[2024-07-29 10:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 10:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 533): INFO AMP_ENABLE: true AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 1.0 CUTMIX_MINMAX: null MIXUP: 0.8 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 64 CACHE_MODE: part DATASET: imagenet DATA_PATH: /dataset/ImageNet_ILSVRC2012 IMG_SIZE: 224 INTERPOLATION: bicubic MASK_PATCH_SIZE: 32 MASK_RATIO: 0.6 NUM_WORKERS: 8 PERSISTENT_WORKERS: true PIN_MEMORY: true ZIP_MODE: false ENABLE_AMP: false EVAL_MODE: false FUSED_LAYERNORM: false MODEL: DDP: hfai DROP_PATH_RATE: 0.2 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 MLLA: APE: false DEPTHS: - 2 - 4 - 8 - 4 DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 EMBED_DIM: 64 IMAGE_SIZE: 224 IN_CHANS: 3 MLP_RATIO: 4.0 NUM_HEADS: - 2 - 4 - 8 - 16 PATCH_SIZE: 4 SIMPLE_DOWNSAMPLE: false SIMPLE_PATCH_EMBED: false MMCKPT: false NAME: vssd_mesa_retrain_tiny_e300 NUM_CLASSES: 1000 PRETRAINED: '' RESUME: '' RMT: CHUNKWISE_RECURRENTS: - true - true - false - false DEPTHS: - 2 - 2 - 6 - 2 DROP_PATH_RATE: 0.1 EMBED_DIMS: - 64 - 128 - 256 - 512 HEADS_RANGES: - 3 - 3 - 3 - 3 INIT_VALUES: - 1 - 1 - 1 - 1 LAYERSCALES: - false - false - false - false MLP_RATIOS: - 3 - 3 - 3 - 3 NUM_HEADS: - 3 - 6 - 12 - 24 PATCH_NORM: true TYPE: vmamba2 VMAMBA2: APE: false ATTN_TYPES: - mamba2 - mamba2 - mamba2 - standard BIDIRECTION: false DEPTHS: - 2 - 4 - 8 - 4 DROP_PATH_RATE: 0.2 DROP_RATE: 0.0 D_STATE: 64 EMBED_DIM: 64 IMAGE_SIZE: 224 IN_CHANS: 3 LEPE: false LINEAR_ATTN_DUALITY: true MLP_RATIO: 4.0 NUM_HEADS: - 2 - 4 - 8 - 16 PARTIAL_WIN_SIZE: -1 PATCH_SIZE: 4 SIMPLE_DOWNSAMPLE: false SIMPLE_PATCH_EMBED: false SSD_AEXP: false SSD_CHUNK_SIZE: 256 SSD_EXPANSION: 2 SSD_NGROUPS: 1 SSD_POSITIVE_DA: true VSSM: ADD_SE: false AXIS_STAGE: [] CONVFFN: false CONV_FFN_RATIO: 2 DEPTHS: - 2 - 2 - 9 - 2 DOWNSAMPLE: v2 EMBED_DIM: 96 GMLP: false IN_CHANS: 3 MLP_ACT_LAYER: gelu MLP_DROP_RATE: 0.0 MLP_RATIO: 4.0 NORM_LAYER: ln NUM_HEADS: - 1 - 2 - 4 - 8 PATCHEMBED: v2 PATCH_NORM: true PATCH_SIZE: 4 POSEMBED: false PRE_NORM: false SSM_ACT_LAYER: silu SSM_CONV: 3 SSM_CONV_BIAS: true SSM_DROP_RATE: 0.0 SSM_DT_RANK: auto SSM_D_STATE: 16 SSM_FORWARDTYPE: v2 SSM_INIT: v0 SSM_RANK_RATIO: 2.0 SSM_RATIO: 2.0 OUTPUT: ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109 PRINT_FREQ: 10 SAVE_FREQ: 1 SEED: 0 TAG: '20240725135109' TEST: CROP: true SEQUENTIAL: false SHUFFLE: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 1 AUTO_RESUME: true BASE_LR: 0.002 CLIP_GRAD: 5.0 EPOCHS: 300 LAYER_DECAY: 1.0 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 GAMMA: 0.1 MULTISTEPS: [] NAME: cosine WARMUP_PREFIX: true MIN_LR: 2.0e-05 MOE: SAVE_MASTER: false OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 2.0e-06 WEIGHT_DECAY: 0.05 TRAINCOST_MODE: false [2024-07-29 10:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 534): INFO {"cfg": "./configs/mambav2_mesa/vssd_mesa_retrain_tiny_e300.yaml", "opts": null, "batch_size": 64, "data_path": "/dataset/ImageNet_ILSVRC2012", "zip": false, "cache_mode": "part", "pretrained": null, "resume": null, "accumulation_steps": null, "use_checkpoint": false, "disable_amp": false, "output": "./exclude/output_mesa", "tag": "20240725135109", "eval": false, "throughput": false, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.9999, "model_ema_force_cpu": false, "memory_limit_rate": -1, "ddp": "hfai", "enable_preload": true, "enable_persistance": true, "mesa": true, "mesa_value": 1.0, "mute_repeat": false} [2024-07-29 10:53:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 10:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 135): INFO VMAMBA2( (patch_embed): Stem( (conv1): ConvLayer( (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): ReLU() ) (conv2): Sequential( (0): ConvLayer( (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): ReLU() ) (1): ConvLayer( (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (conv3): Sequential( (0): ConvLayer( (conv): Conv2d(32, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): ReLU() ) (1): ConvLayer( (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (pos_drop): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0): BasicLayer( dim=64, input_resolution=(56, 56), depth=2 (blocks): ModuleList( (0): VMAMBA2Block( (cpe1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=64, out_features=386, bias=False) (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (act): SiLU() (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=128, out_features=64, bias=False) ) (drop_path): Identity() (cpe2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=64, out_features=256, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=256, out_features=64, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (1): VMAMBA2Block( (cpe1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=64, out_features=386, bias=False) (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (act): SiLU() (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=128, out_features=64, bias=False) ) (drop_path): timm.DropPath(0.0117647061124444) (cpe2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=64, out_features=256, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=256, out_features=64, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) ) (downsample): PatchMerging( (conv): Sequential( (0): ConvLayer( (conv): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1)) (act): ReLU() ) (1): ConvLayer( (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512) (act): ReLU() ) (2): ConvLayer( (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1)) (norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (1): BasicLayer( dim=128, input_resolution=(28, 28), depth=4 (blocks): ModuleList( (0): VMAMBA2Block( (cpe1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=128, out_features=644, bias=False) (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384) (act): SiLU() (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=256, out_features=128, bias=False) ) (drop_path): timm.DropPath(0.0235294122248888) (cpe2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=128, out_features=512, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=512, out_features=128, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (1): VMAMBA2Block( (cpe1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=128, out_features=644, bias=False) (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384) (act): SiLU() (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=256, out_features=128, bias=False) ) (drop_path): timm.DropPath(0.03529411926865578) (cpe2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=128, out_features=512, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=512, out_features=128, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (2): VMAMBA2Block( (cpe1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=128, out_features=644, bias=False) (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384) (act): SiLU() (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=256, out_features=128, bias=False) ) (drop_path): timm.DropPath(0.0470588244497776) (cpe2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=128, out_features=512, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=512, out_features=128, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (3): VMAMBA2Block( (cpe1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=128, out_features=644, bias=False) (conv2d): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384) (act): SiLU() (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=256, out_features=128, bias=False) ) (drop_path): timm.DropPath(0.05882352963089943) (cpe2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=128, out_features=512, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=512, out_features=128, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) ) (downsample): PatchMerging( (conv): Sequential( (0): ConvLayer( (conv): Conv2d(128, 1024, kernel_size=(1, 1), stride=(1, 1)) (act): ReLU() ) (1): ConvLayer( (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1024) (act): ReLU() ) (2): ConvLayer( (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (2): BasicLayer( dim=256, input_resolution=(14, 14), depth=8 (blocks): ModuleList( (0): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.07058823853731155) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (1): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.08235294371843338) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (2): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.0941176488995552) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (3): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.10588235408067703) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (4): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.11764705926179886) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (5): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.12941177189350128) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (6): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.1411764770746231) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (7): VMAMBA2Block( (cpe1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (attn): Mamba2( (in_proj): Linear(in_features=256, out_features=1160, bias=False) (conv2d): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=640) (act): SiLU() (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (out_proj): Linear(in_features=512, out_features=256, bias=False) ) (drop_path): timm.DropPath(0.15294118225574493) (cpe2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=1024, out_features=256, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) ) (downsample): PatchMerging( (conv): Sequential( (0): ConvLayer( (conv): Conv2d(256, 2048, kernel_size=(1, 1), stride=(1, 1)) (act): ReLU() ) (1): ConvLayer( (conv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=2048) (act): ReLU() ) (2): ConvLayer( (conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (3): BasicLayer( dim=512, input_resolution=(7, 7), depth=4 (blocks): ModuleList( (0): VMAMBA2Block( (cpe1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): StandardAttention( (to_qkv): Linear(in_features=512, out_features=1536, bias=False) (to_out): Linear(in_features=512, out_features=512, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (drop_path): timm.DropPath(0.16470588743686676) (cpe2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=2048, out_features=512, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (1): VMAMBA2Block( (cpe1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): StandardAttention( (to_qkv): Linear(in_features=512, out_features=1536, bias=False) (to_out): Linear(in_features=512, out_features=512, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (drop_path): timm.DropPath(0.1764705926179886) (cpe2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=2048, out_features=512, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (2): VMAMBA2Block( (cpe1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): StandardAttention( (to_qkv): Linear(in_features=512, out_features=1536, bias=False) (to_out): Linear(in_features=512, out_features=512, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (drop_path): timm.DropPath(0.1882352977991104) (cpe2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=2048, out_features=512, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) (3): VMAMBA2Block( (cpe1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): StandardAttention( (to_qkv): Linear(in_features=512, out_features=1536, bias=False) (to_out): Linear(in_features=512, out_features=512, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (drop_path): timm.DropPath(0.20000000298023224) (cpe2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=2048, out_features=512, bias=True) (drop): Dropout(p=0.0, inplace=False) ) ) ) ) ) (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (avgpool): AdaptiveAvgPool1d(output_size=1) (head): Linear(in_features=512, out_features=1000, bias=True) ) [2024-07-29 10:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 137): INFO number of params: 24270724 [2024-07-29 10:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 139): INFO number of GFLOPs: 0.001 [2024-07-29 10:53:23 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 10:53:23 vssd_mesa_retrain_tiny_e300] (optimizer.py 27): INFO No weight decay list: ['patch_embed.conv1.norm.weight', 'patch_embed.conv1.norm.bias', 'patch_embed.conv2.0.norm.weight', 'patch_embed.conv2.0.norm.bias', 'patch_embed.conv2.1.norm.weight', 'patch_embed.conv2.1.norm.bias', 'patch_embed.conv3.0.norm.weight', 'patch_embed.conv3.0.norm.bias', 'patch_embed.conv3.1.norm.weight', 'patch_embed.conv3.1.norm.bias', 'layers.0.blocks.0.cpe1.bias', 'layers.0.blocks.0.norm1.weight', 'layers.0.blocks.0.norm1.bias', 'layers.0.blocks.0.attn.dt_bias', 'layers.0.blocks.0.attn.A_log', 'layers.0.blocks.0.attn.D', 'layers.0.blocks.0.attn.conv2d.bias', 'layers.0.blocks.0.attn.norm.weight', 'layers.0.blocks.0.attn.norm.bias', 'layers.0.blocks.0.cpe2.bias', 'layers.0.blocks.0.norm2.weight', 'layers.0.blocks.0.norm2.bias', 'layers.0.blocks.0.mlp.fc1.bias', 'layers.0.blocks.0.mlp.fc2.bias', 'layers.0.blocks.1.cpe1.bias', 'layers.0.blocks.1.norm1.weight', 'layers.0.blocks.1.norm1.bias', 'layers.0.blocks.1.attn.dt_bias', 'layers.0.blocks.1.attn.A_log', 'layers.0.blocks.1.attn.D', 'layers.0.blocks.1.attn.conv2d.bias', 'layers.0.blocks.1.attn.norm.weight', 'layers.0.blocks.1.attn.norm.bias', 'layers.0.blocks.1.cpe2.bias', 'layers.0.blocks.1.norm2.weight', 'layers.0.blocks.1.norm2.bias', 'layers.0.blocks.1.mlp.fc1.bias', 'layers.0.blocks.1.mlp.fc2.bias', 'layers.0.downsample.conv.0.conv.bias', 'layers.0.downsample.conv.1.conv.bias', 'layers.0.downsample.conv.2.conv.bias', 'layers.0.downsample.conv.2.norm.weight', 'layers.0.downsample.conv.2.norm.bias', 'layers.1.blocks.0.cpe1.bias', 'layers.1.blocks.0.norm1.weight', 'layers.1.blocks.0.norm1.bias', 'layers.1.blocks.0.attn.dt_bias', 'layers.1.blocks.0.attn.A_log', 'layers.1.blocks.0.attn.D', 'layers.1.blocks.0.attn.conv2d.bias', 'layers.1.blocks.0.attn.norm.weight', 'layers.1.blocks.0.attn.norm.bias', 'layers.1.blocks.0.cpe2.bias', 'layers.1.blocks.0.norm2.weight', 'layers.1.blocks.0.norm2.bias', 'layers.1.blocks.0.mlp.fc1.bias', 'layers.1.blocks.0.mlp.fc2.bias', 'layers.1.blocks.1.cpe1.bias', 'layers.1.blocks.1.norm1.weight', 'layers.1.blocks.1.norm1.bias', 'layers.1.blocks.1.attn.dt_bias', 'layers.1.blocks.1.attn.A_log', 'layers.1.blocks.1.attn.D', 'layers.1.blocks.1.attn.conv2d.bias', 'layers.1.blocks.1.attn.norm.weight', 'layers.1.blocks.1.attn.norm.bias', 'layers.1.blocks.1.cpe2.bias', 'layers.1.blocks.1.norm2.weight', 'layers.1.blocks.1.norm2.bias', 'layers.1.blocks.1.mlp.fc1.bias', 'layers.1.blocks.1.mlp.fc2.bias', 'layers.1.blocks.2.cpe1.bias', 'layers.1.blocks.2.norm1.weight', 'layers.1.blocks.2.norm1.bias', 'layers.1.blocks.2.attn.dt_bias', 'layers.1.blocks.2.attn.A_log', 'layers.1.blocks.2.attn.D', 'layers.1.blocks.2.attn.conv2d.bias', 'layers.1.blocks.2.attn.norm.weight', 'layers.1.blocks.2.attn.norm.bias', 'layers.1.blocks.2.cpe2.bias', 'layers.1.blocks.2.norm2.weight', 'layers.1.blocks.2.norm2.bias', 'layers.1.blocks.2.mlp.fc1.bias', 'layers.1.blocks.2.mlp.fc2.bias', 'layers.1.blocks.3.cpe1.bias', 'layers.1.blocks.3.norm1.weight', 'layers.1.blocks.3.norm1.bias', 'layers.1.blocks.3.attn.dt_bias', 'layers.1.blocks.3.attn.A_log', 'layers.1.blocks.3.attn.D', 'layers.1.blocks.3.attn.conv2d.bias', 'layers.1.blocks.3.attn.norm.weight', 'layers.1.blocks.3.attn.norm.bias', 'layers.1.blocks.3.cpe2.bias', 'layers.1.blocks.3.norm2.weight', 'layers.1.blocks.3.norm2.bias', 'layers.1.blocks.3.mlp.fc1.bias', 'layers.1.blocks.3.mlp.fc2.bias', 'layers.1.downsample.conv.0.conv.bias', 'layers.1.downsample.conv.1.conv.bias', 'layers.1.downsample.conv.2.conv.bias', 'layers.1.downsample.conv.2.norm.weight', 'layers.1.downsample.conv.2.norm.bias', 'layers.2.blocks.0.cpe1.bias', 'layers.2.blocks.0.norm1.weight', 'layers.2.blocks.0.norm1.bias', 'layers.2.blocks.0.attn.dt_bias', 'layers.2.blocks.0.attn.A_log', 'layers.2.blocks.0.attn.D', 'layers.2.blocks.0.attn.conv2d.bias', 'layers.2.blocks.0.attn.norm.weight', 'layers.2.blocks.0.attn.norm.bias', 'layers.2.blocks.0.cpe2.bias', 'layers.2.blocks.0.norm2.weight', 'layers.2.blocks.0.norm2.bias', 'layers.2.blocks.0.mlp.fc1.bias', 'layers.2.blocks.0.mlp.fc2.bias', 'layers.2.blocks.1.cpe1.bias', 'layers.2.blocks.1.norm1.weight', 'layers.2.blocks.1.norm1.bias', 'layers.2.blocks.1.attn.dt_bias', 'layers.2.blocks.1.attn.A_log', 'layers.2.blocks.1.attn.D', 'layers.2.blocks.1.attn.conv2d.bias', 'layers.2.blocks.1.attn.norm.weight', 'layers.2.blocks.1.attn.norm.bias', 'layers.2.blocks.1.cpe2.bias', 'layers.2.blocks.1.norm2.weight', 'layers.2.blocks.1.norm2.bias', 'layers.2.blocks.1.mlp.fc1.bias', 'layers.2.blocks.1.mlp.fc2.bias', 'layers.2.blocks.2.cpe1.bias', 'layers.2.blocks.2.norm1.weight', 'layers.2.blocks.2.norm1.bias', 'layers.2.blocks.2.attn.dt_bias', 'layers.2.blocks.2.attn.A_log', 'layers.2.blocks.2.attn.D', 'layers.2.blocks.2.attn.conv2d.bias', 'layers.2.blocks.2.attn.norm.weight', 'layers.2.blocks.2.attn.norm.bias', 'layers.2.blocks.2.cpe2.bias', 'layers.2.blocks.2.norm2.weight', 'layers.2.blocks.2.norm2.bias', 'layers.2.blocks.2.mlp.fc1.bias', 'layers.2.blocks.2.mlp.fc2.bias', 'layers.2.blocks.3.cpe1.bias', 'layers.2.blocks.3.norm1.weight', 'layers.2.blocks.3.norm1.bias', 'layers.2.blocks.3.attn.dt_bias', 'layers.2.blocks.3.attn.A_log', 'layers.2.blocks.3.attn.D', 'layers.2.blocks.3.attn.conv2d.bias', 'layers.2.blocks.3.attn.norm.weight', 'layers.2.blocks.3.attn.norm.bias', 'layers.2.blocks.3.cpe2.bias', 'layers.2.blocks.3.norm2.weight', 'layers.2.blocks.3.norm2.bias', 'layers.2.blocks.3.mlp.fc1.bias', 'layers.2.blocks.3.mlp.fc2.bias', 'layers.2.blocks.4.cpe1.bias', 'layers.2.blocks.4.norm1.weight', 'layers.2.blocks.4.norm1.bias', 'layers.2.blocks.4.attn.dt_bias', 'layers.2.blocks.4.attn.A_log', 'layers.2.blocks.4.attn.D', 'layers.2.blocks.4.attn.conv2d.bias', 'layers.2.blocks.4.attn.norm.weight', 'layers.2.blocks.4.attn.norm.bias', 'layers.2.blocks.4.cpe2.bias', 'layers.2.blocks.4.norm2.weight', 'layers.2.blocks.4.norm2.bias', 'layers.2.blocks.4.mlp.fc1.bias', 'layers.2.blocks.4.mlp.fc2.bias', 'layers.2.blocks.5.cpe1.bias', 'layers.2.blocks.5.norm1.weight', 'layers.2.blocks.5.norm1.bias', 'layers.2.blocks.5.attn.dt_bias', 'layers.2.blocks.5.attn.A_log', 'layers.2.blocks.5.attn.D', 'layers.2.blocks.5.attn.conv2d.bias', 'layers.2.blocks.5.attn.norm.weight', 'layers.2.blocks.5.attn.norm.bias', 'layers.2.blocks.5.cpe2.bias', 'layers.2.blocks.5.norm2.weight', 'layers.2.blocks.5.norm2.bias', 'layers.2.blocks.5.mlp.fc1.bias', 'layers.2.blocks.5.mlp.fc2.bias', 'layers.2.blocks.6.cpe1.bias', 'layers.2.blocks.6.norm1.weight', 'layers.2.blocks.6.norm1.bias', 'layers.2.blocks.6.attn.dt_bias', 'layers.2.blocks.6.attn.A_log', 'layers.2.blocks.6.attn.D', 'layers.2.blocks.6.attn.conv2d.bias', 'layers.2.blocks.6.attn.norm.weight', 'layers.2.blocks.6.attn.norm.bias', 'layers.2.blocks.6.cpe2.bias', 'layers.2.blocks.6.norm2.weight', 'layers.2.blocks.6.norm2.bias', 'layers.2.blocks.6.mlp.fc1.bias', 'layers.2.blocks.6.mlp.fc2.bias', 'layers.2.blocks.7.cpe1.bias', 'layers.2.blocks.7.norm1.weight', 'layers.2.blocks.7.norm1.bias', 'layers.2.blocks.7.attn.dt_bias', 'layers.2.blocks.7.attn.A_log', 'layers.2.blocks.7.attn.D', 'layers.2.blocks.7.attn.conv2d.bias', 'layers.2.blocks.7.attn.norm.weight', 'layers.2.blocks.7.attn.norm.bias', 'layers.2.blocks.7.cpe2.bias', 'layers.2.blocks.7.norm2.weight', 'layers.2.blocks.7.norm2.bias', 'layers.2.blocks.7.mlp.fc1.bias', 'layers.2.blocks.7.mlp.fc2.bias', 'layers.2.downsample.conv.0.conv.bias', 'layers.2.downsample.conv.1.conv.bias', 'layers.2.downsample.conv.2.conv.bias', 'layers.2.downsample.conv.2.norm.weight', 'layers.2.downsample.conv.2.norm.bias', 'layers.3.blocks.0.cpe1.bias', 'layers.3.blocks.0.norm1.weight', 'layers.3.blocks.0.norm1.bias', 'layers.3.blocks.0.attn.to_out.bias', 'layers.3.blocks.0.cpe2.bias', 'layers.3.blocks.0.norm2.weight', 'layers.3.blocks.0.norm2.bias', 'layers.3.blocks.0.mlp.fc1.bias', 'layers.3.blocks.0.mlp.fc2.bias', 'layers.3.blocks.1.cpe1.bias', 'layers.3.blocks.1.norm1.weight', 'layers.3.blocks.1.norm1.bias', 'layers.3.blocks.1.attn.to_out.bias', 'layers.3.blocks.1.cpe2.bias', 'layers.3.blocks.1.norm2.weight', 'layers.3.blocks.1.norm2.bias', 'layers.3.blocks.1.mlp.fc1.bias', 'layers.3.blocks.1.mlp.fc2.bias', 'layers.3.blocks.2.cpe1.bias', 'layers.3.blocks.2.norm1.weight', 'layers.3.blocks.2.norm1.bias', 'layers.3.blocks.2.attn.to_out.bias', 'layers.3.blocks.2.cpe2.bias', 'layers.3.blocks.2.norm2.weight', 'layers.3.blocks.2.norm2.bias', 'layers.3.blocks.2.mlp.fc1.bias', 'layers.3.blocks.2.mlp.fc2.bias', 'layers.3.blocks.3.cpe1.bias', 'layers.3.blocks.3.norm1.weight', 'layers.3.blocks.3.norm1.bias', 'layers.3.blocks.3.attn.to_out.bias', 'layers.3.blocks.3.cpe2.bias', 'layers.3.blocks.3.norm2.weight', 'layers.3.blocks.3.norm2.bias', 'layers.3.blocks.3.mlp.fc1.bias', 'layers.3.blocks.3.mlp.fc2.bias', 'norm.weight', 'norm.bias', 'head.bias'] [2024-07-29 10:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 195): INFO no checkpoint found in ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109, ignoring auto resume [2024-07-29 10:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 10:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][0/625] eta 1:11:21 lr 0.000002 wd 0.0500 time 6.8502 (6.8502) data time 0.7117 (0.7117) model time 0.0000 (0.0000) loss 6.9291 (6.9291) grad_norm 0.4019 (0.4019) loss_scale 65536.0000 (65536.0000) mem 10786MB [2024-07-29 10:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][10/625] eta 0:08:46 lr 0.000004 wd 0.0500 time 0.1967 (0.8567) data time 0.0011 (0.0657) model time 0.0000 (0.0000) loss 6.9182 (6.9248) grad_norm 0.3966 (0.4049) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][20/625] eta 0:05:29 lr 0.000005 wd 0.0500 time 0.1990 (0.5442) data time 0.0008 (0.0349) model time 0.0000 (0.0000) loss 6.8822 (6.9205) grad_norm 0.3916 (0.3990) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][30/625] eta 0:04:17 lr 0.000007 wd 0.0500 time 0.2122 (0.4332) data time 0.0007 (0.0239) model time 0.0000 (0.0000) loss 6.8961 (6.9170) grad_norm 0.3958 (0.3970) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][40/625] eta 0:03:40 lr 0.000008 wd 0.0500 time 0.1967 (0.3762) data time 0.0009 (0.0183) model time 0.0000 (0.0000) loss 6.9139 (6.9176) grad_norm 0.3483 (0.3913) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][50/625] eta 0:03:16 lr 0.000010 wd 0.0500 time 0.2005 (0.3418) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 6.9139 (6.9168) grad_norm 0.3664 (0.3875) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][60/625] eta 0:03:00 lr 0.000012 wd 0.0500 time 0.2052 (0.3186) data time 0.0008 (0.0126) model time 0.2044 (0.1997) loss 6.9115 (6.9165) grad_norm 0.3769 (0.3841) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][70/625] eta 0:02:47 lr 0.000013 wd 0.0500 time 0.2006 (0.3020) data time 0.0009 (0.0109) model time 0.1997 (0.1996) loss 6.9086 (6.9161) grad_norm 0.3810 (0.3816) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][80/625] eta 0:02:37 lr 0.000015 wd 0.0500 time 0.2002 (0.2895) data time 0.0009 (0.0097) model time 0.1993 (0.1997) loss 6.8966 (6.9156) grad_norm 0.3637 (0.3784) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][90/625] eta 0:02:29 lr 0.000016 wd 0.0500 time 0.1993 (0.2801) data time 0.0008 (0.0087) model time 0.1985 (0.2007) loss 6.9229 (6.9156) grad_norm 0.3566 (0.3752) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][100/625] eta 0:02:22 lr 0.000018 wd 0.0500 time 0.2003 (0.2723) data time 0.0006 (0.0080) model time 0.1997 (0.2005) loss 6.9102 (6.9148) grad_norm 0.3428 (0.3719) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][110/625] eta 0:02:16 lr 0.000020 wd 0.0500 time 0.2006 (0.2658) data time 0.0009 (0.0073) model time 0.1998 (0.2004) loss 6.8931 (6.9141) grad_norm 0.3493 (0.3694) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][120/625] eta 0:02:11 lr 0.000021 wd 0.0500 time 0.2008 (0.2605) data time 0.0007 (0.0068) model time 0.2002 (0.2004) loss 6.9146 (6.9132) grad_norm 0.3509 (0.3665) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][130/625] eta 0:02:06 lr 0.000023 wd 0.0500 time 0.2074 (0.2560) data time 0.0008 (0.0063) model time 0.2066 (0.2004) loss 6.9039 (6.9129) grad_norm 0.3407 (0.3639) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][140/625] eta 0:02:02 lr 0.000024 wd 0.0500 time 0.1982 (0.2520) data time 0.0008 (0.0060) model time 0.1974 (0.2003) loss 6.8926 (6.9119) grad_norm 0.3181 (0.3612) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][150/625] eta 0:01:58 lr 0.000026 wd 0.0500 time 0.2009 (0.2487) data time 0.0010 (0.0056) model time 0.1999 (0.2003) loss 6.9029 (6.9114) grad_norm 0.3233 (0.3590) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][160/625] eta 0:01:54 lr 0.000028 wd 0.0500 time 0.2003 (0.2458) data time 0.0009 (0.0053) model time 0.1994 (0.2004) loss 6.9093 (6.9109) grad_norm 0.3471 (0.3576) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][170/625] eta 0:01:50 lr 0.000029 wd 0.0500 time 0.1995 (0.2432) data time 0.0010 (0.0051) model time 0.1985 (0.2005) loss 6.9046 (6.9103) grad_norm 0.3448 (0.3561) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][180/625] eta 0:01:47 lr 0.000031 wd 0.0500 time 0.1977 (0.2409) data time 0.0009 (0.0048) model time 0.1968 (0.2004) loss 6.9135 (6.9098) grad_norm 0.3476 (0.3558) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][190/625] eta 0:01:43 lr 0.000032 wd 0.0500 time 0.2006 (0.2388) data time 0.0010 (0.0046) model time 0.1996 (0.2004) loss 6.9049 (6.9090) grad_norm 0.3579 (0.3554) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][200/625] eta 0:01:40 lr 0.000034 wd 0.0500 time 0.2009 (0.2369) data time 0.0009 (0.0045) model time 0.2001 (0.2004) loss 6.8848 (6.9083) grad_norm 0.3427 (0.3553) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][210/625] eta 0:01:37 lr 0.000036 wd 0.0500 time 0.2036 (0.2353) data time 0.0009 (0.0043) model time 0.2027 (0.2004) loss 6.9103 (6.9074) grad_norm 0.3438 (0.3552) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][220/625] eta 0:01:34 lr 0.000037 wd 0.0500 time 0.2011 (0.2337) data time 0.0008 (0.0041) model time 0.2003 (0.2004) loss 6.9025 (6.9069) grad_norm 0.3495 (0.3554) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][230/625] eta 0:01:31 lr 0.000039 wd 0.0500 time 0.2001 (0.2323) data time 0.0008 (0.0040) model time 0.1993 (0.2004) loss 6.8938 (6.9057) grad_norm 0.4421 (0.3574) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][240/625] eta 0:01:28 lr 0.000040 wd 0.0500 time 0.2041 (0.2310) data time 0.0007 (0.0039) model time 0.2033 (0.2004) loss 6.8568 (6.9038) grad_norm 0.4766 (0.3612) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][250/625] eta 0:01:26 lr 0.000042 wd 0.0500 time 0.2028 (0.2299) data time 0.0008 (0.0037) model time 0.2020 (0.2004) loss 6.8840 (6.9021) grad_norm 0.4351 (0.3646) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][260/625] eta 0:01:23 lr 0.000044 wd 0.0500 time 0.2039 (0.2288) data time 0.0007 (0.0036) model time 0.2032 (0.2004) loss 6.8314 (6.9005) grad_norm 0.6384 (0.3718) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][270/625] eta 0:01:20 lr 0.000045 wd 0.0500 time 0.2003 (0.2278) data time 0.0008 (0.0035) model time 0.1994 (0.2005) loss 6.8550 (6.8989) grad_norm 0.6968 (0.3823) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][280/625] eta 0:01:18 lr 0.000047 wd 0.0500 time 0.2008 (0.2268) data time 0.0009 (0.0034) model time 0.1998 (0.2004) loss 6.8561 (6.8974) grad_norm 0.7609 (0.3942) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][290/625] eta 0:01:15 lr 0.000048 wd 0.0500 time 0.2013 (0.2259) data time 0.0008 (0.0034) model time 0.2005 (0.2004) loss 6.7295 (6.8942) grad_norm 0.8362 (0.4074) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][300/625] eta 0:01:13 lr 0.000050 wd 0.0500 time 0.2033 (0.2251) data time 0.0009 (0.0033) model time 0.2024 (0.2004) loss 6.7916 (6.8913) grad_norm 0.8631 (0.4208) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][310/625] eta 0:01:10 lr 0.000052 wd 0.0500 time 0.2021 (0.2244) data time 0.0012 (0.0032) model time 0.2009 (0.2005) loss 6.8072 (6.8879) grad_norm 0.9481 (0.4330) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][320/625] eta 0:01:08 lr 0.000053 wd 0.0500 time 0.2013 (0.2237) data time 0.0007 (0.0031) model time 0.2007 (0.2005) loss 6.7304 (6.8863) grad_norm 0.6696 (0.4439) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][330/625] eta 0:01:05 lr 0.000055 wd 0.0500 time 0.2012 (0.2231) data time 0.0007 (0.0031) model time 0.2005 (0.2005) loss 6.7115 (6.8822) grad_norm 0.8014 (0.4609) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][340/625] eta 0:01:03 lr 0.000056 wd 0.0500 time 0.2025 (0.2225) data time 0.0009 (0.0030) model time 0.2016 (0.2006) loss 6.9010 (6.8801) grad_norm 1.8663 (0.4768) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][350/625] eta 0:01:01 lr 0.000058 wd 0.0500 time 0.2011 (0.2219) data time 0.0008 (0.0029) model time 0.2003 (0.2006) loss 6.7650 (6.8769) grad_norm 1.5238 (0.5003) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][360/625] eta 0:00:58 lr 0.000060 wd 0.0500 time 0.1974 (0.2213) data time 0.0007 (0.0029) model time 0.1967 (0.2006) loss 6.7285 (6.8733) grad_norm 1.3162 (0.5222) loss_scale 65536.0000 (65536.0000) mem 8983MB [2024-07-29 10:54:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][370/625] eta 0:00:56 lr 0.000061 wd 0.0500 time 0.1999 (0.2208) data time 0.0009 (0.0028) model time 0.1991 (0.2006) loss 6.7493 (6.8695) grad_norm 1.5036 (inf) loss_scale 32768.0000 (65359.3531) mem 8983MB [2024-07-29 10:54:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][380/625] eta 0:00:53 lr 0.000063 wd 0.0500 time 0.2022 (0.2203) data time 0.0008 (0.0028) model time 0.2015 (0.2006) loss 6.6037 (6.8660) grad_norm 1.8890 (inf) loss_scale 32768.0000 (64503.9370) mem 8983MB [2024-07-29 10:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][390/625] eta 0:00:51 lr 0.000064 wd 0.0500 time 0.2013 (0.2198) data time 0.0007 (0.0027) model time 0.2007 (0.2006) loss 6.7447 (6.8616) grad_norm 1.2611 (inf) loss_scale 32768.0000 (63692.2762) mem 8983MB [2024-07-29 10:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][400/625] eta 0:00:49 lr 0.000066 wd 0.0500 time 0.2041 (0.2194) data time 0.0009 (0.0027) model time 0.2032 (0.2006) loss 6.7708 (6.8595) grad_norm 1.6367 (inf) loss_scale 32768.0000 (62921.0973) mem 8983MB [2024-07-29 10:55:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][410/625] eta 0:00:47 lr 0.000068 wd 0.0500 time 0.2031 (0.2190) data time 0.0009 (0.0026) model time 0.2023 (0.2007) loss 6.8118 (6.8566) grad_norm 1.4389 (inf) loss_scale 32768.0000 (62187.4453) mem 8983MB [2024-07-29 10:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][420/625] eta 0:00:44 lr 0.000069 wd 0.0500 time 0.2069 (0.2187) data time 0.0008 (0.0026) model time 0.2061 (0.2009) loss 6.7417 (6.8529) grad_norm 1.6089 (inf) loss_scale 32768.0000 (61488.6461) mem 8983MB [2024-07-29 10:55:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][430/625] eta 0:00:42 lr 0.000071 wd 0.0500 time 0.2027 (0.2184) data time 0.0006 (0.0026) model time 0.2020 (0.2009) loss 6.6480 (6.8506) grad_norm 1.1044 (inf) loss_scale 32768.0000 (60822.2738) mem 8983MB [2024-07-29 10:55:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][440/625] eta 0:00:40 lr 0.000072 wd 0.0500 time 0.1989 (0.2181) data time 0.0008 (0.0025) model time 0.1981 (0.2010) loss 6.6515 (6.8477) grad_norm 1.4485 (inf) loss_scale 32768.0000 (60186.1224) mem 8983MB [2024-07-29 10:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][450/625] eta 0:00:38 lr 0.000074 wd 0.0500 time 0.2016 (0.2178) data time 0.0006 (0.0025) model time 0.2010 (0.2011) loss 6.5916 (6.8441) grad_norm 1.3294 (inf) loss_scale 32768.0000 (59578.1818) mem 8983MB [2024-07-29 10:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][460/625] eta 0:00:35 lr 0.000076 wd 0.0500 time 0.2056 (0.2174) data time 0.0009 (0.0025) model time 0.2047 (0.2011) loss 6.6101 (6.8397) grad_norm 1.4637 (inf) loss_scale 32768.0000 (58996.6161) mem 8983MB [2024-07-29 10:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][470/625] eta 0:00:33 lr 0.000077 wd 0.0500 time 0.2050 (0.2171) data time 0.0009 (0.0024) model time 0.2041 (0.2011) loss 6.7301 (6.8354) grad_norm 1.5262 (inf) loss_scale 32768.0000 (58439.7452) mem 8983MB [2024-07-29 10:55:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][480/625] eta 0:00:31 lr 0.000079 wd 0.0500 time 0.2039 (0.2169) data time 0.0006 (0.0024) model time 0.2033 (0.2011) loss 6.7532 (6.8321) grad_norm 1.6353 (inf) loss_scale 32768.0000 (57906.0291) mem 8983MB [2024-07-29 10:55:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][490/625] eta 0:00:29 lr 0.000080 wd 0.0500 time 0.1987 (0.2170) data time 0.0009 (0.0024) model time 0.1978 (0.2016) loss 6.7493 (6.8304) grad_norm 1.3311 (inf) loss_scale 32768.0000 (57394.0530) mem 8983MB [2024-07-29 10:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][500/625] eta 0:00:27 lr 0.000082 wd 0.0500 time 0.2035 (0.2167) data time 0.0008 (0.0023) model time 0.2027 (0.2016) loss 6.8256 (6.8267) grad_norm 2.4625 (inf) loss_scale 32768.0000 (56902.5150) mem 8983MB [2024-07-29 10:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][510/625] eta 0:00:24 lr 0.000084 wd 0.0500 time 0.2021 (0.2164) data time 0.0008 (0.0023) model time 0.2013 (0.2016) loss 6.7800 (6.8247) grad_norm 1.6820 (inf) loss_scale 32768.0000 (56430.2153) mem 8983MB [2024-07-29 10:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][520/625] eta 0:00:22 lr 0.000085 wd 0.0500 time 0.2059 (0.2162) data time 0.0006 (0.0023) model time 0.2052 (0.2016) loss 6.6377 (6.8217) grad_norm 1.2934 (inf) loss_scale 32768.0000 (55976.0461) mem 8983MB [2024-07-29 10:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][530/625] eta 0:00:20 lr 0.000087 wd 0.0500 time 0.2133 (0.2160) data time 0.0008 (0.0023) model time 0.2126 (0.2017) loss 6.7077 (6.8178) grad_norm 1.6276 (inf) loss_scale 32768.0000 (55538.9831) mem 8983MB [2024-07-29 10:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][540/625] eta 0:00:18 lr 0.000088 wd 0.0500 time 0.2005 (0.2157) data time 0.0010 (0.0022) model time 0.1996 (0.2017) loss 6.6588 (6.8147) grad_norm 1.1225 (inf) loss_scale 32768.0000 (55118.0776) mem 8983MB [2024-07-29 10:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][550/625] eta 0:00:16 lr 0.000090 wd 0.0500 time 0.2000 (0.2155) data time 0.0008 (0.0022) model time 0.1992 (0.2017) loss 6.6438 (6.8110) grad_norm 1.3903 (inf) loss_scale 32768.0000 (54712.4501) mem 8983MB [2024-07-29 10:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][560/625] eta 0:00:13 lr 0.000092 wd 0.0500 time 0.1998 (0.2152) data time 0.0009 (0.0022) model time 0.1989 (0.2017) loss 6.6434 (6.8089) grad_norm 2.8690 (inf) loss_scale 32768.0000 (54321.2834) mem 8983MB [2024-07-29 10:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][570/625] eta 0:00:11 lr 0.000093 wd 0.0500 time 0.2020 (0.2151) data time 0.0008 (0.0022) model time 0.2012 (0.2017) loss 6.7450 (6.8068) grad_norm 1.8537 (inf) loss_scale 32768.0000 (53943.8179) mem 8983MB [2024-07-29 10:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][580/625] eta 0:00:09 lr 0.000095 wd 0.0500 time 0.2000 (0.2157) data time 0.0007 (0.0021) model time 0.1993 (0.2027) loss 6.7323 (6.8038) grad_norm 1.8756 (inf) loss_scale 32768.0000 (53579.3460) mem 8983MB [2024-07-29 10:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][590/625] eta 0:00:07 lr 0.000096 wd 0.0500 time 0.2048 (0.2155) data time 0.0008 (0.0021) model time 0.2040 (0.2027) loss 6.7875 (6.8012) grad_norm 1.4940 (inf) loss_scale 32768.0000 (53227.2081) mem 8983MB [2024-07-29 10:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 10:55:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 10:55:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 11:06:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 11:06:45 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 0) [2024-07-29 11:06:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 11:07:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][600/625] eta 0:00:32 lr 0.000098 wd 0.0500 time 0.2029 (1.2929) data time 0.0011 (0.1698) model time 0.2019 (1.1231) loss 6.7115 (6.6926) grad_norm 1.3944 (1.6974) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 11:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][610/625] eta 0:00:09 lr 0.000100 wd 0.0500 time 0.1992 (0.6529) data time 0.0005 (0.0706) model time 0.1986 (0.5822) loss 6.5305 (6.6569) grad_norm 1.6367 (1.6179) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 11:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][620/625] eta 0:00:02 lr 0.000101 wd 0.0500 time 0.1985 (0.4858) data time 0.0004 (0.0449) model time 0.1982 (0.4409) loss 6.6470 (6.6513) grad_norm 1.4153 (1.5071) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 11:07:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 0 training takes 0:00:13 [2024-07-29 11:07:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:07:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:07:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.405 (0.405) Loss 6.0703 (6.0703) Acc@1 1.318 (1.318) Acc@5 8.447 (8.447) Mem 8977MB [2024-07-29 11:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.089) Loss 6.3516 (6.1683) Acc@1 0.049 (1.354) Acc@5 2.197 (6.583) Mem 8977MB [2024-07-29 11:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 6.2578 (6.1810) Acc@1 3.076 (1.728) Acc@5 6.104 (6.906) Mem 8977MB [2024-07-29 11:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 2.179 Acc@5 7.909 [2024-07-29 11:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 2.2% [2024-07-29 11:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 2.18% [2024-07-29 11:07:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 11:07:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 11:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.636 (0.636) Loss 7.0312 (7.0312) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8977MB [2024-07-29 11:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.109) Loss 6.9883 (6.9563) Acc@1 0.000 (0.000) Acc@5 0.000 (0.222) Mem 8977MB [2024-07-29 11:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.083) Loss 6.9102 (6.9667) Acc@1 0.000 (0.000) Acc@5 2.441 (0.470) Mem 8977MB [2024-07-29 11:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.504 [2024-07-29 11:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 11:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.10% [2024-07-29 11:07:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 11:07:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 11:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][0/625] eta 0:06:36 lr 0.000102 wd 0.0500 time 0.6341 (0.6341) data time 0.3545 (0.3545) model time 0.0000 (0.0000) loss 6.5495 (6.5495) grad_norm 1.8678 (1.8678) loss_scale 32768.0000 (32768.0000) mem 8971MB [2024-07-29 11:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][10/625] eta 0:02:27 lr 0.000103 wd 0.0500 time 0.2026 (0.2404) data time 0.0008 (0.0330) model time 0.0000 (0.0000) loss 6.6528 (6.5679) grad_norm 2.0852 (1.7078) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][20/625] eta 0:02:14 lr 0.000105 wd 0.0500 time 0.2001 (0.2225) data time 0.0007 (0.0177) model time 0.0000 (0.0000) loss 6.4884 (6.5821) grad_norm 2.4211 (1.6396) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][30/625] eta 0:02:08 lr 0.000107 wd 0.0500 time 0.2082 (0.2158) data time 0.0007 (0.0123) model time 0.0000 (0.0000) loss 6.6874 (6.5859) grad_norm 1.8228 (1.6811) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][40/625] eta 0:02:04 lr 0.000108 wd 0.0500 time 0.1960 (0.2124) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 6.7099 (6.5693) grad_norm 1.4309 (1.6843) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][50/625] eta 0:02:00 lr 0.000110 wd 0.0500 time 0.2009 (0.2103) data time 0.0009 (0.0078) model time 0.0000 (0.0000) loss 6.5715 (6.5736) grad_norm 1.2260 (1.6397) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][60/625] eta 0:01:58 lr 0.000111 wd 0.0500 time 0.2003 (0.2089) data time 0.0007 (0.0067) model time 0.1996 (0.2012) loss 6.2891 (6.5684) grad_norm 1.2091 (1.6015) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][70/625] eta 0:01:55 lr 0.000113 wd 0.0500 time 0.2056 (0.2082) data time 0.0007 (0.0059) model time 0.2049 (0.2021) loss 6.8095 (6.5800) grad_norm 1.5726 (1.5820) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][80/625] eta 0:01:53 lr 0.000115 wd 0.0500 time 0.1980 (0.2075) data time 0.0008 (0.0052) model time 0.1971 (0.2019) loss 6.6814 (6.5820) grad_norm 2.0036 (1.6118) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][90/625] eta 0:01:50 lr 0.000116 wd 0.0500 time 0.1974 (0.2070) data time 0.0007 (0.0048) model time 0.1967 (0.2020) loss 6.6201 (6.5827) grad_norm 1.2655 (1.5988) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][100/625] eta 0:01:48 lr 0.000118 wd 0.0500 time 0.1975 (0.2065) data time 0.0009 (0.0044) model time 0.1965 (0.2018) loss 6.5328 (6.5754) grad_norm 2.6055 (1.6258) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][110/625] eta 0:01:46 lr 0.000119 wd 0.0500 time 0.2035 (0.2062) data time 0.0009 (0.0041) model time 0.2026 (0.2018) loss 6.5472 (6.5680) grad_norm 1.8129 (1.6422) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][120/625] eta 0:01:43 lr 0.000121 wd 0.0500 time 0.1971 (0.2059) data time 0.0008 (0.0038) model time 0.1962 (0.2018) loss 6.6654 (6.5657) grad_norm 1.4296 (1.6477) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][130/625] eta 0:01:41 lr 0.000123 wd 0.0500 time 0.2027 (0.2055) data time 0.0008 (0.0036) model time 0.2020 (0.2016) loss 6.6274 (6.5677) grad_norm 1.5760 (1.6654) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][140/625] eta 0:01:39 lr 0.000124 wd 0.0500 time 0.2036 (0.2054) data time 0.0010 (0.0034) model time 0.2026 (0.2017) loss 6.6838 (6.5678) grad_norm 1.3663 (1.6480) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][150/625] eta 0:01:37 lr 0.000126 wd 0.0500 time 0.2017 (0.2052) data time 0.0009 (0.0032) model time 0.2008 (0.2017) loss 6.5199 (6.5599) grad_norm 1.3122 (1.6482) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][160/625] eta 0:01:35 lr 0.000127 wd 0.0500 time 0.1991 (0.2050) data time 0.0009 (0.0031) model time 0.1982 (0.2017) loss 6.6181 (6.5572) grad_norm 1.3728 (1.6589) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][170/625] eta 0:01:33 lr 0.000129 wd 0.0500 time 0.2034 (0.2049) data time 0.0007 (0.0029) model time 0.2027 (0.2017) loss 6.5891 (6.5507) grad_norm 1.1336 (1.6479) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:08:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][180/625] eta 0:01:31 lr 0.000131 wd 0.0500 time 0.2009 (0.2047) data time 0.0008 (0.0028) model time 0.2001 (0.2016) loss 6.4336 (6.5457) grad_norm 2.0486 (1.6551) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:08:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][190/625] eta 0:01:28 lr 0.000132 wd 0.0500 time 0.1971 (0.2044) data time 0.0007 (0.0027) model time 0.1964 (0.2014) loss 6.8005 (6.5436) grad_norm 1.5205 (1.6575) loss_scale 32768.0000 (32768.0000) mem 8975MB [2024-07-29 11:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][200/625] eta 0:01:26 lr 0.000134 wd 0.0500 time 0.1971 (0.2044) data time 0.0007 (0.0026) model time 0.1964 (0.2015) loss 6.4381 (6.5396) grad_norm 1.3586 (inf) loss_scale 16384.0000 (32034.3881) mem 8975MB [2024-07-29 11:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][210/625] eta 0:01:24 lr 0.000135 wd 0.0500 time 0.2020 (0.2042) data time 0.0007 (0.0025) model time 0.2013 (0.2015) loss 6.6488 (6.5373) grad_norm 2.2639 (inf) loss_scale 16384.0000 (31292.6635) mem 8975MB [2024-07-29 11:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][220/625] eta 0:01:23 lr 0.000137 wd 0.0500 time 0.1934 (0.2050) data time 0.0007 (0.0025) model time 0.1927 (0.2025) loss 6.5251 (6.5324) grad_norm 1.4472 (inf) loss_scale 16384.0000 (30618.0633) mem 8975MB [2024-07-29 11:08:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][230/625] eta 0:01:20 lr 0.000139 wd 0.0500 time 0.2006 (0.2048) data time 0.0009 (0.0024) model time 0.1997 (0.2024) loss 6.6396 (6.5276) grad_norm 1.4530 (inf) loss_scale 16384.0000 (30001.8701) mem 8975MB [2024-07-29 11:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][240/625] eta 0:01:18 lr 0.000140 wd 0.0500 time 0.1990 (0.2047) data time 0.0009 (0.0023) model time 0.1982 (0.2024) loss 6.4746 (6.5212) grad_norm 1.6348 (inf) loss_scale 16384.0000 (29436.8133) mem 8975MB [2024-07-29 11:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][250/625] eta 0:01:16 lr 0.000142 wd 0.0500 time 0.2014 (0.2047) data time 0.0008 (0.0023) model time 0.2006 (0.2024) loss 6.2632 (6.5206) grad_norm 1.6956 (inf) loss_scale 16384.0000 (28916.7809) mem 8975MB [2024-07-29 11:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][260/625] eta 0:01:14 lr 0.000143 wd 0.0500 time 0.1980 (0.2046) data time 0.0009 (0.0022) model time 0.1971 (0.2023) loss 6.3045 (6.5167) grad_norm 1.9163 (inf) loss_scale 16384.0000 (28436.5977) mem 8975MB [2024-07-29 11:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][270/625] eta 0:01:12 lr 0.000145 wd 0.0500 time 0.2020 (0.2045) data time 0.0009 (0.0022) model time 0.2012 (0.2023) loss 6.6023 (6.5138) grad_norm 1.3431 (inf) loss_scale 16384.0000 (27991.8524) mem 8975MB [2024-07-29 11:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][280/625] eta 0:01:10 lr 0.000147 wd 0.0500 time 0.1984 (0.2044) data time 0.0008 (0.0021) model time 0.1976 (0.2023) loss 6.5107 (6.5114) grad_norm 1.7764 (inf) loss_scale 16384.0000 (27578.7616) mem 8975MB [2024-07-29 11:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][290/625] eta 0:01:08 lr 0.000148 wd 0.0500 time 0.2027 (0.2044) data time 0.0008 (0.0021) model time 0.2019 (0.2023) loss 6.3950 (6.5123) grad_norm 1.7652 (inf) loss_scale 16384.0000 (27194.0619) mem 8975MB [2024-07-29 11:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][300/625] eta 0:01:06 lr 0.000150 wd 0.0500 time 0.2008 (0.2043) data time 0.0009 (0.0020) model time 0.1999 (0.2022) loss 6.6222 (6.5100) grad_norm 1.6450 (inf) loss_scale 16384.0000 (26834.9236) mem 8975MB [2024-07-29 11:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][310/625] eta 0:01:04 lr 0.000151 wd 0.0500 time 0.2062 (0.2042) data time 0.0007 (0.0020) model time 0.2055 (0.2022) loss 6.6559 (6.5099) grad_norm 1.2785 (inf) loss_scale 16384.0000 (26498.8810) mem 8975MB [2024-07-29 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][320/625] eta 0:01:02 lr 0.000153 wd 0.0500 time 0.2021 (0.2041) data time 0.0007 (0.0020) model time 0.2014 (0.2021) loss 6.6889 (6.5088) grad_norm 1.7326 (inf) loss_scale 16384.0000 (26183.7757) mem 8975MB [2024-07-29 11:08:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][330/625] eta 0:01:00 lr 0.000155 wd 0.0500 time 0.2004 (0.2041) data time 0.0007 (0.0019) model time 0.1997 (0.2021) loss 6.7029 (6.5071) grad_norm 1.7607 (inf) loss_scale 16384.0000 (25887.7100) mem 8975MB [2024-07-29 11:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][340/625] eta 0:00:58 lr 0.000156 wd 0.0500 time 0.2028 (0.2040) data time 0.0008 (0.0019) model time 0.2020 (0.2021) loss 6.4506 (6.5028) grad_norm 2.1442 (inf) loss_scale 16384.0000 (25609.0088) mem 8975MB [2024-07-29 11:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][350/625] eta 0:00:56 lr 0.000158 wd 0.0500 time 0.1976 (0.2040) data time 0.0006 (0.0019) model time 0.1970 (0.2021) loss 6.5770 (6.4999) grad_norm 1.9690 (inf) loss_scale 16384.0000 (25346.1880) mem 8975MB [2024-07-29 11:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][360/625] eta 0:00:54 lr 0.000159 wd 0.0500 time 0.2036 (0.2051) data time 0.0007 (0.0018) model time 0.2029 (0.2034) loss 6.2621 (6.4926) grad_norm 1.5212 (inf) loss_scale 16384.0000 (25097.9280) mem 8975MB [2024-07-29 11:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][370/625] eta 0:00:52 lr 0.000161 wd 0.0500 time 0.2106 (0.2050) data time 0.0008 (0.0018) model time 0.2098 (0.2034) loss 6.5305 (6.4927) grad_norm 1.7109 (inf) loss_scale 16384.0000 (24863.0512) mem 8975MB [2024-07-29 11:08:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][380/625] eta 0:00:50 lr 0.000163 wd 0.0500 time 0.1988 (0.2050) data time 0.0011 (0.0018) model time 0.1977 (0.2034) loss 6.4545 (6.4901) grad_norm 1.4956 (inf) loss_scale 16384.0000 (24640.5039) mem 8975MB [2024-07-29 11:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][390/625] eta 0:00:48 lr 0.000164 wd 0.0500 time 0.1992 (0.2049) data time 0.0008 (0.0018) model time 0.1984 (0.2033) loss 6.3282 (6.4866) grad_norm 1.3421 (inf) loss_scale 16384.0000 (24429.3402) mem 8975MB [2024-07-29 11:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][400/625] eta 0:00:46 lr 0.000166 wd 0.0500 time 0.2036 (0.2055) data time 0.0006 (0.0017) model time 0.2029 (0.2040) loss 6.3189 (6.4872) grad_norm 1.9307 (inf) loss_scale 16384.0000 (24228.7082) mem 8975MB [2024-07-29 11:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][410/625] eta 0:00:44 lr 0.000167 wd 0.0500 time 0.2014 (0.2055) data time 0.0007 (0.0017) model time 0.2006 (0.2039) loss 6.1952 (6.4857) grad_norm 2.0638 (inf) loss_scale 16384.0000 (24037.8394) mem 8975MB [2024-07-29 11:08:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][420/625] eta 0:00:42 lr 0.000169 wd 0.0500 time 0.2064 (0.2054) data time 0.0008 (0.0017) model time 0.2056 (0.2039) loss 5.9908 (6.4812) grad_norm 1.6691 (inf) loss_scale 16384.0000 (23856.0380) mem 8975MB [2024-07-29 11:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][430/625] eta 0:00:40 lr 0.000171 wd 0.0500 time 0.2004 (0.2054) data time 0.0007 (0.0017) model time 0.1998 (0.2038) loss 6.1009 (6.4765) grad_norm 1.8516 (inf) loss_scale 16384.0000 (23682.6729) mem 8975MB [2024-07-29 11:08:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][440/625] eta 0:00:37 lr 0.000172 wd 0.0500 time 0.2004 (0.2053) data time 0.0008 (0.0017) model time 0.1995 (0.2038) loss 6.3016 (6.4709) grad_norm 2.6183 (inf) loss_scale 16384.0000 (23517.1701) mem 8975MB [2024-07-29 11:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][450/625] eta 0:00:35 lr 0.000174 wd 0.0500 time 0.1997 (0.2052) data time 0.0007 (0.0016) model time 0.1990 (0.2037) loss 6.5063 (6.4670) grad_norm 1.5210 (inf) loss_scale 16384.0000 (23359.0067) mem 8975MB [2024-07-29 11:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][460/625] eta 0:00:33 lr 0.000175 wd 0.0500 time 0.1994 (0.2051) data time 0.0009 (0.0016) model time 0.1985 (0.2036) loss 6.5585 (6.4664) grad_norm 2.1898 (inf) loss_scale 16384.0000 (23207.7050) mem 8975MB [2024-07-29 11:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][470/625] eta 0:00:31 lr 0.000177 wd 0.0500 time 0.2062 (0.2051) data time 0.0010 (0.0016) model time 0.2052 (0.2036) loss 6.4629 (6.4635) grad_norm 1.7925 (inf) loss_scale 16384.0000 (23062.8280) mem 8975MB [2024-07-29 11:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][480/625] eta 0:00:29 lr 0.000179 wd 0.0500 time 0.2018 (0.2050) data time 0.0007 (0.0016) model time 0.2011 (0.2035) loss 6.5565 (6.4638) grad_norm 2.4006 (inf) loss_scale 16384.0000 (22923.9751) mem 8975MB [2024-07-29 11:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][490/625] eta 0:00:27 lr 0.000180 wd 0.0500 time 0.1980 (0.2050) data time 0.0007 (0.0016) model time 0.1973 (0.2035) loss 5.9820 (6.4607) grad_norm 1.6748 (inf) loss_scale 16384.0000 (22790.7780) mem 8975MB [2024-07-29 11:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][500/625] eta 0:00:25 lr 0.000182 wd 0.0500 time 0.1988 (0.2049) data time 0.0009 (0.0016) model time 0.1979 (0.2035) loss 6.3576 (6.4568) grad_norm 1.4601 (inf) loss_scale 16384.0000 (22662.8982) mem 8975MB [2024-07-29 11:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][510/625] eta 0:00:23 lr 0.000183 wd 0.0500 time 0.2030 (0.2049) data time 0.0007 (0.0016) model time 0.2023 (0.2034) loss 5.8733 (6.4527) grad_norm 1.9565 (inf) loss_scale 16384.0000 (22540.0235) mem 8975MB [2024-07-29 11:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][520/625] eta 0:00:21 lr 0.000185 wd 0.0500 time 0.2056 (0.2048) data time 0.0006 (0.0015) model time 0.2050 (0.2034) loss 6.4315 (6.4510) grad_norm 2.2856 (inf) loss_scale 16384.0000 (22421.8656) mem 8975MB [2024-07-29 11:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][530/625] eta 0:00:19 lr 0.000187 wd 0.0500 time 0.1991 (0.2047) data time 0.0006 (0.0015) model time 0.1985 (0.2033) loss 6.6344 (6.4508) grad_norm 1.5576 (inf) loss_scale 16384.0000 (22308.1582) mem 8975MB [2024-07-29 11:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][540/625] eta 0:00:17 lr 0.000188 wd 0.0500 time 0.2017 (0.2047) data time 0.0010 (0.0015) model time 0.2008 (0.2033) loss 6.3280 (6.4497) grad_norm 2.0280 (inf) loss_scale 16384.0000 (22198.6543) mem 8975MB [2024-07-29 11:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][550/625] eta 0:00:15 lr 0.000190 wd 0.0500 time 0.2000 (0.2046) data time 0.0007 (0.0015) model time 0.1993 (0.2032) loss 6.3490 (6.4471) grad_norm 1.5547 (inf) loss_scale 16384.0000 (22093.1252) mem 8975MB [2024-07-29 11:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][560/625] eta 0:00:13 lr 0.000191 wd 0.0500 time 0.2039 (0.2046) data time 0.0008 (0.0015) model time 0.2031 (0.2031) loss 6.6315 (6.4456) grad_norm 1.8057 (inf) loss_scale 16384.0000 (21991.3583) mem 8975MB [2024-07-29 11:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][570/625] eta 0:00:11 lr 0.000193 wd 0.0500 time 0.2001 (0.2045) data time 0.0007 (0.0015) model time 0.1994 (0.2031) loss 6.3213 (6.4414) grad_norm 1.4644 (inf) loss_scale 16384.0000 (21893.1559) mem 8975MB [2024-07-29 11:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][580/625] eta 0:00:09 lr 0.000195 wd 0.0500 time 0.1998 (0.2045) data time 0.0009 (0.0015) model time 0.1989 (0.2031) loss 5.9089 (6.4378) grad_norm 1.6943 (inf) loss_scale 16384.0000 (21798.3339) mem 8975MB [2024-07-29 11:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][590/625] eta 0:00:07 lr 0.000196 wd 0.0500 time 0.1990 (0.2045) data time 0.0009 (0.0015) model time 0.1981 (0.2030) loss 6.2702 (6.4370) grad_norm 1.7601 (inf) loss_scale 16384.0000 (21706.7208) mem 8975MB [2024-07-29 11:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][600/625] eta 0:00:05 lr 0.000198 wd 0.0500 time 0.2011 (0.2044) data time 0.0008 (0.0014) model time 0.2003 (0.2030) loss 6.3569 (6.4339) grad_norm 1.9889 (inf) loss_scale 16384.0000 (21618.1564) mem 8975MB [2024-07-29 11:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][610/625] eta 0:00:03 lr 0.000199 wd 0.0500 time 0.2019 (0.2043) data time 0.0006 (0.0014) model time 0.2013 (0.2029) loss 5.9597 (6.4296) grad_norm 1.6884 (inf) loss_scale 16384.0000 (21532.4910) mem 8975MB [2024-07-29 11:09:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][620/625] eta 0:00:01 lr 0.000201 wd 0.0500 time 0.1995 (0.2043) data time 0.0005 (0.0014) model time 0.1989 (0.2029) loss 5.9463 (6.4264) grad_norm 2.4453 (inf) loss_scale 16384.0000 (21449.5845) mem 8975MB [2024-07-29 11:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 1 training takes 0:02:07 [2024-07-29 11:09:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:09:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.602 (0.602) Loss 5.0781 (5.0781) Acc@1 6.152 (6.152) Acc@5 21.875 (21.875) Mem 8975MB [2024-07-29 11:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 5.6016 (5.1971) Acc@1 2.881 (6.965) Acc@5 13.721 (22.053) Mem 8975MB [2024-07-29 11:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 5.4727 (5.2666) Acc@1 3.857 (7.075) Acc@5 13.330 (20.894) Mem 8975MB [2024-07-29 11:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 7.812 Acc@5 22.283 [2024-07-29 11:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 7.8% [2024-07-29 11:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 7.81% [2024-07-29 11:09:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 11:09:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 11:09:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.420 (0.420) Loss 7.0078 (7.0078) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 11:09:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 6.9609 (6.9379) Acc@1 0.000 (0.000) Acc@5 0.000 (0.666) Mem 8975MB [2024-07-29 11:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 6.9219 (6.9641) Acc@1 2.441 (0.116) Acc@5 4.883 (0.581) Mem 8975MB [2024-07-29 11:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 11:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 11:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][0/625] eta 0:12:24 lr 0.000202 wd 0.0500 time 1.1917 (1.1917) data time 0.5267 (0.5267) model time 0.0000 (0.0000) loss 5.6568 (5.6568) grad_norm 2.8276 (2.8276) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][10/625] eta 0:03:13 lr 0.000203 wd 0.0500 time 0.1983 (0.3145) data time 0.0007 (0.0487) model time 0.0000 (0.0000) loss 6.4196 (6.1637) grad_norm 1.8532 (2.1311) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][20/625] eta 0:02:38 lr 0.000205 wd 0.0500 time 0.1985 (0.2615) data time 0.0007 (0.0259) model time 0.0000 (0.0000) loss 6.1176 (6.2630) grad_norm 1.8571 (2.1487) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][30/625] eta 0:02:24 lr 0.000207 wd 0.0500 time 0.2057 (0.2420) data time 0.0006 (0.0179) model time 0.0000 (0.0000) loss 6.3936 (6.2668) grad_norm 2.4676 (2.1422) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][40/625] eta 0:02:15 lr 0.000208 wd 0.0500 time 0.1976 (0.2322) data time 0.0009 (0.0137) model time 0.0000 (0.0000) loss 5.9075 (6.2218) grad_norm 1.6667 (2.0570) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][50/625] eta 0:02:09 lr 0.000210 wd 0.0500 time 0.1972 (0.2259) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 5.9502 (6.2235) grad_norm 2.3111 (2.0857) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][60/625] eta 0:02:05 lr 0.000211 wd 0.0500 time 0.2022 (0.2218) data time 0.0009 (0.0095) model time 0.2013 (0.2004) loss 5.6146 (6.2020) grad_norm 1.7825 (2.0719) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][70/625] eta 0:02:01 lr 0.000213 wd 0.0500 time 0.2012 (0.2189) data time 0.0008 (0.0083) model time 0.2004 (0.2001) loss 5.6750 (6.1940) grad_norm 2.1505 (2.0833) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][80/625] eta 0:01:58 lr 0.000215 wd 0.0500 time 0.2034 (0.2167) data time 0.0007 (0.0074) model time 0.2027 (0.2003) loss 6.4443 (6.2127) grad_norm 2.9054 (2.1078) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:09:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][90/625] eta 0:01:55 lr 0.000216 wd 0.0500 time 0.1987 (0.2151) data time 0.0007 (0.0067) model time 0.1980 (0.2004) loss 6.3490 (6.2182) grad_norm 1.7087 (2.0729) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][100/625] eta 0:01:52 lr 0.000218 wd 0.0500 time 0.2001 (0.2138) data time 0.0009 (0.0061) model time 0.1992 (0.2006) loss 6.3375 (6.2067) grad_norm 1.7931 (2.0740) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][110/625] eta 0:01:49 lr 0.000219 wd 0.0500 time 0.1979 (0.2127) data time 0.0008 (0.0056) model time 0.1971 (0.2006) loss 5.8627 (6.2137) grad_norm 1.6182 (2.0478) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][120/625] eta 0:01:46 lr 0.000221 wd 0.0500 time 0.2030 (0.2117) data time 0.0009 (0.0052) model time 0.2021 (0.2005) loss 5.6349 (6.2101) grad_norm 1.5877 (2.0596) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][130/625] eta 0:01:44 lr 0.000223 wd 0.0500 time 0.2035 (0.2109) data time 0.0007 (0.0049) model time 0.2028 (0.2005) loss 6.2781 (6.2065) grad_norm 2.2134 (2.0733) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][140/625] eta 0:01:42 lr 0.000224 wd 0.0500 time 0.2055 (0.2104) data time 0.0007 (0.0046) model time 0.2047 (0.2007) loss 6.1972 (6.2018) grad_norm 2.2608 (2.0679) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][150/625] eta 0:01:39 lr 0.000226 wd 0.0500 time 0.1995 (0.2099) data time 0.0007 (0.0044) model time 0.1987 (0.2008) loss 5.3117 (6.1875) grad_norm 2.2472 (2.0681) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][160/625] eta 0:01:37 lr 0.000227 wd 0.0500 time 0.1985 (0.2093) data time 0.0010 (0.0041) model time 0.1975 (0.2008) loss 6.4319 (6.1867) grad_norm 2.0446 (2.0963) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][170/625] eta 0:01:35 lr 0.000229 wd 0.0500 time 0.2007 (0.2089) data time 0.0008 (0.0040) model time 0.2000 (0.2008) loss 5.7453 (6.1775) grad_norm 2.1073 (2.0870) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][180/625] eta 0:01:32 lr 0.000231 wd 0.0500 time 0.2019 (0.2084) data time 0.0009 (0.0038) model time 0.2010 (0.2007) loss 5.7733 (6.1757) grad_norm 2.0310 (2.0786) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][190/625] eta 0:01:30 lr 0.000232 wd 0.0500 time 0.2032 (0.2080) data time 0.0009 (0.0036) model time 0.2023 (0.2007) loss 6.5161 (6.1724) grad_norm 2.0451 (2.0843) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][200/625] eta 0:01:28 lr 0.000234 wd 0.0500 time 0.2004 (0.2077) data time 0.0008 (0.0035) model time 0.1996 (0.2006) loss 6.0507 (6.1710) grad_norm 2.3485 (2.0919) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][210/625] eta 0:01:26 lr 0.000235 wd 0.0500 time 0.1970 (0.2073) data time 0.0008 (0.0034) model time 0.1962 (0.2005) loss 5.8275 (6.1653) grad_norm 2.0261 (2.0930) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][220/625] eta 0:01:23 lr 0.000237 wd 0.0500 time 0.2030 (0.2071) data time 0.0007 (0.0033) model time 0.2023 (0.2006) loss 5.5872 (6.1619) grad_norm 2.1986 (2.1024) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][230/625] eta 0:01:21 lr 0.000239 wd 0.0500 time 0.2043 (0.2069) data time 0.0007 (0.0032) model time 0.2036 (0.2006) loss 5.9033 (6.1553) grad_norm 2.1276 (2.1040) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][240/625] eta 0:01:19 lr 0.000240 wd 0.0500 time 0.2025 (0.2066) data time 0.0007 (0.0031) model time 0.2018 (0.2006) loss 5.7968 (6.1515) grad_norm 2.0337 (2.1090) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][250/625] eta 0:01:17 lr 0.000242 wd 0.0500 time 0.1978 (0.2065) data time 0.0007 (0.0030) model time 0.1971 (0.2007) loss 5.5981 (6.1461) grad_norm 2.1992 (2.1216) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][260/625] eta 0:01:15 lr 0.000243 wd 0.0500 time 0.1994 (0.2062) data time 0.0008 (0.0029) model time 0.1986 (0.2006) loss 5.9127 (6.1430) grad_norm 1.7457 (2.1111) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 11:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][270/625] eta 0:01:13 lr 0.000245 wd 0.0500 time 0.1997 (0.2060) data time 0.0009 (0.0028) model time 0.1987 (0.2006) loss 6.1400 (6.1460) grad_norm 1.9373 (inf) loss_scale 8192.0000 (16172.3985) mem 8975MB [2024-07-29 11:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][280/625] eta 0:01:11 lr 0.000247 wd 0.0500 time 0.2011 (0.2059) data time 0.0008 (0.0027) model time 0.2002 (0.2006) loss 6.4365 (6.1427) grad_norm 1.4109 (inf) loss_scale 8192.0000 (15888.3986) mem 8975MB [2024-07-29 11:10:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][290/625] eta 0:01:08 lr 0.000248 wd 0.0500 time 0.2036 (0.2057) data time 0.0007 (0.0027) model time 0.2030 (0.2006) loss 5.6008 (6.1365) grad_norm 2.2228 (inf) loss_scale 8192.0000 (15623.9175) mem 8975MB [2024-07-29 11:10:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][300/625] eta 0:01:06 lr 0.000250 wd 0.0500 time 0.2040 (0.2056) data time 0.0008 (0.0026) model time 0.2032 (0.2006) loss 6.0279 (6.1306) grad_norm 2.5360 (inf) loss_scale 8192.0000 (15377.0100) mem 8975MB [2024-07-29 11:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][310/625] eta 0:01:04 lr 0.000251 wd 0.0500 time 0.2018 (0.2055) data time 0.0009 (0.0026) model time 0.2009 (0.2006) loss 6.3170 (6.1309) grad_norm 2.0173 (inf) loss_scale 8192.0000 (15145.9807) mem 8975MB [2024-07-29 11:10:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][320/625] eta 0:01:02 lr 0.000253 wd 0.0500 time 0.2003 (0.2058) data time 0.0008 (0.0025) model time 0.1995 (0.2011) loss 5.2701 (6.1305) grad_norm 1.7085 (inf) loss_scale 8192.0000 (14929.3458) mem 8975MB [2024-07-29 11:10:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][330/625] eta 0:01:00 lr 0.000255 wd 0.0500 time 0.2054 (0.2056) data time 0.0006 (0.0025) model time 0.2048 (0.2011) loss 6.4892 (6.1275) grad_norm 1.8984 (inf) loss_scale 8192.0000 (14725.8006) mem 8975MB [2024-07-29 11:10:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][340/625] eta 0:00:58 lr 0.000256 wd 0.0500 time 0.2020 (0.2055) data time 0.0007 (0.0024) model time 0.2013 (0.2011) loss 5.2954 (6.1242) grad_norm 3.4505 (inf) loss_scale 8192.0000 (14534.1935) mem 8975MB [2024-07-29 11:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][350/625] eta 0:00:56 lr 0.000258 wd 0.0500 time 0.2022 (0.2054) data time 0.0010 (0.0024) model time 0.2012 (0.2011) loss 5.9847 (6.1223) grad_norm 2.2189 (inf) loss_scale 8192.0000 (14353.5043) mem 8975MB [2024-07-29 11:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][360/625] eta 0:00:54 lr 0.000259 wd 0.0500 time 0.2012 (0.2053) data time 0.0008 (0.0023) model time 0.2004 (0.2010) loss 5.4783 (6.1238) grad_norm 2.1125 (inf) loss_scale 8192.0000 (14182.8255) mem 8975MB [2024-07-29 11:10:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][370/625] eta 0:00:52 lr 0.000261 wd 0.0500 time 0.2012 (0.2052) data time 0.0009 (0.0023) model time 0.2003 (0.2010) loss 5.8281 (6.1151) grad_norm 2.8616 (inf) loss_scale 8192.0000 (14021.3477) mem 8975MB [2024-07-29 11:10:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][380/625] eta 0:00:50 lr 0.000263 wd 0.0500 time 0.2093 (0.2052) data time 0.0009 (0.0023) model time 0.2084 (0.2011) loss 6.4485 (6.1126) grad_norm 2.8771 (inf) loss_scale 8192.0000 (13868.3465) mem 8975MB [2024-07-29 11:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][390/625] eta 0:00:48 lr 0.000264 wd 0.0500 time 0.2048 (0.2051) data time 0.0006 (0.0022) model time 0.2042 (0.2011) loss 5.8501 (6.1113) grad_norm 1.7995 (inf) loss_scale 8192.0000 (13723.1714) mem 8975MB [2024-07-29 11:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][400/625] eta 0:00:46 lr 0.000266 wd 0.0500 time 0.2047 (0.2050) data time 0.0009 (0.0022) model time 0.2039 (0.2011) loss 6.2511 (6.1104) grad_norm 1.7363 (inf) loss_scale 8192.0000 (13585.2369) mem 8975MB [2024-07-29 11:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][410/625] eta 0:00:44 lr 0.000267 wd 0.0500 time 0.2020 (0.2049) data time 0.0010 (0.0022) model time 0.2009 (0.2011) loss 5.7828 (6.1057) grad_norm 1.7741 (inf) loss_scale 8192.0000 (13454.0146) mem 8975MB [2024-07-29 11:11:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][420/625] eta 0:00:42 lr 0.000269 wd 0.0500 time 0.1967 (0.2049) data time 0.0009 (0.0021) model time 0.1958 (0.2011) loss 6.3177 (6.1016) grad_norm 1.5360 (inf) loss_scale 8192.0000 (13329.0261) mem 8975MB [2024-07-29 11:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][430/625] eta 0:00:39 lr 0.000271 wd 0.0500 time 0.2000 (0.2048) data time 0.0007 (0.0021) model time 0.1993 (0.2011) loss 5.7884 (6.0987) grad_norm 2.4190 (inf) loss_scale 8192.0000 (13209.8376) mem 8975MB [2024-07-29 11:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][440/625] eta 0:00:37 lr 0.000272 wd 0.0500 time 0.2017 (0.2047) data time 0.0007 (0.0021) model time 0.2010 (0.2011) loss 6.2668 (6.0957) grad_norm 1.6564 (inf) loss_scale 8192.0000 (13096.0544) mem 8975MB [2024-07-29 11:11:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][450/625] eta 0:00:35 lr 0.000274 wd 0.0500 time 0.2025 (0.2047) data time 0.0007 (0.0020) model time 0.2018 (0.2011) loss 6.0490 (6.0931) grad_norm 1.6544 (inf) loss_scale 8192.0000 (12987.3171) mem 8975MB [2024-07-29 11:11:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][460/625] eta 0:00:33 lr 0.000275 wd 0.0500 time 0.1987 (0.2046) data time 0.0009 (0.0020) model time 0.1978 (0.2011) loss 6.0056 (6.0925) grad_norm 2.4770 (inf) loss_scale 8192.0000 (12883.2972) mem 8975MB [2024-07-29 11:11:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][470/625] eta 0:00:31 lr 0.000277 wd 0.0500 time 0.2035 (0.2046) data time 0.0007 (0.0020) model time 0.2028 (0.2012) loss 6.1806 (6.0911) grad_norm 2.0554 (inf) loss_scale 8192.0000 (12783.6943) mem 8975MB [2024-07-29 11:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][480/625] eta 0:00:29 lr 0.000279 wd 0.0500 time 0.2024 (0.2046) data time 0.0008 (0.0020) model time 0.2016 (0.2012) loss 5.9275 (6.0885) grad_norm 2.4988 (inf) loss_scale 8192.0000 (12688.2328) mem 8975MB [2024-07-29 11:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][490/625] eta 0:00:27 lr 0.000280 wd 0.0500 time 0.1979 (0.2045) data time 0.0008 (0.0019) model time 0.1971 (0.2012) loss 6.4448 (6.0876) grad_norm 2.0801 (inf) loss_scale 8192.0000 (12596.6599) mem 8975MB [2024-07-29 11:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][500/625] eta 0:00:25 lr 0.000282 wd 0.0500 time 0.2028 (0.2044) data time 0.0007 (0.0019) model time 0.2022 (0.2011) loss 5.5853 (6.0845) grad_norm 2.3127 (inf) loss_scale 8192.0000 (12508.7425) mem 8975MB [2024-07-29 11:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][510/625] eta 0:00:23 lr 0.000283 wd 0.0500 time 0.2016 (0.2044) data time 0.0009 (0.0019) model time 0.2007 (0.2011) loss 5.9727 (6.0785) grad_norm 2.0687 (inf) loss_scale 8192.0000 (12424.2661) mem 8975MB [2024-07-29 11:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][520/625] eta 0:00:21 lr 0.000285 wd 0.0500 time 0.1979 (0.2043) data time 0.0007 (0.0019) model time 0.1972 (0.2011) loss 6.2235 (6.0803) grad_norm 2.3641 (inf) loss_scale 8192.0000 (12343.0326) mem 8975MB [2024-07-29 11:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 11:11:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:11:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:15:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 11:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 11:15:41 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 2) [2024-07-29 11:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 11:16:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][530/625] eta 0:01:56 lr 0.000287 wd 0.0500 time 0.2205 (1.2303) data time 0.0010 (0.0997) model time 0.2194 (1.1306) loss 6.2442 (6.2058) grad_norm 1.9111 (2.2004) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][540/625] eta 0:01:01 lr 0.000288 wd 0.0500 time 0.2113 (0.7217) data time 0.0010 (0.0504) model time 0.2103 (0.6712) loss 6.1909 (6.1061) grad_norm 1.8765 (2.1017) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][550/625] eta 0:00:41 lr 0.000290 wd 0.0500 time 0.2102 (0.5523) data time 0.0010 (0.0340) model time 0.2092 (0.5184) loss 6.2195 (6.1244) grad_norm 1.9837 (2.1562) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][560/625] eta 0:00:30 lr 0.000291 wd 0.0500 time 0.2077 (0.4679) data time 0.0009 (0.0258) model time 0.2068 (0.4421) loss 5.5580 (6.0558) grad_norm 2.1976 (2.1867) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][570/625] eta 0:00:22 lr 0.000293 wd 0.0500 time 0.2137 (0.4170) data time 0.0009 (0.0208) model time 0.2128 (0.3962) loss 5.5453 (6.0292) grad_norm 2.2751 (2.1359) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][580/625] eta 0:00:17 lr 0.000295 wd 0.0500 time 0.2188 (0.3831) data time 0.0008 (0.0175) model time 0.2180 (0.3655) loss 5.8795 (6.0024) grad_norm 1.9277 (2.1160) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][590/625] eta 0:00:12 lr 0.000296 wd 0.0500 time 0.2129 (0.3589) data time 0.0007 (0.0152) model time 0.2122 (0.3438) loss 5.4732 (5.9838) grad_norm 1.6469 (2.1538) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 11:16:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:16:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 11:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 11:57:09 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 2) [2024-07-29 11:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 11:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][600/625] eta 0:00:31 lr 0.000298 wd 0.0500 time 0.2071 (1.2552) data time 0.0007 (0.0830) model time 0.2063 (1.1722) loss 6.2199 (6.1726) grad_norm 1.5789 (2.2440) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:57:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][610/625] eta 0:00:10 lr 0.000299 wd 0.0500 time 0.2023 (0.6742) data time 0.0005 (0.0377) model time 0.2018 (0.6365) loss 6.3008 (6.0766) grad_norm 1.7359 (2.0382) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][620/625] eta 0:00:02 lr 0.000301 wd 0.0500 time 0.2080 (0.5064) data time 0.0008 (0.0245) model time 0.2072 (0.4819) loss 6.2320 (6.0832) grad_norm 2.0576 (2.1383) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 11:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 2 training takes 0:00:14 [2024-07-29 11:57:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:57:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 11:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 3.9238 (3.9238) Acc@1 22.754 (22.754) Acc@5 48.291 (48.291) Mem 8977MB [2024-07-29 11:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.096) Loss 4.7734 (4.0428) Acc@1 13.574 (20.059) Acc@5 32.715 (44.806) Mem 8977MB [2024-07-29 11:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 4.6680 (4.2882) Acc@1 11.523 (17.729) Acc@5 30.908 (40.225) Mem 8977MB [2024-07-29 11:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 18.674 Acc@5 41.313 [2024-07-29 11:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 18.7% [2024-07-29 11:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 18.67% [2024-07-29 11:57:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 11:57:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 11:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 6.9453 (6.9453) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8977MB [2024-07-29 11:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.099) Loss 6.9883 (6.9087) Acc@1 0.000 (0.222) Acc@5 0.000 (0.444) Mem 8977MB [2024-07-29 11:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 6.9492 (6.9634) Acc@1 0.000 (0.116) Acc@5 2.441 (0.581) Mem 8977MB [2024-07-29 11:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 11:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 11:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.10% [2024-07-29 11:57:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 11:57:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 11:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][0/625] eta 0:07:57 lr 0.000302 wd 0.0500 time 0.7636 (0.7636) data time 0.4777 (0.4777) model time 0.0000 (0.0000) loss 6.1682 (6.1682) grad_norm 1.6318 (1.6318) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 11:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][10/625] eta 0:02:40 lr 0.000303 wd 0.0500 time 0.2047 (0.2616) data time 0.0010 (0.0444) model time 0.0000 (0.0000) loss 5.9159 (5.8234) grad_norm 2.7472 (2.7212) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][20/625] eta 0:02:24 lr 0.000305 wd 0.0500 time 0.2100 (0.2389) data time 0.0010 (0.0237) model time 0.0000 (0.0000) loss 6.0021 (5.8736) grad_norm 3.5622 (2.7110) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][30/625] eta 0:02:17 lr 0.000306 wd 0.0500 time 0.2154 (0.2313) data time 0.0011 (0.0164) model time 0.0000 (0.0000) loss 5.9177 (5.8675) grad_norm 2.0206 (2.5775) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][40/625] eta 0:02:13 lr 0.000308 wd 0.0500 time 0.2093 (0.2276) data time 0.0011 (0.0126) model time 0.0000 (0.0000) loss 6.3902 (5.8371) grad_norm 2.6121 (2.5124) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][50/625] eta 0:02:09 lr 0.000310 wd 0.0500 time 0.2052 (0.2254) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 4.9903 (5.8076) grad_norm 1.6994 (2.3961) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][60/625] eta 0:02:06 lr 0.000311 wd 0.0500 time 0.2091 (0.2232) data time 0.0008 (0.0088) model time 0.2083 (0.2111) loss 6.2025 (5.8217) grad_norm 1.6891 (2.4189) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][70/625] eta 0:02:03 lr 0.000313 wd 0.0500 time 0.2113 (0.2220) data time 0.0009 (0.0077) model time 0.2105 (0.2122) loss 6.3135 (5.8656) grad_norm 2.2897 (2.3763) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][80/625] eta 0:02:00 lr 0.000314 wd 0.0500 time 0.2094 (0.2208) data time 0.0007 (0.0069) model time 0.2087 (0.2119) loss 5.5894 (5.8662) grad_norm 2.7794 (2.3600) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][90/625] eta 0:01:57 lr 0.000316 wd 0.0500 time 0.2084 (0.2196) data time 0.0010 (0.0063) model time 0.2074 (0.2111) loss 6.1454 (5.8701) grad_norm 2.4211 (2.3346) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][100/625] eta 0:01:54 lr 0.000318 wd 0.0500 time 0.2129 (0.2189) data time 0.0012 (0.0057) model time 0.2117 (0.2112) loss 6.2106 (5.8659) grad_norm 1.6089 (2.2788) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][110/625] eta 0:01:52 lr 0.000319 wd 0.0500 time 0.2068 (0.2183) data time 0.0011 (0.0053) model time 0.2057 (0.2112) loss 6.3158 (5.8568) grad_norm 1.4563 (2.2331) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][120/625] eta 0:01:49 lr 0.000321 wd 0.0500 time 0.2099 (0.2177) data time 0.0010 (0.0050) model time 0.2089 (0.2111) loss 5.8099 (5.8480) grad_norm 2.1117 (2.2193) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][130/625] eta 0:01:47 lr 0.000322 wd 0.0500 time 0.2113 (0.2172) data time 0.0008 (0.0047) model time 0.2105 (0.2110) loss 5.1637 (5.8469) grad_norm 2.6457 (2.2242) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][140/625] eta 0:01:45 lr 0.000324 wd 0.0500 time 0.2077 (0.2168) data time 0.0010 (0.0044) model time 0.2067 (0.2109) loss 6.0223 (5.8395) grad_norm 2.7462 (2.2100) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][150/625] eta 0:01:42 lr 0.000326 wd 0.0500 time 0.2204 (0.2167) data time 0.0011 (0.0042) model time 0.2194 (0.2111) loss 6.2016 (5.8318) grad_norm 1.8133 (2.2018) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][160/625] eta 0:01:40 lr 0.000327 wd 0.0500 time 0.2108 (0.2164) data time 0.0007 (0.0040) model time 0.2100 (0.2112) loss 5.8319 (5.8321) grad_norm 2.0732 (2.2052) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][170/625] eta 0:01:38 lr 0.000329 wd 0.0500 time 0.2100 (0.2162) data time 0.0008 (0.0038) model time 0.2092 (0.2112) loss 5.2152 (5.8176) grad_norm 3.7301 (2.2197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][180/625] eta 0:01:36 lr 0.000330 wd 0.0500 time 0.2142 (0.2158) data time 0.0010 (0.0037) model time 0.2131 (0.2110) loss 5.1439 (5.8091) grad_norm 2.5631 (2.2169) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][190/625] eta 0:01:33 lr 0.000332 wd 0.0500 time 0.2148 (0.2158) data time 0.0008 (0.0035) model time 0.2140 (0.2112) loss 5.3610 (5.8040) grad_norm 2.0378 (2.2333) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][200/625] eta 0:01:31 lr 0.000334 wd 0.0500 time 0.2137 (0.2156) data time 0.0011 (0.0034) model time 0.2126 (0.2112) loss 5.2045 (5.7975) grad_norm 1.5359 (2.2197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][210/625] eta 0:01:29 lr 0.000335 wd 0.0500 time 0.2080 (0.2166) data time 0.0010 (0.0033) model time 0.2070 (0.2127) loss 5.9969 (5.7980) grad_norm 1.7091 (2.2039) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][220/625] eta 0:01:27 lr 0.000337 wd 0.0500 time 0.2117 (0.2164) data time 0.0011 (0.0032) model time 0.2107 (0.2126) loss 6.1178 (5.7841) grad_norm 3.0117 (2.1947) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][230/625] eta 0:01:25 lr 0.000338 wd 0.0500 time 0.2321 (0.2163) data time 0.0008 (0.0031) model time 0.2313 (0.2126) loss 5.7141 (5.7733) grad_norm 1.4681 (2.1973) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][240/625] eta 0:01:23 lr 0.000340 wd 0.0500 time 0.2154 (0.2162) data time 0.0008 (0.0030) model time 0.2146 (0.2126) loss 6.0924 (5.7654) grad_norm 1.7143 (2.1981) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][250/625] eta 0:01:21 lr 0.000342 wd 0.0500 time 0.2190 (0.2160) data time 0.0010 (0.0029) model time 0.2180 (0.2126) loss 6.0169 (5.7640) grad_norm 1.8132 (2.1957) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][260/625] eta 0:01:18 lr 0.000343 wd 0.0500 time 0.2054 (0.2159) data time 0.0010 (0.0029) model time 0.2044 (0.2125) loss 5.5515 (5.7542) grad_norm 2.8386 (2.1923) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][270/625] eta 0:01:16 lr 0.000345 wd 0.0500 time 0.2165 (0.2158) data time 0.0007 (0.0028) model time 0.2158 (0.2125) loss 5.7149 (5.7454) grad_norm 1.7643 (2.1913) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][280/625] eta 0:01:14 lr 0.000346 wd 0.0500 time 0.2531 (0.2159) data time 0.0008 (0.0027) model time 0.2523 (0.2127) loss 6.0925 (5.7454) grad_norm 3.1228 (2.1933) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][290/625] eta 0:01:12 lr 0.000348 wd 0.0500 time 0.2171 (0.2158) data time 0.0010 (0.0027) model time 0.2161 (0.2127) loss 6.0244 (5.7546) grad_norm 1.5712 (2.1860) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][300/625] eta 0:01:10 lr 0.000350 wd 0.0500 time 0.2119 (0.2156) data time 0.0010 (0.0026) model time 0.2109 (0.2126) loss 5.9306 (5.7536) grad_norm 2.2994 (2.1907) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][310/625] eta 0:01:07 lr 0.000351 wd 0.0500 time 0.2111 (0.2155) data time 0.0010 (0.0026) model time 0.2101 (0.2125) loss 5.7530 (5.7546) grad_norm 1.5904 (2.1844) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][320/625] eta 0:01:05 lr 0.000353 wd 0.0500 time 0.2092 (0.2155) data time 0.0008 (0.0025) model time 0.2084 (0.2126) loss 6.0915 (5.7549) grad_norm 2.4478 (2.1768) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][330/625] eta 0:01:03 lr 0.000354 wd 0.0500 time 0.2241 (0.2155) data time 0.0008 (0.0025) model time 0.2233 (0.2126) loss 5.8712 (5.7496) grad_norm 2.1697 (2.1831) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][340/625] eta 0:01:01 lr 0.000356 wd 0.0500 time 0.2153 (0.2154) data time 0.0007 (0.0024) model time 0.2145 (0.2126) loss 6.0702 (5.7451) grad_norm 1.6142 (2.1996) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][350/625] eta 0:00:59 lr 0.000358 wd 0.0500 time 0.4313 (0.2159) data time 0.0010 (0.0024) model time 0.4303 (0.2133) loss 5.0776 (5.7393) grad_norm 2.9196 (2.2063) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][360/625] eta 0:00:57 lr 0.000359 wd 0.0500 time 0.2116 (0.2158) data time 0.0010 (0.0023) model time 0.2106 (0.2132) loss 6.1114 (5.7344) grad_norm 3.3173 (2.2088) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][370/625] eta 0:00:54 lr 0.000361 wd 0.0500 time 0.2061 (0.2157) data time 0.0010 (0.0023) model time 0.2051 (0.2131) loss 5.3527 (5.7388) grad_norm 3.1047 (2.2077) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][380/625] eta 0:00:52 lr 0.000362 wd 0.0500 time 0.2137 (0.2156) data time 0.0011 (0.0023) model time 0.2126 (0.2131) loss 6.2593 (5.7413) grad_norm 2.3415 (2.2185) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 11:59:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 11:59:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 11:59:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:18:35 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 3) [2024-07-29 12:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][390/625] eta 0:08:11 lr 0.000364 wd 0.0500 time 0.2050 (2.0923) data time 0.0006 (0.1837) model time 0.2045 (1.9086) loss 5.9551 (5.9363) grad_norm 1.7370 (2.4534) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][400/625] eta 0:02:46 lr 0.000366 wd 0.0500 time 0.1959 (0.7398) data time 0.0007 (0.0532) model time 0.1953 (0.6866) loss 6.0611 (5.8341) grad_norm 2.1369 (2.2974) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][410/625] eta 0:01:50 lr 0.000367 wd 0.0500 time 0.1974 (0.5147) data time 0.0009 (0.0314) model time 0.1966 (0.4833) loss 5.4539 (5.8584) grad_norm 3.4672 (2.2316) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][420/625] eta 0:01:26 lr 0.000369 wd 0.0500 time 0.1918 (0.4219) data time 0.0008 (0.0224) model time 0.1911 (0.3995) loss 5.2251 (5.8310) grad_norm 1.7617 (2.3111) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][430/625] eta 0:01:12 lr 0.000370 wd 0.0500 time 0.2030 (0.3717) data time 0.0006 (0.0176) model time 0.2024 (0.3542) loss 5.9052 (5.7970) grad_norm 1.9529 (2.3493) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][440/625] eta 0:01:02 lr 0.000372 wd 0.0500 time 0.1999 (0.3398) data time 0.0007 (0.0145) model time 0.1992 (0.3253) loss 6.2107 (5.7883) grad_norm 1.3450 (2.2710) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 12:19:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:19:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:21:36 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:21:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:21:48 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:21:49 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:21:49 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:21:49 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 3) [2024-07-29 12:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:22:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][450/625] eta 0:12:26 lr 0.000374 wd 0.0500 time 0.9943 (4.2629) data time 0.0010 (0.5619) model time 0.9933 (3.7010) loss 6.3064 (6.2727) grad_norm 1.8793 (1.9531) loss_scale 8192.0000 (8192.0000) mem 8973MB [2024-07-29 12:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][460/625] eta 0:02:26 lr 0.000375 wd 0.0500 time 0.2033 (0.8859) data time 0.0010 (0.0946) model time 0.2023 (0.7913) loss 5.1697 (5.8846) grad_norm 1.6051 (2.0866) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][470/625] eta 0:01:29 lr 0.000377 wd 0.0500 time 0.2036 (0.5797) data time 0.0010 (0.0521) model time 0.2026 (0.5275) loss 5.9101 (5.8934) grad_norm 2.1632 (2.1803) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][480/625] eta 0:01:07 lr 0.000378 wd 0.0500 time 0.2360 (0.4653) data time 0.0008 (0.0362) model time 0.2353 (0.4291) loss 5.5300 (5.8682) grad_norm 1.9265 (2.1621) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][490/625] eta 0:00:54 lr 0.000380 wd 0.0500 time 0.2057 (0.4046) data time 0.0010 (0.0278) model time 0.2048 (0.3768) loss 5.8214 (5.8099) grad_norm 2.2434 (2.2091) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][500/625] eta 0:00:45 lr 0.000382 wd 0.0500 time 0.2053 (0.3672) data time 0.0010 (0.0227) model time 0.2043 (0.3445) loss 5.4959 (5.7811) grad_norm 1.8422 (2.1569) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][510/625] eta 0:00:39 lr 0.000383 wd 0.0500 time 0.2070 (0.3419) data time 0.0008 (0.0192) model time 0.2063 (0.3227) loss 6.1125 (5.7463) grad_norm 2.8950 (2.1437) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][520/625] eta 0:00:34 lr 0.000385 wd 0.0500 time 0.2115 (0.3239) data time 0.0011 (0.0167) model time 0.2104 (0.3072) loss 5.5307 (5.7047) grad_norm 1.4326 (2.1764) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][530/625] eta 0:00:29 lr 0.000386 wd 0.0500 time 0.2093 (0.3098) data time 0.0010 (0.0148) model time 0.2083 (0.2951) loss 5.9621 (5.6935) grad_norm 3.1409 (2.2431) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][540/625] eta 0:00:25 lr 0.000388 wd 0.0500 time 0.2104 (0.2990) data time 0.0007 (0.0133) model time 0.2097 (0.2857) loss 4.5803 (5.6701) grad_norm 1.7867 (2.2409) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][550/625] eta 0:00:21 lr 0.000390 wd 0.0500 time 0.2122 (0.2905) data time 0.0008 (0.0121) model time 0.2114 (0.2784) loss 5.9814 (5.6826) grad_norm 2.0038 (2.2375) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][560/625] eta 0:00:18 lr 0.000391 wd 0.0500 time 0.2130 (0.2836) data time 0.0011 (0.0111) model time 0.2120 (0.2725) loss 5.8253 (5.6811) grad_norm 2.1657 (2.1946) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][570/625] eta 0:00:15 lr 0.000393 wd 0.0500 time 0.2092 (0.2776) data time 0.0007 (0.0103) model time 0.2085 (0.2673) loss 5.7574 (5.6767) grad_norm 2.0390 (2.2036) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][580/625] eta 0:00:12 lr 0.000394 wd 0.0500 time 0.2129 (0.2727) data time 0.0011 (0.0096) model time 0.2118 (0.2631) loss 5.7403 (5.6646) grad_norm 2.0985 (2.2039) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][590/625] eta 0:00:09 lr 0.000396 wd 0.0500 time 0.2144 (0.2684) data time 0.0010 (0.0090) model time 0.2134 (0.2594) loss 5.6785 (5.6577) grad_norm 1.8785 (2.1968) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][600/625] eta 0:00:06 lr 0.000398 wd 0.0500 time 0.2130 (0.2647) data time 0.0011 (0.0085) model time 0.2119 (0.2563) loss 5.7498 (5.6466) grad_norm 1.6600 (2.1937) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][610/625] eta 0:00:03 lr 0.000399 wd 0.0500 time 0.2063 (0.2616) data time 0.0007 (0.0080) model time 0.2056 (0.2536) loss 5.8590 (5.6536) grad_norm 2.3811 (2.1822) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][620/625] eta 0:00:01 lr 0.000401 wd 0.0500 time 0.2069 (0.2585) data time 0.0007 (0.0076) model time 0.2063 (0.2509) loss 5.3568 (5.6537) grad_norm 1.6228 (2.2003) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 3 training takes 0:00:45 [2024-07-29 12:22:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:22:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:22:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 2.9141 (2.9141) Acc@1 39.697 (39.697) Acc@5 69.531 (69.531) Mem 8975MB [2024-07-29 12:22:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 3.9766 (3.1474) Acc@1 24.658 (34.486) Acc@5 48.096 (62.891) Mem 8975MB [2024-07-29 12:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 4.0312 (3.4902) Acc@1 18.408 (29.755) Acc@5 42.383 (55.892) Mem 8975MB [2024-07-29 12:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 30.170 Acc@5 56.354 [2024-07-29 12:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 30.2% [2024-07-29 12:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 30.17% [2024-07-29 12:22:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:22:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:22:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 6.8516 (6.8516) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 12:22:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.094) Loss 7.0352 (6.8775) Acc@1 0.000 (0.222) Acc@5 0.000 (0.444) Mem 8975MB [2024-07-29 12:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.075) Loss 6.9961 (6.9771) Acc@1 0.000 (0.116) Acc@5 2.344 (0.577) Mem 8975MB [2024-07-29 12:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.506 [2024-07-29 12:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.10% [2024-07-29 12:22:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 12:22:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 12:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][0/625] eta 0:07:34 lr 0.000402 wd 0.0500 time 0.7277 (0.7277) data time 0.4204 (0.4204) model time 0.0000 (0.0000) loss 5.3805 (5.3805) grad_norm 1.8482 (1.8482) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 12:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][10/625] eta 0:02:37 lr 0.000403 wd 0.0500 time 0.2088 (0.2568) data time 0.0008 (0.0394) model time 0.0000 (0.0000) loss 5.4235 (5.3984) grad_norm 2.3909 (2.0319) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][20/625] eta 0:02:21 lr 0.000405 wd 0.0500 time 0.1996 (0.2342) data time 0.0010 (0.0211) model time 0.0000 (0.0000) loss 5.6835 (5.4328) grad_norm 2.3053 (2.1922) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][30/625] eta 0:02:15 lr 0.000406 wd 0.0500 time 0.2061 (0.2282) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 5.5944 (5.3482) grad_norm 1.7866 (2.1126) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][40/625] eta 0:02:11 lr 0.000408 wd 0.0500 time 0.2233 (0.2251) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 5.8169 (5.3655) grad_norm 2.5649 (2.1155) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][50/625] eta 0:02:08 lr 0.000410 wd 0.0500 time 0.2170 (0.2228) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 5.7709 (5.4091) grad_norm 1.8861 (2.1000) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][60/625] eta 0:02:05 lr 0.000411 wd 0.0500 time 0.2082 (0.2213) data time 0.0008 (0.0080) model time 0.2073 (0.2124) loss 5.4046 (5.4077) grad_norm 2.5563 (2.1116) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][70/625] eta 0:02:02 lr 0.000413 wd 0.0500 time 0.2096 (0.2202) data time 0.0011 (0.0070) model time 0.2085 (0.2123) loss 6.0538 (5.4150) grad_norm 1.4688 (2.0965) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][80/625] eta 0:01:59 lr 0.000414 wd 0.0500 time 0.2131 (0.2199) data time 0.0009 (0.0063) model time 0.2123 (0.2137) loss 4.1406 (5.3794) grad_norm 2.6405 (2.1192) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][90/625] eta 0:01:57 lr 0.000416 wd 0.0500 time 0.2153 (0.2191) data time 0.0009 (0.0057) model time 0.2143 (0.2132) loss 4.6276 (5.3691) grad_norm 3.0481 (2.1348) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][100/625] eta 0:01:54 lr 0.000418 wd 0.0500 time 0.2185 (0.2186) data time 0.0012 (0.0053) model time 0.2173 (0.2131) loss 5.6183 (5.3976) grad_norm 2.7216 (2.1202) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][110/625] eta 0:01:53 lr 0.000419 wd 0.0500 time 0.2085 (0.2205) data time 0.0007 (0.0049) model time 0.2077 (0.2173) loss 5.8501 (5.3923) grad_norm 1.9950 (2.1096) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][120/625] eta 0:01:51 lr 0.000421 wd 0.0500 time 0.2341 (0.2202) data time 0.0011 (0.0046) model time 0.2330 (0.2172) loss 4.7691 (5.3719) grad_norm 2.3395 (2.1513) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][130/625] eta 0:01:48 lr 0.000422 wd 0.0500 time 0.2092 (0.2197) data time 0.0008 (0.0043) model time 0.2084 (0.2166) loss 6.0538 (5.3753) grad_norm 2.4237 (2.1628) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][140/625] eta 0:01:46 lr 0.000424 wd 0.0500 time 0.2107 (0.2192) data time 0.0011 (0.0041) model time 0.2096 (0.2159) loss 5.9732 (5.3998) grad_norm 1.7088 (2.1475) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][150/625] eta 0:01:43 lr 0.000426 wd 0.0500 time 0.2101 (0.2187) data time 0.0008 (0.0039) model time 0.2093 (0.2155) loss 4.6476 (5.4195) grad_norm 1.6498 (2.1476) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][160/625] eta 0:01:41 lr 0.000427 wd 0.0500 time 0.2179 (0.2185) data time 0.0010 (0.0037) model time 0.2169 (0.2154) loss 4.9677 (5.4167) grad_norm 1.9967 (2.1453) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][170/625] eta 0:01:39 lr 0.000429 wd 0.0500 time 0.2148 (0.2181) data time 0.0008 (0.0036) model time 0.2141 (0.2150) loss 5.3148 (5.4285) grad_norm 1.6588 (2.1437) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][180/625] eta 0:01:36 lr 0.000430 wd 0.0500 time 0.2139 (0.2177) data time 0.0010 (0.0034) model time 0.2129 (0.2146) loss 5.0329 (5.4300) grad_norm 2.6701 (2.1486) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][190/625] eta 0:01:34 lr 0.000432 wd 0.0500 time 0.2088 (0.2175) data time 0.0008 (0.0033) model time 0.2080 (0.2144) loss 4.7927 (5.4274) grad_norm 3.1488 (2.1458) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][200/625] eta 0:01:32 lr 0.000434 wd 0.0500 time 0.2095 (0.2172) data time 0.0010 (0.0032) model time 0.2085 (0.2142) loss 5.4405 (5.4246) grad_norm 2.8966 (2.1548) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][210/625] eta 0:01:30 lr 0.000435 wd 0.0500 time 0.2195 (0.2170) data time 0.0007 (0.0031) model time 0.2188 (0.2141) loss 5.7161 (5.4174) grad_norm 2.5141 (2.1418) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][220/625] eta 0:01:28 lr 0.000437 wd 0.0500 time 0.2063 (0.2178) data time 0.0009 (0.0030) model time 0.2054 (0.2152) loss 5.1536 (5.4185) grad_norm 1.4605 (2.1250) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][230/625] eta 0:01:26 lr 0.000438 wd 0.0500 time 0.2099 (0.2187) data time 0.0010 (0.0029) model time 0.2089 (0.2164) loss 5.4727 (5.4250) grad_norm 1.8946 (2.1159) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][240/625] eta 0:01:24 lr 0.000440 wd 0.0500 time 0.2120 (0.2184) data time 0.0009 (0.0028) model time 0.2111 (0.2162) loss 4.4141 (5.4274) grad_norm 2.5255 (2.1153) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][250/625] eta 0:01:21 lr 0.000442 wd 0.0500 time 0.2142 (0.2184) data time 0.0010 (0.0028) model time 0.2132 (0.2162) loss 5.8373 (5.4309) grad_norm 1.9988 (2.1076) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][260/625] eta 0:01:19 lr 0.000443 wd 0.0500 time 0.2174 (0.2182) data time 0.0011 (0.0027) model time 0.2162 (0.2160) loss 5.9728 (5.4400) grad_norm 2.2834 (2.1036) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][270/625] eta 0:01:17 lr 0.000445 wd 0.0500 time 0.2167 (0.2180) data time 0.0010 (0.0026) model time 0.2157 (0.2159) loss 5.3476 (5.4405) grad_norm 1.9744 (2.0937) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][280/625] eta 0:01:15 lr 0.000446 wd 0.0500 time 0.2128 (0.2179) data time 0.0011 (0.0026) model time 0.2117 (0.2157) loss 4.9016 (5.4330) grad_norm 2.1272 (2.0883) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][290/625] eta 0:01:12 lr 0.000448 wd 0.0500 time 0.2103 (0.2177) data time 0.0011 (0.0025) model time 0.2092 (0.2156) loss 4.9494 (5.4224) grad_norm 3.2584 (2.1040) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][300/625] eta 0:01:10 lr 0.000450 wd 0.0500 time 0.2415 (0.2178) data time 0.0010 (0.0025) model time 0.2405 (0.2157) loss 5.9426 (5.4159) grad_norm 1.5499 (2.1219) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][310/625] eta 0:01:08 lr 0.000451 wd 0.0500 time 0.2185 (0.2185) data time 0.0011 (0.0024) model time 0.2174 (0.2166) loss 5.1079 (5.4206) grad_norm 1.9186 (2.1166) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][320/625] eta 0:01:06 lr 0.000453 wd 0.0500 time 0.2099 (0.2183) data time 0.0008 (0.0024) model time 0.2091 (0.2164) loss 5.6308 (5.4158) grad_norm 1.8609 (2.1075) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][330/625] eta 0:01:04 lr 0.000454 wd 0.0500 time 0.2092 (0.2181) data time 0.0011 (0.0024) model time 0.2081 (0.2162) loss 5.4559 (5.4154) grad_norm 2.0410 (2.0975) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][340/625] eta 0:01:02 lr 0.000456 wd 0.0500 time 0.2141 (0.2180) data time 0.0009 (0.0023) model time 0.2132 (0.2161) loss 5.5848 (5.4230) grad_norm 1.5987 (2.0894) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][350/625] eta 0:00:59 lr 0.000458 wd 0.0500 time 0.2132 (0.2179) data time 0.0011 (0.0023) model time 0.2121 (0.2160) loss 5.7998 (5.4114) grad_norm 2.7535 (2.0941) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][360/625] eta 0:00:57 lr 0.000459 wd 0.0500 time 0.2126 (0.2178) data time 0.0008 (0.0022) model time 0.2118 (0.2159) loss 4.0408 (5.4025) grad_norm 1.7654 (2.0910) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][370/625] eta 0:00:55 lr 0.000461 wd 0.0500 time 0.2075 (0.2177) data time 0.0007 (0.0022) model time 0.2067 (0.2158) loss 5.7140 (5.4028) grad_norm 1.8379 (2.0897) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][380/625] eta 0:00:53 lr 0.000462 wd 0.0500 time 0.2192 (0.2176) data time 0.0009 (0.0022) model time 0.2183 (0.2157) loss 3.8701 (5.4051) grad_norm 2.4155 (2.0899) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][390/625] eta 0:00:51 lr 0.000464 wd 0.0500 time 0.2244 (0.2176) data time 0.0010 (0.0022) model time 0.2234 (0.2157) loss 5.7984 (5.4095) grad_norm 1.9008 (2.0837) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][400/625] eta 0:00:48 lr 0.000466 wd 0.0500 time 0.2107 (0.2174) data time 0.0007 (0.0021) model time 0.2100 (0.2155) loss 5.9336 (5.4084) grad_norm 1.9562 (2.0847) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][410/625] eta 0:00:46 lr 0.000467 wd 0.0500 time 0.2102 (0.2172) data time 0.0007 (0.0021) model time 0.2094 (0.2153) loss 5.5047 (5.4087) grad_norm 1.9548 (2.0898) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][420/625] eta 0:00:44 lr 0.000469 wd 0.0500 time 0.2142 (0.2173) data time 0.0008 (0.0021) model time 0.2135 (0.2154) loss 6.0374 (5.4115) grad_norm 1.6808 (2.0817) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][430/625] eta 0:00:42 lr 0.000470 wd 0.0500 time 0.2069 (0.2172) data time 0.0010 (0.0021) model time 0.2059 (0.2153) loss 5.5761 (5.4067) grad_norm 1.8129 (2.0749) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][440/625] eta 0:00:40 lr 0.000472 wd 0.0500 time 0.2102 (0.2172) data time 0.0009 (0.0021) model time 0.2092 (0.2154) loss 5.7012 (5.4077) grad_norm 1.9876 (2.0756) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][450/625] eta 0:00:38 lr 0.000474 wd 0.0500 time 0.2200 (0.2171) data time 0.0007 (0.0020) model time 0.2193 (0.2153) loss 5.3593 (5.4112) grad_norm 1.9927 (2.0777) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][460/625] eta 0:00:35 lr 0.000475 wd 0.0500 time 0.2229 (0.2170) data time 0.0009 (0.0020) model time 0.2220 (0.2152) loss 4.8775 (5.4097) grad_norm 1.8662 (2.0762) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][470/625] eta 0:00:33 lr 0.000477 wd 0.0500 time 0.2067 (0.2170) data time 0.0008 (0.0020) model time 0.2059 (0.2152) loss 5.5478 (5.4056) grad_norm 1.3793 (2.0708) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][480/625] eta 0:00:31 lr 0.000478 wd 0.0500 time 0.2111 (0.2169) data time 0.0011 (0.0020) model time 0.2100 (0.2151) loss 4.8069 (5.4016) grad_norm 2.3313 (2.0630) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][490/625] eta 0:00:29 lr 0.000480 wd 0.0500 time 0.2050 (0.2168) data time 0.0011 (0.0020) model time 0.2039 (0.2150) loss 5.6490 (5.3970) grad_norm 1.9621 (2.0588) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][500/625] eta 0:00:27 lr 0.000482 wd 0.0500 time 0.2142 (0.2168) data time 0.0009 (0.0019) model time 0.2133 (0.2150) loss 4.8328 (5.3981) grad_norm 2.2950 (2.0604) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][510/625] eta 0:00:24 lr 0.000483 wd 0.0500 time 0.2087 (0.2167) data time 0.0008 (0.0019) model time 0.2080 (0.2149) loss 5.6781 (5.3972) grad_norm 1.7292 (2.0575) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][520/625] eta 0:00:22 lr 0.000485 wd 0.0500 time 0.2165 (0.2167) data time 0.0009 (0.0019) model time 0.2155 (0.2149) loss 4.1951 (5.3872) grad_norm 1.6795 (2.0550) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][530/625] eta 0:00:20 lr 0.000486 wd 0.0500 time 0.2028 (0.2165) data time 0.0008 (0.0019) model time 0.2020 (0.2148) loss 4.5973 (5.3862) grad_norm 2.5037 (2.0535) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][540/625] eta 0:00:18 lr 0.000488 wd 0.0500 time 0.2103 (0.2165) data time 0.0010 (0.0019) model time 0.2094 (0.2147) loss 4.6554 (5.3792) grad_norm 2.1749 (2.0529) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][550/625] eta 0:00:16 lr 0.000490 wd 0.0500 time 0.2107 (0.2164) data time 0.0007 (0.0019) model time 0.2101 (0.2146) loss 4.0676 (5.3745) grad_norm 2.1781 (2.0546) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][560/625] eta 0:00:14 lr 0.000491 wd 0.0500 time 0.2083 (0.2163) data time 0.0008 (0.0018) model time 0.2075 (0.2145) loss 6.0726 (5.3804) grad_norm 2.1348 (2.0517) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][570/625] eta 0:00:11 lr 0.000493 wd 0.0500 time 0.2159 (0.2162) data time 0.0008 (0.0018) model time 0.2151 (0.2145) loss 5.7689 (5.3808) grad_norm 1.6402 (2.0477) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][580/625] eta 0:00:09 lr 0.000494 wd 0.0500 time 0.2165 (0.2161) data time 0.0010 (0.0018) model time 0.2155 (0.2144) loss 5.4065 (5.3759) grad_norm 1.6880 (2.0472) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][590/625] eta 0:00:07 lr 0.000496 wd 0.0500 time 0.2094 (0.2161) data time 0.0009 (0.0018) model time 0.2085 (0.2143) loss 4.6631 (5.3779) grad_norm 1.9913 (2.0458) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][600/625] eta 0:00:05 lr 0.000498 wd 0.0500 time 0.2191 (0.2160) data time 0.0010 (0.0018) model time 0.2181 (0.2143) loss 4.7445 (5.3789) grad_norm 1.3231 (2.0416) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][610/625] eta 0:00:03 lr 0.000499 wd 0.0500 time 0.2076 (0.2160) data time 0.0005 (0.0018) model time 0.2071 (0.2142) loss 5.4292 (5.3764) grad_norm 1.8236 (2.0376) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][620/625] eta 0:00:01 lr 0.000501 wd 0.0500 time 0.2095 (0.2158) data time 0.0005 (0.0018) model time 0.2090 (0.2141) loss 5.4624 (5.3733) grad_norm 2.0854 (2.0391) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 4 training takes 0:02:14 [2024-07-29 12:25:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:25:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.534 (0.534) Loss 2.2949 (2.2949) Acc@1 51.465 (51.465) Acc@5 77.539 (77.539) Mem 8978MB [2024-07-29 12:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 3.5508 (2.5975) Acc@1 29.688 (43.821) Acc@5 54.248 (72.243) Mem 8978MB [2024-07-29 12:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 3.6094 (2.9939) Acc@1 25.098 (37.777) Acc@5 52.100 (64.658) Mem 8978MB [2024-07-29 12:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 38.334 Acc@5 64.941 [2024-07-29 12:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 38.3% [2024-07-29 12:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 38.33% [2024-07-29 12:25:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:25:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 6.7930 (6.7930) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8978MB [2024-07-29 12:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.063 (0.098) Loss 7.1562 (6.8683) Acc@1 0.000 (0.222) Acc@5 0.000 (0.879) Mem 8978MB [2024-07-29 12:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 7.0117 (7.0011) Acc@1 0.000 (0.116) Acc@5 2.441 (0.593) Mem 8978MB [2024-07-29 12:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.510 [2024-07-29 12:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][0/625] eta 0:11:22 lr 0.000502 wd 0.0500 time 1.0921 (1.0921) data time 0.6701 (0.6701) model time 0.0000 (0.0000) loss 4.3233 (4.3233) grad_norm 1.3877 (1.3877) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][10/625] eta 0:03:00 lr 0.000503 wd 0.0500 time 0.2072 (0.2941) data time 0.0011 (0.0620) model time 0.0000 (0.0000) loss 5.7520 (4.7241) grad_norm 1.5138 (2.2004) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][20/625] eta 0:02:33 lr 0.000505 wd 0.0500 time 0.2089 (0.2545) data time 0.0008 (0.0330) model time 0.0000 (0.0000) loss 4.8762 (5.0224) grad_norm 2.3738 (2.1118) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][30/625] eta 0:02:23 lr 0.000506 wd 0.0500 time 0.2175 (0.2418) data time 0.0008 (0.0227) model time 0.0000 (0.0000) loss 5.6469 (5.0332) grad_norm 1.6378 (2.0165) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][40/625] eta 0:02:18 lr 0.000508 wd 0.0500 time 0.2144 (0.2374) data time 0.0010 (0.0175) model time 0.0000 (0.0000) loss 4.5035 (5.0448) grad_norm 2.3514 (1.9638) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][50/625] eta 0:02:14 lr 0.000509 wd 0.0500 time 0.2085 (0.2339) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 4.8424 (5.0742) grad_norm 1.7458 (1.9350) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][60/625] eta 0:02:12 lr 0.000511 wd 0.0500 time 0.2083 (0.2338) data time 0.0010 (0.0122) model time 0.2073 (0.2317) loss 5.5407 (5.1005) grad_norm 2.4972 (1.9606) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][70/625] eta 0:02:07 lr 0.000513 wd 0.0500 time 0.2085 (0.2306) data time 0.0008 (0.0106) model time 0.2077 (0.2210) loss 4.3453 (5.1000) grad_norm 1.5269 (1.9872) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][80/625] eta 0:02:04 lr 0.000514 wd 0.0500 time 0.2091 (0.2282) data time 0.0009 (0.0094) model time 0.2083 (0.2173) loss 4.0408 (5.0832) grad_norm 1.7335 (1.9784) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][90/625] eta 0:02:01 lr 0.000516 wd 0.0500 time 0.2108 (0.2263) data time 0.0008 (0.0086) model time 0.2100 (0.2155) loss 5.6759 (5.0924) grad_norm 2.5977 (1.9736) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][100/625] eta 0:01:58 lr 0.000517 wd 0.0500 time 0.2142 (0.2252) data time 0.0008 (0.0078) model time 0.2134 (0.2151) loss 5.9414 (5.0994) grad_norm 2.9907 (1.9889) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][110/625] eta 0:01:55 lr 0.000519 wd 0.0500 time 0.2068 (0.2240) data time 0.0008 (0.0072) model time 0.2060 (0.2145) loss 4.1708 (5.1040) grad_norm 2.7350 (2.0035) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][120/625] eta 0:01:52 lr 0.000521 wd 0.0500 time 0.2154 (0.2230) data time 0.0011 (0.0067) model time 0.2143 (0.2140) loss 5.5371 (5.1258) grad_norm 2.0108 (1.9841) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][130/625] eta 0:01:49 lr 0.000522 wd 0.0500 time 0.2140 (0.2221) data time 0.0009 (0.0063) model time 0.2131 (0.2134) loss 5.9446 (5.1694) grad_norm 2.0919 (1.9666) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][140/625] eta 0:01:47 lr 0.000524 wd 0.0500 time 0.2093 (0.2212) data time 0.0010 (0.0059) model time 0.2083 (0.2129) loss 3.8776 (5.1510) grad_norm 1.7707 (1.9513) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][150/625] eta 0:01:44 lr 0.000525 wd 0.0500 time 0.2076 (0.2205) data time 0.0010 (0.0056) model time 0.2066 (0.2125) loss 5.7716 (5.1329) grad_norm 1.7739 (1.9507) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][160/625] eta 0:01:42 lr 0.000527 wd 0.0500 time 0.2096 (0.2199) data time 0.0010 (0.0053) model time 0.2086 (0.2122) loss 4.8958 (5.1388) grad_norm 1.6323 (1.9452) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][170/625] eta 0:01:39 lr 0.000529 wd 0.0500 time 0.2120 (0.2194) data time 0.0007 (0.0050) model time 0.2112 (0.2121) loss 5.5109 (5.1528) grad_norm 2.3189 (1.9507) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][180/625] eta 0:01:37 lr 0.000530 wd 0.0500 time 0.2083 (0.2189) data time 0.0007 (0.0048) model time 0.2076 (0.2119) loss 6.0125 (5.1617) grad_norm 1.6320 (1.9343) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][190/625] eta 0:01:35 lr 0.000532 wd 0.0500 time 0.2218 (0.2187) data time 0.0010 (0.0046) model time 0.2208 (0.2120) loss 5.5332 (5.1622) grad_norm 2.3927 (1.9409) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][200/625] eta 0:01:32 lr 0.000533 wd 0.0500 time 0.2215 (0.2183) data time 0.0010 (0.0045) model time 0.2205 (0.2119) loss 5.2067 (5.1567) grad_norm 1.5803 (1.9249) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][210/625] eta 0:01:30 lr 0.000535 wd 0.0500 time 0.2068 (0.2180) data time 0.0007 (0.0043) model time 0.2061 (0.2118) loss 5.6285 (5.1660) grad_norm 1.7413 (1.9192) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][220/625] eta 0:01:28 lr 0.000537 wd 0.0500 time 0.2106 (0.2176) data time 0.0009 (0.0042) model time 0.2097 (0.2116) loss 5.5417 (5.1658) grad_norm 2.2206 (1.9279) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][230/625] eta 0:01:25 lr 0.000538 wd 0.0500 time 0.2068 (0.2173) data time 0.0008 (0.0040) model time 0.2060 (0.2115) loss 5.3587 (5.1519) grad_norm 1.7767 (1.9233) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][240/625] eta 0:01:23 lr 0.000540 wd 0.0500 time 0.2174 (0.2171) data time 0.0007 (0.0039) model time 0.2167 (0.2115) loss 5.5642 (5.1606) grad_norm 2.2337 (1.9289) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][250/625] eta 0:01:21 lr 0.000541 wd 0.0500 time 0.2139 (0.2169) data time 0.0010 (0.0038) model time 0.2129 (0.2115) loss 5.7479 (5.1605) grad_norm 3.0248 (1.9450) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][260/625] eta 0:01:19 lr 0.000543 wd 0.0500 time 0.2080 (0.2167) data time 0.0008 (0.0037) model time 0.2072 (0.2114) loss 6.2126 (5.1694) grad_norm 1.6806 (1.9415) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][270/625] eta 0:01:16 lr 0.000545 wd 0.0500 time 0.2053 (0.2165) data time 0.0010 (0.0036) model time 0.2044 (0.2113) loss 4.0796 (5.1601) grad_norm 2.5453 (1.9410) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][280/625] eta 0:01:14 lr 0.000546 wd 0.0500 time 0.2051 (0.2162) data time 0.0010 (0.0035) model time 0.2041 (0.2112) loss 5.8397 (5.1639) grad_norm 1.9196 (1.9339) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][290/625] eta 0:01:12 lr 0.000548 wd 0.0500 time 0.2077 (0.2161) data time 0.0007 (0.0034) model time 0.2069 (0.2113) loss 4.7472 (5.1608) grad_norm 2.0643 (1.9438) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][300/625] eta 0:01:10 lr 0.000549 wd 0.0500 time 0.2215 (0.2161) data time 0.0009 (0.0033) model time 0.2206 (0.2113) loss 5.0024 (5.1568) grad_norm 1.8270 (1.9513) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][310/625] eta 0:01:07 lr 0.000551 wd 0.0500 time 0.2065 (0.2158) data time 0.0010 (0.0033) model time 0.2054 (0.2112) loss 5.4327 (5.1599) grad_norm 1.4069 (1.9434) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][320/625] eta 0:01:05 lr 0.000553 wd 0.0500 time 0.2151 (0.2158) data time 0.0008 (0.0032) model time 0.2143 (0.2112) loss 5.1132 (5.1667) grad_norm 1.7107 (1.9431) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][330/625] eta 0:01:03 lr 0.000554 wd 0.0500 time 0.2116 (0.2156) data time 0.0008 (0.0031) model time 0.2108 (0.2112) loss 4.2337 (5.1713) grad_norm 1.6565 (1.9577) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][340/625] eta 0:01:01 lr 0.000556 wd 0.0500 time 0.2117 (0.2155) data time 0.0009 (0.0031) model time 0.2108 (0.2112) loss 5.4564 (5.1703) grad_norm 1.9906 (1.9628) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][350/625] eta 0:00:59 lr 0.000557 wd 0.0500 time 0.2128 (0.2154) data time 0.0010 (0.0030) model time 0.2118 (0.2112) loss 5.5075 (5.1696) grad_norm 2.1690 (1.9711) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][360/625] eta 0:00:57 lr 0.000559 wd 0.0500 time 0.2090 (0.2154) data time 0.0007 (0.0030) model time 0.2083 (0.2113) loss 4.3920 (5.1649) grad_norm 1.5292 (1.9721) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][370/625] eta 0:00:54 lr 0.000561 wd 0.0500 time 0.2126 (0.2155) data time 0.0008 (0.0029) model time 0.2118 (0.2115) loss 4.6748 (5.1619) grad_norm 1.6885 (1.9705) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][380/625] eta 0:00:52 lr 0.000562 wd 0.0500 time 0.2049 (0.2155) data time 0.0007 (0.0029) model time 0.2042 (0.2115) loss 5.7263 (5.1714) grad_norm 1.7147 (1.9690) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][390/625] eta 0:00:50 lr 0.000564 wd 0.0500 time 0.2051 (0.2154) data time 0.0012 (0.0028) model time 0.2039 (0.2115) loss 5.0375 (5.1734) grad_norm 1.8437 (1.9614) loss_scale 16384.0000 (8233.9028) mem 8978MB [2024-07-29 12:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][400/625] eta 0:00:48 lr 0.000565 wd 0.0500 time 0.2157 (0.2153) data time 0.0011 (0.0028) model time 0.2147 (0.2115) loss 4.1623 (5.1722) grad_norm 2.1794 (1.9563) loss_scale 16384.0000 (8437.1471) mem 8978MB [2024-07-29 12:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][410/625] eta 0:00:46 lr 0.000567 wd 0.0500 time 0.2080 (0.2152) data time 0.0009 (0.0027) model time 0.2072 (0.2115) loss 5.7898 (5.1722) grad_norm 2.3635 (1.9565) loss_scale 16384.0000 (8630.5012) mem 8978MB [2024-07-29 12:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][420/625] eta 0:00:44 lr 0.000569 wd 0.0500 time 0.2213 (0.2152) data time 0.0009 (0.0027) model time 0.2204 (0.2115) loss 5.7447 (5.1663) grad_norm 1.5591 (1.9534) loss_scale 16384.0000 (8814.6698) mem 8978MB [2024-07-29 12:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][430/625] eta 0:00:41 lr 0.000570 wd 0.0500 time 0.2117 (0.2151) data time 0.0008 (0.0026) model time 0.2109 (0.2115) loss 4.4538 (5.1663) grad_norm 2.1457 (1.9475) loss_scale 16384.0000 (8990.2923) mem 8978MB [2024-07-29 12:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][440/625] eta 0:00:39 lr 0.000572 wd 0.0500 time 0.2167 (0.2151) data time 0.0008 (0.0026) model time 0.2158 (0.2116) loss 5.8652 (5.1690) grad_norm 1.5365 (1.9436) loss_scale 16384.0000 (9157.9501) mem 8978MB [2024-07-29 12:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][450/625] eta 0:00:37 lr 0.000573 wd 0.0500 time 0.2125 (0.2152) data time 0.0012 (0.0026) model time 0.2113 (0.2117) loss 5.2363 (5.1619) grad_norm 1.4805 (1.9374) loss_scale 16384.0000 (9318.1729) mem 8978MB [2024-07-29 12:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][460/625] eta 0:00:35 lr 0.000575 wd 0.0500 time 0.4532 (0.2156) data time 0.0008 (0.0026) model time 0.4524 (0.2122) loss 5.2968 (5.1657) grad_norm 1.8736 (1.9344) loss_scale 16384.0000 (9471.4447) mem 8978MB [2024-07-29 12:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][470/625] eta 0:00:33 lr 0.000577 wd 0.0500 time 0.2219 (0.2161) data time 0.0010 (0.0025) model time 0.2209 (0.2128) loss 4.7724 (5.1675) grad_norm 2.2793 (1.9319) loss_scale 16384.0000 (9618.2081) mem 8978MB [2024-07-29 12:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][480/625] eta 0:00:31 lr 0.000578 wd 0.0500 time 0.2142 (0.2161) data time 0.0009 (0.0025) model time 0.2133 (0.2128) loss 5.4911 (5.1751) grad_norm 1.4947 (1.9296) loss_scale 16384.0000 (9758.8690) mem 8978MB [2024-07-29 12:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][490/625] eta 0:00:29 lr 0.000580 wd 0.0500 time 0.2173 (0.2161) data time 0.0010 (0.0025) model time 0.2163 (0.2129) loss 4.5664 (5.1776) grad_norm 2.1980 (1.9276) loss_scale 16384.0000 (9893.8004) mem 8978MB [2024-07-29 12:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][500/625] eta 0:00:27 lr 0.000581 wd 0.0500 time 0.2151 (0.2160) data time 0.0007 (0.0024) model time 0.2144 (0.2129) loss 4.5247 (5.1785) grad_norm 2.3738 (1.9351) loss_scale 16384.0000 (10023.3453) mem 8978MB [2024-07-29 12:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][510/625] eta 0:00:24 lr 0.000583 wd 0.0500 time 0.2136 (0.2161) data time 0.0010 (0.0024) model time 0.2126 (0.2130) loss 4.7980 (5.1734) grad_norm 1.6537 (1.9369) loss_scale 16384.0000 (10147.8200) mem 8978MB [2024-07-29 12:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][520/625] eta 0:00:22 lr 0.000585 wd 0.0500 time 0.2177 (0.2161) data time 0.0007 (0.0024) model time 0.2170 (0.2131) loss 4.7058 (5.1641) grad_norm 1.5427 (1.9358) loss_scale 16384.0000 (10267.5163) mem 8978MB [2024-07-29 12:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][530/625] eta 0:00:20 lr 0.000586 wd 0.0500 time 0.2079 (0.2161) data time 0.0009 (0.0024) model time 0.2070 (0.2131) loss 5.0342 (5.1642) grad_norm 1.9425 (1.9366) loss_scale 16384.0000 (10382.7043) mem 8978MB [2024-07-29 12:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][540/625] eta 0:00:18 lr 0.000588 wd 0.0500 time 0.2154 (0.2161) data time 0.0010 (0.0023) model time 0.2144 (0.2132) loss 3.7643 (5.1631) grad_norm 2.0113 (1.9362) loss_scale 16384.0000 (10493.6340) mem 8978MB [2024-07-29 12:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][550/625] eta 0:00:16 lr 0.000589 wd 0.0500 time 0.2108 (0.2160) data time 0.0009 (0.0023) model time 0.2099 (0.2131) loss 5.6725 (5.1652) grad_norm 1.5130 (1.9303) loss_scale 16384.0000 (10600.5372) mem 8978MB [2024-07-29 12:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][560/625] eta 0:00:14 lr 0.000591 wd 0.0500 time 0.2129 (0.2159) data time 0.0010 (0.0023) model time 0.2119 (0.2130) loss 5.2022 (5.1670) grad_norm 2.3713 (1.9277) loss_scale 16384.0000 (10703.6292) mem 8978MB [2024-07-29 12:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][570/625] eta 0:00:11 lr 0.000593 wd 0.0500 time 0.2086 (0.2159) data time 0.0010 (0.0023) model time 0.2077 (0.2130) loss 5.5936 (5.1702) grad_norm 2.7706 (1.9326) loss_scale 16384.0000 (10803.1103) mem 8978MB [2024-07-29 12:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][580/625] eta 0:00:09 lr 0.000594 wd 0.0500 time 0.2102 (0.2158) data time 0.0010 (0.0023) model time 0.2092 (0.2129) loss 4.9419 (5.1694) grad_norm 1.4365 (1.9291) loss_scale 16384.0000 (10899.1670) mem 8978MB [2024-07-29 12:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][590/625] eta 0:00:07 lr 0.000596 wd 0.0500 time 0.2126 (0.2158) data time 0.0008 (0.0023) model time 0.2118 (0.2130) loss 3.7613 (5.1677) grad_norm 1.6596 (1.9233) loss_scale 16384.0000 (10991.9729) mem 8978MB [2024-07-29 12:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][600/625] eta 0:00:05 lr 0.000597 wd 0.0500 time 0.4485 (0.2162) data time 0.0008 (0.0022) model time 0.4477 (0.2134) loss 5.9039 (5.1693) grad_norm 1.8541 (1.9229) loss_scale 16384.0000 (11081.6905) mem 8978MB [2024-07-29 12:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][610/625] eta 0:00:03 lr 0.000599 wd 0.0500 time 0.2090 (0.2161) data time 0.0005 (0.0022) model time 0.2085 (0.2134) loss 4.9865 (5.1700) grad_norm 2.1223 (1.9241) loss_scale 16384.0000 (11168.4714) mem 8978MB [2024-07-29 12:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][620/625] eta 0:00:01 lr 0.000601 wd 0.0500 time 0.2066 (0.2160) data time 0.0007 (0.0022) model time 0.2059 (0.2133) loss 5.3135 (5.1678) grad_norm 1.8159 (1.9207) loss_scale 16384.0000 (11252.4573) mem 8978MB [2024-07-29 12:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 5 training takes 0:02:14 [2024-07-29 12:27:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:27:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.551 (0.551) Loss 1.9121 (1.9121) Acc@1 57.520 (57.520) Acc@5 82.617 (82.617) Mem 8978MB [2024-07-29 12:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.105) Loss 3.0352 (2.2330) Acc@1 36.475 (50.471) Acc@5 64.453 (78.196) Mem 8978MB [2024-07-29 12:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 3.1895 (2.6027) Acc@1 33.984 (44.541) Acc@5 59.473 (71.124) Mem 8978MB [2024-07-29 12:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 44.808 Acc@5 71.229 [2024-07-29 12:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 44.8% [2024-07-29 12:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 44.81% [2024-07-29 12:27:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:27:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 6.8125 (6.8125) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8978MB [2024-07-29 12:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 7.2461 (6.8672) Acc@1 0.000 (0.222) Acc@5 0.000 (0.888) Mem 8978MB [2024-07-29 12:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 7.1016 (7.0342) Acc@1 0.000 (0.116) Acc@5 0.000 (0.581) Mem 8978MB [2024-07-29 12:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 12:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][0/625] eta 0:11:19 lr 0.000601 wd 0.0500 time 1.0879 (1.0879) data time 0.6968 (0.6968) model time 0.0000 (0.0000) loss 4.6496 (4.6496) grad_norm 1.6236 (1.6236) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][10/625] eta 0:03:04 lr 0.000603 wd 0.0500 time 0.2296 (0.2995) data time 0.0007 (0.0656) model time 0.0000 (0.0000) loss 4.6492 (5.0037) grad_norm 1.8757 (1.8630) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][20/625] eta 0:02:42 lr 0.000605 wd 0.0500 time 0.2102 (0.2679) data time 0.0008 (0.0349) model time 0.0000 (0.0000) loss 5.7290 (5.0826) grad_norm 2.0706 (1.7990) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][30/625] eta 0:02:30 lr 0.000606 wd 0.0500 time 0.2127 (0.2531) data time 0.0010 (0.0246) model time 0.0000 (0.0000) loss 5.1858 (5.0287) grad_norm 3.1528 (1.8380) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][40/625] eta 0:02:21 lr 0.000608 wd 0.0500 time 0.2136 (0.2427) data time 0.0010 (0.0189) model time 0.0000 (0.0000) loss 4.5742 (5.0329) grad_norm 1.5217 (1.8593) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][50/625] eta 0:02:15 lr 0.000609 wd 0.0500 time 0.2042 (0.2363) data time 0.0012 (0.0154) model time 0.0000 (0.0000) loss 3.9482 (4.9899) grad_norm 1.9120 (1.8355) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][60/625] eta 0:02:11 lr 0.000611 wd 0.0500 time 0.2285 (0.2325) data time 0.0008 (0.0131) model time 0.2277 (0.2123) loss 3.7456 (4.9264) grad_norm 2.4406 (1.8695) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][70/625] eta 0:02:07 lr 0.000613 wd 0.0500 time 0.2080 (0.2293) data time 0.0008 (0.0114) model time 0.2072 (0.2104) loss 4.6974 (4.8954) grad_norm 2.1880 (1.8616) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][80/625] eta 0:02:03 lr 0.000614 wd 0.0500 time 0.2204 (0.2271) data time 0.0008 (0.0101) model time 0.2196 (0.2104) loss 5.8194 (4.9470) grad_norm 1.5427 (1.8081) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][90/625] eta 0:02:00 lr 0.000616 wd 0.0500 time 0.2066 (0.2252) data time 0.0011 (0.0091) model time 0.2055 (0.2101) loss 5.2720 (4.9600) grad_norm 1.5356 (1.7882) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][100/625] eta 0:01:57 lr 0.000617 wd 0.0500 time 0.2165 (0.2241) data time 0.0010 (0.0083) model time 0.2155 (0.2106) loss 4.8026 (4.9962) grad_norm 2.4455 (1.8082) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][110/625] eta 0:01:54 lr 0.000619 wd 0.0500 time 0.2152 (0.2230) data time 0.0010 (0.0077) model time 0.2142 (0.2107) loss 5.1316 (4.9783) grad_norm 2.6700 (1.8482) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][120/625] eta 0:01:52 lr 0.000621 wd 0.0500 time 0.2089 (0.2231) data time 0.0010 (0.0071) model time 0.2078 (0.2123) loss 5.7200 (4.9665) grad_norm 1.8598 (1.8425) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][130/625] eta 0:01:50 lr 0.000622 wd 0.0500 time 0.2106 (0.2231) data time 0.0013 (0.0068) model time 0.2093 (0.2133) loss 5.3325 (4.9692) grad_norm 1.7961 (1.8360) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][140/625] eta 0:01:47 lr 0.000624 wd 0.0500 time 0.2274 (0.2223) data time 0.0008 (0.0064) model time 0.2266 (0.2131) loss 5.2056 (4.9449) grad_norm 1.4531 (1.8226) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][150/625] eta 0:01:45 lr 0.000625 wd 0.0500 time 0.2126 (0.2215) data time 0.0008 (0.0061) model time 0.2118 (0.2127) loss 4.1634 (4.9413) grad_norm 2.3068 (1.8245) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][160/625] eta 0:01:42 lr 0.000627 wd 0.0500 time 0.2092 (0.2209) data time 0.0007 (0.0058) model time 0.2085 (0.2125) loss 4.4189 (4.9459) grad_norm 3.2398 (1.8503) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][170/625] eta 0:01:40 lr 0.000629 wd 0.0500 time 0.2048 (0.2202) data time 0.0009 (0.0055) model time 0.2039 (0.2121) loss 4.8697 (4.9574) grad_norm 1.2353 (1.8471) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][180/625] eta 0:01:37 lr 0.000630 wd 0.0500 time 0.2101 (0.2197) data time 0.0007 (0.0053) model time 0.2093 (0.2119) loss 5.4870 (4.9786) grad_norm 1.4501 (1.8527) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][190/625] eta 0:01:35 lr 0.000632 wd 0.0500 time 0.2168 (0.2195) data time 0.0010 (0.0050) model time 0.2159 (0.2121) loss 5.3393 (4.9798) grad_norm 2.0076 (1.8699) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][200/625] eta 0:01:33 lr 0.000633 wd 0.0500 time 0.2126 (0.2192) data time 0.0009 (0.0048) model time 0.2117 (0.2121) loss 5.4022 (4.9815) grad_norm 1.9188 (1.8770) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][210/625] eta 0:01:30 lr 0.000635 wd 0.0500 time 0.2150 (0.2187) data time 0.0010 (0.0047) model time 0.2140 (0.2119) loss 5.2306 (4.9925) grad_norm 2.0168 (1.8904) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][220/625] eta 0:01:28 lr 0.000637 wd 0.0500 time 0.2088 (0.2184) data time 0.0007 (0.0045) model time 0.2081 (0.2118) loss 3.7911 (4.9643) grad_norm 1.5673 (1.8840) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][230/625] eta 0:01:26 lr 0.000638 wd 0.0500 time 0.2143 (0.2181) data time 0.0009 (0.0044) model time 0.2133 (0.2117) loss 5.0332 (4.9484) grad_norm 1.5592 (1.8808) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][240/625] eta 0:01:23 lr 0.000640 wd 0.0500 time 0.2087 (0.2179) data time 0.0008 (0.0042) model time 0.2079 (0.2118) loss 5.7372 (4.9441) grad_norm 1.9020 (1.8939) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][250/625] eta 0:01:21 lr 0.000641 wd 0.0500 time 0.2052 (0.2177) data time 0.0010 (0.0041) model time 0.2042 (0.2117) loss 3.9448 (4.9452) grad_norm 2.3438 (1.9035) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][260/625] eta 0:01:19 lr 0.000643 wd 0.0500 time 0.2065 (0.2174) data time 0.0012 (0.0040) model time 0.2054 (0.2115) loss 5.1804 (4.9523) grad_norm 1.5780 (1.8991) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][270/625] eta 0:01:17 lr 0.000645 wd 0.0500 time 0.2073 (0.2172) data time 0.0011 (0.0039) model time 0.2062 (0.2115) loss 4.8818 (4.9575) grad_norm 1.3556 (1.8846) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][280/625] eta 0:01:14 lr 0.000646 wd 0.0500 time 0.2097 (0.2170) data time 0.0007 (0.0038) model time 0.2091 (0.2115) loss 4.1988 (4.9688) grad_norm 2.4257 (1.8797) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][290/625] eta 0:01:12 lr 0.000648 wd 0.0500 time 0.2235 (0.2169) data time 0.0011 (0.0037) model time 0.2223 (0.2116) loss 5.5333 (4.9678) grad_norm 1.2245 (1.8689) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][300/625] eta 0:01:10 lr 0.000649 wd 0.0500 time 0.2127 (0.2167) data time 0.0009 (0.0036) model time 0.2118 (0.2115) loss 4.5311 (4.9656) grad_norm 1.9680 (1.8667) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][310/625] eta 0:01:08 lr 0.000651 wd 0.0500 time 0.2077 (0.2165) data time 0.0008 (0.0035) model time 0.2070 (0.2114) loss 4.5414 (4.9691) grad_norm 1.6361 (1.8683) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][320/625] eta 0:01:05 lr 0.000653 wd 0.0500 time 0.2118 (0.2164) data time 0.0009 (0.0035) model time 0.2108 (0.2114) loss 4.3020 (4.9662) grad_norm 1.9658 (1.8705) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][330/625] eta 0:01:03 lr 0.000654 wd 0.0500 time 0.2047 (0.2162) data time 0.0011 (0.0034) model time 0.2035 (0.2113) loss 5.5301 (4.9561) grad_norm 1.9683 (1.8736) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][340/625] eta 0:01:01 lr 0.000656 wd 0.0500 time 0.2095 (0.2160) data time 0.0007 (0.0033) model time 0.2088 (0.2113) loss 4.0156 (4.9559) grad_norm 2.4441 (1.8741) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][350/625] eta 0:00:59 lr 0.000657 wd 0.0500 time 0.2152 (0.2160) data time 0.0010 (0.0033) model time 0.2142 (0.2113) loss 5.2777 (4.9610) grad_norm 2.2529 (1.8770) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][360/625] eta 0:00:57 lr 0.000659 wd 0.0500 time 0.2098 (0.2159) data time 0.0009 (0.0032) model time 0.2089 (0.2113) loss 4.3331 (4.9528) grad_norm 1.9743 (1.8853) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][370/625] eta 0:00:55 lr 0.000661 wd 0.0500 time 0.2060 (0.2158) data time 0.0008 (0.0032) model time 0.2052 (0.2113) loss 5.6784 (4.9606) grad_norm 1.9221 (1.8840) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][380/625] eta 0:00:52 lr 0.000662 wd 0.0500 time 0.2077 (0.2156) data time 0.0011 (0.0031) model time 0.2066 (0.2112) loss 5.2081 (4.9570) grad_norm 1.4763 (1.8865) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][390/625] eta 0:00:50 lr 0.000664 wd 0.0500 time 0.2050 (0.2155) data time 0.0007 (0.0031) model time 0.2042 (0.2112) loss 5.5944 (4.9542) grad_norm 1.4382 (1.8785) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][400/625] eta 0:00:48 lr 0.000665 wd 0.0500 time 0.2184 (0.2154) data time 0.0007 (0.0030) model time 0.2177 (0.2112) loss 4.3722 (4.9547) grad_norm 2.1938 (1.8787) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:28:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][410/625] eta 0:00:46 lr 0.000667 wd 0.0500 time 0.2134 (0.2154) data time 0.0010 (0.0030) model time 0.2125 (0.2112) loss 4.9866 (4.9565) grad_norm 1.7746 (1.8778) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][420/625] eta 0:00:44 lr 0.000669 wd 0.0500 time 0.2093 (0.2153) data time 0.0008 (0.0029) model time 0.2085 (0.2113) loss 4.0945 (4.9502) grad_norm 1.3403 (1.8795) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][430/625] eta 0:00:41 lr 0.000670 wd 0.0500 time 0.2100 (0.2154) data time 0.0010 (0.0029) model time 0.2091 (0.2114) loss 4.4575 (4.9460) grad_norm 1.1635 (1.8756) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][440/625] eta 0:00:39 lr 0.000672 wd 0.0500 time 0.2176 (0.2154) data time 0.0007 (0.0028) model time 0.2169 (0.2115) loss 3.6073 (4.9415) grad_norm 1.5956 (1.8710) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][450/625] eta 0:00:37 lr 0.000673 wd 0.0500 time 0.2113 (0.2154) data time 0.0008 (0.0028) model time 0.2105 (0.2115) loss 5.4765 (4.9461) grad_norm 2.0806 (1.8711) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][460/625] eta 0:00:35 lr 0.000675 wd 0.0500 time 0.2099 (0.2153) data time 0.0010 (0.0028) model time 0.2089 (0.2115) loss 5.0062 (4.9483) grad_norm 1.6267 (1.8681) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][470/625] eta 0:00:33 lr 0.000677 wd 0.0500 time 0.2102 (0.2153) data time 0.0008 (0.0027) model time 0.2094 (0.2116) loss 4.1781 (4.9415) grad_norm 1.4578 (1.8647) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][480/625] eta 0:00:31 lr 0.000678 wd 0.0500 time 0.2066 (0.2152) data time 0.0010 (0.0027) model time 0.2056 (0.2116) loss 4.5067 (4.9435) grad_norm 1.5657 (1.8600) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][490/625] eta 0:00:29 lr 0.000680 wd 0.0500 time 0.2120 (0.2152) data time 0.0011 (0.0027) model time 0.2109 (0.2116) loss 4.2955 (4.9361) grad_norm 2.9290 (1.8630) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][500/625] eta 0:00:26 lr 0.000681 wd 0.0500 time 0.2112 (0.2153) data time 0.0008 (0.0026) model time 0.2104 (0.2117) loss 4.0193 (4.9264) grad_norm 1.3886 (1.8618) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][510/625] eta 0:00:24 lr 0.000683 wd 0.0500 time 0.2075 (0.2152) data time 0.0008 (0.0026) model time 0.2067 (0.2117) loss 4.8132 (4.9260) grad_norm 1.9634 (1.8580) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][520/625] eta 0:00:22 lr 0.000685 wd 0.0500 time 0.2154 (0.2151) data time 0.0010 (0.0026) model time 0.2145 (0.2116) loss 4.4751 (4.9220) grad_norm 1.5456 (1.8537) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][530/625] eta 0:00:20 lr 0.000686 wd 0.0500 time 0.2071 (0.2151) data time 0.0011 (0.0025) model time 0.2060 (0.2116) loss 5.3888 (4.9189) grad_norm 1.4845 (1.8527) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][540/625] eta 0:00:18 lr 0.000688 wd 0.0500 time 0.2085 (0.2150) data time 0.0010 (0.0025) model time 0.2075 (0.2116) loss 5.1195 (4.9160) grad_norm 1.5687 (1.8504) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][550/625] eta 0:00:16 lr 0.000689 wd 0.0500 time 0.2060 (0.2149) data time 0.0010 (0.0025) model time 0.2050 (0.2115) loss 5.5724 (4.9150) grad_norm 1.3268 (1.8444) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][560/625] eta 0:00:13 lr 0.000691 wd 0.0500 time 0.2127 (0.2148) data time 0.0010 (0.0025) model time 0.2118 (0.2115) loss 4.2341 (4.9063) grad_norm 1.7061 (1.8419) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][570/625] eta 0:00:11 lr 0.000693 wd 0.0500 time 0.2080 (0.2147) data time 0.0007 (0.0024) model time 0.2073 (0.2115) loss 5.3040 (4.9070) grad_norm 1.6457 (1.8444) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][580/625] eta 0:00:09 lr 0.000694 wd 0.0500 time 0.2127 (0.2147) data time 0.0011 (0.0024) model time 0.2117 (0.2115) loss 5.6279 (4.9101) grad_norm 1.4688 (1.8423) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][590/625] eta 0:00:07 lr 0.000696 wd 0.0500 time 0.2088 (0.2147) data time 0.0010 (0.0024) model time 0.2078 (0.2115) loss 5.1389 (4.9090) grad_norm 1.6692 (1.8409) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][600/625] eta 0:00:05 lr 0.000697 wd 0.0500 time 0.2123 (0.2146) data time 0.0010 (0.0024) model time 0.2113 (0.2114) loss 5.3551 (4.9100) grad_norm 1.6756 (1.8433) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][610/625] eta 0:00:03 lr 0.000699 wd 0.0500 time 0.2061 (0.2146) data time 0.0005 (0.0023) model time 0.2056 (0.2114) loss 5.7807 (4.9134) grad_norm 1.8155 (1.8426) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][620/625] eta 0:00:01 lr 0.000701 wd 0.0500 time 0.2101 (0.2145) data time 0.0007 (0.0023) model time 0.2094 (0.2114) loss 4.6863 (4.9116) grad_norm 2.3039 (1.8473) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 6 training takes 0:02:14 [2024-07-29 12:29:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:29:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:29:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 1.7910 (1.7910) Acc@1 62.158 (62.158) Acc@5 83.740 (83.740) Mem 8978MB [2024-07-29 12:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 2.6895 (2.0409) Acc@1 44.238 (54.701) Acc@5 70.166 (81.357) Mem 8978MB [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 2.9082 (2.3780) Acc@1 38.672 (48.947) Acc@5 65.332 (75.167) Mem 8978MB [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 48.918 Acc@5 75.138 [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 48.9% [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 48.92% [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:29:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.554 (0.554) Loss 6.8477 (6.8477) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8978MB [2024-07-29 12:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.105) Loss 7.2227 (6.8626) Acc@1 0.000 (0.222) Acc@5 0.000 (0.790) Mem 8978MB [2024-07-29 12:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 7.1250 (7.0363) Acc@1 0.000 (0.116) Acc@5 0.000 (0.614) Mem 8978MB [2024-07-29 12:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.528 [2024-07-29 12:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:29:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][0/625] eta 0:11:47 lr 0.000701 wd 0.0500 time 1.1321 (1.1321) data time 0.7150 (0.7150) model time 0.0000 (0.0000) loss 5.4005 (5.4005) grad_norm 1.6336 (1.6336) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][10/625] eta 0:03:04 lr 0.000703 wd 0.0500 time 0.2532 (0.2999) data time 0.0008 (0.0660) model time 0.0000 (0.0000) loss 3.6209 (4.6587) grad_norm 1.6786 (1.5452) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][20/625] eta 0:02:36 lr 0.000704 wd 0.0500 time 0.2180 (0.2587) data time 0.0009 (0.0353) model time 0.0000 (0.0000) loss 5.2808 (4.8985) grad_norm 1.4676 (1.5588) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][30/625] eta 0:02:25 lr 0.000706 wd 0.0500 time 0.2168 (0.2439) data time 0.0010 (0.0243) model time 0.0000 (0.0000) loss 3.5704 (4.8210) grad_norm 1.4709 (1.6285) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 12:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][40/625] eta 0:02:18 lr 0.000708 wd 0.0500 time 0.2162 (0.2364) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 3.7457 (4.8868) grad_norm 1.5573 (inf) loss_scale 8192.0000 (15384.9756) mem 8978MB [2024-07-29 12:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][50/625] eta 0:02:13 lr 0.000709 wd 0.0500 time 0.2159 (0.2324) data time 0.0007 (0.0152) model time 0.0000 (0.0000) loss 5.0521 (4.8707) grad_norm 2.1337 (inf) loss_scale 8192.0000 (13974.5882) mem 8978MB [2024-07-29 12:30:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][60/625] eta 0:02:11 lr 0.000711 wd 0.0500 time 0.2139 (0.2324) data time 0.0011 (0.0129) model time 0.2128 (0.2315) loss 3.8223 (4.8706) grad_norm 2.1943 (inf) loss_scale 8192.0000 (13026.6230) mem 8978MB [2024-07-29 12:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][70/625] eta 0:02:07 lr 0.000712 wd 0.0500 time 0.2064 (0.2295) data time 0.0008 (0.0112) model time 0.2056 (0.2209) loss 4.3382 (4.8572) grad_norm 2.0978 (inf) loss_scale 8192.0000 (12345.6901) mem 8978MB [2024-07-29 12:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][80/625] eta 0:02:03 lr 0.000714 wd 0.0500 time 0.2142 (0.2273) data time 0.0007 (0.0099) model time 0.2135 (0.2177) loss 5.5197 (4.8654) grad_norm 2.3036 (inf) loss_scale 8192.0000 (11832.8889) mem 8978MB [2024-07-29 12:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][90/625] eta 0:02:00 lr 0.000716 wd 0.0500 time 0.2053 (0.2257) data time 0.0010 (0.0090) model time 0.2043 (0.2160) loss 5.2666 (4.8894) grad_norm 2.3489 (inf) loss_scale 8192.0000 (11432.7912) mem 8978MB [2024-07-29 12:30:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][100/625] eta 0:01:57 lr 0.000717 wd 0.0500 time 0.2097 (0.2243) data time 0.0009 (0.0082) model time 0.2088 (0.2151) loss 5.2078 (4.8915) grad_norm 1.2747 (inf) loss_scale 8192.0000 (11111.9208) mem 8978MB [2024-07-29 12:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][110/625] eta 0:01:55 lr 0.000719 wd 0.0500 time 0.2331 (0.2236) data time 0.0010 (0.0075) model time 0.2321 (0.2151) loss 4.0439 (4.8918) grad_norm 1.4698 (inf) loss_scale 8192.0000 (10848.8649) mem 8978MB [2024-07-29 12:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][120/625] eta 0:01:52 lr 0.000720 wd 0.0500 time 0.2156 (0.2225) data time 0.0008 (0.0070) model time 0.2148 (0.2143) loss 4.1572 (4.9031) grad_norm 2.5420 (inf) loss_scale 8192.0000 (10629.2893) mem 8978MB [2024-07-29 12:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][130/625] eta 0:01:49 lr 0.000722 wd 0.0500 time 0.2080 (0.2217) data time 0.0011 (0.0066) model time 0.2069 (0.2138) loss 5.1556 (4.8960) grad_norm 1.3305 (inf) loss_scale 8192.0000 (10443.2366) mem 8978MB [2024-07-29 12:30:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][140/625] eta 0:01:47 lr 0.000724 wd 0.0500 time 0.2069 (0.2209) data time 0.0010 (0.0062) model time 0.2059 (0.2133) loss 3.8307 (4.9012) grad_norm 1.5910 (inf) loss_scale 8192.0000 (10283.5745) mem 8978MB [2024-07-29 12:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][150/625] eta 0:01:44 lr 0.000725 wd 0.0500 time 0.2136 (0.2202) data time 0.0009 (0.0058) model time 0.2126 (0.2129) loss 4.6838 (4.8812) grad_norm 1.2326 (inf) loss_scale 8192.0000 (10145.0596) mem 8978MB [2024-07-29 12:30:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][160/625] eta 0:01:42 lr 0.000727 wd 0.0500 time 0.2160 (0.2211) data time 0.0007 (0.0055) model time 0.2153 (0.2149) loss 5.4323 (4.8741) grad_norm 2.0411 (inf) loss_scale 8192.0000 (10023.7516) mem 8978MB [2024-07-29 12:30:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][170/625] eta 0:01:40 lr 0.000728 wd 0.0500 time 0.2127 (0.2218) data time 0.0012 (0.0053) model time 0.2115 (0.2162) loss 4.6827 (4.8607) grad_norm 1.8721 (inf) loss_scale 8192.0000 (9916.6316) mem 8978MB [2024-07-29 12:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][180/625] eta 0:01:38 lr 0.000730 wd 0.0500 time 0.2135 (0.2212) data time 0.0008 (0.0050) model time 0.2127 (0.2158) loss 3.6691 (4.8446) grad_norm 1.2214 (inf) loss_scale 8192.0000 (9821.3481) mem 8978MB [2024-07-29 12:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][190/625] eta 0:01:35 lr 0.000732 wd 0.0500 time 0.2091 (0.2207) data time 0.0011 (0.0048) model time 0.2080 (0.2154) loss 4.7932 (4.8396) grad_norm 1.7198 (inf) loss_scale 8192.0000 (9736.0419) mem 8978MB [2024-07-29 12:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][200/625] eta 0:01:33 lr 0.000733 wd 0.0500 time 0.2255 (0.2205) data time 0.0008 (0.0046) model time 0.2247 (0.2153) loss 5.2842 (4.8322) grad_norm 2.4663 (inf) loss_scale 8192.0000 (9659.2239) mem 8978MB [2024-07-29 12:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][210/625] eta 0:01:31 lr 0.000735 wd 0.0500 time 0.2134 (0.2200) data time 0.0010 (0.0045) model time 0.2124 (0.2150) loss 4.6964 (4.8252) grad_norm 1.5282 (inf) loss_scale 8192.0000 (9589.6872) mem 8978MB [2024-07-29 12:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][220/625] eta 0:01:28 lr 0.000736 wd 0.0500 time 0.2156 (0.2197) data time 0.0009 (0.0043) model time 0.2147 (0.2149) loss 5.6083 (4.8333) grad_norm 2.1077 (inf) loss_scale 8192.0000 (9526.4434) mem 8978MB [2024-07-29 12:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][230/625] eta 0:01:26 lr 0.000738 wd 0.0500 time 0.2063 (0.2192) data time 0.0010 (0.0042) model time 0.2053 (0.2144) loss 5.3004 (4.8418) grad_norm 1.8940 (inf) loss_scale 8192.0000 (9468.6753) mem 8978MB [2024-07-29 12:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][240/625] eta 0:01:24 lr 0.000740 wd 0.0500 time 0.2122 (0.2189) data time 0.0008 (0.0040) model time 0.2114 (0.2142) loss 5.2871 (4.8338) grad_norm 2.8105 (inf) loss_scale 8192.0000 (9415.7012) mem 8978MB [2024-07-29 12:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][250/625] eta 0:01:22 lr 0.000741 wd 0.0500 time 0.2081 (0.2187) data time 0.0009 (0.0039) model time 0.2073 (0.2141) loss 3.5670 (4.8213) grad_norm 1.7325 (inf) loss_scale 8192.0000 (9366.9482) mem 8978MB [2024-07-29 12:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][260/625] eta 0:01:19 lr 0.000743 wd 0.0500 time 0.2068 (0.2185) data time 0.0010 (0.0038) model time 0.2057 (0.2140) loss 4.5581 (4.8172) grad_norm 1.7668 (inf) loss_scale 8192.0000 (9321.9310) mem 8978MB [2024-07-29 12:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][270/625] eta 0:01:17 lr 0.000744 wd 0.0500 time 0.2036 (0.2181) data time 0.0011 (0.0037) model time 0.2025 (0.2138) loss 5.4824 (4.8228) grad_norm 1.4885 (inf) loss_scale 8192.0000 (9280.2362) mem 8978MB [2024-07-29 12:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][280/625] eta 0:01:15 lr 0.000746 wd 0.0500 time 0.2142 (0.2180) data time 0.0007 (0.0036) model time 0.2135 (0.2137) loss 5.7160 (4.8318) grad_norm 1.7411 (inf) loss_scale 8192.0000 (9241.5089) mem 8978MB [2024-07-29 12:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][290/625] eta 0:01:12 lr 0.000748 wd 0.0500 time 0.2121 (0.2177) data time 0.0011 (0.0035) model time 0.2111 (0.2136) loss 4.8378 (4.8258) grad_norm 1.6871 (inf) loss_scale 8192.0000 (9205.4433) mem 8978MB [2024-07-29 12:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][300/625] eta 0:01:10 lr 0.000749 wd 0.0500 time 0.2080 (0.2175) data time 0.0007 (0.0035) model time 0.2073 (0.2134) loss 4.4226 (4.8254) grad_norm 1.9030 (inf) loss_scale 8192.0000 (9171.7741) mem 8978MB [2024-07-29 12:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][310/625] eta 0:01:08 lr 0.000751 wd 0.0500 time 0.2124 (0.2180) data time 0.0009 (0.0034) model time 0.2115 (0.2142) loss 4.8749 (4.8206) grad_norm 1.1716 (inf) loss_scale 8192.0000 (9140.2701) mem 8978MB [2024-07-29 12:30:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][320/625] eta 0:01:06 lr 0.000752 wd 0.0500 time 0.2111 (0.2179) data time 0.0008 (0.0033) model time 0.2102 (0.2141) loss 5.0751 (4.8167) grad_norm 1.3321 (inf) loss_scale 8192.0000 (9110.7290) mem 8978MB [2024-07-29 12:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][330/625] eta 0:01:04 lr 0.000754 wd 0.0500 time 0.2189 (0.2178) data time 0.0011 (0.0032) model time 0.2178 (0.2140) loss 4.9353 (4.8232) grad_norm 1.7911 (inf) loss_scale 8192.0000 (9082.9728) mem 8978MB [2024-07-29 12:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][340/625] eta 0:01:02 lr 0.000756 wd 0.0500 time 0.2096 (0.2175) data time 0.0010 (0.0032) model time 0.2086 (0.2139) loss 4.2467 (4.8146) grad_norm 1.9364 (inf) loss_scale 8192.0000 (9056.8446) mem 8978MB [2024-07-29 12:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][350/625] eta 0:00:59 lr 0.000757 wd 0.0500 time 0.2160 (0.2175) data time 0.0011 (0.0031) model time 0.2148 (0.2139) loss 5.4662 (4.8181) grad_norm 1.4051 (inf) loss_scale 8192.0000 (9032.2051) mem 8978MB [2024-07-29 12:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][360/625] eta 0:00:57 lr 0.000759 wd 0.0500 time 0.2104 (0.2174) data time 0.0009 (0.0031) model time 0.2095 (0.2139) loss 4.2503 (4.8215) grad_norm 1.4083 (inf) loss_scale 8192.0000 (9008.9307) mem 8978MB [2024-07-29 12:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][370/625] eta 0:00:55 lr 0.000760 wd 0.0500 time 0.2102 (0.2173) data time 0.0008 (0.0030) model time 0.2094 (0.2138) loss 3.9312 (4.8120) grad_norm 1.7645 (inf) loss_scale 8192.0000 (8986.9111) mem 8978MB [2024-07-29 12:31:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][380/625] eta 0:00:53 lr 0.000762 wd 0.0500 time 0.2097 (0.2172) data time 0.0009 (0.0030) model time 0.2088 (0.2138) loss 4.7738 (4.8163) grad_norm 1.7481 (inf) loss_scale 8192.0000 (8966.0472) mem 8978MB [2024-07-29 12:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][390/625] eta 0:00:51 lr 0.000764 wd 0.0500 time 0.2075 (0.2171) data time 0.0009 (0.0029) model time 0.2066 (0.2137) loss 4.7678 (4.8187) grad_norm 1.4127 (inf) loss_scale 8192.0000 (8946.2506) mem 8978MB [2024-07-29 12:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][400/625] eta 0:00:48 lr 0.000765 wd 0.0500 time 0.2185 (0.2169) data time 0.0008 (0.0029) model time 0.2177 (0.2136) loss 5.5112 (4.8218) grad_norm 2.8787 (inf) loss_scale 8192.0000 (8927.4414) mem 8978MB [2024-07-29 12:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][410/625] eta 0:00:46 lr 0.000767 wd 0.0500 time 0.2101 (0.2168) data time 0.0010 (0.0028) model time 0.2092 (0.2135) loss 3.3888 (4.8181) grad_norm 1.7948 (inf) loss_scale 8192.0000 (8909.5474) mem 8978MB [2024-07-29 12:31:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][420/625] eta 0:00:44 lr 0.000768 wd 0.0500 time 0.2082 (0.2167) data time 0.0010 (0.0028) model time 0.2072 (0.2134) loss 4.1431 (4.8157) grad_norm 1.8292 (inf) loss_scale 8192.0000 (8892.5036) mem 8978MB [2024-07-29 12:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][430/625] eta 0:00:42 lr 0.000770 wd 0.0500 time 0.2136 (0.2166) data time 0.0008 (0.0027) model time 0.2129 (0.2134) loss 4.7124 (4.8115) grad_norm 2.5819 (inf) loss_scale 8192.0000 (8876.2506) mem 8978MB [2024-07-29 12:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][440/625] eta 0:00:40 lr 0.000772 wd 0.0500 time 0.2097 (0.2165) data time 0.0010 (0.0027) model time 0.2087 (0.2133) loss 4.9090 (4.8185) grad_norm 1.4174 (inf) loss_scale 8192.0000 (8860.7347) mem 8978MB [2024-07-29 12:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][450/625] eta 0:00:37 lr 0.000773 wd 0.0500 time 0.2144 (0.2164) data time 0.0007 (0.0027) model time 0.2136 (0.2133) loss 3.4825 (4.8130) grad_norm 1.6198 (inf) loss_scale 8192.0000 (8845.9069) mem 8978MB [2024-07-29 12:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][460/625] eta 0:00:35 lr 0.000775 wd 0.0500 time 0.2235 (0.2166) data time 0.0007 (0.0026) model time 0.2228 (0.2136) loss 4.9049 (4.8101) grad_norm 1.8566 (inf) loss_scale 8192.0000 (8831.7223) mem 8978MB [2024-07-29 12:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][470/625] eta 0:00:33 lr 0.000776 wd 0.0500 time 0.2152 (0.2165) data time 0.0009 (0.0026) model time 0.2144 (0.2135) loss 4.9664 (4.8122) grad_norm 1.4470 (inf) loss_scale 8192.0000 (8818.1401) mem 8978MB [2024-07-29 12:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][480/625] eta 0:00:31 lr 0.000778 wd 0.0500 time 0.2141 (0.2164) data time 0.0010 (0.0026) model time 0.2131 (0.2134) loss 4.5056 (4.8145) grad_norm 1.5225 (inf) loss_scale 8192.0000 (8805.1227) mem 8978MB [2024-07-29 12:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][490/625] eta 0:00:29 lr 0.000780 wd 0.0500 time 0.2064 (0.2163) data time 0.0009 (0.0025) model time 0.2055 (0.2134) loss 4.3683 (4.8100) grad_norm 1.3260 (inf) loss_scale 8192.0000 (8792.6354) mem 8978MB [2024-07-29 12:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][500/625] eta 0:00:27 lr 0.000781 wd 0.0500 time 0.2144 (0.2163) data time 0.0009 (0.0025) model time 0.2135 (0.2134) loss 4.9455 (4.8128) grad_norm 1.1111 (inf) loss_scale 8192.0000 (8780.6467) mem 8978MB [2024-07-29 12:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][510/625] eta 0:00:24 lr 0.000783 wd 0.0500 time 0.2061 (0.2162) data time 0.0011 (0.0025) model time 0.2050 (0.2133) loss 4.6091 (4.8215) grad_norm 1.4384 (inf) loss_scale 8192.0000 (8769.1272) mem 8978MB [2024-07-29 12:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][520/625] eta 0:00:22 lr 0.000784 wd 0.0500 time 0.2111 (0.2161) data time 0.0011 (0.0025) model time 0.2100 (0.2133) loss 3.9909 (4.8217) grad_norm 1.5959 (inf) loss_scale 8192.0000 (8758.0499) mem 8978MB [2024-07-29 12:31:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][530/625] eta 0:00:20 lr 0.000786 wd 0.0500 time 0.2124 (0.2161) data time 0.0010 (0.0024) model time 0.2114 (0.2132) loss 4.5429 (4.8205) grad_norm 3.1795 (inf) loss_scale 8192.0000 (8747.3898) mem 8978MB [2024-07-29 12:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][540/625] eta 0:00:18 lr 0.000788 wd 0.0500 time 0.2150 (0.2160) data time 0.0011 (0.0024) model time 0.2140 (0.2132) loss 4.4892 (4.8161) grad_norm 1.7994 (inf) loss_scale 8192.0000 (8737.1238) mem 8978MB [2024-07-29 12:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][550/625] eta 0:00:16 lr 0.000789 wd 0.0500 time 0.2191 (0.2160) data time 0.0008 (0.0024) model time 0.2183 (0.2132) loss 4.5890 (4.8165) grad_norm 1.6116 (inf) loss_scale 8192.0000 (8727.2305) mem 8978MB [2024-07-29 12:31:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][560/625] eta 0:00:14 lr 0.000791 wd 0.0500 time 0.2074 (0.2159) data time 0.0013 (0.0024) model time 0.2061 (0.2131) loss 5.3645 (4.8100) grad_norm 1.6927 (inf) loss_scale 8192.0000 (8717.6898) mem 8978MB [2024-07-29 12:31:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][570/625] eta 0:00:11 lr 0.000792 wd 0.0500 time 0.2125 (0.2158) data time 0.0011 (0.0023) model time 0.2113 (0.2130) loss 4.8996 (4.8108) grad_norm 1.7853 (inf) loss_scale 8192.0000 (8708.4834) mem 8978MB [2024-07-29 12:31:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][580/625] eta 0:00:09 lr 0.000794 wd 0.0500 time 0.2148 (0.2157) data time 0.0010 (0.0023) model time 0.2138 (0.2130) loss 5.1666 (4.8109) grad_norm 3.0123 (inf) loss_scale 8192.0000 (8699.5938) mem 8978MB [2024-07-29 12:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][590/625] eta 0:00:07 lr 0.000796 wd 0.0500 time 0.2054 (0.2156) data time 0.0008 (0.0023) model time 0.2046 (0.2130) loss 5.5143 (4.8150) grad_norm 1.4662 (inf) loss_scale 8192.0000 (8691.0051) mem 8978MB [2024-07-29 12:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][600/625] eta 0:00:05 lr 0.000797 wd 0.0500 time 0.2094 (0.2156) data time 0.0011 (0.0023) model time 0.2084 (0.2130) loss 3.5965 (4.8101) grad_norm 2.2711 (inf) loss_scale 8192.0000 (8682.7022) mem 8978MB [2024-07-29 12:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][610/625] eta 0:00:03 lr 0.000799 wd 0.0500 time 0.2062 (0.2156) data time 0.0005 (0.0023) model time 0.2056 (0.2129) loss 4.1284 (4.8102) grad_norm 1.5997 (inf) loss_scale 8192.0000 (8674.6710) mem 8978MB [2024-07-29 12:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][620/625] eta 0:00:01 lr 0.000800 wd 0.0500 time 0.2083 (0.2154) data time 0.0007 (0.0022) model time 0.2076 (0.2128) loss 4.0520 (4.8103) grad_norm 1.6617 (inf) loss_scale 8192.0000 (8666.8986) mem 8978MB [2024-07-29 12:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 7 training takes 0:02:14 [2024-07-29 12:32:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:32:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:32:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 1.6221 (1.6221) Acc@1 66.211 (66.211) Acc@5 86.279 (86.279) Mem 8978MB [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 2.6133 (1.8295) Acc@1 45.850 (59.601) Acc@5 71.680 (84.419) Mem 8978MB [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 2.6680 (2.1964) Acc@1 44.189 (53.083) Acc@5 68.115 (77.837) Mem 8978MB [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 53.083 Acc@5 77.749 [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 53.1% [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 53.08% [2024-07-29 12:32:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:32:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:32:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 6.8828 (6.8828) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8978MB [2024-07-29 12:32:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 7.1719 (6.8786) Acc@1 0.000 (0.222) Acc@5 0.000 (0.666) Mem 8978MB [2024-07-29 12:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 7.1289 (7.0417) Acc@1 0.000 (0.116) Acc@5 0.000 (0.581) Mem 8978MB [2024-07-29 12:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 12:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:32:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][0/625] eta 0:11:25 lr 0.000801 wd 0.0500 time 1.0965 (1.0965) data time 0.6896 (0.6896) model time 0.0000 (0.0000) loss 5.3971 (5.3971) grad_norm 2.0235 (2.0235) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][10/625] eta 0:03:00 lr 0.000803 wd 0.0500 time 0.2146 (0.2934) data time 0.0008 (0.0638) model time 0.0000 (0.0000) loss 5.3045 (4.8982) grad_norm 1.7296 (1.5862) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][20/625] eta 0:02:34 lr 0.000804 wd 0.0500 time 0.2113 (0.2549) data time 0.0008 (0.0339) model time 0.0000 (0.0000) loss 5.3925 (4.9263) grad_norm 2.1553 (1.5995) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][30/625] eta 0:02:24 lr 0.000806 wd 0.0500 time 0.2229 (0.2421) data time 0.0007 (0.0233) model time 0.0000 (0.0000) loss 5.3488 (4.9922) grad_norm 1.3592 (1.5863) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][40/625] eta 0:02:17 lr 0.000808 wd 0.0500 time 0.2166 (0.2351) data time 0.0008 (0.0179) model time 0.0000 (0.0000) loss 3.8856 (4.9288) grad_norm 2.0247 (1.6923) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][50/625] eta 0:02:12 lr 0.000809 wd 0.0500 time 0.2170 (0.2310) data time 0.0009 (0.0146) model time 0.0000 (0.0000) loss 5.0326 (4.9272) grad_norm 1.1525 (1.7104) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][60/625] eta 0:02:08 lr 0.000811 wd 0.0500 time 0.2060 (0.2281) data time 0.0012 (0.0124) model time 0.2048 (0.2118) loss 5.3592 (4.8954) grad_norm 2.3202 (1.7335) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][70/625] eta 0:02:05 lr 0.000812 wd 0.0500 time 0.2137 (0.2265) data time 0.0010 (0.0108) model time 0.2127 (0.2138) loss 3.5542 (4.8478) grad_norm 1.8356 (1.7639) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][80/625] eta 0:02:02 lr 0.000814 wd 0.0500 time 0.2153 (0.2245) data time 0.0007 (0.0096) model time 0.2146 (0.2123) loss 5.2372 (4.8282) grad_norm 1.3541 (1.7475) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][90/625] eta 0:01:59 lr 0.000816 wd 0.0500 time 0.2201 (0.2232) data time 0.0010 (0.0086) model time 0.2191 (0.2122) loss 4.9025 (4.8279) grad_norm 1.3495 (1.7297) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][100/625] eta 0:01:56 lr 0.000817 wd 0.0500 time 0.2111 (0.2222) data time 0.0007 (0.0079) model time 0.2104 (0.2122) loss 5.3581 (4.8260) grad_norm 1.6023 (1.7061) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][110/625] eta 0:01:54 lr 0.000819 wd 0.0500 time 0.2075 (0.2215) data time 0.0011 (0.0073) model time 0.2064 (0.2124) loss 3.9967 (4.7898) grad_norm 2.5386 (1.7242) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][120/625] eta 0:01:51 lr 0.000820 wd 0.0500 time 0.2213 (0.2208) data time 0.0010 (0.0068) model time 0.2203 (0.2123) loss 4.7100 (4.7670) grad_norm 1.3335 (1.7113) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][130/625] eta 0:01:48 lr 0.000822 wd 0.0500 time 0.2094 (0.2201) data time 0.0009 (0.0063) model time 0.2085 (0.2121) loss 4.3250 (4.7583) grad_norm 1.6609 (1.7073) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][140/625] eta 0:01:46 lr 0.000824 wd 0.0500 time 0.2335 (0.2199) data time 0.0008 (0.0060) model time 0.2328 (0.2124) loss 5.6418 (4.7639) grad_norm 1.5019 (1.6981) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][150/625] eta 0:01:44 lr 0.000825 wd 0.0500 time 0.2145 (0.2195) data time 0.0008 (0.0056) model time 0.2138 (0.2126) loss 3.6354 (4.7498) grad_norm 2.4581 (1.7092) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][160/625] eta 0:01:41 lr 0.000827 wd 0.0500 time 0.2125 (0.2192) data time 0.0010 (0.0054) model time 0.2116 (0.2127) loss 4.2683 (4.7482) grad_norm 1.7894 (1.7298) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][170/625] eta 0:01:39 lr 0.000828 wd 0.0500 time 0.2099 (0.2188) data time 0.0008 (0.0051) model time 0.2091 (0.2125) loss 3.9075 (4.7322) grad_norm 1.7541 (1.7328) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][180/625] eta 0:01:37 lr 0.000830 wd 0.0500 time 0.2054 (0.2185) data time 0.0011 (0.0049) model time 0.2044 (0.2125) loss 4.0452 (4.7102) grad_norm 1.6271 (1.7193) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][190/625] eta 0:01:35 lr 0.000832 wd 0.0500 time 0.2095 (0.2185) data time 0.0007 (0.0047) model time 0.2088 (0.2128) loss 3.7092 (4.6846) grad_norm 1.1397 (1.7088) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][200/625] eta 0:01:32 lr 0.000833 wd 0.0500 time 0.2149 (0.2181) data time 0.0009 (0.0045) model time 0.2139 (0.2126) loss 4.7674 (4.6852) grad_norm 1.3390 (1.7080) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][210/625] eta 0:01:30 lr 0.000835 wd 0.0500 time 0.2075 (0.2178) data time 0.0011 (0.0044) model time 0.2063 (0.2125) loss 3.5575 (4.6683) grad_norm 1.3968 (1.7065) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][220/625] eta 0:01:28 lr 0.000836 wd 0.0500 time 0.2070 (0.2176) data time 0.0009 (0.0042) model time 0.2061 (0.2124) loss 5.5289 (4.6743) grad_norm 1.5042 (1.7002) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][230/625] eta 0:01:25 lr 0.000838 wd 0.0500 time 0.2129 (0.2174) data time 0.0008 (0.0041) model time 0.2121 (0.2124) loss 4.6273 (4.6687) grad_norm 2.5115 (1.6969) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][240/625] eta 0:01:23 lr 0.000840 wd 0.0500 time 0.2240 (0.2175) data time 0.0011 (0.0040) model time 0.2228 (0.2127) loss 5.0213 (4.6729) grad_norm 1.3396 (1.6981) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][250/625] eta 0:01:21 lr 0.000841 wd 0.0500 time 0.2152 (0.2174) data time 0.0011 (0.0038) model time 0.2141 (0.2128) loss 5.2853 (4.6706) grad_norm 2.1057 (1.7067) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][260/625] eta 0:01:19 lr 0.000843 wd 0.0500 time 0.2066 (0.2172) data time 0.0011 (0.0037) model time 0.2055 (0.2127) loss 4.8261 (4.6861) grad_norm 1.5401 (1.7070) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][270/625] eta 0:01:17 lr 0.000844 wd 0.0500 time 0.2708 (0.2173) data time 0.0009 (0.0036) model time 0.2699 (0.2130) loss 3.6322 (4.6825) grad_norm 2.6347 (1.7229) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][280/625] eta 0:01:14 lr 0.000846 wd 0.0500 time 0.2103 (0.2172) data time 0.0009 (0.0035) model time 0.2094 (0.2131) loss 4.2697 (4.6768) grad_norm 1.7505 (1.7249) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][290/625] eta 0:01:12 lr 0.000848 wd 0.0500 time 0.2061 (0.2171) data time 0.0010 (0.0035) model time 0.2051 (0.2130) loss 4.8413 (4.6789) grad_norm 2.3208 (1.7227) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][300/625] eta 0:01:10 lr 0.000849 wd 0.0500 time 0.2204 (0.2169) data time 0.0010 (0.0034) model time 0.2194 (0.2129) loss 5.3759 (4.6880) grad_norm 1.8454 (1.7253) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][310/625] eta 0:01:08 lr 0.000851 wd 0.0500 time 0.2169 (0.2169) data time 0.0010 (0.0033) model time 0.2158 (0.2130) loss 4.8055 (4.6774) grad_norm 1.6591 (1.7233) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][320/625] eta 0:01:06 lr 0.000852 wd 0.0500 time 0.2190 (0.2168) data time 0.0008 (0.0033) model time 0.2183 (0.2129) loss 5.3326 (4.6864) grad_norm 1.2598 (1.7146) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][330/625] eta 0:01:03 lr 0.000854 wd 0.0500 time 0.2295 (0.2168) data time 0.0008 (0.0032) model time 0.2286 (0.2131) loss 4.6890 (4.6854) grad_norm 1.9167 (1.7078) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][340/625] eta 0:01:01 lr 0.000856 wd 0.0500 time 0.2086 (0.2173) data time 0.0011 (0.0032) model time 0.2075 (0.2138) loss 4.5791 (4.6902) grad_norm 2.4235 (1.7041) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][350/625] eta 0:00:59 lr 0.000857 wd 0.0500 time 0.2037 (0.2178) data time 0.0009 (0.0031) model time 0.2028 (0.2145) loss 3.9517 (4.6832) grad_norm 2.3331 (1.7049) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][360/625] eta 0:00:57 lr 0.000859 wd 0.0500 time 0.2166 (0.2183) data time 0.0011 (0.0030) model time 0.2155 (0.2151) loss 5.1917 (4.6862) grad_norm 1.5420 (1.7017) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][370/625] eta 0:00:55 lr 0.000860 wd 0.0500 time 0.2126 (0.2182) data time 0.0011 (0.0030) model time 0.2115 (0.2150) loss 3.3648 (4.6809) grad_norm 1.2500 (1.6994) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][380/625] eta 0:00:53 lr 0.000862 wd 0.0500 time 0.2179 (0.2180) data time 0.0010 (0.0029) model time 0.2169 (0.2149) loss 5.0970 (4.6866) grad_norm 1.9086 (1.6991) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][390/625] eta 0:00:51 lr 0.000864 wd 0.0500 time 0.2130 (0.2180) data time 0.0007 (0.0029) model time 0.2123 (0.2149) loss 4.2764 (4.6780) grad_norm 2.4843 (1.7032) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][400/625] eta 0:00:49 lr 0.000865 wd 0.0500 time 0.2108 (0.2179) data time 0.0008 (0.0028) model time 0.2100 (0.2149) loss 3.8705 (4.6767) grad_norm 1.7610 (1.7045) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][410/625] eta 0:00:46 lr 0.000867 wd 0.0500 time 0.2071 (0.2179) data time 0.0009 (0.0028) model time 0.2062 (0.2149) loss 4.5657 (4.6692) grad_norm 1.6650 (1.7047) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][420/625] eta 0:00:44 lr 0.000868 wd 0.0500 time 0.2138 (0.2178) data time 0.0011 (0.0028) model time 0.2127 (0.2149) loss 5.2039 (4.6750) grad_norm 1.4258 (1.6997) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][430/625] eta 0:00:42 lr 0.000870 wd 0.0500 time 0.2163 (0.2177) data time 0.0009 (0.0027) model time 0.2154 (0.2147) loss 5.2181 (4.6725) grad_norm 1.5800 (1.6907) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][440/625] eta 0:00:40 lr 0.000872 wd 0.0500 time 0.2221 (0.2176) data time 0.0008 (0.0027) model time 0.2213 (0.2147) loss 4.7340 (4.6735) grad_norm 1.2991 (1.6853) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][450/625] eta 0:00:38 lr 0.000873 wd 0.0500 time 0.2065 (0.2174) data time 0.0011 (0.0027) model time 0.2054 (0.2146) loss 4.5241 (4.6762) grad_norm 1.6632 (1.6894) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][460/625] eta 0:00:35 lr 0.000875 wd 0.0500 time 0.2135 (0.2173) data time 0.0008 (0.0026) model time 0.2127 (0.2145) loss 3.8808 (4.6780) grad_norm 1.4847 (1.6841) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][470/625] eta 0:00:33 lr 0.000876 wd 0.0500 time 0.2057 (0.2172) data time 0.0008 (0.0026) model time 0.2048 (0.2144) loss 5.1667 (4.6693) grad_norm 2.2564 (1.6825) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][480/625] eta 0:00:31 lr 0.000878 wd 0.0500 time 0.2134 (0.2171) data time 0.0007 (0.0026) model time 0.2127 (0.2144) loss 4.8068 (4.6687) grad_norm 1.2321 (1.6886) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][490/625] eta 0:00:29 lr 0.000880 wd 0.0500 time 0.2129 (0.2171) data time 0.0007 (0.0025) model time 0.2121 (0.2143) loss 3.5848 (4.6639) grad_norm 1.4673 (1.6898) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][500/625] eta 0:00:27 lr 0.000881 wd 0.0500 time 0.2194 (0.2170) data time 0.0012 (0.0025) model time 0.2183 (0.2143) loss 5.0301 (4.6637) grad_norm 1.6849 (1.6913) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][510/625] eta 0:00:24 lr 0.000883 wd 0.0500 time 0.2095 (0.2169) data time 0.0010 (0.0025) model time 0.2085 (0.2143) loss 3.5140 (4.6617) grad_norm 1.8161 (1.6947) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][520/625] eta 0:00:22 lr 0.000884 wd 0.0500 time 0.2097 (0.2169) data time 0.0008 (0.0024) model time 0.2088 (0.2142) loss 4.7240 (4.6614) grad_norm 1.7770 (1.6932) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][530/625] eta 0:00:20 lr 0.000886 wd 0.0500 time 0.2085 (0.2168) data time 0.0011 (0.0024) model time 0.2074 (0.2141) loss 4.5388 (4.6664) grad_norm 1.9886 (1.6982) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][540/625] eta 0:00:18 lr 0.000888 wd 0.0500 time 0.2175 (0.2167) data time 0.0010 (0.0024) model time 0.2165 (0.2141) loss 3.8004 (4.6639) grad_norm 1.5307 (1.7012) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][550/625] eta 0:00:16 lr 0.000889 wd 0.0500 time 0.2067 (0.2166) data time 0.0009 (0.0024) model time 0.2058 (0.2140) loss 5.0383 (4.6666) grad_norm 2.2710 (1.7017) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][560/625] eta 0:00:14 lr 0.000891 wd 0.0500 time 0.2072 (0.2165) data time 0.0010 (0.0023) model time 0.2062 (0.2139) loss 5.3146 (4.6736) grad_norm 1.4737 (1.6994) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][570/625] eta 0:00:11 lr 0.000892 wd 0.0500 time 0.2052 (0.2165) data time 0.0009 (0.0023) model time 0.2043 (0.2139) loss 4.4372 (4.6747) grad_norm 2.4751 (1.7073) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][580/625] eta 0:00:09 lr 0.000894 wd 0.0500 time 0.2096 (0.2164) data time 0.0008 (0.0023) model time 0.2088 (0.2139) loss 3.7197 (4.6750) grad_norm 1.3704 (1.7054) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][590/625] eta 0:00:07 lr 0.000896 wd 0.0500 time 0.2187 (0.2164) data time 0.0008 (0.0023) model time 0.2179 (0.2139) loss 4.1503 (4.6756) grad_norm 1.2147 (1.7030) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][600/625] eta 0:00:05 lr 0.000897 wd 0.0500 time 0.2188 (0.2163) data time 0.0010 (0.0023) model time 0.2178 (0.2138) loss 4.9195 (4.6739) grad_norm 1.3571 (1.6982) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][610/625] eta 0:00:03 lr 0.000899 wd 0.0500 time 0.2076 (0.2169) data time 0.0005 (0.0022) model time 0.2071 (0.2145) loss 4.5186 (4.6689) grad_norm 1.2795 (1.7015) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][620/625] eta 0:00:01 lr 0.000900 wd 0.0500 time 0.2092 (0.2168) data time 0.0005 (0.0022) model time 0.2088 (0.2144) loss 4.6792 (4.6743) grad_norm 1.5092 (1.7018) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 8 training takes 0:02:15 [2024-07-29 12:34:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:34:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.508 (0.508) Loss 1.4824 (1.4824) Acc@1 67.627 (67.627) Acc@5 89.111 (89.111) Mem 8978MB [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 2.4238 (1.7229) Acc@1 47.754 (61.670) Acc@5 75.195 (86.488) Mem 8978MB [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 2.5410 (2.0556) Acc@1 47.559 (55.866) Acc@5 71.484 (80.769) Mem 8978MB [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 55.960 Acc@5 80.716 [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 56.0% [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 55.96% [2024-07-29 12:34:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:34:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 6.9062 (6.9062) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8978MB [2024-07-29 12:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 7.1406 (6.8828) Acc@1 0.000 (0.000) Acc@5 0.000 (0.666) Mem 8978MB [2024-07-29 12:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 7.1289 (7.0389) Acc@1 0.000 (0.116) Acc@5 0.000 (0.581) Mem 8978MB [2024-07-29 12:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 12:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][0/625] eta 0:11:48 lr 0.000901 wd 0.0500 time 1.1332 (1.1332) data time 0.5849 (0.5849) model time 0.0000 (0.0000) loss 5.4543 (5.4543) grad_norm 1.3443 (1.3443) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][10/625] eta 0:03:14 lr 0.000903 wd 0.0500 time 0.2120 (0.3159) data time 0.0009 (0.0543) model time 0.0000 (0.0000) loss 4.2399 (4.8397) grad_norm 1.2529 (1.5192) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][20/625] eta 0:02:40 lr 0.000904 wd 0.0500 time 0.2063 (0.2656) data time 0.0010 (0.0289) model time 0.0000 (0.0000) loss 4.0692 (4.8083) grad_norm 1.3507 (1.4696) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][30/625] eta 0:02:27 lr 0.000906 wd 0.0500 time 0.2097 (0.2479) data time 0.0011 (0.0199) model time 0.0000 (0.0000) loss 4.5345 (4.6392) grad_norm 1.3321 (1.5152) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][40/625] eta 0:02:19 lr 0.000907 wd 0.0500 time 0.2096 (0.2390) data time 0.0010 (0.0153) model time 0.0000 (0.0000) loss 4.7188 (4.5473) grad_norm 2.2053 (1.5998) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][50/625] eta 0:02:14 lr 0.000909 wd 0.0500 time 0.2053 (0.2331) data time 0.0012 (0.0125) model time 0.0000 (0.0000) loss 4.9964 (4.5868) grad_norm 1.2707 (1.6077) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][60/625] eta 0:02:09 lr 0.000911 wd 0.0500 time 0.2115 (0.2298) data time 0.0008 (0.0106) model time 0.2107 (0.2117) loss 4.6643 (4.5760) grad_norm 1.6873 (1.6236) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][70/625] eta 0:02:06 lr 0.000912 wd 0.0500 time 0.2178 (0.2273) data time 0.0007 (0.0093) model time 0.2170 (0.2115) loss 4.2572 (4.5855) grad_norm 1.7495 (1.6680) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][80/625] eta 0:02:02 lr 0.000914 wd 0.0500 time 0.2094 (0.2255) data time 0.0009 (0.0083) model time 0.2085 (0.2115) loss 3.4212 (4.5881) grad_norm 1.6180 (1.6506) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][90/625] eta 0:02:00 lr 0.000915 wd 0.0500 time 0.2069 (0.2246) data time 0.0014 (0.0075) model time 0.2055 (0.2126) loss 4.7770 (4.6231) grad_norm 2.1566 (1.6540) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][100/625] eta 0:01:57 lr 0.000917 wd 0.0500 time 0.2101 (0.2237) data time 0.0010 (0.0069) model time 0.2091 (0.2130) loss 5.0913 (4.5942) grad_norm 1.4691 (1.6914) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][110/625] eta 0:01:54 lr 0.000919 wd 0.0500 time 0.2119 (0.2228) data time 0.0012 (0.0064) model time 0.2107 (0.2129) loss 3.9716 (4.6007) grad_norm 1.6075 (1.6921) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][120/625] eta 0:01:52 lr 0.000920 wd 0.0500 time 0.2150 (0.2221) data time 0.0011 (0.0060) model time 0.2139 (0.2128) loss 4.7200 (4.6103) grad_norm 1.9642 (1.6842) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][130/625] eta 0:01:49 lr 0.000922 wd 0.0500 time 0.2087 (0.2217) data time 0.0012 (0.0056) model time 0.2076 (0.2132) loss 4.3029 (4.5923) grad_norm 1.6535 (1.6720) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][140/625] eta 0:01:47 lr 0.000923 wd 0.0500 time 0.2205 (0.2213) data time 0.0007 (0.0053) model time 0.2198 (0.2134) loss 5.2441 (4.5782) grad_norm 1.3736 (1.6612) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][150/625] eta 0:01:44 lr 0.000925 wd 0.0500 time 0.2132 (0.2209) data time 0.0010 (0.0051) model time 0.2122 (0.2133) loss 4.7438 (4.5684) grad_norm 1.6568 (1.6591) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][160/625] eta 0:01:42 lr 0.000927 wd 0.0500 time 0.2309 (0.2206) data time 0.0009 (0.0049) model time 0.2300 (0.2135) loss 4.1593 (4.5718) grad_norm 1.7433 (1.6543) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][170/625] eta 0:01:40 lr 0.000928 wd 0.0500 time 0.2089 (0.2201) data time 0.0011 (0.0047) model time 0.2078 (0.2133) loss 4.0234 (4.5529) grad_norm 1.1363 (1.6531) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][180/625] eta 0:01:37 lr 0.000930 wd 0.0500 time 0.2223 (0.2198) data time 0.0007 (0.0045) model time 0.2216 (0.2132) loss 4.8745 (4.5596) grad_norm 1.8453 (1.6574) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][190/625] eta 0:01:35 lr 0.000931 wd 0.0500 time 0.2072 (0.2194) data time 0.0008 (0.0043) model time 0.2064 (0.2131) loss 5.1427 (4.5614) grad_norm 1.7513 (1.6567) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][200/625] eta 0:01:33 lr 0.000933 wd 0.0500 time 0.2099 (0.2190) data time 0.0012 (0.0041) model time 0.2087 (0.2130) loss 4.6745 (4.5581) grad_norm 1.3370 (1.6472) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][210/625] eta 0:01:30 lr 0.000935 wd 0.0500 time 0.2273 (0.2190) data time 0.0012 (0.0040) model time 0.2262 (0.2132) loss 4.4172 (4.5587) grad_norm 1.7907 (1.6500) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][220/625] eta 0:01:28 lr 0.000936 wd 0.0500 time 0.2187 (0.2190) data time 0.0010 (0.0039) model time 0.2177 (0.2134) loss 4.6977 (4.5511) grad_norm 1.9650 (1.6514) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][230/625] eta 0:01:26 lr 0.000938 wd 0.0500 time 0.2267 (0.2190) data time 0.0011 (0.0038) model time 0.2256 (0.2137) loss 5.1444 (4.5554) grad_norm 2.0097 (1.6494) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][240/625] eta 0:01:24 lr 0.000939 wd 0.0500 time 0.2062 (0.2187) data time 0.0011 (0.0037) model time 0.2051 (0.2135) loss 4.1150 (4.5534) grad_norm 1.7289 (1.6540) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][250/625] eta 0:01:21 lr 0.000941 wd 0.0500 time 0.2064 (0.2186) data time 0.0011 (0.0036) model time 0.2053 (0.2136) loss 4.8681 (4.5442) grad_norm 1.3636 (1.6570) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][260/625] eta 0:01:19 lr 0.000943 wd 0.0500 time 0.2224 (0.2188) data time 0.0010 (0.0036) model time 0.2214 (0.2138) loss 4.3861 (4.5557) grad_norm 1.3980 (1.6521) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][270/625] eta 0:01:17 lr 0.000944 wd 0.0500 time 0.2106 (0.2187) data time 0.0011 (0.0036) model time 0.2095 (0.2138) loss 4.5271 (4.5611) grad_norm 2.2098 (1.6514) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][280/625] eta 0:01:15 lr 0.000946 wd 0.0500 time 0.2035 (0.2185) data time 0.0011 (0.0035) model time 0.2025 (0.2138) loss 3.9091 (4.5672) grad_norm 1.5709 (1.6460) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][290/625] eta 0:01:13 lr 0.000947 wd 0.0500 time 0.2070 (0.2185) data time 0.0010 (0.0035) model time 0.2060 (0.2138) loss 4.5219 (4.5719) grad_norm 1.3300 (1.6411) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][300/625] eta 0:01:10 lr 0.000949 wd 0.0500 time 0.2264 (0.2183) data time 0.0007 (0.0034) model time 0.2257 (0.2138) loss 4.0311 (4.5650) grad_norm 2.2920 (1.6406) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][310/625] eta 0:01:08 lr 0.000951 wd 0.0500 time 0.2090 (0.2182) data time 0.0008 (0.0033) model time 0.2081 (0.2137) loss 5.2315 (4.5691) grad_norm 2.0804 (1.6382) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][320/625] eta 0:01:06 lr 0.000952 wd 0.0500 time 0.2079 (0.2183) data time 0.0015 (0.0032) model time 0.2064 (0.2140) loss 4.8653 (4.5733) grad_norm 1.4964 (1.6315) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][330/625] eta 0:01:04 lr 0.000954 wd 0.0500 time 0.2100 (0.2181) data time 0.0010 (0.0032) model time 0.2089 (0.2139) loss 5.1366 (4.5830) grad_norm 1.8064 (1.6261) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][340/625] eta 0:01:02 lr 0.000955 wd 0.0500 time 0.2036 (0.2180) data time 0.0011 (0.0031) model time 0.2026 (0.2139) loss 4.6343 (4.5804) grad_norm 1.5371 (1.6292) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][350/625] eta 0:00:59 lr 0.000957 wd 0.0500 time 0.2088 (0.2181) data time 0.0010 (0.0031) model time 0.2078 (0.2141) loss 4.7912 (4.5866) grad_norm 1.5970 (1.6257) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][360/625] eta 0:00:57 lr 0.000959 wd 0.0500 time 0.2082 (0.2181) data time 0.0009 (0.0031) model time 0.2073 (0.2141) loss 4.9627 (4.5873) grad_norm 1.1243 (1.6301) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][370/625] eta 0:00:55 lr 0.000960 wd 0.0500 time 0.2112 (0.2179) data time 0.0009 (0.0030) model time 0.2104 (0.2140) loss 4.9313 (4.5940) grad_norm 1.6279 (1.6358) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][380/625] eta 0:00:53 lr 0.000962 wd 0.0500 time 0.2071 (0.2178) data time 0.0007 (0.0030) model time 0.2064 (0.2139) loss 5.3883 (4.5961) grad_norm 1.7811 (1.6430) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][390/625] eta 0:00:51 lr 0.000963 wd 0.0500 time 0.3814 (0.2182) data time 0.0010 (0.0029) model time 0.3804 (0.2145) loss 5.1795 (4.5972) grad_norm 2.3541 (1.6431) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][400/625] eta 0:00:49 lr 0.000965 wd 0.0500 time 0.2143 (0.2185) data time 0.0010 (0.0029) model time 0.2133 (0.2149) loss 4.6236 (4.5901) grad_norm 1.4419 (1.6424) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][410/625] eta 0:00:46 lr 0.000967 wd 0.0500 time 0.2173 (0.2184) data time 0.0010 (0.0028) model time 0.2163 (0.2149) loss 3.5038 (4.5867) grad_norm 1.3426 (1.6381) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 12:36:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 12:36:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:36:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:38:22 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 9) [2024-07-29 12:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][420/625] eta 0:07:02 lr 0.000968 wd 0.0500 time 0.2173 (2.0589) data time 0.0007 (0.1514) model time 0.2166 (1.9074) loss 5.3626 (5.1443) grad_norm 1.6387 (1.7797) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][430/625] eta 0:02:41 lr 0.000970 wd 0.0500 time 0.2125 (0.8280) data time 0.0009 (0.0512) model time 0.2116 (0.7768) loss 4.8891 (4.9046) grad_norm 1.3968 (1.6048) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][440/625] eta 0:01:47 lr 0.000971 wd 0.0500 time 0.2055 (0.5821) data time 0.0009 (0.0312) model time 0.2046 (0.5509) loss 5.1962 (4.9125) grad_norm 1.8822 (1.5275) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][450/625] eta 0:01:23 lr 0.000973 wd 0.0500 time 0.2328 (0.4786) data time 0.0010 (0.0231) model time 0.2318 (0.4555) loss 4.5374 (4.8729) grad_norm 1.5111 (1.5414) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][460/625] eta 0:01:09 lr 0.000975 wd 0.0500 time 0.2051 (0.4192) data time 0.0011 (0.0182) model time 0.2040 (0.4010) loss 4.8765 (4.8336) grad_norm 1.3397 (1.5364) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][470/625] eta 0:00:59 lr 0.000976 wd 0.0500 time 0.2096 (0.3822) data time 0.0007 (0.0151) model time 0.2089 (0.3671) loss 3.7030 (4.7893) grad_norm 2.3688 (1.5308) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][480/625] eta 0:00:51 lr 0.000978 wd 0.0500 time 0.2126 (0.3566) data time 0.0010 (0.0131) model time 0.2117 (0.3436) loss 5.3274 (4.7695) grad_norm 1.7373 (1.5441) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][490/625] eta 0:00:45 lr 0.000979 wd 0.0500 time 0.2127 (0.3387) data time 0.0010 (0.0116) model time 0.2117 (0.3271) loss 3.6649 (4.7081) grad_norm 1.7653 (1.5893) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][500/625] eta 0:00:40 lr 0.000981 wd 0.0500 time 0.2045 (0.3244) data time 0.0009 (0.0104) model time 0.2036 (0.3140) loss 4.4474 (4.6804) grad_norm 1.2672 (1.5798) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][510/625] eta 0:00:36 lr 0.000983 wd 0.0500 time 0.2148 (0.3141) data time 0.0009 (0.0099) model time 0.2138 (0.3042) loss 5.0191 (4.6863) grad_norm 2.2817 (1.5902) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][520/625] eta 0:00:31 lr 0.000984 wd 0.0500 time 0.2093 (0.3047) data time 0.0011 (0.0093) model time 0.2081 (0.2954) loss 4.3436 (4.7053) grad_norm 1.2135 (1.5990) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][530/625] eta 0:00:28 lr 0.000986 wd 0.0500 time 0.2154 (0.2970) data time 0.0007 (0.0086) model time 0.2146 (0.2884) loss 3.8105 (4.6877) grad_norm 2.6588 (1.6209) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][540/625] eta 0:00:24 lr 0.000987 wd 0.0500 time 0.2116 (0.2902) data time 0.0007 (0.0080) model time 0.2109 (0.2822) loss 4.3123 (4.6902) grad_norm 1.2036 (1.6497) loss_scale 8192.0000 (8192.0000) mem 8976MB [2024-07-29 12:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 12:39:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:39:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:42:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:42:50 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:43:01 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:43:02 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:43:02 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:43:02 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 9) [2024-07-29 12:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:45:03 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 9) [2024-07-29 12:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][550/625] eta 0:02:07 lr 0.000989 wd 0.0500 time 0.2049 (1.7020) data time 0.0007 (0.1593) model time 0.2043 (1.5427) loss 5.5381 (5.1823) grad_norm 1.4426 (1.3659) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][560/625] eta 0:00:45 lr 0.000991 wd 0.0500 time 0.2078 (0.7010) data time 0.0009 (0.0537) model time 0.2070 (0.6473) loss 5.1093 (4.9084) grad_norm 1.3860 (1.3936) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][570/625] eta 0:00:27 lr 0.000992 wd 0.0500 time 0.1972 (0.4999) data time 0.0008 (0.0326) model time 0.1964 (0.4673) loss 4.8301 (4.8920) grad_norm 1.5704 (1.4144) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][580/625] eta 0:00:18 lr 0.000994 wd 0.0500 time 0.1966 (0.4145) data time 0.0009 (0.0235) model time 0.1957 (0.3910) loss 4.7130 (4.8560) grad_norm 1.8331 (1.5067) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][590/625] eta 0:00:12 lr 0.000995 wd 0.0500 time 0.1997 (0.3684) data time 0.0008 (0.0187) model time 0.1989 (0.3497) loss 4.8068 (4.7859) grad_norm 1.6719 (1.5810) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][600/625] eta 0:00:08 lr 0.000997 wd 0.0500 time 0.1983 (0.3378) data time 0.0006 (0.0155) model time 0.1976 (0.3223) loss 3.3508 (4.7442) grad_norm 2.0206 (1.5951) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][610/625] eta 0:00:04 lr 0.000999 wd 0.0500 time 0.2155 (0.3168) data time 0.0006 (0.0133) model time 0.2149 (0.3035) loss 4.8362 (4.7282) grad_norm 1.2525 (1.5975) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][620/625] eta 0:00:01 lr 0.001000 wd 0.0500 time 0.2112 (0.3018) data time 0.0006 (0.0116) model time 0.2107 (0.2902) loss 3.5274 (4.6543) grad_norm 1.3954 (1.5978) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 12:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 9 training takes 0:00:23 [2024-07-29 12:45:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:45:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.395 (0.395) Loss 1.3398 (1.3398) Acc@1 71.924 (71.924) Acc@5 89.990 (89.990) Mem 8977MB [2024-07-29 12:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 2.2266 (1.6174) Acc@1 52.734 (63.761) Acc@5 77.783 (87.522) Mem 8977MB [2024-07-29 12:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 2.4570 (1.9302) Acc@1 46.338 (57.929) Acc@5 74.463 (82.171) Mem 8977MB [2024-07-29 12:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 57.815 Acc@5 82.198 [2024-07-29 12:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 57.8% [2024-07-29 12:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 57.82% [2024-07-29 12:45:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:45:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.402 (0.402) Loss 6.9453 (6.9453) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8977MB [2024-07-29 12:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 7.1172 (6.8949) Acc@1 0.000 (0.000) Acc@5 0.000 (0.444) Mem 8977MB [2024-07-29 12:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 7.1211 (7.0270) Acc@1 0.000 (0.116) Acc@5 0.000 (0.581) Mem 8977MB [2024-07-29 12:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 12:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.10% [2024-07-29 12:45:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 12:45:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 12:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][0/625] eta 0:07:43 lr 0.001001 wd 0.0500 time 0.7413 (0.7413) data time 0.4243 (0.4243) model time 0.0000 (0.0000) loss 4.7936 (4.7936) grad_norm 1.7239 (1.7239) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 12:45:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][10/625] eta 0:02:39 lr 0.001003 wd 0.0500 time 0.2088 (0.2597) data time 0.0006 (0.0401) model time 0.0000 (0.0000) loss 5.3407 (4.5715) grad_norm 2.3625 (1.6505) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][20/625] eta 0:02:22 lr 0.001004 wd 0.0500 time 0.2001 (0.2351) data time 0.0008 (0.0214) model time 0.0000 (0.0000) loss 5.0269 (4.6864) grad_norm 1.5142 (1.7082) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][30/625] eta 0:02:13 lr 0.001006 wd 0.0500 time 0.2035 (0.2237) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 4.5279 (4.7107) grad_norm 1.2947 (1.6127) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][40/625] eta 0:02:07 lr 0.001007 wd 0.0500 time 0.2045 (0.2183) data time 0.0006 (0.0114) model time 0.0000 (0.0000) loss 5.4042 (4.7136) grad_norm 1.2751 (1.5936) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][50/625] eta 0:02:03 lr 0.001009 wd 0.0500 time 0.2008 (0.2151) data time 0.0006 (0.0094) model time 0.0000 (0.0000) loss 4.4851 (4.6104) grad_norm 1.1478 (1.5628) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][60/625] eta 0:02:00 lr 0.001011 wd 0.0500 time 0.1997 (0.2126) data time 0.0007 (0.0080) model time 0.1990 (0.1992) loss 3.2294 (4.5941) grad_norm 1.3769 (1.6202) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][70/625] eta 0:01:57 lr 0.001012 wd 0.0500 time 0.2052 (0.2110) data time 0.0008 (0.0070) model time 0.2044 (0.1996) loss 4.6668 (4.5758) grad_norm 1.3530 (1.6197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][80/625] eta 0:01:54 lr 0.001014 wd 0.0500 time 0.2008 (0.2097) data time 0.0009 (0.0062) model time 0.1999 (0.1998) loss 4.8298 (4.5791) grad_norm 1.3672 (1.5932) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][90/625] eta 0:01:52 lr 0.001015 wd 0.0500 time 0.1989 (0.2095) data time 0.0007 (0.0056) model time 0.1981 (0.2016) loss 3.5904 (4.5668) grad_norm 2.9545 (1.6135) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][100/625] eta 0:01:50 lr 0.001017 wd 0.0500 time 0.2009 (0.2099) data time 0.0007 (0.0053) model time 0.2003 (0.2034) loss 3.4780 (4.5398) grad_norm 1.4842 (1.6087) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][110/625] eta 0:01:47 lr 0.001019 wd 0.0500 time 0.2057 (0.2093) data time 0.0007 (0.0049) model time 0.2050 (0.2032) loss 3.8968 (4.5438) grad_norm 1.7112 (1.5961) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][120/625] eta 0:01:45 lr 0.001020 wd 0.0500 time 0.2008 (0.2086) data time 0.0008 (0.0046) model time 0.2000 (0.2028) loss 4.8202 (4.5315) grad_norm 1.5998 (1.5856) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][130/625] eta 0:01:42 lr 0.001022 wd 0.0500 time 0.2002 (0.2079) data time 0.0006 (0.0043) model time 0.1995 (0.2023) loss 4.5836 (4.5179) grad_norm 1.0408 (1.5756) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][140/625] eta 0:01:40 lr 0.001023 wd 0.0500 time 0.1981 (0.2074) data time 0.0008 (0.0040) model time 0.1972 (0.2021) loss 4.1311 (4.5087) grad_norm 2.2256 (1.5923) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][150/625] eta 0:01:38 lr 0.001025 wd 0.0500 time 0.2028 (0.2071) data time 0.0008 (0.0038) model time 0.2020 (0.2020) loss 5.0366 (4.5184) grad_norm 1.7139 (1.6085) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][160/625] eta 0:01:36 lr 0.001027 wd 0.0500 time 0.2089 (0.2069) data time 0.0008 (0.0037) model time 0.2081 (0.2021) loss 5.0265 (4.5087) grad_norm 1.5161 (1.5985) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 12:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][170/625] eta 0:01:34 lr 0.001028 wd 0.0500 time 0.2006 (0.2067) data time 0.0006 (0.0035) model time 0.2000 (0.2021) loss 3.3717 (4.4893) grad_norm 1.8362 (1.6083) loss_scale 16384.0000 (8671.0643) mem 8975MB [2024-07-29 12:46:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][180/625] eta 0:01:32 lr 0.001030 wd 0.0500 time 0.1852 (0.2080) data time 0.0009 (0.0034) model time 0.1843 (0.2043) loss 3.3889 (4.4769) grad_norm 1.1957 (1.6040) loss_scale 16384.0000 (9097.1934) mem 8975MB [2024-07-29 12:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][190/625] eta 0:01:30 lr 0.001031 wd 0.0500 time 0.1995 (0.2079) data time 0.0007 (0.0032) model time 0.1988 (0.2043) loss 5.5410 (4.4694) grad_norm 1.2297 (1.5975) loss_scale 16384.0000 (9478.7016) mem 8975MB [2024-07-29 12:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][200/625] eta 0:01:28 lr 0.001033 wd 0.0500 time 0.2007 (0.2076) data time 0.0009 (0.0031) model time 0.1998 (0.2041) loss 4.9298 (4.4836) grad_norm 2.2133 (1.5965) loss_scale 16384.0000 (9822.2488) mem 8975MB [2024-07-29 12:46:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][210/625] eta 0:01:26 lr 0.001035 wd 0.0500 time 0.2002 (0.2073) data time 0.0009 (0.0030) model time 0.1993 (0.2038) loss 3.4748 (4.4779) grad_norm 1.4790 (1.5904) loss_scale 16384.0000 (10133.2322) mem 8975MB [2024-07-29 12:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][220/625] eta 0:01:23 lr 0.001036 wd 0.0500 time 0.2010 (0.2070) data time 0.0007 (0.0029) model time 0.2003 (0.2036) loss 4.2028 (4.4617) grad_norm 1.5111 (1.5868) loss_scale 16384.0000 (10416.0724) mem 8975MB [2024-07-29 12:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][230/625] eta 0:01:21 lr 0.001038 wd 0.0500 time 0.2024 (0.2067) data time 0.0008 (0.0028) model time 0.2016 (0.2034) loss 4.0502 (4.4608) grad_norm 4.0304 (1.6129) loss_scale 16384.0000 (10674.4242) mem 8975MB [2024-07-29 12:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][240/625] eta 0:01:19 lr 0.001039 wd 0.0500 time 0.2016 (0.2065) data time 0.0009 (0.0027) model time 0.2007 (0.2032) loss 4.3328 (4.4781) grad_norm 1.7033 (1.6206) loss_scale 16384.0000 (10911.3361) mem 8975MB [2024-07-29 12:46:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][250/625] eta 0:01:17 lr 0.001041 wd 0.0500 time 0.1999 (0.2065) data time 0.0008 (0.0027) model time 0.1991 (0.2033) loss 4.8921 (4.4871) grad_norm 1.9335 (1.6190) loss_scale 16384.0000 (11129.3705) mem 8975MB [2024-07-29 12:46:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][260/625] eta 0:01:15 lr 0.001043 wd 0.0500 time 0.2061 (0.2066) data time 0.0008 (0.0026) model time 0.2053 (0.2035) loss 4.7610 (4.4877) grad_norm 1.1456 (1.6081) loss_scale 16384.0000 (11330.6973) mem 8975MB [2024-07-29 12:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][270/625] eta 0:01:13 lr 0.001044 wd 0.0500 time 0.2030 (0.2066) data time 0.0008 (0.0026) model time 0.2022 (0.2036) loss 4.0471 (4.4888) grad_norm 2.6127 (1.6108) loss_scale 16384.0000 (11517.1661) mem 8975MB [2024-07-29 12:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][280/625] eta 0:01:11 lr 0.001046 wd 0.0500 time 0.2044 (0.2064) data time 0.0007 (0.0025) model time 0.2036 (0.2035) loss 5.6484 (4.4910) grad_norm 1.3120 (1.6173) loss_scale 16384.0000 (11690.3630) mem 8975MB [2024-07-29 12:46:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 12:46:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:46:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 12:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 12:48:54 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 10) [2024-07-29 12:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 12:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][290/625] eta 0:06:45 lr 0.001047 wd 0.0500 time 0.2083 (1.2098) data time 0.0008 (0.1167) model time 0.2075 (1.0931) loss 4.6908 (4.9618) grad_norm 1.3598 (1.4938) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][300/625] eta 0:03:33 lr 0.001049 wd 0.0500 time 0.2138 (0.6568) data time 0.0009 (0.0525) model time 0.2129 (0.6043) loss 5.0833 (4.7585) grad_norm 1.5903 (1.6115) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][310/625] eta 0:02:36 lr 0.001051 wd 0.0500 time 0.2095 (0.4981) data time 0.0010 (0.0342) model time 0.2085 (0.4639) loss 4.9994 (4.8068) grad_norm 1.3139 (1.5624) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][320/625] eta 0:02:08 lr 0.001052 wd 0.0500 time 0.2121 (0.4229) data time 0.0009 (0.0254) model time 0.2111 (0.3975) loss 4.4545 (4.7338) grad_norm 2.3992 (1.6066) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][330/625] eta 0:01:51 lr 0.001054 wd 0.0500 time 0.2239 (0.3794) data time 0.0008 (0.0204) model time 0.2231 (0.3590) loss 4.7508 (4.6902) grad_norm 1.4525 (1.6325) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][340/625] eta 0:01:40 lr 0.001055 wd 0.0500 time 0.2215 (0.3510) data time 0.0009 (0.0171) model time 0.2206 (0.3339) loss 3.7209 (4.6646) grad_norm 1.1482 (1.6375) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][350/625] eta 0:01:30 lr 0.001057 wd 0.0500 time 0.2141 (0.3307) data time 0.0008 (0.0147) model time 0.2133 (0.3160) loss 3.8350 (4.6428) grad_norm 1.3711 (1.5994) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][360/625] eta 0:01:23 lr 0.001059 wd 0.0500 time 0.2278 (0.3157) data time 0.0011 (0.0130) model time 0.2267 (0.3027) loss 4.2505 (4.6103) grad_norm 1.1858 (1.5708) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][370/625] eta 0:01:17 lr 0.001060 wd 0.0500 time 0.2200 (0.3042) data time 0.0012 (0.0117) model time 0.2187 (0.2925) loss 4.8749 (4.5819) grad_norm 1.5273 (1.6059) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][380/625] eta 0:01:12 lr 0.001062 wd 0.0500 time 0.2110 (0.2951) data time 0.0009 (0.0106) model time 0.2102 (0.2845) loss 5.2801 (4.5809) grad_norm 1.4959 (1.6098) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][390/625] eta 0:01:07 lr 0.001063 wd 0.0500 time 0.2099 (0.2876) data time 0.0009 (0.0097) model time 0.2091 (0.2779) loss 3.3793 (4.5896) grad_norm 2.1325 (1.6143) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][400/625] eta 0:01:03 lr 0.001065 wd 0.0500 time 0.2164 (0.2813) data time 0.0010 (0.0090) model time 0.2154 (0.2723) loss 4.0044 (4.5809) grad_norm 1.4124 (1.6046) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][410/625] eta 0:00:59 lr 0.001067 wd 0.0500 time 0.2130 (0.2760) data time 0.0008 (0.0084) model time 0.2122 (0.2676) loss 4.7754 (4.5700) grad_norm 1.3277 (1.5978) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][420/625] eta 0:00:55 lr 0.001068 wd 0.0500 time 0.2206 (0.2716) data time 0.0012 (0.0079) model time 0.2194 (0.2638) loss 4.4720 (4.5648) grad_norm 1.3444 (1.6061) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][430/625] eta 0:00:52 lr 0.001070 wd 0.0500 time 0.2180 (0.2677) data time 0.0009 (0.0074) model time 0.2171 (0.2603) loss 4.6620 (4.5539) grad_norm 2.4986 (1.6034) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][440/625] eta 0:00:48 lr 0.001071 wd 0.0500 time 0.2201 (0.2643) data time 0.0008 (0.0070) model time 0.2193 (0.2573) loss 3.5213 (4.5481) grad_norm 1.3446 (1.5973) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][450/625] eta 0:00:45 lr 0.001073 wd 0.0500 time 0.2212 (0.2617) data time 0.0010 (0.0067) model time 0.2202 (0.2550) loss 5.2166 (4.5539) grad_norm 1.8095 (1.5955) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][460/625] eta 0:00:42 lr 0.001075 wd 0.0500 time 0.2152 (0.2590) data time 0.0009 (0.0064) model time 0.2143 (0.2526) loss 3.8754 (4.5359) grad_norm 1.0896 (1.5898) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][470/625] eta 0:00:39 lr 0.001076 wd 0.0500 time 0.2172 (0.2568) data time 0.0010 (0.0061) model time 0.2162 (0.2507) loss 5.1777 (4.5288) grad_norm 2.5444 (1.5885) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][480/625] eta 0:00:36 lr 0.001078 wd 0.0500 time 0.2188 (0.2547) data time 0.0010 (0.0058) model time 0.2178 (0.2488) loss 3.5290 (4.5188) grad_norm 1.3627 (1.5831) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][490/625] eta 0:00:34 lr 0.001079 wd 0.0500 time 0.2136 (0.2526) data time 0.0010 (0.0056) model time 0.2126 (0.2470) loss 4.9643 (4.5068) grad_norm 1.2289 (1.5712) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][500/625] eta 0:00:31 lr 0.001081 wd 0.0500 time 0.2192 (0.2509) data time 0.0010 (0.0054) model time 0.2182 (0.2455) loss 4.2071 (4.4996) grad_norm 2.0930 (1.5753) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][510/625] eta 0:00:28 lr 0.001083 wd 0.0500 time 0.2097 (0.2493) data time 0.0011 (0.0052) model time 0.2087 (0.2441) loss 5.1054 (4.5084) grad_norm 1.4699 (1.5776) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][520/625] eta 0:00:26 lr 0.001084 wd 0.0500 time 0.2086 (0.2479) data time 0.0009 (0.0050) model time 0.2077 (0.2429) loss 5.1095 (4.4981) grad_norm 1.3903 (1.5744) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][530/625] eta 0:00:23 lr 0.001086 wd 0.0500 time 0.2076 (0.2465) data time 0.0011 (0.0049) model time 0.2064 (0.2416) loss 3.5957 (4.4863) grad_norm 1.2973 (1.5736) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][540/625] eta 0:00:20 lr 0.001087 wd 0.0500 time 0.2143 (0.2452) data time 0.0010 (0.0047) model time 0.2133 (0.2405) loss 4.1919 (4.4658) grad_norm 1.1277 (1.5707) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][550/625] eta 0:00:18 lr 0.001089 wd 0.0500 time 0.2185 (0.2441) data time 0.0009 (0.0046) model time 0.2176 (0.2395) loss 4.8076 (4.4595) grad_norm 1.3035 (1.5821) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][560/625] eta 0:00:15 lr 0.001091 wd 0.0500 time 0.2164 (0.2430) data time 0.0010 (0.0045) model time 0.2154 (0.2386) loss 3.0785 (4.4619) grad_norm 1.3046 (1.5900) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][570/625] eta 0:00:13 lr 0.001092 wd 0.0500 time 0.2147 (0.2419) data time 0.0008 (0.0043) model time 0.2139 (0.2376) loss 4.9560 (4.4662) grad_norm 1.3174 (1.5859) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][580/625] eta 0:00:10 lr 0.001094 wd 0.0500 time 0.2179 (0.2409) data time 0.0011 (0.0042) model time 0.2168 (0.2367) loss 4.2819 (4.4486) grad_norm 1.4190 (1.5814) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][590/625] eta 0:00:08 lr 0.001095 wd 0.0500 time 0.2121 (0.2400) data time 0.0008 (0.0041) model time 0.2113 (0.2359) loss 4.0313 (4.4439) grad_norm 2.0126 (1.5731) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][600/625] eta 0:00:05 lr 0.001097 wd 0.0500 time 0.2297 (0.2393) data time 0.0012 (0.0040) model time 0.2285 (0.2353) loss 3.9611 (4.4538) grad_norm 1.2170 (1.5699) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][610/625] eta 0:00:03 lr 0.001099 wd 0.0500 time 0.2062 (0.2385) data time 0.0006 (0.0040) model time 0.2056 (0.2346) loss 4.7196 (4.4632) grad_norm 1.9699 (1.5635) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][620/625] eta 0:00:01 lr 0.001100 wd 0.0500 time 0.2133 (0.2377) data time 0.0008 (0.0039) model time 0.2125 (0.2339) loss 4.6290 (4.4601) grad_norm 1.1575 (1.5646) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 12:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 10 training takes 0:01:21 [2024-07-29 12:50:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:50:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 1.3359 (1.3359) Acc@1 72.754 (72.754) Acc@5 90.283 (90.283) Mem 8977MB [2024-07-29 12:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.096) Loss 2.2090 (1.5850) Acc@1 53.467 (64.968) Acc@5 78.662 (88.210) Mem 8977MB [2024-07-29 12:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 2.3438 (1.8893) Acc@1 51.172 (59.115) Acc@5 74.561 (83.052) Mem 8977MB [2024-07-29 12:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 59.097 Acc@5 83.065 [2024-07-29 12:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 59.1% [2024-07-29 12:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 59.10% [2024-07-29 12:50:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:50:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 6.9375 (6.9375) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8977MB [2024-07-29 12:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.097) Loss 7.0742 (6.9027) Acc@1 0.000 (0.000) Acc@5 0.000 (0.195) Mem 8977MB [2024-07-29 12:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 7.0938 (7.0099) Acc@1 0.000 (0.116) Acc@5 0.000 (0.549) Mem 8977MB [2024-07-29 12:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.566 [2024-07-29 12:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.10% [2024-07-29 12:50:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 12:50:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 12:50:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][0/625] eta 0:07:39 lr 0.001101 wd 0.0500 time 0.7356 (0.7356) data time 0.4534 (0.4534) model time 0.0000 (0.0000) loss 4.4155 (4.4155) grad_norm 1.1347 (1.1347) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 12:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][10/625] eta 0:02:41 lr 0.001102 wd 0.0500 time 0.2068 (0.2624) data time 0.0009 (0.0422) model time 0.0000 (0.0000) loss 4.8239 (4.4407) grad_norm 1.2214 (1.4425) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][20/625] eta 0:02:24 lr 0.001104 wd 0.0500 time 0.2175 (0.2393) data time 0.0008 (0.0226) model time 0.0000 (0.0000) loss 4.3859 (4.3876) grad_norm 1.4450 (1.5822) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][30/625] eta 0:02:16 lr 0.001106 wd 0.0500 time 0.2124 (0.2302) data time 0.0009 (0.0157) model time 0.0000 (0.0000) loss 4.8571 (4.3529) grad_norm 1.2919 (1.6123) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][40/625] eta 0:02:12 lr 0.001107 wd 0.0500 time 0.2258 (0.2267) data time 0.0011 (0.0121) model time 0.0000 (0.0000) loss 3.4094 (4.3361) grad_norm 1.1654 (1.6104) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][50/625] eta 0:02:08 lr 0.001109 wd 0.0500 time 0.2125 (0.2242) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 4.7680 (4.3118) grad_norm 1.5595 (1.5609) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][60/625] eta 0:02:05 lr 0.001110 wd 0.0500 time 0.2174 (0.2228) data time 0.0011 (0.0085) model time 0.2163 (0.2146) loss 4.0065 (4.3801) grad_norm 1.3443 (1.5860) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][70/625] eta 0:02:04 lr 0.001112 wd 0.0500 time 0.2075 (0.2245) data time 0.0011 (0.0075) model time 0.2065 (0.2240) loss 4.5461 (4.4156) grad_norm 1.0596 (1.5620) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][80/625] eta 0:02:01 lr 0.001114 wd 0.0500 time 0.2164 (0.2237) data time 0.0009 (0.0067) model time 0.2155 (0.2217) loss 3.5442 (4.3894) grad_norm 1.1890 (1.5874) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][90/625] eta 0:01:59 lr 0.001115 wd 0.0500 time 0.2100 (0.2229) data time 0.0008 (0.0061) model time 0.2093 (0.2201) loss 5.2533 (4.4430) grad_norm 1.3209 (1.5527) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][100/625] eta 0:01:56 lr 0.001117 wd 0.0500 time 0.2232 (0.2225) data time 0.0008 (0.0056) model time 0.2224 (0.2196) loss 4.7091 (4.4592) grad_norm 1.4141 (1.5406) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][110/625] eta 0:01:54 lr 0.001118 wd 0.0500 time 0.2043 (0.2220) data time 0.0009 (0.0052) model time 0.2034 (0.2189) loss 3.6148 (4.4386) grad_norm 1.7265 (1.5183) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][120/625] eta 0:01:51 lr 0.001120 wd 0.0500 time 0.2174 (0.2214) data time 0.0008 (0.0048) model time 0.2165 (0.2182) loss 3.8627 (4.4154) grad_norm 1.2626 (1.5150) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][130/625] eta 0:01:49 lr 0.001122 wd 0.0500 time 0.2076 (0.2209) data time 0.0009 (0.0045) model time 0.2067 (0.2176) loss 4.1293 (4.3813) grad_norm 1.4835 (1.5314) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][140/625] eta 0:01:47 lr 0.001123 wd 0.0500 time 0.2452 (0.2207) data time 0.0008 (0.0043) model time 0.2445 (0.2175) loss 5.2487 (4.3897) grad_norm 1.3475 (1.5160) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][150/625] eta 0:01:44 lr 0.001125 wd 0.0500 time 0.2208 (0.2204) data time 0.0008 (0.0041) model time 0.2200 (0.2173) loss 3.2645 (4.3862) grad_norm 1.7870 (1.5263) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][160/625] eta 0:01:42 lr 0.001126 wd 0.0500 time 0.2112 (0.2200) data time 0.0007 (0.0039) model time 0.2105 (0.2170) loss 3.9744 (4.3763) grad_norm 1.0599 (1.5127) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][170/625] eta 0:01:39 lr 0.001128 wd 0.0500 time 0.2129 (0.2196) data time 0.0008 (0.0037) model time 0.2121 (0.2166) loss 4.9589 (4.4133) grad_norm 2.2112 (1.5356) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][180/625] eta 0:01:37 lr 0.001130 wd 0.0500 time 0.2181 (0.2196) data time 0.0011 (0.0036) model time 0.2170 (0.2167) loss 4.5273 (4.4019) grad_norm 1.2491 (1.5530) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][190/625] eta 0:01:35 lr 0.001131 wd 0.0500 time 0.2128 (0.2193) data time 0.0009 (0.0035) model time 0.2119 (0.2164) loss 5.0747 (4.3993) grad_norm 1.7376 (1.5668) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][200/625] eta 0:01:33 lr 0.001133 wd 0.0500 time 0.2149 (0.2192) data time 0.0011 (0.0033) model time 0.2138 (0.2163) loss 4.3799 (4.3962) grad_norm 1.9969 (1.5641) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][210/625] eta 0:01:31 lr 0.001134 wd 0.0500 time 0.2158 (0.2204) data time 0.0008 (0.0032) model time 0.2150 (0.2180) loss 5.1687 (4.4091) grad_norm 1.5469 (1.5760) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][220/625] eta 0:01:29 lr 0.001136 wd 0.0500 time 0.2071 (0.2200) data time 0.0011 (0.0031) model time 0.2060 (0.2177) loss 4.2673 (4.4182) grad_norm 1.2365 (1.5692) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][230/625] eta 0:01:26 lr 0.001138 wd 0.0500 time 0.2354 (0.2199) data time 0.0007 (0.0030) model time 0.2347 (0.2176) loss 3.9424 (4.4185) grad_norm 1.5151 (1.5642) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][240/625] eta 0:01:24 lr 0.001139 wd 0.0500 time 0.2185 (0.2196) data time 0.0010 (0.0030) model time 0.2175 (0.2173) loss 4.8620 (4.4272) grad_norm 1.2910 (1.5622) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][250/625] eta 0:01:22 lr 0.001141 wd 0.0500 time 0.2201 (0.2194) data time 0.0010 (0.0029) model time 0.2191 (0.2171) loss 3.8869 (4.4250) grad_norm 1.6934 (1.5592) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][260/625] eta 0:01:20 lr 0.001142 wd 0.0500 time 0.2162 (0.2193) data time 0.0010 (0.0028) model time 0.2152 (0.2170) loss 3.2919 (4.4218) grad_norm 1.4958 (1.5597) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][270/625] eta 0:01:17 lr 0.001144 wd 0.0500 time 0.2225 (0.2192) data time 0.0012 (0.0028) model time 0.2214 (0.2169) loss 4.9197 (4.4260) grad_norm 1.6546 (1.5657) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][280/625] eta 0:01:15 lr 0.001146 wd 0.0500 time 0.2247 (0.2191) data time 0.0007 (0.0027) model time 0.2240 (0.2169) loss 5.1844 (4.4311) grad_norm 1.3762 (1.5621) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][290/625] eta 0:01:13 lr 0.001147 wd 0.0500 time 0.2126 (0.2189) data time 0.0009 (0.0026) model time 0.2117 (0.2167) loss 5.1520 (4.4332) grad_norm 1.2095 (1.5528) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][300/625] eta 0:01:11 lr 0.001149 wd 0.0500 time 0.2138 (0.2187) data time 0.0008 (0.0026) model time 0.2131 (0.2165) loss 5.3867 (4.4241) grad_norm 1.0152 (1.5558) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][310/625] eta 0:01:08 lr 0.001150 wd 0.0500 time 0.2146 (0.2186) data time 0.0007 (0.0025) model time 0.2138 (0.2164) loss 5.2736 (4.4207) grad_norm 1.4508 (1.5533) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][320/625] eta 0:01:06 lr 0.001152 wd 0.0500 time 0.2217 (0.2185) data time 0.0010 (0.0025) model time 0.2207 (0.2163) loss 3.9371 (4.4086) grad_norm 1.3490 (1.5465) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][330/625] eta 0:01:04 lr 0.001154 wd 0.0500 time 0.2167 (0.2183) data time 0.0009 (0.0025) model time 0.2158 (0.2161) loss 5.1739 (4.4172) grad_norm 1.3334 (1.5497) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][340/625] eta 0:01:02 lr 0.001155 wd 0.0500 time 0.2261 (0.2182) data time 0.0008 (0.0024) model time 0.2253 (0.2160) loss 4.4796 (4.4161) grad_norm 1.9634 (1.5533) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][350/625] eta 0:00:59 lr 0.001157 wd 0.0500 time 0.2208 (0.2181) data time 0.0010 (0.0024) model time 0.2198 (0.2160) loss 4.8319 (4.4120) grad_norm 2.2450 (1.5580) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][360/625] eta 0:00:57 lr 0.001158 wd 0.0500 time 0.2140 (0.2180) data time 0.0007 (0.0023) model time 0.2133 (0.2159) loss 5.0858 (4.4077) grad_norm 1.3574 (1.5583) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][370/625] eta 0:00:55 lr 0.001160 wd 0.0500 time 0.2171 (0.2178) data time 0.0009 (0.0023) model time 0.2162 (0.2157) loss 3.9449 (4.4008) grad_norm 2.2638 (1.5556) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][380/625] eta 0:00:53 lr 0.001162 wd 0.0500 time 0.2158 (0.2178) data time 0.0007 (0.0023) model time 0.2150 (0.2157) loss 4.8862 (4.3982) grad_norm 1.3988 (1.5640) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][390/625] eta 0:00:51 lr 0.001163 wd 0.0500 time 0.2144 (0.2178) data time 0.0010 (0.0022) model time 0.2134 (0.2157) loss 4.5504 (4.4022) grad_norm 1.3345 (1.5648) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][400/625] eta 0:00:48 lr 0.001165 wd 0.0500 time 0.2255 (0.2177) data time 0.0012 (0.0022) model time 0.2243 (0.2157) loss 4.5673 (4.4108) grad_norm 1.2585 (1.5608) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][410/625] eta 0:00:46 lr 0.001166 wd 0.0500 time 0.2196 (0.2176) data time 0.0010 (0.0022) model time 0.2186 (0.2156) loss 3.6126 (4.4094) grad_norm 1.4845 (1.5577) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][420/625] eta 0:00:44 lr 0.001168 wd 0.0500 time 0.2064 (0.2177) data time 0.0010 (0.0022) model time 0.2054 (0.2157) loss 4.3623 (4.4123) grad_norm 1.2876 (1.5561) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][430/625] eta 0:00:42 lr 0.001170 wd 0.0500 time 0.2202 (0.2176) data time 0.0007 (0.0021) model time 0.2195 (0.2157) loss 5.1021 (4.4159) grad_norm 1.7770 (1.5573) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][440/625] eta 0:00:40 lr 0.001171 wd 0.0500 time 0.2159 (0.2176) data time 0.0012 (0.0021) model time 0.2147 (0.2156) loss 4.9292 (4.4165) grad_norm 1.4474 (1.5572) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][450/625] eta 0:00:38 lr 0.001173 wd 0.0500 time 0.2121 (0.2175) data time 0.0010 (0.0021) model time 0.2111 (0.2156) loss 4.7588 (4.4123) grad_norm 1.4077 (1.5600) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][460/625] eta 0:00:35 lr 0.001174 wd 0.0500 time 0.2214 (0.2175) data time 0.0010 (0.0021) model time 0.2204 (0.2156) loss 3.9705 (4.4105) grad_norm 1.1310 (1.5552) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][470/625] eta 0:00:33 lr 0.001176 wd 0.0500 time 0.2143 (0.2174) data time 0.0010 (0.0020) model time 0.2133 (0.2155) loss 3.2260 (4.3988) grad_norm 1.5456 (1.5558) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][480/625] eta 0:00:31 lr 0.001178 wd 0.0500 time 0.2199 (0.2173) data time 0.0010 (0.0020) model time 0.2189 (0.2154) loss 3.8870 (4.4027) grad_norm 1.3836 (1.5517) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][490/625] eta 0:00:29 lr 0.001179 wd 0.0500 time 0.2150 (0.2173) data time 0.0007 (0.0020) model time 0.2143 (0.2153) loss 4.5609 (4.3979) grad_norm 1.3842 (1.5580) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][500/625] eta 0:00:27 lr 0.001181 wd 0.0500 time 0.2093 (0.2172) data time 0.0008 (0.0020) model time 0.2085 (0.2153) loss 5.0653 (4.3938) grad_norm 1.2250 (1.5575) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][510/625] eta 0:00:24 lr 0.001182 wd 0.0500 time 0.2131 (0.2171) data time 0.0008 (0.0020) model time 0.2123 (0.2152) loss 4.4125 (4.3907) grad_norm 1.0055 (1.5528) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][520/625] eta 0:00:22 lr 0.001184 wd 0.0500 time 0.2142 (0.2171) data time 0.0011 (0.0020) model time 0.2131 (0.2152) loss 2.9733 (4.3883) grad_norm 1.2366 (1.5499) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][530/625] eta 0:00:20 lr 0.001186 wd 0.0500 time 0.2164 (0.2171) data time 0.0009 (0.0019) model time 0.2154 (0.2152) loss 4.8067 (4.3893) grad_norm 1.7923 (1.5486) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][540/625] eta 0:00:18 lr 0.001187 wd 0.0500 time 0.2167 (0.2171) data time 0.0011 (0.0019) model time 0.2156 (0.2152) loss 4.3075 (4.3861) grad_norm 1.3095 (1.5495) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][550/625] eta 0:00:16 lr 0.001189 wd 0.0500 time 0.2233 (0.2170) data time 0.0007 (0.0019) model time 0.2225 (0.2152) loss 4.4978 (4.3853) grad_norm 1.2747 (1.5496) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][560/625] eta 0:00:14 lr 0.001190 wd 0.0500 time 0.2137 (0.2170) data time 0.0012 (0.0019) model time 0.2125 (0.2152) loss 4.8029 (4.3888) grad_norm 1.1365 (1.5437) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][570/625] eta 0:00:11 lr 0.001192 wd 0.0500 time 0.2138 (0.2170) data time 0.0009 (0.0019) model time 0.2130 (0.2152) loss 4.9716 (4.3867) grad_norm 1.5084 (1.5421) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][580/625] eta 0:00:09 lr 0.001194 wd 0.0500 time 0.2123 (0.2170) data time 0.0012 (0.0019) model time 0.2111 (0.2152) loss 4.6483 (4.3914) grad_norm 1.1958 (1.5362) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][590/625] eta 0:00:07 lr 0.001195 wd 0.0500 time 0.2077 (0.2169) data time 0.0009 (0.0019) model time 0.2068 (0.2151) loss 4.4334 (4.3966) grad_norm 2.8011 (1.5373) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][600/625] eta 0:00:05 lr 0.001197 wd 0.0500 time 0.2097 (0.2169) data time 0.0011 (0.0018) model time 0.2086 (0.2151) loss 3.5117 (4.3894) grad_norm 1.6719 (1.5458) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][610/625] eta 0:00:03 lr 0.001198 wd 0.0500 time 0.2171 (0.2169) data time 0.0005 (0.0018) model time 0.2166 (0.2151) loss 4.3684 (4.3828) grad_norm 1.2841 (1.5450) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][620/625] eta 0:00:01 lr 0.001200 wd 0.0500 time 0.2124 (0.2170) data time 0.0008 (0.0018) model time 0.2117 (0.2153) loss 4.6866 (4.3803) grad_norm 1.2198 (1.5403) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 11 training takes 0:02:15 [2024-07-29 12:52:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:52:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.566 (0.566) Loss 1.3291 (1.3291) Acc@1 70.801 (70.801) Acc@5 90.283 (90.283) Mem 8975MB [2024-07-29 12:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.108) Loss 2.1074 (1.4897) Acc@1 55.029 (66.451) Acc@5 80.176 (89.373) Mem 8975MB [2024-07-29 12:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 2.2852 (1.7859) Acc@1 51.465 (60.921) Acc@5 76.807 (84.584) Mem 8975MB [2024-07-29 12:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 60.779 Acc@5 84.489 [2024-07-29 12:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 60.8% [2024-07-29 12:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 60.78% [2024-07-29 12:53:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:53:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:53:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 6.9414 (6.9414) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 12:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.067 (0.104) Loss 7.0234 (6.9215) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 12:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 7.0312 (6.9944) Acc@1 0.000 (0.116) Acc@5 0.000 (0.465) Mem 8975MB [2024-07-29 12:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.100 Acc@5 0.500 [2024-07-29 12:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][0/625] eta 0:12:20 lr 0.001201 wd 0.0500 time 1.1846 (1.1846) data time 0.7310 (0.7310) model time 0.0000 (0.0000) loss 4.3337 (4.3337) grad_norm 1.1616 (1.1616) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][10/625] eta 0:03:06 lr 0.001202 wd 0.0500 time 0.2148 (0.3035) data time 0.0010 (0.0675) model time 0.0000 (0.0000) loss 3.5077 (4.3577) grad_norm 2.5142 (1.6406) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][20/625] eta 0:02:37 lr 0.001204 wd 0.0500 time 0.2118 (0.2609) data time 0.0012 (0.0359) model time 0.0000 (0.0000) loss 2.8017 (4.2887) grad_norm 1.5925 (1.6256) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][30/625] eta 0:02:26 lr 0.001206 wd 0.0500 time 0.2196 (0.2458) data time 0.0010 (0.0247) model time 0.0000 (0.0000) loss 4.8415 (4.3192) grad_norm 1.6623 (1.6154) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][40/625] eta 0:02:19 lr 0.001207 wd 0.0500 time 0.2192 (0.2383) data time 0.0008 (0.0189) model time 0.0000 (0.0000) loss 5.2512 (4.3452) grad_norm 1.3568 (1.6393) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][50/625] eta 0:02:14 lr 0.001209 wd 0.0500 time 0.2101 (0.2337) data time 0.0008 (0.0154) model time 0.0000 (0.0000) loss 3.6558 (4.3524) grad_norm 1.1000 (1.5931) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][60/625] eta 0:02:10 lr 0.001210 wd 0.0500 time 0.2227 (0.2307) data time 0.0011 (0.0131) model time 0.2216 (0.2140) loss 3.7040 (4.3012) grad_norm 1.3804 (1.5674) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][70/625] eta 0:02:06 lr 0.001212 wd 0.0500 time 0.2177 (0.2287) data time 0.0007 (0.0114) model time 0.2170 (0.2148) loss 5.1861 (4.3427) grad_norm 1.0474 (1.5430) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][80/625] eta 0:02:03 lr 0.001214 wd 0.0500 time 0.2158 (0.2269) data time 0.0010 (0.0101) model time 0.2149 (0.2141) loss 3.8653 (4.3404) grad_norm 1.1784 (1.5279) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][90/625] eta 0:02:00 lr 0.001215 wd 0.0500 time 0.2078 (0.2252) data time 0.0009 (0.0091) model time 0.2069 (0.2131) loss 5.4124 (4.3670) grad_norm 1.4669 (1.5156) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][100/625] eta 0:01:57 lr 0.001217 wd 0.0500 time 0.2097 (0.2241) data time 0.0010 (0.0083) model time 0.2087 (0.2130) loss 4.6861 (4.3797) grad_norm 1.3507 (1.5533) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][110/625] eta 0:01:54 lr 0.001218 wd 0.0500 time 0.2159 (0.2230) data time 0.0010 (0.0077) model time 0.2149 (0.2128) loss 4.4153 (4.3624) grad_norm 1.6205 (1.5716) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][120/625] eta 0:01:52 lr 0.001220 wd 0.0500 time 0.2240 (0.2226) data time 0.0007 (0.0071) model time 0.2233 (0.2133) loss 2.9673 (4.3397) grad_norm 1.2234 (1.5728) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][130/625] eta 0:01:49 lr 0.001222 wd 0.0500 time 0.2176 (0.2221) data time 0.0008 (0.0067) model time 0.2168 (0.2135) loss 3.6478 (4.3412) grad_norm 1.2890 (1.5497) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][140/625] eta 0:01:47 lr 0.001223 wd 0.0500 time 0.2204 (0.2216) data time 0.0007 (0.0063) model time 0.2197 (0.2137) loss 3.9958 (4.3468) grad_norm 1.4795 (1.5410) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][150/625] eta 0:01:45 lr 0.001225 wd 0.0500 time 0.2118 (0.2212) data time 0.0009 (0.0059) model time 0.2110 (0.2137) loss 5.4805 (4.3704) grad_norm 1.1581 (1.5312) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][160/625] eta 0:01:42 lr 0.001226 wd 0.0500 time 0.2227 (0.2210) data time 0.0010 (0.0056) model time 0.2217 (0.2140) loss 4.5812 (4.3691) grad_norm 1.6137 (1.5317) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][170/625] eta 0:01:40 lr 0.001228 wd 0.0500 time 0.2155 (0.2207) data time 0.0009 (0.0054) model time 0.2146 (0.2141) loss 4.7113 (4.3600) grad_norm 1.4996 (1.5214) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][180/625] eta 0:01:38 lr 0.001230 wd 0.0500 time 0.2196 (0.2204) data time 0.0010 (0.0051) model time 0.2185 (0.2141) loss 4.3085 (4.3583) grad_norm 1.2513 (1.5154) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][190/625] eta 0:01:35 lr 0.001231 wd 0.0500 time 0.2112 (0.2201) data time 0.0012 (0.0049) model time 0.2100 (0.2141) loss 3.1154 (4.3483) grad_norm 1.7237 (1.5069) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][200/625] eta 0:01:33 lr 0.001233 wd 0.0500 time 0.2110 (0.2197) data time 0.0009 (0.0047) model time 0.2101 (0.2139) loss 5.2607 (4.3461) grad_norm 1.2772 (1.5011) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][210/625] eta 0:01:31 lr 0.001234 wd 0.0500 time 0.2100 (0.2194) data time 0.0009 (0.0045) model time 0.2091 (0.2138) loss 4.5558 (4.3623) grad_norm 1.4188 (1.5057) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][220/625] eta 0:01:28 lr 0.001236 wd 0.0500 time 0.2150 (0.2192) data time 0.0007 (0.0044) model time 0.2143 (0.2137) loss 4.2599 (4.3773) grad_norm 1.7248 (1.5207) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][230/625] eta 0:01:26 lr 0.001238 wd 0.0500 time 0.2120 (0.2190) data time 0.0008 (0.0043) model time 0.2112 (0.2137) loss 5.4074 (4.3724) grad_norm 1.3815 (1.5191) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][240/625] eta 0:01:24 lr 0.001239 wd 0.0500 time 0.2114 (0.2187) data time 0.0009 (0.0041) model time 0.2105 (0.2136) loss 4.5296 (4.3834) grad_norm 1.2461 (1.5199) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][250/625] eta 0:01:21 lr 0.001241 wd 0.0500 time 0.2124 (0.2185) data time 0.0009 (0.0040) model time 0.2115 (0.2135) loss 3.7700 (4.3839) grad_norm 1.4895 (1.5114) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][260/625] eta 0:01:19 lr 0.001242 wd 0.0500 time 0.2070 (0.2183) data time 0.0008 (0.0039) model time 0.2062 (0.2135) loss 3.0322 (4.3747) grad_norm 2.0847 (1.5139) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][270/625] eta 0:01:17 lr 0.001244 wd 0.0500 time 0.2121 (0.2182) data time 0.0012 (0.0038) model time 0.2108 (0.2135) loss 4.5715 (4.3714) grad_norm 2.1035 (1.5135) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][280/625] eta 0:01:15 lr 0.001246 wd 0.0500 time 0.2339 (0.2181) data time 0.0010 (0.0037) model time 0.2329 (0.2135) loss 3.7476 (4.3743) grad_norm 2.0756 (1.5082) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][290/625] eta 0:01:13 lr 0.001247 wd 0.0500 time 0.2127 (0.2181) data time 0.0009 (0.0036) model time 0.2118 (0.2136) loss 4.9312 (4.3698) grad_norm 1.7148 (1.5029) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][300/625] eta 0:01:10 lr 0.001249 wd 0.0500 time 0.2107 (0.2179) data time 0.0008 (0.0035) model time 0.2098 (0.2136) loss 3.9871 (4.3758) grad_norm 1.6814 (1.5003) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][310/625] eta 0:01:08 lr 0.001250 wd 0.0500 time 0.2201 (0.2178) data time 0.0009 (0.0034) model time 0.2193 (0.2136) loss 5.6057 (4.3884) grad_norm 1.4807 (1.4958) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][320/625] eta 0:01:06 lr 0.001252 wd 0.0500 time 0.2173 (0.2177) data time 0.0010 (0.0034) model time 0.2163 (0.2136) loss 4.5017 (4.3974) grad_norm 1.2215 (1.4972) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][330/625] eta 0:01:04 lr 0.001254 wd 0.0500 time 0.2163 (0.2176) data time 0.0008 (0.0033) model time 0.2155 (0.2136) loss 3.7465 (4.3945) grad_norm 1.9099 (1.5036) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][340/625] eta 0:01:01 lr 0.001255 wd 0.0500 time 0.2155 (0.2174) data time 0.0010 (0.0032) model time 0.2145 (0.2135) loss 4.4471 (4.3961) grad_norm 1.2708 (1.5033) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][350/625] eta 0:00:59 lr 0.001257 wd 0.0500 time 0.2200 (0.2174) data time 0.0010 (0.0032) model time 0.2191 (0.2135) loss 3.5302 (4.3847) grad_norm 3.4342 (1.5135) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][360/625] eta 0:00:57 lr 0.001258 wd 0.0500 time 0.2115 (0.2173) data time 0.0009 (0.0031) model time 0.2106 (0.2135) loss 5.4867 (4.3852) grad_norm 1.4109 (1.5126) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][370/625] eta 0:00:55 lr 0.001260 wd 0.0500 time 0.2141 (0.2178) data time 0.0010 (0.0031) model time 0.2131 (0.2142) loss 4.5059 (4.3798) grad_norm 1.3140 (1.5073) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][380/625] eta 0:00:53 lr 0.001262 wd 0.0500 time 0.2157 (0.2178) data time 0.0009 (0.0030) model time 0.2148 (0.2142) loss 4.6319 (4.3825) grad_norm 1.1575 (1.5009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][390/625] eta 0:00:51 lr 0.001263 wd 0.0500 time 0.2139 (0.2178) data time 0.0011 (0.0030) model time 0.2128 (0.2143) loss 4.7610 (4.3879) grad_norm 1.0843 (1.5029) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][400/625] eta 0:00:49 lr 0.001265 wd 0.0500 time 0.2173 (0.2178) data time 0.0009 (0.0029) model time 0.2164 (0.2144) loss 4.2773 (4.3916) grad_norm 1.7987 (1.5036) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][410/625] eta 0:00:46 lr 0.001266 wd 0.0500 time 0.2109 (0.2177) data time 0.0010 (0.0029) model time 0.2099 (0.2143) loss 4.5182 (4.3943) grad_norm 1.0769 (1.4971) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][420/625] eta 0:00:44 lr 0.001268 wd 0.0500 time 0.2246 (0.2177) data time 0.0008 (0.0028) model time 0.2238 (0.2144) loss 3.3663 (4.3953) grad_norm 1.2414 (1.4959) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][430/625] eta 0:00:42 lr 0.001270 wd 0.0500 time 0.2101 (0.2182) data time 0.0012 (0.0028) model time 0.2089 (0.2150) loss 4.9902 (4.3974) grad_norm 1.4965 (1.5001) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][440/625] eta 0:00:40 lr 0.001271 wd 0.0500 time 0.2126 (0.2181) data time 0.0007 (0.0027) model time 0.2118 (0.2150) loss 4.3291 (4.4034) grad_norm 1.7698 (1.5043) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][450/625] eta 0:00:38 lr 0.001273 wd 0.0500 time 0.2154 (0.2181) data time 0.0009 (0.0027) model time 0.2145 (0.2150) loss 4.5347 (4.3996) grad_norm 1.5249 (1.5076) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][460/625] eta 0:00:35 lr 0.001274 wd 0.0500 time 0.2109 (0.2180) data time 0.0011 (0.0027) model time 0.2099 (0.2150) loss 3.5547 (4.3950) grad_norm 1.7766 (1.5102) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][470/625] eta 0:00:33 lr 0.001276 wd 0.0500 time 0.2143 (0.2180) data time 0.0010 (0.0026) model time 0.2133 (0.2150) loss 4.0243 (4.3961) grad_norm 0.9274 (1.5092) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][480/625] eta 0:00:31 lr 0.001278 wd 0.0500 time 0.2147 (0.2179) data time 0.0012 (0.0026) model time 0.2135 (0.2149) loss 3.5338 (4.3971) grad_norm 1.2934 (1.5070) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][490/625] eta 0:00:29 lr 0.001279 wd 0.0500 time 0.2169 (0.2179) data time 0.0007 (0.0026) model time 0.2162 (0.2149) loss 5.3353 (4.4003) grad_norm 2.7539 (1.5105) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][500/625] eta 0:00:27 lr 0.001281 wd 0.0500 time 0.2150 (0.2179) data time 0.0011 (0.0025) model time 0.2138 (0.2150) loss 4.4075 (4.3974) grad_norm 1.3925 (1.5097) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][510/625] eta 0:00:25 lr 0.001282 wd 0.0500 time 0.2154 (0.2178) data time 0.0011 (0.0025) model time 0.2143 (0.2150) loss 4.5547 (4.3928) grad_norm 1.3861 (1.5089) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][520/625] eta 0:00:22 lr 0.001284 wd 0.0500 time 0.2120 (0.2178) data time 0.0008 (0.0025) model time 0.2113 (0.2150) loss 3.8289 (4.3821) grad_norm 1.0256 (1.5042) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][530/625] eta 0:00:20 lr 0.001286 wd 0.0500 time 0.2113 (0.2177) data time 0.0009 (0.0025) model time 0.2104 (0.2149) loss 4.4427 (4.3767) grad_norm 1.0660 (1.5035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][540/625] eta 0:00:18 lr 0.001287 wd 0.0500 time 0.2080 (0.2177) data time 0.0011 (0.0024) model time 0.2069 (0.2149) loss 4.7296 (4.3793) grad_norm 1.5508 (1.5038) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][550/625] eta 0:00:16 lr 0.001289 wd 0.0500 time 0.2207 (0.2177) data time 0.0009 (0.0024) model time 0.2198 (0.2150) loss 4.5554 (4.3810) grad_norm 1.1822 (1.5009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][560/625] eta 0:00:14 lr 0.001290 wd 0.0500 time 0.2104 (0.2177) data time 0.0009 (0.0024) model time 0.2095 (0.2150) loss 3.3235 (4.3794) grad_norm 1.5910 (1.4975) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][570/625] eta 0:00:11 lr 0.001292 wd 0.0500 time 0.2260 (0.2177) data time 0.0010 (0.0024) model time 0.2250 (0.2150) loss 4.2924 (4.3776) grad_norm 1.3424 (1.4978) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][580/625] eta 0:00:09 lr 0.001294 wd 0.0500 time 0.2188 (0.2177) data time 0.0009 (0.0023) model time 0.2180 (0.2150) loss 4.6569 (4.3736) grad_norm 1.4057 (1.5035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][590/625] eta 0:00:07 lr 0.001295 wd 0.0500 time 0.2154 (0.2176) data time 0.0009 (0.0023) model time 0.2146 (0.2150) loss 4.1227 (4.3704) grad_norm 1.3693 (1.5009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][600/625] eta 0:00:05 lr 0.001297 wd 0.0500 time 0.2142 (0.2176) data time 0.0007 (0.0023) model time 0.2135 (0.2150) loss 3.0075 (4.3616) grad_norm 1.3576 (1.4978) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][610/625] eta 0:00:03 lr 0.001298 wd 0.0500 time 0.2198 (0.2180) data time 0.0007 (0.0023) model time 0.2191 (0.2155) loss 4.7564 (4.3618) grad_norm 1.5005 (1.4963) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][620/625] eta 0:00:01 lr 0.001300 wd 0.0500 time 0.2074 (0.2179) data time 0.0006 (0.0023) model time 0.2068 (0.2154) loss 4.3196 (4.3604) grad_norm 1.5348 (1.4956) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 12 training takes 0:02:16 [2024-07-29 12:55:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:55:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.556 (0.556) Loss 1.2109 (1.2109) Acc@1 74.561 (74.561) Acc@5 92.676 (92.676) Mem 8975MB [2024-07-29 12:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.107) Loss 2.0703 (1.4544) Acc@1 55.713 (68.235) Acc@5 82.080 (90.137) Mem 8975MB [2024-07-29 12:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 2.2031 (1.7759) Acc@1 53.955 (62.379) Acc@5 78.516 (85.224) Mem 8975MB [2024-07-29 12:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 62.174 Acc@5 85.185 [2024-07-29 12:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 62.2% [2024-07-29 12:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 62.17% [2024-07-29 12:55:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:55:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 6.9688 (6.9688) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 12:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 6.9961 (6.9535) Acc@1 0.000 (0.000) Acc@5 0.000 (0.209) Mem 8975MB [2024-07-29 12:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 7.0117 (6.9933) Acc@1 0.000 (0.021) Acc@5 0.000 (0.514) Mem 8975MB [2024-07-29 12:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.116 Acc@5 0.542 [2024-07-29 12:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.12% [2024-07-29 12:55:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 12:55:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 12:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][0/625] eta 0:06:11 lr 0.001301 wd 0.0500 time 0.5943 (0.5943) data time 0.3925 (0.3925) model time 0.0000 (0.0000) loss 4.6609 (4.6609) grad_norm 1.1504 (1.1504) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][10/625] eta 0:02:32 lr 0.001302 wd 0.0500 time 0.2136 (0.2475) data time 0.0010 (0.0368) model time 0.0000 (0.0000) loss 4.7995 (4.6518) grad_norm 1.2413 (1.2246) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][20/625] eta 0:02:21 lr 0.001304 wd 0.0500 time 0.2141 (0.2331) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 3.5930 (4.5220) grad_norm 1.5076 (1.3295) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][30/625] eta 0:02:14 lr 0.001305 wd 0.0500 time 0.2172 (0.2267) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 4.2201 (4.4200) grad_norm 1.7884 (1.5097) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][40/625] eta 0:02:10 lr 0.001307 wd 0.0500 time 0.2251 (0.2238) data time 0.0010 (0.0107) model time 0.0000 (0.0000) loss 4.6717 (4.4413) grad_norm 1.5673 (1.4949) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][50/625] eta 0:02:08 lr 0.001309 wd 0.0500 time 0.2131 (0.2239) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 3.1492 (4.3644) grad_norm 1.2037 (1.4779) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][60/625] eta 0:02:06 lr 0.001310 wd 0.0500 time 0.2145 (0.2244) data time 0.0011 (0.0079) model time 0.2134 (0.2234) loss 4.0692 (4.2636) grad_norm 1.0343 (1.4508) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][70/625] eta 0:02:03 lr 0.001312 wd 0.0500 time 0.2124 (0.2228) data time 0.0008 (0.0069) model time 0.2117 (0.2179) loss 4.6808 (4.2114) grad_norm 1.1655 (1.4528) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][80/625] eta 0:02:00 lr 0.001313 wd 0.0500 time 0.2145 (0.2219) data time 0.0007 (0.0062) model time 0.2138 (0.2166) loss 5.5769 (4.2129) grad_norm 1.8602 (1.4646) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][90/625] eta 0:01:58 lr 0.001315 wd 0.0500 time 0.2157 (0.2211) data time 0.0008 (0.0056) model time 0.2149 (0.2159) loss 3.6824 (4.2016) grad_norm 1.4642 (1.4935) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][100/625] eta 0:01:55 lr 0.001317 wd 0.0500 time 0.2102 (0.2204) data time 0.0010 (0.0052) model time 0.2092 (0.2153) loss 3.9284 (4.2266) grad_norm 2.2896 (1.5111) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][110/625] eta 0:01:53 lr 0.001318 wd 0.0500 time 0.2086 (0.2200) data time 0.0010 (0.0048) model time 0.2076 (0.2153) loss 3.8580 (4.2709) grad_norm 1.7493 (1.5145) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][120/625] eta 0:01:50 lr 0.001320 wd 0.0500 time 0.2149 (0.2196) data time 0.0010 (0.0045) model time 0.2139 (0.2150) loss 4.5130 (4.2636) grad_norm 1.4486 (1.5316) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][130/625] eta 0:01:48 lr 0.001321 wd 0.0500 time 0.2212 (0.2192) data time 0.0011 (0.0043) model time 0.2201 (0.2148) loss 4.5156 (4.2481) grad_norm 1.6602 (1.5199) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][140/625] eta 0:01:46 lr 0.001323 wd 0.0500 time 0.2137 (0.2190) data time 0.0010 (0.0040) model time 0.2127 (0.2150) loss 4.3224 (4.2664) grad_norm 1.4504 (1.5113) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:55:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][150/625] eta 0:01:44 lr 0.001325 wd 0.0500 time 0.2113 (0.2193) data time 0.0008 (0.0038) model time 0.2104 (0.2156) loss 4.7557 (4.2582) grad_norm 2.2256 (1.5340) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][160/625] eta 0:01:42 lr 0.001326 wd 0.0500 time 0.2130 (0.2195) data time 0.0009 (0.0038) model time 0.2121 (0.2159) loss 3.4837 (4.2343) grad_norm 2.6864 (1.5727) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][170/625] eta 0:01:39 lr 0.001328 wd 0.0500 time 0.2269 (0.2194) data time 0.0007 (0.0037) model time 0.2262 (0.2159) loss 4.8821 (4.2470) grad_norm 1.5443 (1.5783) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][180/625] eta 0:01:37 lr 0.001329 wd 0.0500 time 0.2169 (0.2191) data time 0.0011 (0.0035) model time 0.2157 (0.2158) loss 4.8121 (4.2457) grad_norm 1.3643 (1.5531) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][190/625] eta 0:01:35 lr 0.001331 wd 0.0500 time 0.2104 (0.2188) data time 0.0010 (0.0034) model time 0.2094 (0.2154) loss 3.5354 (4.2506) grad_norm 1.8131 (1.5490) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][200/625] eta 0:01:32 lr 0.001333 wd 0.0500 time 0.2196 (0.2186) data time 0.0011 (0.0033) model time 0.2185 (0.2154) loss 4.3210 (4.2495) grad_norm 1.5332 (1.5509) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][210/625] eta 0:01:30 lr 0.001334 wd 0.0500 time 0.2135 (0.2185) data time 0.0009 (0.0032) model time 0.2126 (0.2153) loss 4.5482 (4.2579) grad_norm 1.2516 (1.5514) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][220/625] eta 0:01:28 lr 0.001336 wd 0.0500 time 0.2142 (0.2196) data time 0.0007 (0.0031) model time 0.2134 (0.2168) loss 4.4427 (4.2532) grad_norm 1.5681 (1.5478) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][230/625] eta 0:01:26 lr 0.001337 wd 0.0500 time 0.2269 (0.2195) data time 0.0009 (0.0030) model time 0.2260 (0.2168) loss 4.5419 (4.2561) grad_norm 1.2993 (1.5371) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][240/625] eta 0:01:24 lr 0.001339 wd 0.0500 time 0.2123 (0.2193) data time 0.0011 (0.0029) model time 0.2113 (0.2166) loss 4.4936 (4.2713) grad_norm 1.0180 (1.5323) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][250/625] eta 0:01:22 lr 0.001341 wd 0.0500 time 0.2200 (0.2191) data time 0.0011 (0.0029) model time 0.2190 (0.2165) loss 4.2947 (4.2613) grad_norm 1.9022 (1.5287) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][260/625] eta 0:01:19 lr 0.001342 wd 0.0500 time 0.2173 (0.2189) data time 0.0010 (0.0028) model time 0.2163 (0.2164) loss 5.1567 (4.2567) grad_norm 1.5777 (1.5395) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][270/625] eta 0:01:17 lr 0.001344 wd 0.0500 time 0.2121 (0.2187) data time 0.0007 (0.0028) model time 0.2113 (0.2162) loss 3.7406 (4.2540) grad_norm 1.7513 (1.5402) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][280/625] eta 0:01:15 lr 0.001345 wd 0.0500 time 0.2145 (0.2186) data time 0.0009 (0.0027) model time 0.2136 (0.2160) loss 4.9967 (4.2477) grad_norm 1.3159 (1.5334) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:56:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][290/625] eta 0:01:13 lr 0.001347 wd 0.0500 time 0.2042 (0.2184) data time 0.0007 (0.0027) model time 0.2035 (0.2158) loss 3.1922 (4.2518) grad_norm inf (inf) loss_scale 16384.0000 (16609.2096) mem 8975MB [2024-07-29 12:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][300/625] eta 0:01:11 lr 0.001349 wd 0.0500 time 0.2162 (0.2186) data time 0.0008 (0.0026) model time 0.2154 (0.2161) loss 5.2729 (4.2528) grad_norm 1.1994 (inf) loss_scale 16384.0000 (16601.7276) mem 8975MB [2024-07-29 12:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][310/625] eta 0:01:08 lr 0.001350 wd 0.0500 time 0.2221 (0.2187) data time 0.0008 (0.0026) model time 0.2213 (0.2164) loss 4.6780 (4.2523) grad_norm 1.4236 (inf) loss_scale 16384.0000 (16594.7267) mem 8975MB [2024-07-29 12:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][320/625] eta 0:01:06 lr 0.001352 wd 0.0500 time 0.2225 (0.2186) data time 0.0009 (0.0025) model time 0.2216 (0.2163) loss 3.7590 (4.2515) grad_norm 1.1146 (inf) loss_scale 16384.0000 (16588.1620) mem 8975MB [2024-07-29 12:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][330/625] eta 0:01:04 lr 0.001353 wd 0.0500 time 0.2211 (0.2186) data time 0.0011 (0.0025) model time 0.2200 (0.2163) loss 3.5407 (4.2356) grad_norm 1.5897 (inf) loss_scale 16384.0000 (16581.9940) mem 8975MB [2024-07-29 12:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][340/625] eta 0:01:02 lr 0.001355 wd 0.0500 time 0.2144 (0.2185) data time 0.0009 (0.0024) model time 0.2134 (0.2163) loss 4.1045 (4.2316) grad_norm 1.0735 (inf) loss_scale 16384.0000 (16576.1877) mem 8975MB [2024-07-29 12:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][350/625] eta 0:01:00 lr 0.001357 wd 0.0500 time 0.2136 (0.2185) data time 0.0009 (0.0024) model time 0.2127 (0.2162) loss 4.0516 (4.2319) grad_norm 1.3261 (inf) loss_scale 16384.0000 (16570.7123) mem 8975MB [2024-07-29 12:56:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][360/625] eta 0:00:57 lr 0.001358 wd 0.0500 time 0.2126 (0.2184) data time 0.0008 (0.0024) model time 0.2118 (0.2162) loss 2.4343 (4.2227) grad_norm 2.4999 (inf) loss_scale 16384.0000 (16565.5402) mem 8975MB [2024-07-29 12:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][370/625] eta 0:00:55 lr 0.001360 wd 0.0500 time 0.2133 (0.2182) data time 0.0014 (0.0023) model time 0.2119 (0.2160) loss 3.1496 (4.2226) grad_norm 1.2855 (inf) loss_scale 16384.0000 (16560.6469) mem 8975MB [2024-07-29 12:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][380/625] eta 0:00:53 lr 0.001361 wd 0.0500 time 0.2260 (0.2182) data time 0.0011 (0.0023) model time 0.2249 (0.2160) loss 4.0814 (4.2179) grad_norm 1.4375 (inf) loss_scale 16384.0000 (16556.0105) mem 8975MB [2024-07-29 12:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][390/625] eta 0:00:51 lr 0.001363 wd 0.0500 time 0.2184 (0.2181) data time 0.0008 (0.0023) model time 0.2177 (0.2159) loss 4.5474 (4.2159) grad_norm 1.6237 (inf) loss_scale 16384.0000 (16551.6113) mem 8975MB [2024-07-29 12:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][400/625] eta 0:00:49 lr 0.001365 wd 0.0500 time 0.2147 (0.2180) data time 0.0011 (0.0022) model time 0.2136 (0.2159) loss 3.6585 (4.2135) grad_norm 1.5924 (inf) loss_scale 16384.0000 (16547.4314) mem 8975MB [2024-07-29 12:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][410/625] eta 0:00:46 lr 0.001366 wd 0.0500 time 0.2191 (0.2179) data time 0.0010 (0.0022) model time 0.2182 (0.2158) loss 4.3172 (4.2135) grad_norm 1.2450 (inf) loss_scale 16384.0000 (16543.4550) mem 8975MB [2024-07-29 12:56:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][420/625] eta 0:00:44 lr 0.001368 wd 0.0500 time 0.2245 (0.2179) data time 0.0010 (0.0022) model time 0.2235 (0.2158) loss 3.3447 (4.2211) grad_norm 1.6372 (inf) loss_scale 16384.0000 (16539.6675) mem 8975MB [2024-07-29 12:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][430/625] eta 0:00:42 lr 0.001369 wd 0.0500 time 0.2166 (0.2178) data time 0.0012 (0.0022) model time 0.2155 (0.2158) loss 4.8006 (4.2218) grad_norm 2.3486 (inf) loss_scale 16384.0000 (16536.0557) mem 8975MB [2024-07-29 12:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][440/625] eta 0:00:40 lr 0.001371 wd 0.0500 time 0.2390 (0.2178) data time 0.0009 (0.0021) model time 0.2380 (0.2158) loss 4.6663 (4.2268) grad_norm 1.0009 (inf) loss_scale 16384.0000 (16532.6077) mem 8975MB [2024-07-29 12:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][450/625] eta 0:00:38 lr 0.001373 wd 0.0500 time 0.2113 (0.2178) data time 0.0009 (0.0021) model time 0.2104 (0.2158) loss 4.5316 (4.2266) grad_norm 1.2280 (inf) loss_scale 16384.0000 (16529.3126) mem 8975MB [2024-07-29 12:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][460/625] eta 0:00:35 lr 0.001374 wd 0.0500 time 0.2173 (0.2178) data time 0.0008 (0.0021) model time 0.2165 (0.2158) loss 4.7799 (4.2252) grad_norm 1.2598 (inf) loss_scale 16384.0000 (16526.1605) mem 8975MB [2024-07-29 12:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][470/625] eta 0:00:33 lr 0.001376 wd 0.0500 time 0.2157 (0.2178) data time 0.0010 (0.0021) model time 0.2148 (0.2158) loss 3.9332 (4.2153) grad_norm 1.7808 (inf) loss_scale 16384.0000 (16523.1423) mem 8975MB [2024-07-29 12:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][480/625] eta 0:00:31 lr 0.001377 wd 0.0500 time 0.2074 (0.2181) data time 0.0009 (0.0021) model time 0.2065 (0.2161) loss 4.5833 (4.2236) grad_norm 1.1162 (inf) loss_scale 16384.0000 (16520.2495) mem 8975MB [2024-07-29 12:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][490/625] eta 0:00:29 lr 0.001379 wd 0.0500 time 0.2151 (0.2180) data time 0.0009 (0.0020) model time 0.2142 (0.2161) loss 4.6662 (4.2190) grad_norm 2.1806 (inf) loss_scale 16384.0000 (16517.4745) mem 8975MB [2024-07-29 12:57:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][500/625] eta 0:00:27 lr 0.001381 wd 0.0500 time 0.2152 (0.2180) data time 0.0007 (0.0020) model time 0.2145 (0.2161) loss 5.0500 (4.2264) grad_norm 1.3830 (inf) loss_scale 16384.0000 (16514.8104) mem 8975MB [2024-07-29 12:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][510/625] eta 0:00:25 lr 0.001382 wd 0.0500 time 0.2133 (0.2180) data time 0.0010 (0.0020) model time 0.2123 (0.2161) loss 4.5551 (4.2223) grad_norm 1.7355 (inf) loss_scale 16384.0000 (16512.2505) mem 8975MB [2024-07-29 12:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][520/625] eta 0:00:22 lr 0.001384 wd 0.0500 time 0.2205 (0.2180) data time 0.0011 (0.0020) model time 0.2194 (0.2161) loss 3.7821 (4.2242) grad_norm 1.4799 (inf) loss_scale 16384.0000 (16509.7889) mem 8975MB [2024-07-29 12:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][530/625] eta 0:00:20 lr 0.001385 wd 0.0500 time 0.2098 (0.2179) data time 0.0011 (0.0020) model time 0.2087 (0.2160) loss 4.2201 (4.2261) grad_norm 1.4890 (inf) loss_scale 16384.0000 (16507.4200) mem 8975MB [2024-07-29 12:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][540/625] eta 0:00:18 lr 0.001387 wd 0.0500 time 0.2154 (0.2179) data time 0.0011 (0.0019) model time 0.2143 (0.2161) loss 4.8222 (4.2314) grad_norm 1.1598 (inf) loss_scale 16384.0000 (16505.1386) mem 8975MB [2024-07-29 12:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][550/625] eta 0:00:16 lr 0.001389 wd 0.0500 time 0.2219 (0.2180) data time 0.0010 (0.0019) model time 0.2210 (0.2161) loss 3.8199 (4.2327) grad_norm 1.6843 (inf) loss_scale 16384.0000 (16502.9401) mem 8975MB [2024-07-29 12:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][560/625] eta 0:00:14 lr 0.001390 wd 0.0500 time 0.2199 (0.2179) data time 0.0009 (0.0019) model time 0.2190 (0.2161) loss 3.9486 (4.2324) grad_norm 1.4178 (inf) loss_scale 16384.0000 (16500.8200) mem 8975MB [2024-07-29 12:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][570/625] eta 0:00:11 lr 0.001392 wd 0.0500 time 0.2123 (0.2179) data time 0.0008 (0.0019) model time 0.2115 (0.2161) loss 5.1042 (4.2377) grad_norm 1.3419 (inf) loss_scale 16384.0000 (16498.7741) mem 8975MB [2024-07-29 12:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][580/625] eta 0:00:09 lr 0.001393 wd 0.0500 time 0.2173 (0.2178) data time 0.0008 (0.0019) model time 0.2165 (0.2160) loss 3.0902 (4.2369) grad_norm 1.7253 (inf) loss_scale 16384.0000 (16496.7986) mem 8975MB [2024-07-29 12:57:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][590/625] eta 0:00:07 lr 0.001395 wd 0.0500 time 0.2139 (0.2178) data time 0.0010 (0.0019) model time 0.2129 (0.2160) loss 3.8042 (4.2366) grad_norm 1.6202 (inf) loss_scale 16384.0000 (16494.8900) mem 8975MB [2024-07-29 12:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][600/625] eta 0:00:05 lr 0.001397 wd 0.0500 time 0.2136 (0.2177) data time 0.0007 (0.0019) model time 0.2128 (0.2159) loss 4.8561 (4.2401) grad_norm 1.2729 (inf) loss_scale 16384.0000 (16493.0449) mem 8975MB [2024-07-29 12:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][610/625] eta 0:00:03 lr 0.001398 wd 0.0500 time 0.2154 (0.2177) data time 0.0007 (0.0019) model time 0.2146 (0.2159) loss 4.0714 (4.2323) grad_norm 1.5035 (inf) loss_scale 16384.0000 (16491.2602) mem 8975MB [2024-07-29 12:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][620/625] eta 0:00:01 lr 0.001400 wd 0.0500 time 0.2092 (0.2176) data time 0.0007 (0.0018) model time 0.2085 (0.2158) loss 3.7036 (4.2325) grad_norm 0.9335 (inf) loss_scale 16384.0000 (16489.5330) mem 8975MB [2024-07-29 12:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 13 training takes 0:02:15 [2024-07-29 12:57:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 12:57:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 12:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.533 (0.533) Loss 1.1914 (1.1914) Acc@1 75.098 (75.098) Acc@5 92.676 (92.676) Mem 8975MB [2024-07-29 12:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.9863 (1.4291) Acc@1 56.592 (68.550) Acc@5 82.373 (90.252) Mem 8975MB [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 2.2168 (1.7175) Acc@1 52.637 (62.951) Acc@5 78.369 (85.531) Mem 8975MB [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 62.866 Acc@5 85.515 [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 62.9% [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 62.87% [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 12:57:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 12:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.549 (0.549) Loss 6.9688 (6.9688) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.104) Loss 6.9883 (6.9751) Acc@1 0.000 (0.009) Acc@5 0.000 (0.222) Mem 8975MB [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 7.0039 (6.9933) Acc@1 0.000 (0.058) Acc@5 0.000 (0.430) Mem 8975MB [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.146 Acc@5 0.470 [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.15% [2024-07-29 12:57:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 12:57:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 12:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][0/625] eta 0:07:49 lr 0.001401 wd 0.0500 time 0.7511 (0.7511) data time 0.5507 (0.5507) model time 0.0000 (0.0000) loss 4.0330 (4.0330) grad_norm 2.5055 (2.5055) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][10/625] eta 0:02:42 lr 0.001402 wd 0.0500 time 0.2117 (0.2640) data time 0.0008 (0.0510) model time 0.0000 (0.0000) loss 4.2783 (3.9263) grad_norm 1.1983 (1.5266) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:57:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][20/625] eta 0:02:25 lr 0.001404 wd 0.0500 time 0.2237 (0.2407) data time 0.0007 (0.0272) model time 0.0000 (0.0000) loss 4.3408 (3.9685) grad_norm 1.2811 (1.4222) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][30/625] eta 0:02:19 lr 0.001405 wd 0.0500 time 0.2188 (0.2342) data time 0.0008 (0.0188) model time 0.0000 (0.0000) loss 3.4794 (4.0126) grad_norm 1.1738 (1.3623) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][40/625] eta 0:02:14 lr 0.001407 wd 0.0500 time 0.2171 (0.2299) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 2.9199 (4.0586) grad_norm 1.6060 (1.3568) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][50/625] eta 0:02:10 lr 0.001409 wd 0.0500 time 0.2122 (0.2267) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 4.6991 (4.1269) grad_norm 1.2774 (1.3352) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][60/625] eta 0:02:09 lr 0.001410 wd 0.0500 time 0.2202 (0.2284) data time 0.0011 (0.0101) model time 0.2191 (0.2356) loss 4.4430 (4.1493) grad_norm 1.5694 (1.3413) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][70/625] eta 0:02:05 lr 0.001412 wd 0.0500 time 0.2157 (0.2266) data time 0.0011 (0.0088) model time 0.2146 (0.2252) loss 3.9888 (4.1399) grad_norm 1.1662 (1.3553) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][80/625] eta 0:02:02 lr 0.001413 wd 0.0500 time 0.2053 (0.2252) data time 0.0011 (0.0079) model time 0.2042 (0.2214) loss 4.2926 (4.1521) grad_norm 1.4485 (1.3923) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][90/625] eta 0:02:00 lr 0.001415 wd 0.0500 time 0.2171 (0.2243) data time 0.0012 (0.0072) model time 0.2160 (0.2201) loss 4.3885 (4.1453) grad_norm 1.2041 (1.4059) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][100/625] eta 0:01:57 lr 0.001417 wd 0.0500 time 0.2209 (0.2235) data time 0.0008 (0.0066) model time 0.2201 (0.2191) loss 5.4416 (4.1761) grad_norm 1.4070 (1.3975) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][110/625] eta 0:01:54 lr 0.001418 wd 0.0500 time 0.2129 (0.2227) data time 0.0008 (0.0061) model time 0.2121 (0.2181) loss 3.5121 (4.1968) grad_norm 2.2909 (1.4011) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][120/625] eta 0:01:52 lr 0.001420 wd 0.0500 time 0.2218 (0.2222) data time 0.0009 (0.0057) model time 0.2209 (0.2178) loss 3.0605 (4.2018) grad_norm 1.2623 (1.4031) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][130/625] eta 0:01:49 lr 0.001421 wd 0.0500 time 0.2190 (0.2218) data time 0.0010 (0.0053) model time 0.2180 (0.2175) loss 4.4439 (4.2107) grad_norm 1.1237 (1.3967) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][140/625] eta 0:01:47 lr 0.001423 wd 0.0500 time 0.2180 (0.2214) data time 0.0007 (0.0050) model time 0.2172 (0.2172) loss 5.0216 (4.2086) grad_norm 1.2744 (1.3961) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][150/625] eta 0:01:44 lr 0.001425 wd 0.0500 time 0.2272 (0.2210) data time 0.0007 (0.0047) model time 0.2265 (0.2170) loss 3.7133 (4.2016) grad_norm 1.3934 (1.4116) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][160/625] eta 0:01:42 lr 0.001426 wd 0.0500 time 0.2089 (0.2206) data time 0.0011 (0.0045) model time 0.2078 (0.2167) loss 3.3239 (4.1971) grad_norm 1.0308 (1.3979) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][170/625] eta 0:01:40 lr 0.001428 wd 0.0500 time 0.2092 (0.2202) data time 0.0008 (0.0043) model time 0.2084 (0.2162) loss 3.0473 (4.1988) grad_norm 1.1922 (1.3941) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][180/625] eta 0:01:37 lr 0.001429 wd 0.0500 time 0.2195 (0.2197) data time 0.0010 (0.0041) model time 0.2185 (0.2159) loss 3.6665 (4.1901) grad_norm 1.2037 (1.3947) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][190/625] eta 0:01:35 lr 0.001431 wd 0.0500 time 0.2155 (0.2194) data time 0.0010 (0.0040) model time 0.2145 (0.2156) loss 4.6163 (4.2184) grad_norm 1.9737 (1.4036) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][200/625] eta 0:01:33 lr 0.001433 wd 0.0500 time 0.2102 (0.2191) data time 0.0011 (0.0038) model time 0.2091 (0.2154) loss 3.8522 (4.1956) grad_norm 1.1022 (1.4146) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][210/625] eta 0:01:30 lr 0.001434 wd 0.0500 time 0.2202 (0.2190) data time 0.0008 (0.0037) model time 0.2194 (0.2155) loss 4.9607 (4.2022) grad_norm 2.1665 (1.4292) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][220/625] eta 0:01:28 lr 0.001436 wd 0.0500 time 0.2157 (0.2188) data time 0.0011 (0.0036) model time 0.2147 (0.2153) loss 4.4564 (4.2026) grad_norm 1.2839 (1.4415) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][230/625] eta 0:01:26 lr 0.001437 wd 0.0500 time 0.2126 (0.2196) data time 0.0010 (0.0035) model time 0.2116 (0.2164) loss 4.4097 (4.2087) grad_norm 2.0670 (1.4516) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][240/625] eta 0:01:24 lr 0.001439 wd 0.0500 time 0.2104 (0.2193) data time 0.0010 (0.0034) model time 0.2093 (0.2162) loss 3.8282 (4.2267) grad_norm 1.1265 (1.4549) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][250/625] eta 0:01:22 lr 0.001441 wd 0.0500 time 0.2134 (0.2191) data time 0.0011 (0.0033) model time 0.2123 (0.2160) loss 4.3667 (4.2098) grad_norm 1.2791 (1.4552) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][260/625] eta 0:01:19 lr 0.001442 wd 0.0500 time 0.2172 (0.2189) data time 0.0010 (0.0032) model time 0.2162 (0.2159) loss 3.8265 (4.2089) grad_norm 1.9373 (1.4587) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][270/625] eta 0:01:17 lr 0.001444 wd 0.0500 time 0.2183 (0.2188) data time 0.0010 (0.0031) model time 0.2173 (0.2158) loss 4.4543 (4.2161) grad_norm 1.0925 (1.4594) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][280/625] eta 0:01:15 lr 0.001445 wd 0.0500 time 0.2097 (0.2186) data time 0.0008 (0.0031) model time 0.2089 (0.2157) loss 3.6629 (4.2080) grad_norm 1.0825 (1.4603) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][290/625] eta 0:01:13 lr 0.001447 wd 0.0500 time 0.2141 (0.2184) data time 0.0009 (0.0030) model time 0.2132 (0.2155) loss 4.4822 (4.2019) grad_norm 1.3839 (1.4595) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][300/625] eta 0:01:10 lr 0.001449 wd 0.0500 time 0.2166 (0.2182) data time 0.0010 (0.0029) model time 0.2156 (0.2154) loss 4.3582 (4.2076) grad_norm 1.2575 (1.4532) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][310/625] eta 0:01:08 lr 0.001450 wd 0.0500 time 0.2157 (0.2181) data time 0.0010 (0.0029) model time 0.2147 (0.2153) loss 4.1076 (4.2115) grad_norm 0.9137 (1.4504) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][320/625] eta 0:01:06 lr 0.001452 wd 0.0500 time 0.2162 (0.2180) data time 0.0007 (0.0028) model time 0.2155 (0.2153) loss 3.8077 (4.2129) grad_norm 1.6900 (1.4491) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][330/625] eta 0:01:04 lr 0.001453 wd 0.0500 time 0.2117 (0.2179) data time 0.0011 (0.0028) model time 0.2105 (0.2152) loss 4.1876 (4.2128) grad_norm 1.7876 (1.4488) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][340/625] eta 0:01:02 lr 0.001455 wd 0.0500 time 0.2092 (0.2178) data time 0.0008 (0.0027) model time 0.2084 (0.2151) loss 4.7200 (4.2219) grad_norm 1.3567 (1.4602) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][350/625] eta 0:00:59 lr 0.001457 wd 0.0500 time 0.2136 (0.2177) data time 0.0010 (0.0027) model time 0.2126 (0.2151) loss 4.9575 (4.2306) grad_norm 1.2336 (1.4550) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][360/625] eta 0:00:57 lr 0.001458 wd 0.0500 time 0.2119 (0.2177) data time 0.0011 (0.0026) model time 0.2108 (0.2150) loss 4.2663 (4.2273) grad_norm 1.0528 (1.4514) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][370/625] eta 0:00:55 lr 0.001460 wd 0.0500 time 0.2196 (0.2176) data time 0.0010 (0.0026) model time 0.2187 (0.2150) loss 4.4577 (4.2303) grad_norm 1.0343 (1.4473) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][380/625] eta 0:00:53 lr 0.001461 wd 0.0500 time 0.2199 (0.2175) data time 0.0010 (0.0026) model time 0.2190 (0.2149) loss 4.5802 (4.2233) grad_norm 1.7948 (1.4535) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][390/625] eta 0:00:51 lr 0.001463 wd 0.0500 time 0.2168 (0.2174) data time 0.0007 (0.0025) model time 0.2162 (0.2149) loss 3.0301 (4.2233) grad_norm 1.5721 (1.4562) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][400/625] eta 0:00:48 lr 0.001465 wd 0.0500 time 0.2188 (0.2174) data time 0.0009 (0.0025) model time 0.2179 (0.2149) loss 4.4779 (4.2235) grad_norm 1.9647 (1.4539) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][410/625] eta 0:00:46 lr 0.001466 wd 0.0500 time 0.2054 (0.2173) data time 0.0012 (0.0025) model time 0.2042 (0.2148) loss 4.5169 (4.2246) grad_norm 1.3634 (1.4525) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][420/625] eta 0:00:44 lr 0.001468 wd 0.0500 time 0.2119 (0.2172) data time 0.0011 (0.0024) model time 0.2108 (0.2147) loss 4.7394 (4.2304) grad_norm 2.6531 (1.4626) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][430/625] eta 0:00:42 lr 0.001469 wd 0.0500 time 0.2130 (0.2172) data time 0.0011 (0.0024) model time 0.2118 (0.2148) loss 4.5786 (4.2357) grad_norm 1.1119 (1.4603) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][440/625] eta 0:00:40 lr 0.001471 wd 0.0500 time 0.2173 (0.2172) data time 0.0011 (0.0024) model time 0.2162 (0.2148) loss 4.5672 (4.2305) grad_norm 1.0001 (1.4677) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][450/625] eta 0:00:37 lr 0.001473 wd 0.0500 time 0.2106 (0.2171) data time 0.0009 (0.0023) model time 0.2097 (0.2148) loss 3.7955 (4.2300) grad_norm 1.5045 (1.4740) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][460/625] eta 0:00:35 lr 0.001474 wd 0.0500 time 0.2259 (0.2171) data time 0.0011 (0.0023) model time 0.2248 (0.2148) loss 4.7843 (4.2325) grad_norm 1.2343 (1.4707) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][470/625] eta 0:00:33 lr 0.001476 wd 0.0500 time 0.2126 (0.2170) data time 0.0010 (0.0023) model time 0.2115 (0.2147) loss 4.3060 (4.2341) grad_norm 1.3478 (1.4670) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][480/625] eta 0:00:31 lr 0.001477 wd 0.0500 time 0.2248 (0.2170) data time 0.0010 (0.0023) model time 0.2238 (0.2147) loss 4.8120 (4.2372) grad_norm 1.2916 (1.4648) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][490/625] eta 0:00:29 lr 0.001479 wd 0.0500 time 0.2083 (0.2169) data time 0.0009 (0.0022) model time 0.2074 (0.2147) loss 4.0950 (4.2466) grad_norm 2.4087 (1.4674) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][500/625] eta 0:00:27 lr 0.001481 wd 0.0500 time 0.2126 (0.2168) data time 0.0008 (0.0022) model time 0.2118 (0.2146) loss 4.5298 (4.2478) grad_norm 2.2675 (1.4691) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][510/625] eta 0:00:24 lr 0.001482 wd 0.0500 time 0.2089 (0.2168) data time 0.0011 (0.0022) model time 0.2077 (0.2145) loss 4.6135 (4.2507) grad_norm 1.1599 (1.4687) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][520/625] eta 0:00:22 lr 0.001484 wd 0.0500 time 0.2187 (0.2167) data time 0.0011 (0.0022) model time 0.2176 (0.2145) loss 3.1055 (4.2458) grad_norm 1.7085 (1.4675) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][530/625] eta 0:00:20 lr 0.001485 wd 0.0500 time 0.2147 (0.2166) data time 0.0010 (0.0021) model time 0.2137 (0.2145) loss 4.5032 (4.2457) grad_norm 1.6726 (1.4721) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][540/625] eta 0:00:18 lr 0.001487 wd 0.0500 time 0.2138 (0.2166) data time 0.0007 (0.0021) model time 0.2131 (0.2145) loss 4.0367 (4.2404) grad_norm 1.0111 (1.4715) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][550/625] eta 0:00:16 lr 0.001489 wd 0.0500 time 0.2150 (0.2166) data time 0.0013 (0.0021) model time 0.2137 (0.2145) loss 4.4831 (4.2431) grad_norm 1.9107 (1.4735) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][560/625] eta 0:00:14 lr 0.001490 wd 0.0500 time 0.2138 (0.2166) data time 0.0011 (0.0021) model time 0.2128 (0.2145) loss 3.7934 (4.2411) grad_norm 1.4074 (1.4710) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][570/625] eta 0:00:11 lr 0.001492 wd 0.0500 time 0.2088 (0.2166) data time 0.0009 (0.0021) model time 0.2079 (0.2145) loss 4.2856 (4.2347) grad_norm 1.6853 (1.4688) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][580/625] eta 0:00:09 lr 0.001493 wd 0.0500 time 0.2168 (0.2166) data time 0.0008 (0.0021) model time 0.2160 (0.2145) loss 4.9026 (4.2314) grad_norm 0.8986 (1.4659) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][590/625] eta 0:00:07 lr 0.001495 wd 0.0500 time 0.2127 (0.2165) data time 0.0010 (0.0021) model time 0.2117 (0.2145) loss 4.3179 (4.2319) grad_norm 1.1660 (1.4617) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][600/625] eta 0:00:05 lr 0.001497 wd 0.0500 time 0.2150 (0.2165) data time 0.0011 (0.0020) model time 0.2140 (0.2145) loss 3.6235 (4.2318) grad_norm 1.0663 (1.4596) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 12:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][610/625] eta 0:00:03 lr 0.001498 wd 0.0500 time 0.2097 (0.2165) data time 0.0005 (0.0020) model time 0.2092 (0.2144) loss 4.9261 (4.2289) grad_norm 1.0868 (1.4565) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][620/625] eta 0:00:01 lr 0.001500 wd 0.0500 time 0.2124 (0.2168) data time 0.0007 (0.0020) model time 0.2117 (0.2147) loss 3.9339 (4.2280) grad_norm 1.1651 (1.4617) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 14 training takes 0:02:15 [2024-07-29 13:00:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:00:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.533 (0.533) Loss 1.1943 (1.1943) Acc@1 75.195 (75.195) Acc@5 92.822 (92.822) Mem 8975MB [2024-07-29 13:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 2.0176 (1.4094) Acc@1 56.299 (69.154) Acc@5 82.129 (90.785) Mem 8975MB [2024-07-29 13:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 2.0312 (1.6892) Acc@1 57.422 (64.014) Acc@5 81.494 (86.598) Mem 8975MB [2024-07-29 13:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 63.806 Acc@5 86.546 [2024-07-29 13:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 63.8% [2024-07-29 13:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 63.81% [2024-07-29 13:00:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:00:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:00:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 6.9531 (6.9531) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 8975MB [2024-07-29 13:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 6.9570 (6.9957) Acc@1 0.000 (0.098) Acc@5 0.000 (0.222) Mem 8975MB [2024-07-29 13:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 6.9922 (6.9911) Acc@1 0.000 (0.130) Acc@5 0.000 (0.500) Mem 8975MB [2024-07-29 13:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.180 Acc@5 0.528 [2024-07-29 13:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-07-29 13:00:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.18% [2024-07-29 13:00:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:00:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][0/625] eta 0:06:48 lr 0.001501 wd 0.0500 time 0.6539 (0.6539) data time 0.4450 (0.4450) model time 0.0000 (0.0000) loss 2.9421 (2.9421) grad_norm 1.0356 (1.0356) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][10/625] eta 0:02:34 lr 0.001502 wd 0.0500 time 0.2132 (0.2516) data time 0.0010 (0.0416) model time 0.0000 (0.0000) loss 4.0799 (4.0440) grad_norm 1.0494 (1.2364) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][20/625] eta 0:02:20 lr 0.001504 wd 0.0500 time 0.2109 (0.2327) data time 0.0009 (0.0223) model time 0.0000 (0.0000) loss 3.4427 (3.7413) grad_norm 1.0741 (1.2844) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][30/625] eta 0:02:14 lr 0.001505 wd 0.0500 time 0.2112 (0.2269) data time 0.0013 (0.0155) model time 0.0000 (0.0000) loss 4.6534 (3.8291) grad_norm 1.2746 (1.3552) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][40/625] eta 0:02:10 lr 0.001507 wd 0.0500 time 0.2130 (0.2239) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 3.6523 (3.8918) grad_norm 1.7512 (1.3535) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][50/625] eta 0:02:07 lr 0.001508 wd 0.0500 time 0.2236 (0.2221) data time 0.0010 (0.0099) model time 0.0000 (0.0000) loss 4.8809 (3.9312) grad_norm 2.3264 (1.3907) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][60/625] eta 0:02:04 lr 0.001510 wd 0.0500 time 0.2110 (0.2210) data time 0.0010 (0.0084) model time 0.2101 (0.2145) loss 3.1089 (3.9699) grad_norm 1.8044 (1.4087) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][70/625] eta 0:02:02 lr 0.001512 wd 0.0500 time 0.2156 (0.2203) data time 0.0009 (0.0074) model time 0.2146 (0.2145) loss 4.9298 (3.9881) grad_norm 1.0546 (1.3645) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][80/625] eta 0:01:59 lr 0.001513 wd 0.0500 time 0.2139 (0.2194) data time 0.0010 (0.0066) model time 0.2129 (0.2137) loss 4.3738 (4.0091) grad_norm 1.0414 (1.3940) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][90/625] eta 0:01:57 lr 0.001515 wd 0.0500 time 0.2128 (0.2191) data time 0.0007 (0.0060) model time 0.2121 (0.2142) loss 4.3602 (4.0476) grad_norm 1.7199 (1.3793) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][100/625] eta 0:01:54 lr 0.001516 wd 0.0500 time 0.2161 (0.2187) data time 0.0007 (0.0055) model time 0.2154 (0.2140) loss 3.2508 (4.0796) grad_norm 1.0462 (1.3599) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][110/625] eta 0:01:52 lr 0.001518 wd 0.0500 time 0.2118 (0.2182) data time 0.0008 (0.0051) model time 0.2111 (0.2137) loss 4.7454 (4.0624) grad_norm 1.8881 (1.3647) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][120/625] eta 0:01:50 lr 0.001520 wd 0.0500 time 0.2186 (0.2180) data time 0.0010 (0.0048) model time 0.2176 (0.2140) loss 3.5431 (4.0567) grad_norm 0.9469 (1.3813) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][130/625] eta 0:01:47 lr 0.001521 wd 0.0500 time 0.2133 (0.2180) data time 0.0010 (0.0045) model time 0.2124 (0.2142) loss 3.4727 (4.0800) grad_norm 1.7955 (1.3909) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][140/625] eta 0:01:45 lr 0.001523 wd 0.0500 time 0.2181 (0.2176) data time 0.0008 (0.0043) model time 0.2174 (0.2140) loss 3.8506 (4.0782) grad_norm 1.1872 (1.3836) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][150/625] eta 0:01:43 lr 0.001524 wd 0.0500 time 0.2093 (0.2176) data time 0.0010 (0.0041) model time 0.2083 (0.2142) loss 5.2127 (4.0953) grad_norm 1.3715 (1.3681) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][160/625] eta 0:01:41 lr 0.001526 wd 0.0500 time 0.2240 (0.2188) data time 0.0011 (0.0039) model time 0.2229 (0.2162) loss 4.6499 (4.0987) grad_norm 1.8152 (1.3762) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][170/625] eta 0:01:39 lr 0.001528 wd 0.0500 time 0.2133 (0.2187) data time 0.0011 (0.0037) model time 0.2122 (0.2161) loss 4.9735 (4.0973) grad_norm 1.6707 (1.3943) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][180/625] eta 0:01:37 lr 0.001529 wd 0.0500 time 0.2162 (0.2186) data time 0.0011 (0.0036) model time 0.2151 (0.2161) loss 4.5161 (4.0981) grad_norm 1.3788 (1.3921) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][190/625] eta 0:01:35 lr 0.001531 wd 0.0500 time 0.2120 (0.2185) data time 0.0008 (0.0035) model time 0.2112 (0.2160) loss 4.0625 (4.0932) grad_norm 1.4622 (1.3903) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][200/625] eta 0:01:32 lr 0.001532 wd 0.0500 time 0.2158 (0.2183) data time 0.0011 (0.0033) model time 0.2147 (0.2159) loss 4.6768 (4.0993) grad_norm 1.0248 (1.3822) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][210/625] eta 0:01:30 lr 0.001534 wd 0.0500 time 0.2154 (0.2182) data time 0.0010 (0.0032) model time 0.2143 (0.2159) loss 3.5021 (4.1031) grad_norm 2.0023 (1.3932) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][220/625] eta 0:01:28 lr 0.001536 wd 0.0500 time 0.2170 (0.2181) data time 0.0007 (0.0031) model time 0.2163 (0.2157) loss 3.5709 (4.1014) grad_norm 1.3422 (1.3987) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][230/625] eta 0:01:26 lr 0.001537 wd 0.0500 time 0.2101 (0.2179) data time 0.0009 (0.0030) model time 0.2092 (0.2156) loss 3.8357 (4.1018) grad_norm 1.2439 (1.3951) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][240/625] eta 0:01:23 lr 0.001539 wd 0.0500 time 0.2102 (0.2178) data time 0.0010 (0.0030) model time 0.2092 (0.2155) loss 3.0140 (4.0930) grad_norm 1.5595 (1.4009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][250/625] eta 0:01:21 lr 0.001540 wd 0.0500 time 0.2086 (0.2175) data time 0.0010 (0.0029) model time 0.2076 (0.2152) loss 4.3791 (4.0968) grad_norm 1.3195 (1.4000) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][260/625] eta 0:01:19 lr 0.001542 wd 0.0500 time 0.2209 (0.2176) data time 0.0009 (0.0028) model time 0.2200 (0.2154) loss 3.6177 (4.0941) grad_norm 0.8675 (1.4024) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][270/625] eta 0:01:17 lr 0.001544 wd 0.0500 time 0.2128 (0.2175) data time 0.0010 (0.0028) model time 0.2118 (0.2153) loss 3.0269 (4.0945) grad_norm 1.9886 (1.4244) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][280/625] eta 0:01:14 lr 0.001545 wd 0.0500 time 0.2161 (0.2173) data time 0.0009 (0.0027) model time 0.2152 (0.2152) loss 4.2999 (4.1033) grad_norm 1.3121 (1.4328) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][290/625] eta 0:01:13 lr 0.001547 wd 0.0500 time 0.4455 (0.2180) data time 0.0007 (0.0027) model time 0.4448 (0.2160) loss 4.5684 (4.1081) grad_norm 0.9929 (1.4295) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][300/625] eta 0:01:10 lr 0.001548 wd 0.0500 time 0.2121 (0.2178) data time 0.0007 (0.0026) model time 0.2113 (0.2158) loss 4.5627 (4.0949) grad_norm 1.0297 (1.4264) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][310/625] eta 0:01:08 lr 0.001550 wd 0.0500 time 0.2187 (0.2179) data time 0.0010 (0.0026) model time 0.2177 (0.2159) loss 4.4303 (4.1013) grad_norm 1.0142 (1.4209) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][320/625] eta 0:01:06 lr 0.001552 wd 0.0500 time 0.2148 (0.2177) data time 0.0008 (0.0025) model time 0.2140 (0.2158) loss 5.2493 (4.1037) grad_norm 1.0179 (1.4148) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][330/625] eta 0:01:04 lr 0.001553 wd 0.0500 time 0.2160 (0.2176) data time 0.0009 (0.0025) model time 0.2151 (0.2157) loss 4.5563 (4.1109) grad_norm 1.4534 (1.4120) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][340/625] eta 0:01:02 lr 0.001555 wd 0.0500 time 0.2094 (0.2177) data time 0.0011 (0.0024) model time 0.2083 (0.2158) loss 3.1492 (4.1114) grad_norm 1.4025 (1.4077) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][350/625] eta 0:00:59 lr 0.001556 wd 0.0500 time 0.2181 (0.2177) data time 0.0007 (0.0024) model time 0.2173 (0.2159) loss 4.8539 (4.1078) grad_norm 1.7212 (1.4071) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][360/625] eta 0:00:57 lr 0.001558 wd 0.0500 time 0.2161 (0.2177) data time 0.0008 (0.0024) model time 0.2153 (0.2158) loss 4.8555 (4.1131) grad_norm 1.3075 (1.4026) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][370/625] eta 0:00:55 lr 0.001560 wd 0.0500 time 0.2162 (0.2176) data time 0.0007 (0.0023) model time 0.2155 (0.2158) loss 4.1872 (4.1167) grad_norm 2.2594 (1.4035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][380/625] eta 0:00:53 lr 0.001561 wd 0.0500 time 0.2134 (0.2177) data time 0.0009 (0.0023) model time 0.2125 (0.2159) loss 4.0796 (4.1159) grad_norm 0.9452 (1.4103) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][390/625] eta 0:00:51 lr 0.001563 wd 0.0500 time 0.2114 (0.2176) data time 0.0011 (0.0023) model time 0.2103 (0.2158) loss 4.3370 (4.1216) grad_norm 1.2383 (1.4097) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][400/625] eta 0:00:48 lr 0.001564 wd 0.0500 time 0.2067 (0.2175) data time 0.0009 (0.0022) model time 0.2058 (0.2157) loss 3.8818 (4.1321) grad_norm 1.1310 (1.4034) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][410/625] eta 0:00:46 lr 0.001566 wd 0.0500 time 0.2201 (0.2174) data time 0.0007 (0.0022) model time 0.2194 (0.2156) loss 3.0098 (4.1308) grad_norm 0.9749 (1.4001) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][420/625] eta 0:00:44 lr 0.001568 wd 0.0500 time 0.2203 (0.2174) data time 0.0010 (0.0022) model time 0.2193 (0.2156) loss 5.0813 (4.1359) grad_norm 1.2045 (1.3938) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][430/625] eta 0:00:42 lr 0.001569 wd 0.0500 time 0.2110 (0.2173) data time 0.0010 (0.0021) model time 0.2100 (0.2156) loss 4.2253 (4.1330) grad_norm 1.4604 (1.3986) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][440/625] eta 0:00:40 lr 0.001571 wd 0.0500 time 0.2101 (0.2173) data time 0.0007 (0.0021) model time 0.2093 (0.2156) loss 4.9267 (4.1317) grad_norm 1.9343 (1.4056) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][450/625] eta 0:00:38 lr 0.001572 wd 0.0500 time 0.2122 (0.2172) data time 0.0011 (0.0021) model time 0.2111 (0.2155) loss 4.5705 (4.1351) grad_norm 1.1608 (1.4056) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][460/625] eta 0:00:35 lr 0.001574 wd 0.0500 time 0.2202 (0.2172) data time 0.0008 (0.0021) model time 0.2194 (0.2155) loss 4.4561 (4.1372) grad_norm 1.9535 (1.4035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][470/625] eta 0:00:33 lr 0.001576 wd 0.0500 time 0.2184 (0.2172) data time 0.0010 (0.0020) model time 0.2174 (0.2155) loss 3.8790 (4.1419) grad_norm 1.4836 (1.4059) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][480/625] eta 0:00:31 lr 0.001577 wd 0.0500 time 0.2195 (0.2171) data time 0.0008 (0.0020) model time 0.2187 (0.2155) loss 2.9563 (4.1440) grad_norm 1.5828 (1.4072) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][490/625] eta 0:00:29 lr 0.001579 wd 0.0500 time 0.2114 (0.2171) data time 0.0010 (0.0020) model time 0.2104 (0.2155) loss 3.8037 (4.1389) grad_norm 1.0566 (1.4084) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][500/625] eta 0:00:27 lr 0.001580 wd 0.0500 time 0.2131 (0.2171) data time 0.0010 (0.0020) model time 0.2121 (0.2155) loss 4.1988 (4.1343) grad_norm 1.3828 (1.4157) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][510/625] eta 0:00:24 lr 0.001582 wd 0.0500 time 0.2125 (0.2171) data time 0.0007 (0.0020) model time 0.2118 (0.2154) loss 4.8556 (4.1384) grad_norm 2.2666 (1.4213) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][520/625] eta 0:00:22 lr 0.001584 wd 0.0500 time 0.2200 (0.2171) data time 0.0010 (0.0020) model time 0.2190 (0.2155) loss 4.5438 (4.1389) grad_norm 1.1680 (1.4192) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][530/625] eta 0:00:20 lr 0.001585 wd 0.0500 time 0.2121 (0.2170) data time 0.0009 (0.0019) model time 0.2112 (0.2154) loss 3.7519 (4.1364) grad_norm 1.4921 (1.4181) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][540/625] eta 0:00:18 lr 0.001587 wd 0.0500 time 0.2127 (0.2170) data time 0.0007 (0.0019) model time 0.2119 (0.2154) loss 5.2528 (4.1400) grad_norm 1.2605 (1.4146) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][550/625] eta 0:00:16 lr 0.001588 wd 0.0500 time 0.2112 (0.2169) data time 0.0009 (0.0019) model time 0.2104 (0.2153) loss 3.7264 (4.1422) grad_norm 1.5752 (1.4171) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][560/625] eta 0:00:14 lr 0.001590 wd 0.0500 time 0.2277 (0.2170) data time 0.0011 (0.0019) model time 0.2266 (0.2154) loss 4.6011 (4.1390) grad_norm 1.0604 (1.4170) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][570/625] eta 0:00:11 lr 0.001592 wd 0.0500 time 0.2154 (0.2169) data time 0.0010 (0.0019) model time 0.2144 (0.2153) loss 3.3632 (4.1398) grad_norm 1.6524 (1.4148) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][580/625] eta 0:00:09 lr 0.001593 wd 0.0500 time 0.2113 (0.2169) data time 0.0010 (0.0019) model time 0.2102 (0.2153) loss 4.9440 (4.1427) grad_norm 1.2999 (1.4128) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][590/625] eta 0:00:07 lr 0.001595 wd 0.0500 time 0.2077 (0.2169) data time 0.0010 (0.0019) model time 0.2066 (0.2153) loss 4.7302 (4.1406) grad_norm 1.1303 (1.4097) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][600/625] eta 0:00:05 lr 0.001596 wd 0.0500 time 0.2073 (0.2168) data time 0.0011 (0.0018) model time 0.2061 (0.2153) loss 4.7925 (4.1389) grad_norm 1.8512 (1.4109) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][610/625] eta 0:00:03 lr 0.001598 wd 0.0500 time 0.2139 (0.2168) data time 0.0005 (0.0018) model time 0.2135 (0.2152) loss 4.6157 (4.1380) grad_norm 1.9644 (1.4134) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][620/625] eta 0:00:01 lr 0.001600 wd 0.0500 time 0.2105 (0.2167) data time 0.0005 (0.0018) model time 0.2100 (0.2151) loss 5.1576 (4.1393) grad_norm 1.2788 (1.4101) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 15 training takes 0:02:15 [2024-07-29 13:02:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:02:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.527 (0.527) Loss 1.1172 (1.1172) Acc@1 77.539 (77.539) Acc@5 94.141 (94.141) Mem 8975MB [2024-07-29 13:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 1.9307 (1.3807) Acc@1 58.301 (70.508) Acc@5 83.789 (91.477) Mem 8975MB [2024-07-29 13:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 2.0781 (1.6617) Acc@1 55.859 (64.737) Acc@5 81.885 (87.163) Mem 8975MB [2024-07-29 13:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 64.631 Acc@5 87.132 [2024-07-29 13:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 64.6% [2024-07-29 13:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 64.63% [2024-07-29 13:02:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:02:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.545 (0.545) Loss 6.8984 (6.8984) Acc@1 0.000 (0.000) Acc@5 0.146 (0.146) Mem 8975MB [2024-07-29 13:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 6.9297 (7.0103) Acc@1 0.000 (0.067) Acc@5 0.049 (0.240) Mem 8975MB [2024-07-29 13:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 6.9844 (6.9900) Acc@1 0.000 (0.130) Acc@5 0.146 (0.602) Mem 8975MB [2024-07-29 13:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.142 Acc@5 0.572 [2024-07-29 13:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 13:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][0/625] eta 0:12:13 lr 0.001600 wd 0.0500 time 1.1737 (1.1737) data time 0.8795 (0.8795) model time 0.0000 (0.0000) loss 3.2769 (3.2769) grad_norm 1.4791 (1.4791) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][10/625] eta 0:03:07 lr 0.001602 wd 0.0500 time 0.2127 (0.3048) data time 0.0009 (0.0810) model time 0.0000 (0.0000) loss 4.8779 (3.9611) grad_norm 0.9275 (1.3958) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][20/625] eta 0:02:39 lr 0.001604 wd 0.0500 time 0.2212 (0.2629) data time 0.0008 (0.0429) model time 0.0000 (0.0000) loss 5.1035 (4.1223) grad_norm 1.1151 (1.3660) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][30/625] eta 0:02:27 lr 0.001605 wd 0.0500 time 0.2122 (0.2481) data time 0.0011 (0.0294) model time 0.0000 (0.0000) loss 4.5142 (4.0584) grad_norm 1.8929 (1.4103) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][40/625] eta 0:02:20 lr 0.001607 wd 0.0500 time 0.2082 (0.2397) data time 0.0011 (0.0225) model time 0.0000 (0.0000) loss 4.1206 (4.1151) grad_norm 1.9018 (1.4594) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][50/625] eta 0:02:15 lr 0.001608 wd 0.0500 time 0.2209 (0.2353) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 4.0095 (4.0813) grad_norm 1.1125 (1.4440) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][60/625] eta 0:02:11 lr 0.001610 wd 0.0500 time 0.2160 (0.2322) data time 0.0010 (0.0155) model time 0.2151 (0.2153) loss 3.8092 (4.0808) grad_norm 1.0885 (1.4201) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][70/625] eta 0:02:07 lr 0.001612 wd 0.0500 time 0.2154 (0.2300) data time 0.0011 (0.0134) model time 0.2144 (0.2154) loss 3.3814 (4.0712) grad_norm 1.8253 (1.4155) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][80/625] eta 0:02:04 lr 0.001613 wd 0.0500 time 0.2201 (0.2287) data time 0.0008 (0.0119) model time 0.2193 (0.2165) loss 4.7515 (4.0516) grad_norm 1.1350 (1.4018) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][90/625] eta 0:02:01 lr 0.001615 wd 0.0500 time 0.2166 (0.2273) data time 0.0007 (0.0107) model time 0.2159 (0.2160) loss 4.8931 (4.0761) grad_norm 1.1870 (1.3961) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][100/625] eta 0:01:58 lr 0.001616 wd 0.0500 time 0.2074 (0.2259) data time 0.0010 (0.0098) model time 0.2064 (0.2152) loss 3.8831 (4.1031) grad_norm 1.4535 (1.4487) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][110/625] eta 0:01:55 lr 0.001618 wd 0.0500 time 0.2112 (0.2249) data time 0.0008 (0.0090) model time 0.2103 (0.2150) loss 4.7471 (4.1085) grad_norm 1.6502 (1.4588) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][120/625] eta 0:01:53 lr 0.001620 wd 0.0500 time 0.2109 (0.2241) data time 0.0010 (0.0083) model time 0.2099 (0.2149) loss 4.6016 (4.1330) grad_norm 1.1057 (1.4431) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][130/625] eta 0:01:50 lr 0.001621 wd 0.0500 time 0.2161 (0.2235) data time 0.0009 (0.0078) model time 0.2152 (0.2149) loss 3.4039 (4.1266) grad_norm 1.4396 (1.4287) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][140/625] eta 0:01:48 lr 0.001623 wd 0.0500 time 0.2142 (0.2231) data time 0.0012 (0.0073) model time 0.2130 (0.2151) loss 4.4237 (4.1337) grad_norm 1.2295 (1.4179) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][150/625] eta 0:01:45 lr 0.001624 wd 0.0500 time 0.2071 (0.2227) data time 0.0011 (0.0069) model time 0.2060 (0.2152) loss 4.2591 (4.1477) grad_norm 1.1485 (1.4379) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][160/625] eta 0:01:43 lr 0.001626 wd 0.0500 time 0.2156 (0.2226) data time 0.0011 (0.0065) model time 0.2145 (0.2156) loss 4.2638 (4.1578) grad_norm 1.5841 (1.4339) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][170/625] eta 0:01:41 lr 0.001628 wd 0.0500 time 0.2128 (0.2220) data time 0.0007 (0.0062) model time 0.2121 (0.2153) loss 4.4512 (4.1684) grad_norm 1.1822 (1.4269) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][180/625] eta 0:01:38 lr 0.001629 wd 0.0500 time 0.2102 (0.2217) data time 0.0009 (0.0059) model time 0.2093 (0.2152) loss 4.7064 (4.1719) grad_norm 0.9544 (1.4153) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][190/625] eta 0:01:36 lr 0.001631 wd 0.0500 time 0.2157 (0.2212) data time 0.0007 (0.0057) model time 0.2149 (0.2151) loss 4.6297 (4.1655) grad_norm 1.1585 (1.4121) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][200/625] eta 0:01:33 lr 0.001632 wd 0.0500 time 0.2153 (0.2209) data time 0.0010 (0.0054) model time 0.2143 (0.2149) loss 2.8859 (4.1636) grad_norm 1.5018 (1.4139) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][210/625] eta 0:01:31 lr 0.001634 wd 0.0500 time 0.2144 (0.2205) data time 0.0010 (0.0052) model time 0.2134 (0.2147) loss 4.3618 (4.1633) grad_norm 0.9971 (1.4111) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][220/625] eta 0:01:29 lr 0.001636 wd 0.0500 time 0.2139 (0.2203) data time 0.0007 (0.0050) model time 0.2131 (0.2147) loss 4.7323 (4.1642) grad_norm 1.2421 (1.4009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][230/625] eta 0:01:27 lr 0.001637 wd 0.0500 time 0.2241 (0.2208) data time 0.0009 (0.0049) model time 0.2232 (0.2156) loss 4.4341 (4.1574) grad_norm 2.1881 (1.3964) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][240/625] eta 0:01:24 lr 0.001639 wd 0.0500 time 0.2170 (0.2207) data time 0.0009 (0.0047) model time 0.2162 (0.2157) loss 5.0872 (4.1576) grad_norm 1.3978 (1.3944) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][250/625] eta 0:01:22 lr 0.001640 wd 0.0500 time 0.2071 (0.2205) data time 0.0009 (0.0046) model time 0.2062 (0.2156) loss 4.8890 (4.1650) grad_norm 1.1008 (1.3877) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][260/625] eta 0:01:20 lr 0.001642 wd 0.0500 time 0.2208 (0.2203) data time 0.0007 (0.0044) model time 0.2201 (0.2155) loss 4.6748 (4.1667) grad_norm 3.9014 (1.4006) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][270/625] eta 0:01:18 lr 0.001644 wd 0.0500 time 0.2115 (0.2201) data time 0.0008 (0.0043) model time 0.2107 (0.2155) loss 4.9716 (4.1686) grad_norm 1.2889 (1.4218) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][280/625] eta 0:01:15 lr 0.001645 wd 0.0500 time 0.2146 (0.2198) data time 0.0011 (0.0042) model time 0.2136 (0.2153) loss 4.4539 (4.1745) grad_norm 1.1108 (1.4142) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][290/625] eta 0:01:13 lr 0.001647 wd 0.0500 time 0.2213 (0.2197) data time 0.0010 (0.0041) model time 0.2203 (0.2153) loss 3.4614 (4.1654) grad_norm 1.0765 (1.4037) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][300/625] eta 0:01:11 lr 0.001648 wd 0.0500 time 0.2147 (0.2195) data time 0.0012 (0.0040) model time 0.2135 (0.2152) loss 4.4428 (4.1765) grad_norm 2.0418 (1.4050) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][310/625] eta 0:01:09 lr 0.001650 wd 0.0500 time 0.2135 (0.2194) data time 0.0009 (0.0039) model time 0.2125 (0.2152) loss 4.0452 (4.1750) grad_norm 1.5776 (1.4102) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][320/625] eta 0:01:06 lr 0.001652 wd 0.0500 time 0.2088 (0.2193) data time 0.0012 (0.0038) model time 0.2077 (0.2152) loss 4.7518 (4.1747) grad_norm 1.4017 (1.4087) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][330/625] eta 0:01:05 lr 0.001653 wd 0.0500 time 0.2407 (0.2206) data time 0.0010 (0.0037) model time 0.2397 (0.2169) loss 4.2979 (4.1766) grad_norm 1.2548 (1.4082) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][340/625] eta 0:01:02 lr 0.001655 wd 0.0500 time 0.2186 (0.2205) data time 0.0010 (0.0037) model time 0.2176 (0.2168) loss 4.0188 (4.1720) grad_norm 1.2521 (1.4103) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][350/625] eta 0:01:00 lr 0.001656 wd 0.0500 time 0.2121 (0.2202) data time 0.0010 (0.0036) model time 0.2111 (0.2166) loss 4.3489 (4.1619) grad_norm 1.7426 (1.4079) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][360/625] eta 0:00:58 lr 0.001658 wd 0.0500 time 0.2165 (0.2201) data time 0.0010 (0.0035) model time 0.2155 (0.2165) loss 4.1838 (4.1665) grad_norm 1.2635 (1.4135) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][370/625] eta 0:00:56 lr 0.001660 wd 0.0500 time 0.2124 (0.2199) data time 0.0010 (0.0035) model time 0.2114 (0.2164) loss 3.1226 (4.1581) grad_norm 1.2120 (1.4133) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][380/625] eta 0:00:53 lr 0.001661 wd 0.0500 time 0.2102 (0.2198) data time 0.0009 (0.0034) model time 0.2093 (0.2163) loss 4.3237 (4.1567) grad_norm 1.0006 (1.4127) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][390/625] eta 0:00:51 lr 0.001663 wd 0.0500 time 0.2123 (0.2197) data time 0.0010 (0.0033) model time 0.2112 (0.2163) loss 4.1411 (4.1562) grad_norm 2.0116 (1.4089) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][400/625] eta 0:00:49 lr 0.001664 wd 0.0500 time 0.2048 (0.2196) data time 0.0009 (0.0033) model time 0.2039 (0.2162) loss 4.0027 (4.1499) grad_norm 1.1587 (1.4059) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][410/625] eta 0:00:47 lr 0.001666 wd 0.0500 time 0.2151 (0.2196) data time 0.0008 (0.0032) model time 0.2143 (0.2162) loss 3.1121 (4.1482) grad_norm 1.9732 (1.4165) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][420/625] eta 0:00:44 lr 0.001668 wd 0.0500 time 0.2140 (0.2195) data time 0.0008 (0.0032) model time 0.2132 (0.2162) loss 2.8560 (4.1460) grad_norm 1.1421 (1.4143) loss_scale 32768.0000 (16617.5012) mem 8975MB [2024-07-29 13:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][430/625] eta 0:00:42 lr 0.001669 wd 0.0500 time 0.2142 (0.2194) data time 0.0010 (0.0031) model time 0.2132 (0.2161) loss 3.8296 (4.1455) grad_norm 1.0214 (1.4123) loss_scale 32768.0000 (16992.2227) mem 8975MB [2024-07-29 13:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][440/625] eta 0:00:40 lr 0.001671 wd 0.0500 time 0.2136 (0.2193) data time 0.0008 (0.0031) model time 0.2127 (0.2161) loss 3.5935 (4.1388) grad_norm 1.0454 (1.4076) loss_scale 32768.0000 (17349.9501) mem 8975MB [2024-07-29 13:04:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][450/625] eta 0:00:38 lr 0.001672 wd 0.0500 time 0.2176 (0.2196) data time 0.0007 (0.0030) model time 0.2169 (0.2165) loss 4.5875 (4.1458) grad_norm 0.9024 (1.4106) loss_scale 32768.0000 (17691.8137) mem 8975MB [2024-07-29 13:04:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][460/625] eta 0:00:36 lr 0.001674 wd 0.0500 time 0.2154 (0.2197) data time 0.0011 (0.0030) model time 0.2143 (0.2166) loss 4.3794 (4.1419) grad_norm 1.2303 (1.4142) loss_scale 32768.0000 (18018.8460) mem 8975MB [2024-07-29 13:04:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][470/625] eta 0:00:34 lr 0.001676 wd 0.0500 time 0.2164 (0.2196) data time 0.0007 (0.0030) model time 0.2156 (0.2166) loss 4.3343 (4.1385) grad_norm 1.0226 (1.4064) loss_scale 32768.0000 (18331.9915) mem 8975MB [2024-07-29 13:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][480/625] eta 0:00:31 lr 0.001677 wd 0.0500 time 0.2117 (0.2195) data time 0.0011 (0.0029) model time 0.2106 (0.2166) loss 2.9250 (4.1333) grad_norm 1.1323 (1.4016) loss_scale 32768.0000 (18632.1164) mem 8975MB [2024-07-29 13:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][490/625] eta 0:00:29 lr 0.001679 wd 0.0500 time 0.2122 (0.2194) data time 0.0010 (0.0029) model time 0.2111 (0.2165) loss 4.2929 (4.1358) grad_norm 0.8824 (1.3971) loss_scale 32768.0000 (18920.0163) mem 8975MB [2024-07-29 13:04:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][500/625] eta 0:00:27 lr 0.001680 wd 0.0500 time 0.2116 (0.2194) data time 0.0010 (0.0028) model time 0.2106 (0.2165) loss 3.6969 (4.1367) grad_norm 1.0628 (1.3937) loss_scale 32768.0000 (19196.4232) mem 8975MB [2024-07-29 13:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][510/625] eta 0:00:25 lr 0.001682 wd 0.0500 time 0.2070 (0.2193) data time 0.0012 (0.0028) model time 0.2058 (0.2164) loss 4.3380 (4.1392) grad_norm 2.5170 (inf) loss_scale 16384.0000 (19397.8865) mem 8975MB [2024-07-29 13:04:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][520/625] eta 0:00:23 lr 0.001684 wd 0.0500 time 0.2100 (0.2192) data time 0.0009 (0.0028) model time 0.2091 (0.2164) loss 5.1196 (4.1399) grad_norm 1.2862 (inf) loss_scale 16384.0000 (19340.0384) mem 8975MB [2024-07-29 13:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][530/625] eta 0:00:20 lr 0.001685 wd 0.0500 time 0.2086 (0.2191) data time 0.0008 (0.0027) model time 0.2078 (0.2163) loss 3.1113 (4.1376) grad_norm 1.3928 (inf) loss_scale 16384.0000 (19284.3691) mem 8975MB [2024-07-29 13:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][540/625] eta 0:00:18 lr 0.001687 wd 0.0500 time 0.2167 (0.2191) data time 0.0007 (0.0027) model time 0.2159 (0.2163) loss 3.9490 (4.1337) grad_norm 1.0995 (inf) loss_scale 16384.0000 (19230.7579) mem 8975MB [2024-07-29 13:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][550/625] eta 0:00:16 lr 0.001688 wd 0.0500 time 0.2186 (0.2190) data time 0.0011 (0.0027) model time 0.2175 (0.2163) loss 4.1531 (4.1337) grad_norm 1.4257 (inf) loss_scale 16384.0000 (19179.0926) mem 8975MB [2024-07-29 13:04:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][560/625] eta 0:00:14 lr 0.001690 wd 0.0500 time 0.2178 (0.2190) data time 0.0010 (0.0027) model time 0.2168 (0.2163) loss 3.9541 (4.1329) grad_norm 1.8022 (inf) loss_scale 16384.0000 (19129.2692) mem 8975MB [2024-07-29 13:04:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][570/625] eta 0:00:12 lr 0.001692 wd 0.0500 time 0.2082 (0.2190) data time 0.0009 (0.0026) model time 0.2073 (0.2163) loss 5.1274 (4.1342) grad_norm 1.1337 (inf) loss_scale 16384.0000 (19081.1909) mem 8975MB [2024-07-29 13:04:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][580/625] eta 0:00:09 lr 0.001693 wd 0.0500 time 0.2192 (0.2189) data time 0.0007 (0.0026) model time 0.2184 (0.2162) loss 4.5485 (4.1396) grad_norm 1.5714 (inf) loss_scale 16384.0000 (19034.7676) mem 8975MB [2024-07-29 13:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][590/625] eta 0:00:07 lr 0.001695 wd 0.0500 time 0.2198 (0.2188) data time 0.0011 (0.0026) model time 0.2187 (0.2162) loss 2.9160 (4.1417) grad_norm 1.0778 (inf) loss_scale 16384.0000 (18989.9154) mem 8975MB [2024-07-29 13:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][600/625] eta 0:00:05 lr 0.001696 wd 0.0500 time 0.2234 (0.2188) data time 0.0010 (0.0026) model time 0.2224 (0.2162) loss 4.2387 (4.1370) grad_norm 1.1761 (inf) loss_scale 16384.0000 (18946.5557) mem 8975MB [2024-07-29 13:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][610/625] eta 0:00:03 lr 0.001698 wd 0.0500 time 0.2105 (0.2187) data time 0.0007 (0.0025) model time 0.2098 (0.2161) loss 3.4225 (4.1338) grad_norm 1.0107 (inf) loss_scale 16384.0000 (18904.6154) mem 8975MB [2024-07-29 13:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][620/625] eta 0:00:01 lr 0.001700 wd 0.0500 time 0.2114 (0.2186) data time 0.0005 (0.0025) model time 0.2109 (0.2160) loss 4.7139 (4.1270) grad_norm 1.2922 (inf) loss_scale 16384.0000 (18864.0258) mem 8975MB [2024-07-29 13:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 16 training takes 0:02:16 [2024-07-29 13:04:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:04:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.536 (0.536) Loss 1.1289 (1.1289) Acc@1 77.246 (77.246) Acc@5 93.896 (93.896) Mem 8975MB [2024-07-29 13:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.105) Loss 1.8311 (1.3621) Acc@1 60.547 (70.930) Acc@5 84.570 (91.757) Mem 8975MB [2024-07-29 13:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 2.0273 (1.6200) Acc@1 57.227 (65.620) Acc@5 81.982 (87.723) Mem 8975MB [2024-07-29 13:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 65.377 Acc@5 87.620 [2024-07-29 13:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 65.4% [2024-07-29 13:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 65.38% [2024-07-29 13:04:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:04:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 6.8555 (6.8555) Acc@1 0.098 (0.098) Acc@5 1.221 (1.221) Mem 8975MB [2024-07-29 13:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 6.9141 (7.0213) Acc@1 0.000 (0.058) Acc@5 1.270 (0.422) Mem 8975MB [2024-07-29 13:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 6.9609 (6.9874) Acc@1 0.000 (0.123) Acc@5 1.611 (0.667) Mem 8975MB [2024-07-29 13:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.134 Acc@5 0.618 [2024-07-29 13:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 13:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][0/625] eta 0:11:41 lr 0.001700 wd 0.0500 time 1.1220 (1.1220) data time 0.4734 (0.4734) model time 0.0000 (0.0000) loss 4.1547 (4.1547) grad_norm 1.1811 (1.1811) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][10/625] eta 0:03:05 lr 0.001702 wd 0.0500 time 0.2227 (0.3016) data time 0.0011 (0.0440) model time 0.0000 (0.0000) loss 3.0525 (3.8407) grad_norm 1.6582 (1.3106) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][20/625] eta 0:02:36 lr 0.001703 wd 0.0500 time 0.2147 (0.2590) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 4.3087 (4.0208) grad_norm 2.1840 (1.5534) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][30/625] eta 0:02:25 lr 0.001705 wd 0.0500 time 0.2129 (0.2444) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 3.7446 (3.9755) grad_norm 0.9804 (1.5168) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][40/625] eta 0:02:18 lr 0.001707 wd 0.0500 time 0.2176 (0.2368) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 3.5893 (4.0345) grad_norm 1.1548 (1.4175) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][50/625] eta 0:02:14 lr 0.001708 wd 0.0500 time 0.2801 (0.2337) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 4.4183 (4.0670) grad_norm 1.2942 (1.3849) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][60/625] eta 0:02:10 lr 0.001710 wd 0.0500 time 0.2171 (0.2316) data time 0.0008 (0.0090) model time 0.2163 (0.2183) loss 4.5763 (4.0508) grad_norm 1.1726 (1.3579) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][70/625] eta 0:02:07 lr 0.001711 wd 0.0500 time 0.2119 (0.2290) data time 0.0011 (0.0079) model time 0.2108 (0.2151) loss 4.0120 (4.0209) grad_norm 1.4144 (1.3472) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][80/625] eta 0:02:03 lr 0.001713 wd 0.0500 time 0.2167 (0.2271) data time 0.0008 (0.0070) model time 0.2159 (0.2145) loss 3.7918 (4.0630) grad_norm 1.8735 (1.3720) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][90/625] eta 0:02:00 lr 0.001715 wd 0.0500 time 0.2109 (0.2257) data time 0.0012 (0.0064) model time 0.2098 (0.2139) loss 3.6383 (4.0822) grad_norm 1.4855 (1.3870) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][100/625] eta 0:01:57 lr 0.001716 wd 0.0500 time 0.2152 (0.2244) data time 0.0008 (0.0059) model time 0.2144 (0.2134) loss 3.3334 (4.0570) grad_norm 1.0373 (1.3938) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][110/625] eta 0:01:55 lr 0.001718 wd 0.0500 time 0.2212 (0.2235) data time 0.0009 (0.0055) model time 0.2203 (0.2134) loss 4.9108 (4.0630) grad_norm 1.1184 (1.3875) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][120/625] eta 0:01:52 lr 0.001719 wd 0.0500 time 0.2231 (0.2228) data time 0.0010 (0.0051) model time 0.2222 (0.2134) loss 3.0877 (4.0548) grad_norm 1.7175 (1.3942) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][130/625] eta 0:01:49 lr 0.001721 wd 0.0500 time 0.2164 (0.2221) data time 0.0009 (0.0048) model time 0.2155 (0.2134) loss 4.6882 (4.0728) grad_norm 1.8546 (1.3847) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][140/625] eta 0:01:47 lr 0.001723 wd 0.0500 time 0.2115 (0.2216) data time 0.0011 (0.0046) model time 0.2104 (0.2135) loss 4.5442 (4.0525) grad_norm 1.5605 (1.4078) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][150/625] eta 0:01:45 lr 0.001724 wd 0.0500 time 0.2117 (0.2212) data time 0.0009 (0.0043) model time 0.2108 (0.2135) loss 2.7638 (4.0541) grad_norm 1.3251 (1.4389) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][160/625] eta 0:01:42 lr 0.001726 wd 0.0500 time 0.2199 (0.2208) data time 0.0009 (0.0041) model time 0.2190 (0.2136) loss 3.7005 (4.0607) grad_norm 1.1451 (1.4279) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][170/625] eta 0:01:40 lr 0.001727 wd 0.0500 time 0.2194 (0.2204) data time 0.0010 (0.0040) model time 0.2184 (0.2135) loss 3.6007 (4.0616) grad_norm 1.2930 (1.4136) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][180/625] eta 0:01:37 lr 0.001729 wd 0.0500 time 0.2107 (0.2201) data time 0.0010 (0.0038) model time 0.2098 (0.2135) loss 4.1708 (4.0534) grad_norm 0.9619 (1.4073) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][190/625] eta 0:01:35 lr 0.001731 wd 0.0500 time 0.2167 (0.2198) data time 0.0007 (0.0037) model time 0.2160 (0.2135) loss 3.0626 (4.0522) grad_norm 1.5737 (1.4066) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][200/625] eta 0:01:33 lr 0.001732 wd 0.0500 time 0.2146 (0.2198) data time 0.0008 (0.0035) model time 0.2138 (0.2138) loss 2.9638 (4.0576) grad_norm 1.2884 (1.4020) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][210/625] eta 0:01:31 lr 0.001734 wd 0.0500 time 0.2218 (0.2200) data time 0.0009 (0.0034) model time 0.2209 (0.2144) loss 3.3672 (4.0485) grad_norm 0.9561 (1.4060) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][220/625] eta 0:01:29 lr 0.001735 wd 0.0500 time 0.2208 (0.2198) data time 0.0010 (0.0033) model time 0.2198 (0.2145) loss 4.5593 (4.0618) grad_norm 2.2092 (1.4052) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][230/625] eta 0:01:26 lr 0.001737 wd 0.0500 time 0.2138 (0.2196) data time 0.0008 (0.0032) model time 0.2129 (0.2144) loss 3.8699 (4.0664) grad_norm 0.9385 (1.3952) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][240/625] eta 0:01:24 lr 0.001739 wd 0.0500 time 0.2136 (0.2194) data time 0.0007 (0.0031) model time 0.2129 (0.2144) loss 4.1064 (4.0574) grad_norm 1.0636 (1.3845) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][250/625] eta 0:01:22 lr 0.001740 wd 0.0500 time 0.2099 (0.2191) data time 0.0011 (0.0030) model time 0.2088 (0.2142) loss 2.7560 (4.0485) grad_norm 1.4502 (1.3783) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][260/625] eta 0:01:19 lr 0.001742 wd 0.0500 time 0.2224 (0.2190) data time 0.0007 (0.0030) model time 0.2216 (0.2142) loss 4.2947 (4.0494) grad_norm 1.8641 (1.3735) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][270/625] eta 0:01:17 lr 0.001743 wd 0.0500 time 0.2097 (0.2189) data time 0.0007 (0.0029) model time 0.2090 (0.2143) loss 2.9918 (4.0487) grad_norm 1.2750 (1.3693) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:05:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][280/625] eta 0:01:15 lr 0.001745 wd 0.0500 time 0.2166 (0.2187) data time 0.0009 (0.0028) model time 0.2157 (0.2142) loss 4.7805 (4.0583) grad_norm 1.5103 (inf) loss_scale 8192.0000 (16296.5409) mem 8975MB [2024-07-29 13:05:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][290/625] eta 0:01:13 lr 0.001747 wd 0.0500 time 0.2185 (0.2187) data time 0.0010 (0.0028) model time 0.2175 (0.2143) loss 4.2864 (4.0530) grad_norm 1.2612 (inf) loss_scale 8192.0000 (16018.0344) mem 8975MB [2024-07-29 13:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][300/625] eta 0:01:11 lr 0.001748 wd 0.0500 time 0.2085 (0.2185) data time 0.0011 (0.0027) model time 0.2074 (0.2143) loss 4.1927 (4.0593) grad_norm 1.1428 (inf) loss_scale 8192.0000 (15758.0332) mem 8975MB [2024-07-29 13:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][310/625] eta 0:01:08 lr 0.001750 wd 0.0500 time 0.2148 (0.2184) data time 0.0009 (0.0027) model time 0.2138 (0.2143) loss 4.0903 (4.0643) grad_norm 0.9607 (inf) loss_scale 8192.0000 (15514.7524) mem 8975MB [2024-07-29 13:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][320/625] eta 0:01:06 lr 0.001751 wd 0.0500 time 0.2159 (0.2184) data time 0.0010 (0.0026) model time 0.2149 (0.2143) loss 4.4618 (4.0672) grad_norm 1.2533 (inf) loss_scale 8192.0000 (15286.6293) mem 8975MB [2024-07-29 13:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][330/625] eta 0:01:04 lr 0.001753 wd 0.0500 time 0.2194 (0.2182) data time 0.0012 (0.0026) model time 0.2181 (0.2143) loss 4.5668 (4.0691) grad_norm 1.1115 (inf) loss_scale 8192.0000 (15072.2900) mem 8975MB [2024-07-29 13:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][340/625] eta 0:01:02 lr 0.001755 wd 0.0500 time 0.2105 (0.2182) data time 0.0011 (0.0025) model time 0.2094 (0.2143) loss 4.3522 (4.0624) grad_norm 1.1520 (inf) loss_scale 8192.0000 (14870.5220) mem 8975MB [2024-07-29 13:06:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][350/625] eta 0:00:59 lr 0.001756 wd 0.0500 time 0.2113 (0.2180) data time 0.0011 (0.0025) model time 0.2102 (0.2142) loss 4.2927 (4.0635) grad_norm 1.4439 (inf) loss_scale 8192.0000 (14680.2507) mem 8975MB [2024-07-29 13:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][360/625] eta 0:00:57 lr 0.001758 wd 0.0500 time 0.2144 (0.2180) data time 0.0009 (0.0024) model time 0.2135 (0.2143) loss 4.7048 (4.0658) grad_norm 1.0004 (inf) loss_scale 8192.0000 (14500.5208) mem 8975MB [2024-07-29 13:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][370/625] eta 0:00:55 lr 0.001759 wd 0.0500 time 0.2143 (0.2180) data time 0.0011 (0.0024) model time 0.2132 (0.2143) loss 4.6640 (4.0584) grad_norm 0.9670 (inf) loss_scale 8192.0000 (14330.4798) mem 8975MB [2024-07-29 13:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][380/625] eta 0:00:53 lr 0.001761 wd 0.0500 time 0.2143 (0.2179) data time 0.0007 (0.0024) model time 0.2135 (0.2143) loss 4.8683 (4.0713) grad_norm 1.3171 (inf) loss_scale 8192.0000 (14169.3648) mem 8975MB [2024-07-29 13:06:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][390/625] eta 0:00:51 lr 0.001763 wd 0.0500 time 0.2116 (0.2178) data time 0.0011 (0.0023) model time 0.2105 (0.2142) loss 4.4102 (4.0752) grad_norm 1.5940 (inf) loss_scale 8192.0000 (14016.4910) mem 8975MB [2024-07-29 13:06:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][400/625] eta 0:00:48 lr 0.001764 wd 0.0500 time 0.2120 (0.2177) data time 0.0010 (0.0023) model time 0.2110 (0.2142) loss 4.1957 (4.0633) grad_norm 1.3960 (inf) loss_scale 8192.0000 (13871.2419) mem 8975MB [2024-07-29 13:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][410/625] eta 0:00:46 lr 0.001766 wd 0.0500 time 0.2179 (0.2176) data time 0.0009 (0.0023) model time 0.2169 (0.2142) loss 4.6655 (4.0651) grad_norm 1.4993 (inf) loss_scale 8192.0000 (13733.0608) mem 8975MB [2024-07-29 13:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][420/625] eta 0:00:44 lr 0.001767 wd 0.0500 time 0.2167 (0.2176) data time 0.0010 (0.0023) model time 0.2157 (0.2142) loss 4.6148 (4.0708) grad_norm 1.3768 (inf) loss_scale 8192.0000 (13601.4442) mem 8975MB [2024-07-29 13:06:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][430/625] eta 0:00:42 lr 0.001769 wd 0.0500 time 0.2095 (0.2175) data time 0.0011 (0.0022) model time 0.2084 (0.2142) loss 3.9406 (4.0690) grad_norm 1.9984 (inf) loss_scale 8192.0000 (13475.9350) mem 8975MB [2024-07-29 13:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][440/625] eta 0:00:40 lr 0.001771 wd 0.0500 time 0.2150 (0.2175) data time 0.0011 (0.0022) model time 0.2139 (0.2142) loss 3.3139 (4.0692) grad_norm 0.8538 (inf) loss_scale 8192.0000 (13356.1179) mem 8975MB [2024-07-29 13:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][450/625] eta 0:00:38 lr 0.001772 wd 0.0500 time 0.2148 (0.2174) data time 0.0008 (0.0022) model time 0.2140 (0.2142) loss 4.0463 (4.0712) grad_norm 0.9422 (inf) loss_scale 8192.0000 (13241.6142) mem 8975MB [2024-07-29 13:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][460/625] eta 0:00:35 lr 0.001774 wd 0.0500 time 0.2175 (0.2174) data time 0.0010 (0.0022) model time 0.2165 (0.2143) loss 4.4341 (4.0753) grad_norm 2.1393 (inf) loss_scale 8192.0000 (13132.0781) mem 8975MB [2024-07-29 13:06:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][470/625] eta 0:00:33 lr 0.001775 wd 0.0500 time 0.2159 (0.2179) data time 0.0008 (0.0021) model time 0.2151 (0.2148) loss 4.2603 (4.0793) grad_norm 1.8030 (inf) loss_scale 8192.0000 (13027.1932) mem 8975MB [2024-07-29 13:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][480/625] eta 0:00:31 lr 0.001777 wd 0.0500 time 0.2147 (0.2179) data time 0.0011 (0.0021) model time 0.2136 (0.2149) loss 3.8383 (4.0799) grad_norm 2.0968 (inf) loss_scale 8192.0000 (12926.6694) mem 8975MB [2024-07-29 13:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][490/625] eta 0:00:29 lr 0.001779 wd 0.0500 time 0.2213 (0.2178) data time 0.0010 (0.0021) model time 0.2204 (0.2148) loss 3.7504 (4.0821) grad_norm 1.1528 (inf) loss_scale 8192.0000 (12830.2403) mem 8975MB [2024-07-29 13:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][500/625] eta 0:00:27 lr 0.001780 wd 0.0500 time 0.2153 (0.2177) data time 0.0007 (0.0021) model time 0.2146 (0.2148) loss 3.1358 (4.0867) grad_norm 1.4526 (inf) loss_scale 8192.0000 (12737.6607) mem 8975MB [2024-07-29 13:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][510/625] eta 0:00:25 lr 0.001782 wd 0.0500 time 0.2105 (0.2177) data time 0.0008 (0.0021) model time 0.2097 (0.2148) loss 4.6899 (4.0874) grad_norm 1.0096 (inf) loss_scale 8192.0000 (12648.7045) mem 8975MB [2024-07-29 13:06:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][520/625] eta 0:00:22 lr 0.001783 wd 0.0500 time 0.2125 (0.2177) data time 0.0011 (0.0020) model time 0.2114 (0.2148) loss 4.3046 (4.0811) grad_norm 0.9130 (inf) loss_scale 8192.0000 (12563.1631) mem 8975MB [2024-07-29 13:06:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][530/625] eta 0:00:20 lr 0.001785 wd 0.0500 time 0.2074 (0.2176) data time 0.0009 (0.0020) model time 0.2065 (0.2148) loss 4.9908 (4.0865) grad_norm 1.3214 (inf) loss_scale 8192.0000 (12480.8437) mem 8975MB [2024-07-29 13:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][540/625] eta 0:00:18 lr 0.001787 wd 0.0500 time 0.2106 (0.2175) data time 0.0010 (0.0020) model time 0.2096 (0.2147) loss 4.3789 (4.0848) grad_norm 1.9939 (inf) loss_scale 8192.0000 (12401.5675) mem 8975MB [2024-07-29 13:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][550/625] eta 0:00:16 lr 0.001788 wd 0.0500 time 0.2083 (0.2175) data time 0.0010 (0.0020) model time 0.2073 (0.2147) loss 4.1808 (4.0898) grad_norm 0.8762 (inf) loss_scale 8192.0000 (12325.1688) mem 8975MB [2024-07-29 13:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][560/625] eta 0:00:14 lr 0.001790 wd 0.0500 time 0.2120 (0.2174) data time 0.0010 (0.0020) model time 0.2110 (0.2147) loss 4.0022 (4.0920) grad_norm 0.8544 (inf) loss_scale 8192.0000 (12251.4938) mem 8975MB [2024-07-29 13:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][570/625] eta 0:00:11 lr 0.001791 wd 0.0500 time 0.2163 (0.2174) data time 0.0007 (0.0020) model time 0.2155 (0.2146) loss 3.9786 (4.0981) grad_norm 1.4204 (inf) loss_scale 8192.0000 (12180.3993) mem 8975MB [2024-07-29 13:06:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][580/625] eta 0:00:09 lr 0.001793 wd 0.0500 time 0.2176 (0.2174) data time 0.0007 (0.0019) model time 0.2168 (0.2147) loss 3.5368 (4.0980) grad_norm 1.0526 (inf) loss_scale 8192.0000 (12111.7522) mem 8975MB [2024-07-29 13:07:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][590/625] eta 0:00:07 lr 0.001795 wd 0.0500 time 0.2154 (0.2174) data time 0.0011 (0.0019) model time 0.2142 (0.2147) loss 4.5556 (4.0992) grad_norm 1.0009 (inf) loss_scale 8192.0000 (12045.4281) mem 8975MB [2024-07-29 13:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][600/625] eta 0:00:05 lr 0.001796 wd 0.0500 time 0.2155 (0.2173) data time 0.0009 (0.0019) model time 0.2146 (0.2147) loss 4.4138 (4.1027) grad_norm 2.4259 (inf) loss_scale 8192.0000 (11981.3111) mem 8975MB [2024-07-29 13:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][610/625] eta 0:00:03 lr 0.001798 wd 0.0500 time 0.2136 (0.2173) data time 0.0007 (0.0019) model time 0.2129 (0.2147) loss 3.7980 (4.1066) grad_norm 0.8271 (inf) loss_scale 8192.0000 (11919.2930) mem 8975MB [2024-07-29 13:07:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][620/625] eta 0:00:01 lr 0.001799 wd 0.0500 time 0.2093 (0.2172) data time 0.0007 (0.0019) model time 0.2086 (0.2146) loss 4.4476 (4.1021) grad_norm 2.2072 (inf) loss_scale 8192.0000 (11859.2721) mem 8975MB [2024-07-29 13:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 17 training takes 0:02:15 [2024-07-29 13:07:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:07:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.556 (0.556) Loss 1.0889 (1.0889) Acc@1 78.516 (78.516) Acc@5 94.678 (94.678) Mem 8975MB [2024-07-29 13:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.106) Loss 1.9131 (1.3231) Acc@1 58.057 (71.400) Acc@5 83.447 (91.899) Mem 8975MB [2024-07-29 13:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.9629 (1.5864) Acc@1 57.227 (65.932) Acc@5 82.910 (87.860) Mem 8975MB [2024-07-29 13:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 65.865 Acc@5 87.844 [2024-07-29 13:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 65.9% [2024-07-29 13:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 65.86% [2024-07-29 13:07:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:07:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:07:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 6.7812 (6.7812) Acc@1 1.709 (1.709) Acc@5 2.197 (2.197) Mem 8975MB [2024-07-29 13:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 6.9023 (7.0167) Acc@1 0.000 (0.155) Acc@5 1.562 (0.479) Mem 8975MB [2024-07-29 13:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 6.9492 (6.9836) Acc@1 0.049 (0.140) Acc@5 1.709 (0.732) Mem 8975MB [2024-07-29 13:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.150 Acc@5 0.680 [2024-07-29 13:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-07-29 13:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][0/625] eta 0:11:13 lr 0.001800 wd 0.0500 time 1.0774 (1.0774) data time 0.7123 (0.7123) model time 0.0000 (0.0000) loss 4.2291 (4.2291) grad_norm 1.3624 (1.3624) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][10/625] eta 0:03:14 lr 0.001802 wd 0.0500 time 0.2074 (0.3159) data time 0.0007 (0.0659) model time 0.0000 (0.0000) loss 3.4375 (4.2097) grad_norm 1.1414 (1.5015) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][20/625] eta 0:02:48 lr 0.001803 wd 0.0500 time 0.2133 (0.2791) data time 0.0011 (0.0351) model time 0.0000 (0.0000) loss 4.3579 (4.1664) grad_norm 1.2529 (1.4214) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][30/625] eta 0:02:34 lr 0.001805 wd 0.0500 time 0.2144 (0.2589) data time 0.0011 (0.0241) model time 0.0000 (0.0000) loss 3.5484 (4.0600) grad_norm 1.1244 (1.4246) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][40/625] eta 0:02:25 lr 0.001807 wd 0.0500 time 0.2161 (0.2484) data time 0.0010 (0.0185) model time 0.0000 (0.0000) loss 4.5764 (4.0467) grad_norm 1.2007 (1.3654) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][50/625] eta 0:02:18 lr 0.001808 wd 0.0500 time 0.2101 (0.2417) data time 0.0009 (0.0151) model time 0.0000 (0.0000) loss 3.6546 (4.0188) grad_norm 1.0052 (1.3723) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][60/625] eta 0:02:14 lr 0.001810 wd 0.0500 time 0.2131 (0.2372) data time 0.0010 (0.0128) model time 0.2121 (0.2135) loss 3.6035 (3.9935) grad_norm 1.1018 (1.3898) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][70/625] eta 0:02:09 lr 0.001811 wd 0.0500 time 0.2083 (0.2341) data time 0.0011 (0.0112) model time 0.2072 (0.2137) loss 4.0666 (3.9913) grad_norm 0.8498 (1.3986) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][80/625] eta 0:02:06 lr 0.001813 wd 0.0500 time 0.2130 (0.2315) data time 0.0007 (0.0099) model time 0.2122 (0.2130) loss 4.2822 (4.0154) grad_norm 2.4299 (1.4442) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][90/625] eta 0:02:02 lr 0.001815 wd 0.0500 time 0.2078 (0.2296) data time 0.0008 (0.0090) model time 0.2070 (0.2131) loss 4.3861 (4.0307) grad_norm 0.8781 (1.4238) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][100/625] eta 0:01:59 lr 0.001816 wd 0.0500 time 0.2136 (0.2282) data time 0.0012 (0.0082) model time 0.2125 (0.2133) loss 4.1432 (4.0382) grad_norm 1.3981 (1.4058) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][110/625] eta 0:01:56 lr 0.001818 wd 0.0500 time 0.2162 (0.2271) data time 0.0008 (0.0076) model time 0.2154 (0.2135) loss 3.8185 (4.0020) grad_norm 2.0927 (1.4233) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][120/625] eta 0:01:54 lr 0.001819 wd 0.0500 time 0.2142 (0.2263) data time 0.0012 (0.0070) model time 0.2130 (0.2139) loss 4.1998 (4.0230) grad_norm 1.1606 (1.4458) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][130/625] eta 0:01:51 lr 0.001821 wd 0.0500 time 0.2100 (0.2255) data time 0.0009 (0.0066) model time 0.2091 (0.2140) loss 2.8969 (4.0152) grad_norm 1.1258 (1.4421) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][140/625] eta 0:01:48 lr 0.001823 wd 0.0500 time 0.2121 (0.2245) data time 0.0009 (0.0062) model time 0.2112 (0.2137) loss 3.8720 (4.0021) grad_norm 1.4649 (1.4462) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][150/625] eta 0:01:46 lr 0.001824 wd 0.0500 time 0.2157 (0.2238) data time 0.0007 (0.0058) model time 0.2150 (0.2136) loss 3.6920 (3.9673) grad_norm 1.3888 (1.4382) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][160/625] eta 0:01:43 lr 0.001826 wd 0.0500 time 0.2132 (0.2233) data time 0.0011 (0.0056) model time 0.2122 (0.2137) loss 3.1100 (3.9588) grad_norm 0.8959 (1.4243) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][170/625] eta 0:01:41 lr 0.001827 wd 0.0500 time 0.2130 (0.2227) data time 0.0011 (0.0053) model time 0.2118 (0.2135) loss 3.2273 (3.9590) grad_norm 1.3262 (1.4068) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][180/625] eta 0:01:38 lr 0.001829 wd 0.0500 time 0.2108 (0.2220) data time 0.0011 (0.0051) model time 0.2097 (0.2132) loss 3.5030 (3.9494) grad_norm 1.9700 (1.4005) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][190/625] eta 0:01:36 lr 0.001831 wd 0.0500 time 0.2179 (0.2216) data time 0.0010 (0.0049) model time 0.2169 (0.2132) loss 3.9893 (3.9666) grad_norm 2.3446 (1.4038) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][200/625] eta 0:01:34 lr 0.001832 wd 0.0500 time 0.2107 (0.2212) data time 0.0009 (0.0047) model time 0.2098 (0.2131) loss 3.8834 (3.9660) grad_norm 1.0630 (1.4056) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:07:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][210/625] eta 0:01:31 lr 0.001834 wd 0.0500 time 0.2139 (0.2209) data time 0.0013 (0.0045) model time 0.2126 (0.2131) loss 3.2358 (3.9667) grad_norm 1.2064 (1.4097) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][220/625] eta 0:01:29 lr 0.001835 wd 0.0500 time 0.2190 (0.2206) data time 0.0008 (0.0043) model time 0.2183 (0.2132) loss 4.2657 (3.9904) grad_norm 0.9031 (1.4018) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][230/625] eta 0:01:27 lr 0.001837 wd 0.0500 time 0.2197 (0.2204) data time 0.0007 (0.0042) model time 0.2190 (0.2132) loss 5.0288 (3.9993) grad_norm 1.2083 (1.4009) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][240/625] eta 0:01:24 lr 0.001839 wd 0.0500 time 0.2097 (0.2202) data time 0.0008 (0.0041) model time 0.2089 (0.2133) loss 5.1338 (4.0031) grad_norm 1.1374 (1.3931) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][250/625] eta 0:01:22 lr 0.001840 wd 0.0500 time 0.2123 (0.2200) data time 0.0008 (0.0039) model time 0.2115 (0.2134) loss 3.3053 (3.9885) grad_norm 1.2282 (1.3868) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][260/625] eta 0:01:20 lr 0.001842 wd 0.0500 time 0.2041 (0.2197) data time 0.0011 (0.0038) model time 0.2029 (0.2133) loss 4.4998 (4.0002) grad_norm 1.1872 (1.3828) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][270/625] eta 0:01:17 lr 0.001843 wd 0.0500 time 0.2171 (0.2196) data time 0.0010 (0.0037) model time 0.2160 (0.2133) loss 4.3353 (4.0124) grad_norm 1.3727 (1.3828) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][280/625] eta 0:01:15 lr 0.001845 wd 0.0500 time 0.2183 (0.2194) data time 0.0010 (0.0037) model time 0.2173 (0.2134) loss 4.4579 (4.0111) grad_norm 1.1824 (1.3875) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][290/625] eta 0:01:13 lr 0.001847 wd 0.0500 time 0.2141 (0.2194) data time 0.0009 (0.0036) model time 0.2132 (0.2135) loss 3.1715 (4.0142) grad_norm 1.3211 (1.3823) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][300/625] eta 0:01:11 lr 0.001848 wd 0.0500 time 0.2181 (0.2193) data time 0.0010 (0.0035) model time 0.2171 (0.2136) loss 4.1138 (4.0278) grad_norm 1.3733 (1.3856) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][310/625] eta 0:01:09 lr 0.001850 wd 0.0500 time 0.2088 (0.2192) data time 0.0009 (0.0034) model time 0.2080 (0.2137) loss 2.8388 (4.0159) grad_norm 1.1755 (1.3839) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][320/625] eta 0:01:06 lr 0.001851 wd 0.0500 time 0.2109 (0.2190) data time 0.0008 (0.0033) model time 0.2102 (0.2136) loss 3.5024 (4.0105) grad_norm 1.1207 (1.3792) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][330/625] eta 0:01:04 lr 0.001853 wd 0.0500 time 0.2159 (0.2189) data time 0.0007 (0.0033) model time 0.2152 (0.2136) loss 4.1900 (4.0088) grad_norm 1.4574 (1.3938) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][340/625] eta 0:01:02 lr 0.001855 wd 0.0500 time 0.2156 (0.2188) data time 0.0008 (0.0032) model time 0.2148 (0.2136) loss 4.4671 (4.0132) grad_norm 1.2220 (1.3920) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][350/625] eta 0:01:00 lr 0.001856 wd 0.0500 time 0.2107 (0.2186) data time 0.0010 (0.0031) model time 0.2097 (0.2136) loss 3.3072 (4.0147) grad_norm 1.6147 (1.3962) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][360/625] eta 0:00:57 lr 0.001858 wd 0.0500 time 0.2140 (0.2184) data time 0.0010 (0.0031) model time 0.2130 (0.2135) loss 4.4811 (4.0084) grad_norm 1.0080 (1.3969) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][370/625] eta 0:00:55 lr 0.001859 wd 0.0500 time 0.2126 (0.2183) data time 0.0011 (0.0030) model time 0.2115 (0.2135) loss 4.0978 (4.0065) grad_norm 1.0536 (1.3957) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][380/625] eta 0:00:53 lr 0.001861 wd 0.0500 time 0.2077 (0.2182) data time 0.0009 (0.0030) model time 0.2068 (0.2135) loss 3.0154 (4.0033) grad_norm 1.1333 (1.3927) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][390/625] eta 0:00:51 lr 0.001863 wd 0.0500 time 0.2125 (0.2182) data time 0.0012 (0.0029) model time 0.2113 (0.2135) loss 4.3482 (4.0058) grad_norm 1.6811 (1.3984) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][400/625] eta 0:00:49 lr 0.001864 wd 0.0500 time 0.2082 (0.2181) data time 0.0008 (0.0029) model time 0.2074 (0.2136) loss 3.3385 (3.9986) grad_norm 1.3618 (1.3939) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][410/625] eta 0:00:46 lr 0.001866 wd 0.0500 time 0.2188 (0.2181) data time 0.0009 (0.0028) model time 0.2179 (0.2136) loss 3.5733 (3.9999) grad_norm 1.0257 (1.3925) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][420/625] eta 0:00:44 lr 0.001867 wd 0.0500 time 0.2094 (0.2180) data time 0.0011 (0.0028) model time 0.2083 (0.2136) loss 4.2729 (4.0026) grad_norm 1.9781 (1.3923) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][430/625] eta 0:00:42 lr 0.001869 wd 0.0500 time 0.2107 (0.2179) data time 0.0008 (0.0028) model time 0.2100 (0.2136) loss 2.9408 (4.0042) grad_norm 1.1881 (1.3889) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][440/625] eta 0:00:40 lr 0.001871 wd 0.0500 time 0.2100 (0.2178) data time 0.0010 (0.0027) model time 0.2089 (0.2136) loss 4.6379 (4.0111) grad_norm 0.9318 (1.3843) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][450/625] eta 0:00:38 lr 0.001872 wd 0.0500 time 0.2160 (0.2178) data time 0.0011 (0.0027) model time 0.2149 (0.2136) loss 2.5573 (4.0132) grad_norm 1.0957 (1.3807) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][460/625] eta 0:00:35 lr 0.001874 wd 0.0500 time 0.2202 (0.2179) data time 0.0010 (0.0027) model time 0.2191 (0.2138) loss 4.0897 (4.0132) grad_norm 1.2825 (1.3825) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][470/625] eta 0:00:33 lr 0.001875 wd 0.0500 time 0.2132 (0.2178) data time 0.0010 (0.0026) model time 0.2122 (0.2138) loss 3.1565 (4.0140) grad_norm 1.3847 (1.3818) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][480/625] eta 0:00:31 lr 0.001877 wd 0.0500 time 0.2148 (0.2177) data time 0.0011 (0.0026) model time 0.2137 (0.2138) loss 4.5628 (4.0151) grad_norm 1.2893 (1.3803) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][490/625] eta 0:00:29 lr 0.001879 wd 0.0500 time 0.2162 (0.2177) data time 0.0008 (0.0026) model time 0.2154 (0.2138) loss 4.8687 (4.0169) grad_norm 1.6381 (1.3782) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][500/625] eta 0:00:27 lr 0.001880 wd 0.0500 time 0.2139 (0.2177) data time 0.0009 (0.0025) model time 0.2130 (0.2139) loss 3.4141 (4.0219) grad_norm 1.0264 (1.3793) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][510/625] eta 0:00:25 lr 0.001882 wd 0.0500 time 0.2168 (0.2176) data time 0.0008 (0.0025) model time 0.2160 (0.2138) loss 4.7071 (4.0217) grad_norm 0.9896 (1.3765) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][520/625] eta 0:00:22 lr 0.001883 wd 0.0500 time 0.2109 (0.2175) data time 0.0008 (0.0025) model time 0.2102 (0.2138) loss 4.7191 (4.0238) grad_norm 1.0548 (1.3797) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][530/625] eta 0:00:20 lr 0.001885 wd 0.0500 time 0.2132 (0.2175) data time 0.0008 (0.0025) model time 0.2124 (0.2138) loss 3.1757 (4.0280) grad_norm 2.6020 (1.3876) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][540/625] eta 0:00:18 lr 0.001887 wd 0.0500 time 0.2184 (0.2175) data time 0.0011 (0.0024) model time 0.2173 (0.2139) loss 4.6185 (4.0298) grad_norm 1.1632 (1.3884) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][550/625] eta 0:00:16 lr 0.001888 wd 0.0500 time 0.2126 (0.2174) data time 0.0008 (0.0024) model time 0.2118 (0.2138) loss 2.8959 (4.0285) grad_norm 1.5404 (1.3836) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][560/625] eta 0:00:14 lr 0.001890 wd 0.0500 time 0.2067 (0.2174) data time 0.0010 (0.0024) model time 0.2057 (0.2138) loss 3.8593 (4.0301) grad_norm 1.0877 (1.3793) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][570/625] eta 0:00:11 lr 0.001891 wd 0.0500 time 0.2301 (0.2174) data time 0.0007 (0.0024) model time 0.2294 (0.2139) loss 3.8870 (4.0345) grad_norm 1.0819 (1.3780) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][580/625] eta 0:00:09 lr 0.001893 wd 0.0500 time 0.2138 (0.2174) data time 0.0009 (0.0023) model time 0.2129 (0.2140) loss 4.2483 (4.0345) grad_norm 1.1218 (1.3754) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][590/625] eta 0:00:07 lr 0.001895 wd 0.0500 time 0.2125 (0.2174) data time 0.0010 (0.0023) model time 0.2115 (0.2140) loss 4.7488 (4.0376) grad_norm 1.0676 (1.3790) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][600/625] eta 0:00:05 lr 0.001896 wd 0.0500 time 0.2146 (0.2174) data time 0.0010 (0.0023) model time 0.2136 (0.2140) loss 4.7773 (4.0352) grad_norm 1.6440 (1.3779) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][610/625] eta 0:00:03 lr 0.001898 wd 0.0500 time 0.2120 (0.2174) data time 0.0005 (0.0023) model time 0.2115 (0.2140) loss 4.9542 (4.0354) grad_norm 1.2052 (1.3749) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][620/625] eta 0:00:01 lr 0.001899 wd 0.0500 time 0.2088 (0.2173) data time 0.0007 (0.0023) model time 0.2081 (0.2140) loss 4.2804 (4.0331) grad_norm 1.7122 (1.3715) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 18 training takes 0:02:15 [2024-07-29 13:09:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:09:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 1.0479 (1.0479) Acc@1 77.686 (77.686) Acc@5 93.604 (93.604) Mem 8975MB [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.7432 (1.2825) Acc@1 60.889 (71.893) Acc@5 85.742 (92.227) Mem 8975MB [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 2.0039 (1.5451) Acc@1 57.471 (66.813) Acc@5 81.494 (88.293) Mem 8975MB [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 66.655 Acc@5 88.286 [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 66.7% [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 66.65% [2024-07-29 13:09:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:09:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:09:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.536 (0.536) Loss 6.7188 (6.7188) Acc@1 2.246 (2.246) Acc@5 3.564 (3.564) Mem 8975MB [2024-07-29 13:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 6.9023 (7.0092) Acc@1 0.000 (0.204) Acc@5 1.074 (0.586) Mem 8975MB [2024-07-29 13:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 6.9492 (6.9812) Acc@1 0.000 (0.144) Acc@5 1.221 (0.765) Mem 8975MB [2024-07-29 13:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.152 Acc@5 0.734 [2024-07-29 13:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-07-29 13:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][0/625] eta 0:12:10 lr 0.001900 wd 0.0500 time 1.1685 (1.1685) data time 0.5922 (0.5922) model time 0.0000 (0.0000) loss 2.8346 (2.8346) grad_norm 1.7197 (1.7197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][10/625] eta 0:03:18 lr 0.001902 wd 0.0500 time 0.2000 (0.3234) data time 0.0011 (0.0548) model time 0.0000 (0.0000) loss 4.2372 (4.1247) grad_norm 1.7681 (1.6838) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][20/625] eta 0:02:44 lr 0.001903 wd 0.0500 time 0.2166 (0.2714) data time 0.0010 (0.0292) model time 0.0000 (0.0000) loss 3.8469 (4.0572) grad_norm 1.0150 (1.5381) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][30/625] eta 0:02:30 lr 0.001905 wd 0.0500 time 0.2130 (0.2526) data time 0.0012 (0.0201) model time 0.0000 (0.0000) loss 4.1676 (4.0230) grad_norm 2.0999 (1.5113) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][40/625] eta 0:02:22 lr 0.001906 wd 0.0500 time 0.2061 (0.2436) data time 0.0011 (0.0155) model time 0.0000 (0.0000) loss 3.2344 (3.9346) grad_norm 1.4546 (1.4514) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][50/625] eta 0:02:16 lr 0.001908 wd 0.0500 time 0.2183 (0.2382) data time 0.0011 (0.0127) model time 0.0000 (0.0000) loss 3.1579 (3.9168) grad_norm 1.7406 (1.3947) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][60/625] eta 0:02:12 lr 0.001910 wd 0.0500 time 0.2158 (0.2344) data time 0.0009 (0.0107) model time 0.2150 (0.2139) loss 2.9070 (3.9765) grad_norm 1.7167 (1.3747) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][70/625] eta 0:02:08 lr 0.001911 wd 0.0500 time 0.2095 (0.2315) data time 0.0009 (0.0094) model time 0.2086 (0.2132) loss 4.2259 (4.0127) grad_norm 0.8583 (1.4227) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][80/625] eta 0:02:06 lr 0.001913 wd 0.0500 time 0.2183 (0.2319) data time 0.0007 (0.0084) model time 0.2175 (0.2201) loss 2.4371 (3.9854) grad_norm 1.0632 (1.4244) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][90/625] eta 0:02:04 lr 0.001914 wd 0.0500 time 0.2131 (0.2326) data time 0.0011 (0.0076) model time 0.2120 (0.2243) loss 4.0876 (4.0016) grad_norm 1.2862 (1.4004) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][100/625] eta 0:02:01 lr 0.001916 wd 0.0500 time 0.2287 (0.2310) data time 0.0009 (0.0069) model time 0.2278 (0.2225) loss 3.6762 (3.9931) grad_norm 2.3898 (1.3915) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][110/625] eta 0:01:58 lr 0.001918 wd 0.0500 time 0.2071 (0.2296) data time 0.0011 (0.0064) model time 0.2060 (0.2212) loss 4.1800 (4.0124) grad_norm 0.8350 (1.3900) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][120/625] eta 0:01:55 lr 0.001919 wd 0.0500 time 0.2126 (0.2284) data time 0.0009 (0.0060) model time 0.2117 (0.2202) loss 4.2756 (4.0116) grad_norm 1.9361 (1.3914) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][130/625] eta 0:01:52 lr 0.001921 wd 0.0500 time 0.2128 (0.2272) data time 0.0009 (0.0056) model time 0.2120 (0.2191) loss 4.1869 (4.0161) grad_norm 1.9307 (1.3777) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][140/625] eta 0:01:49 lr 0.001922 wd 0.0500 time 0.2217 (0.2266) data time 0.0010 (0.0053) model time 0.2207 (0.2190) loss 3.9636 (4.0172) grad_norm 1.1147 (1.3651) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][150/625] eta 0:01:47 lr 0.001924 wd 0.0500 time 0.2110 (0.2257) data time 0.0008 (0.0050) model time 0.2102 (0.2183) loss 4.5880 (4.0143) grad_norm 1.2047 (1.3482) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][160/625] eta 0:01:45 lr 0.001926 wd 0.0500 time 0.2079 (0.2265) data time 0.0008 (0.0048) model time 0.2071 (0.2200) loss 4.5395 (4.0154) grad_norm 0.8969 (1.3369) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][170/625] eta 0:01:42 lr 0.001927 wd 0.0500 time 0.2131 (0.2261) data time 0.0009 (0.0045) model time 0.2122 (0.2199) loss 2.7031 (4.0187) grad_norm 1.1457 (1.3348) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][180/625] eta 0:01:40 lr 0.001929 wd 0.0500 time 0.2126 (0.2255) data time 0.0008 (0.0043) model time 0.2118 (0.2194) loss 4.8007 (4.0243) grad_norm 1.3475 (1.3373) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][190/625] eta 0:01:37 lr 0.001930 wd 0.0500 time 0.2180 (0.2250) data time 0.0008 (0.0042) model time 0.2172 (0.2190) loss 4.7645 (4.0165) grad_norm 1.3028 (1.3379) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][200/625] eta 0:01:35 lr 0.001932 wd 0.0500 time 0.2121 (0.2244) data time 0.0008 (0.0040) model time 0.2113 (0.2186) loss 3.2570 (4.0066) grad_norm 1.8408 (1.3469) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][210/625] eta 0:01:32 lr 0.001934 wd 0.0500 time 0.2063 (0.2240) data time 0.0008 (0.0039) model time 0.2055 (0.2184) loss 4.5560 (3.9938) grad_norm 2.0587 (1.3461) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][220/625] eta 0:01:30 lr 0.001935 wd 0.0500 time 0.2171 (0.2237) data time 0.0008 (0.0038) model time 0.2163 (0.2183) loss 4.8452 (3.9917) grad_norm 1.5572 (1.3421) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][230/625] eta 0:01:28 lr 0.001937 wd 0.0500 time 0.2108 (0.2234) data time 0.0009 (0.0036) model time 0.2098 (0.2181) loss 3.5388 (3.9865) grad_norm 2.0233 (1.3483) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][240/625] eta 0:01:25 lr 0.001938 wd 0.0500 time 0.2189 (0.2231) data time 0.0008 (0.0035) model time 0.2182 (0.2179) loss 4.3289 (3.9898) grad_norm 1.1184 (1.3437) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][250/625] eta 0:01:23 lr 0.001940 wd 0.0500 time 0.2174 (0.2227) data time 0.0008 (0.0034) model time 0.2166 (0.2177) loss 2.9972 (3.9932) grad_norm 1.6482 (1.3374) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][260/625] eta 0:01:21 lr 0.001942 wd 0.0500 time 0.2183 (0.2225) data time 0.0009 (0.0033) model time 0.2173 (0.2176) loss 2.4624 (3.9946) grad_norm 1.2271 (1.3331) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][270/625] eta 0:01:18 lr 0.001943 wd 0.0500 time 0.2167 (0.2222) data time 0.0010 (0.0033) model time 0.2157 (0.2174) loss 3.8325 (3.9935) grad_norm 1.6312 (1.3282) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][280/625] eta 0:01:16 lr 0.001945 wd 0.0500 time 0.2141 (0.2219) data time 0.0009 (0.0032) model time 0.2132 (0.2172) loss 3.2863 (3.9937) grad_norm 1.2341 (1.3280) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][290/625] eta 0:01:14 lr 0.001946 wd 0.0500 time 0.2120 (0.2216) data time 0.0008 (0.0031) model time 0.2112 (0.2170) loss 4.6185 (4.0093) grad_norm 1.0455 (1.3426) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][300/625] eta 0:01:11 lr 0.001948 wd 0.0500 time 0.2133 (0.2214) data time 0.0007 (0.0030) model time 0.2126 (0.2169) loss 4.3129 (4.0180) grad_norm 1.4233 (1.3349) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][310/625] eta 0:01:09 lr 0.001950 wd 0.0500 time 0.2120 (0.2212) data time 0.0010 (0.0030) model time 0.2110 (0.2168) loss 3.5684 (4.0071) grad_norm 2.1084 (1.3328) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][320/625] eta 0:01:07 lr 0.001951 wd 0.0500 time 0.2146 (0.2210) data time 0.0010 (0.0029) model time 0.2135 (0.2167) loss 3.3082 (4.0076) grad_norm 2.6234 (1.3337) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][330/625] eta 0:01:05 lr 0.001953 wd 0.0500 time 0.2143 (0.2208) data time 0.0009 (0.0029) model time 0.2134 (0.2166) loss 4.5375 (4.0041) grad_norm 1.6511 (1.3313) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][340/625] eta 0:01:02 lr 0.001954 wd 0.0500 time 0.2127 (0.2206) data time 0.0009 (0.0028) model time 0.2118 (0.2164) loss 4.7449 (4.0066) grad_norm 1.1165 (1.3257) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][350/625] eta 0:01:00 lr 0.001956 wd 0.0500 time 0.2149 (0.2204) data time 0.0008 (0.0028) model time 0.2140 (0.2162) loss 4.7438 (4.0108) grad_norm 1.0631 (1.3198) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][360/625] eta 0:00:58 lr 0.001958 wd 0.0500 time 0.2126 (0.2202) data time 0.0010 (0.0027) model time 0.2115 (0.2162) loss 4.6104 (4.0200) grad_norm 1.6061 (1.3195) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][370/625] eta 0:00:56 lr 0.001959 wd 0.0500 time 0.2194 (0.2201) data time 0.0008 (0.0027) model time 0.2186 (0.2161) loss 4.6958 (4.0152) grad_norm 1.3071 (1.3172) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][380/625] eta 0:00:53 lr 0.001961 wd 0.0500 time 0.2130 (0.2200) data time 0.0007 (0.0027) model time 0.2123 (0.2160) loss 3.7286 (4.0096) grad_norm 1.3853 (1.3135) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][390/625] eta 0:00:51 lr 0.001962 wd 0.0500 time 0.2094 (0.2199) data time 0.0009 (0.0027) model time 0.2085 (0.2160) loss 3.2061 (4.0043) grad_norm 0.9713 (1.3103) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][400/625] eta 0:00:49 lr 0.001964 wd 0.0500 time 0.2111 (0.2198) data time 0.0009 (0.0026) model time 0.2102 (0.2159) loss 3.6233 (4.0060) grad_norm 1.2606 (1.3124) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][410/625] eta 0:00:47 lr 0.001966 wd 0.0500 time 0.2073 (0.2197) data time 0.0011 (0.0026) model time 0.2062 (0.2159) loss 4.0580 (4.0090) grad_norm 1.0580 (1.3151) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][420/625] eta 0:00:45 lr 0.001967 wd 0.0500 time 0.2174 (0.2196) data time 0.0010 (0.0026) model time 0.2164 (0.2158) loss 4.2162 (4.0111) grad_norm 1.1797 (1.3169) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][430/625] eta 0:00:42 lr 0.001969 wd 0.0500 time 0.2169 (0.2195) data time 0.0007 (0.0025) model time 0.2162 (0.2158) loss 3.1561 (4.0111) grad_norm 1.2182 (1.3165) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][440/625] eta 0:00:40 lr 0.001970 wd 0.0500 time 0.2164 (0.2194) data time 0.0009 (0.0025) model time 0.2155 (0.2157) loss 3.8695 (4.0109) grad_norm 1.3760 (1.3248) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][450/625] eta 0:00:38 lr 0.001972 wd 0.0500 time 0.2115 (0.2192) data time 0.0007 (0.0025) model time 0.2108 (0.2156) loss 3.9390 (4.0075) grad_norm 0.8767 (1.3235) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][460/625] eta 0:00:36 lr 0.001974 wd 0.0500 time 0.2225 (0.2192) data time 0.0010 (0.0025) model time 0.2216 (0.2157) loss 3.6763 (4.0110) grad_norm 1.9232 (1.3244) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][470/625] eta 0:00:33 lr 0.001975 wd 0.0500 time 0.2125 (0.2191) data time 0.0009 (0.0024) model time 0.2116 (0.2156) loss 3.3562 (4.0118) grad_norm 1.0054 (1.3196) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][480/625] eta 0:00:31 lr 0.001977 wd 0.0500 time 0.2267 (0.2191) data time 0.0011 (0.0024) model time 0.2256 (0.2156) loss 4.1201 (4.0138) grad_norm 1.1322 (1.3180) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][490/625] eta 0:00:29 lr 0.001978 wd 0.0500 time 0.2213 (0.2190) data time 0.0010 (0.0024) model time 0.2204 (0.2156) loss 4.0646 (4.0205) grad_norm 0.9014 (1.3217) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][500/625] eta 0:00:27 lr 0.001980 wd 0.0500 time 0.2133 (0.2189) data time 0.0010 (0.0024) model time 0.2123 (0.2155) loss 4.4325 (4.0198) grad_norm 0.9784 (1.3230) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][510/625] eta 0:00:25 lr 0.001982 wd 0.0500 time 0.2113 (0.2189) data time 0.0013 (0.0023) model time 0.2101 (0.2156) loss 3.0338 (4.0173) grad_norm 1.1163 (1.3216) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][520/625] eta 0:00:22 lr 0.001983 wd 0.0500 time 0.2095 (0.2188) data time 0.0011 (0.0023) model time 0.2084 (0.2155) loss 4.3526 (4.0222) grad_norm 1.0102 (1.3177) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][530/625] eta 0:00:20 lr 0.001985 wd 0.0500 time 0.2174 (0.2188) data time 0.0007 (0.0023) model time 0.2166 (0.2155) loss 3.9745 (4.0306) grad_norm 2.7974 (1.3261) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][540/625] eta 0:00:18 lr 0.001986 wd 0.0500 time 0.2177 (0.2187) data time 0.0009 (0.0023) model time 0.2168 (0.2155) loss 2.9202 (4.0307) grad_norm 1.0498 (1.3275) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][550/625] eta 0:00:16 lr 0.001988 wd 0.0500 time 0.2128 (0.2187) data time 0.0009 (0.0022) model time 0.2120 (0.2155) loss 4.7457 (4.0318) grad_norm 0.9678 (1.3281) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][560/625] eta 0:00:14 lr 0.001990 wd 0.0500 time 0.2184 (0.2186) data time 0.0007 (0.0022) model time 0.2177 (0.2155) loss 4.8101 (4.0239) grad_norm 1.9509 (1.3304) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][570/625] eta 0:00:12 lr 0.001991 wd 0.0500 time 0.2134 (0.2186) data time 0.0010 (0.0022) model time 0.2124 (0.2155) loss 4.0537 (4.0275) grad_norm 1.3447 (1.3312) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][580/625] eta 0:00:09 lr 0.001993 wd 0.0500 time 0.2138 (0.2185) data time 0.0010 (0.0022) model time 0.2128 (0.2154) loss 3.3486 (4.0286) grad_norm 1.0812 (1.3305) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][590/625] eta 0:00:07 lr 0.001994 wd 0.0500 time 0.2120 (0.2184) data time 0.0010 (0.0022) model time 0.2110 (0.2154) loss 4.5201 (4.0295) grad_norm 1.6981 (1.3290) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][600/625] eta 0:00:05 lr 0.001996 wd 0.0500 time 0.2152 (0.2184) data time 0.0010 (0.0021) model time 0.2142 (0.2154) loss 4.1310 (4.0324) grad_norm 1.6155 (1.3288) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][610/625] eta 0:00:03 lr 0.001998 wd 0.0500 time 0.2114 (0.2183) data time 0.0007 (0.0021) model time 0.2107 (0.2153) loss 3.5161 (4.0308) grad_norm 1.0570 (1.3331) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][620/625] eta 0:00:01 lr 0.001999 wd 0.0500 time 0.2073 (0.2182) data time 0.0007 (0.0021) model time 0.2066 (0.2153) loss 4.7446 (4.0261) grad_norm 0.9735 (1.3322) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 19 training takes 0:02:16 [2024-07-29 13:11:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:11:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.535 (0.535) Loss 1.0420 (1.0420) Acc@1 77.588 (77.588) Acc@5 95.020 (95.020) Mem 8975MB [2024-07-29 13:11:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.7100 (1.2570) Acc@1 62.012 (72.985) Acc@5 86.230 (92.636) Mem 8975MB [2024-07-29 13:11:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.8643 (1.5173) Acc@1 60.205 (67.594) Acc@5 84.375 (88.888) Mem 8975MB [2024-07-29 13:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 67.344 Acc@5 88.802 [2024-07-29 13:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 67.3% [2024-07-29 13:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 67.34% [2024-07-29 13:11:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.491 (0.491) Loss 6.6523 (6.6523) Acc@1 2.295 (2.295) Acc@5 4.053 (4.053) Mem 8975MB [2024-07-29 13:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 6.8945 (7.0060) Acc@1 0.000 (0.213) Acc@5 0.928 (0.795) Mem 8975MB [2024-07-29 13:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 6.9375 (6.9732) Acc@1 0.000 (0.158) Acc@5 1.123 (0.858) Mem 8975MB [2024-07-29 13:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.176 Acc@5 0.860 [2024-07-29 13:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-07-29 13:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][0/625] eta 0:11:09 lr 0.002000 wd 0.0500 time 1.0705 (1.0705) data time 0.7935 (0.7935) model time 0.0000 (0.0000) loss 3.8196 (3.8196) grad_norm 1.8727 (1.8727) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][10/625] eta 0:03:01 lr 0.002000 wd 0.0500 time 0.2189 (0.2958) data time 0.0011 (0.0732) model time 0.0000 (0.0000) loss 3.8535 (4.0405) grad_norm 1.2705 (1.3056) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][20/625] eta 0:02:36 lr 0.002000 wd 0.0500 time 0.2337 (0.2585) data time 0.0008 (0.0389) model time 0.0000 (0.0000) loss 3.5745 (4.1572) grad_norm 1.2390 (1.2698) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][30/625] eta 0:02:25 lr 0.002000 wd 0.0500 time 0.2130 (0.2440) data time 0.0012 (0.0267) model time 0.0000 (0.0000) loss 4.4273 (4.1314) grad_norm 1.0874 (1.1962) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][40/625] eta 0:02:18 lr 0.002000 wd 0.0500 time 0.2137 (0.2368) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 4.5607 (4.1341) grad_norm 1.5610 (1.2625) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][50/625] eta 0:02:13 lr 0.002000 wd 0.0500 time 0.2164 (0.2323) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 4.2268 (4.1530) grad_norm 1.2634 (1.2728) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][60/625] eta 0:02:09 lr 0.002000 wd 0.0500 time 0.2136 (0.2294) data time 0.0012 (0.0141) model time 0.2124 (0.2133) loss 4.1959 (4.0934) grad_norm 1.0910 (1.3373) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][70/625] eta 0:02:06 lr 0.002000 wd 0.0500 time 0.2101 (0.2272) data time 0.0007 (0.0123) model time 0.2094 (0.2129) loss 4.8278 (4.0214) grad_norm 0.8921 (1.3312) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][80/625] eta 0:02:02 lr 0.002000 wd 0.0500 time 0.2133 (0.2255) data time 0.0011 (0.0109) model time 0.2122 (0.2127) loss 4.2233 (4.0175) grad_norm 1.4079 (1.3360) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][90/625] eta 0:01:59 lr 0.002000 wd 0.0500 time 0.2140 (0.2242) data time 0.0011 (0.0099) model time 0.2129 (0.2128) loss 3.1308 (4.0063) grad_norm 1.0875 (1.3430) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][100/625] eta 0:01:57 lr 0.002000 wd 0.0500 time 0.2168 (0.2234) data time 0.0010 (0.0090) model time 0.2159 (0.2131) loss 4.5053 (4.0350) grad_norm 1.0690 (1.3439) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][110/625] eta 0:01:54 lr 0.002000 wd 0.0500 time 0.2144 (0.2225) data time 0.0010 (0.0083) model time 0.2134 (0.2131) loss 3.7650 (4.0454) grad_norm 0.9637 (1.3373) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][120/625] eta 0:01:52 lr 0.002000 wd 0.0500 time 0.2292 (0.2221) data time 0.0007 (0.0077) model time 0.2285 (0.2135) loss 4.2008 (4.0492) grad_norm 1.0463 (1.3156) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][130/625] eta 0:01:49 lr 0.002000 wd 0.0500 time 0.2185 (0.2215) data time 0.0008 (0.0072) model time 0.2177 (0.2136) loss 4.9151 (4.0389) grad_norm 1.0872 (1.3165) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][140/625] eta 0:01:47 lr 0.002000 wd 0.0500 time 0.2173 (0.2212) data time 0.0007 (0.0067) model time 0.2166 (0.2138) loss 5.0384 (4.0213) grad_norm 1.1968 (1.3251) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][150/625] eta 0:01:45 lr 0.002000 wd 0.0500 time 0.3784 (0.2219) data time 0.0008 (0.0064) model time 0.3776 (0.2154) loss 4.2579 (4.0041) grad_norm 1.1214 (1.3196) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][160/625] eta 0:01:43 lr 0.002000 wd 0.0500 time 0.2208 (0.2226) data time 0.0010 (0.0060) model time 0.2199 (0.2170) loss 3.8072 (3.9966) grad_norm 1.1012 (1.3077) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][170/625] eta 0:01:41 lr 0.002000 wd 0.0500 time 0.2100 (0.2222) data time 0.0007 (0.0057) model time 0.2092 (0.2168) loss 3.9694 (3.9830) grad_norm 1.0969 (1.2955) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][180/625] eta 0:01:38 lr 0.002000 wd 0.0500 time 0.2192 (0.2222) data time 0.0010 (0.0055) model time 0.2182 (0.2172) loss 3.8039 (3.9711) grad_norm 2.3023 (1.3160) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][190/625] eta 0:01:36 lr 0.002000 wd 0.0500 time 0.2062 (0.2220) data time 0.0011 (0.0053) model time 0.2051 (0.2171) loss 4.2220 (3.9788) grad_norm 0.9965 (1.3183) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][200/625] eta 0:01:34 lr 0.002000 wd 0.0500 time 0.2156 (0.2217) data time 0.0010 (0.0051) model time 0.2145 (0.2169) loss 3.7623 (3.9810) grad_norm 1.1854 (1.3113) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][210/625] eta 0:01:31 lr 0.002000 wd 0.0500 time 0.2093 (0.2214) data time 0.0010 (0.0049) model time 0.2083 (0.2168) loss 3.8367 (3.9895) grad_norm 0.8383 (1.2998) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][220/625] eta 0:01:29 lr 0.002000 wd 0.0500 time 0.2144 (0.2212) data time 0.0010 (0.0047) model time 0.2133 (0.2168) loss 4.2382 (3.9824) grad_norm 1.1491 (1.2962) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][230/625] eta 0:01:27 lr 0.002000 wd 0.0500 time 0.2133 (0.2209) data time 0.0010 (0.0046) model time 0.2123 (0.2166) loss 4.5949 (3.9901) grad_norm 1.0077 (1.3006) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][240/625] eta 0:01:24 lr 0.002000 wd 0.0500 time 0.2271 (0.2207) data time 0.0008 (0.0044) model time 0.2263 (0.2164) loss 3.8847 (3.9815) grad_norm 1.2246 (1.3083) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][250/625] eta 0:01:22 lr 0.002000 wd 0.0500 time 0.2111 (0.2204) data time 0.0011 (0.0043) model time 0.2100 (0.2163) loss 4.4599 (3.9683) grad_norm 1.8671 (1.3210) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][260/625] eta 0:01:20 lr 0.002000 wd 0.0500 time 0.2100 (0.2202) data time 0.0007 (0.0042) model time 0.2093 (0.2162) loss 3.0083 (3.9693) grad_norm 1.8561 (1.3260) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][270/625] eta 0:01:18 lr 0.002000 wd 0.0500 time 0.2210 (0.2202) data time 0.0008 (0.0041) model time 0.2202 (0.2162) loss 4.7866 (3.9747) grad_norm 1.5427 (1.3223) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][280/625] eta 0:01:15 lr 0.002000 wd 0.0500 time 0.2301 (0.2200) data time 0.0007 (0.0040) model time 0.2294 (0.2162) loss 4.6979 (3.9830) grad_norm 0.8596 (1.3122) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][290/625] eta 0:01:13 lr 0.002000 wd 0.0500 time 0.2116 (0.2199) data time 0.0009 (0.0039) model time 0.2107 (0.2161) loss 4.1949 (3.9862) grad_norm 1.4892 (1.3177) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][300/625] eta 0:01:11 lr 0.002000 wd 0.0500 time 0.2097 (0.2197) data time 0.0011 (0.0038) model time 0.2086 (0.2160) loss 4.1041 (3.9978) grad_norm 1.1358 (1.3192) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][310/625] eta 0:01:09 lr 0.002000 wd 0.0500 time 0.2169 (0.2196) data time 0.0011 (0.0037) model time 0.2159 (0.2159) loss 4.3489 (3.9987) grad_norm 1.0291 (1.3198) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][320/625] eta 0:01:06 lr 0.002000 wd 0.0500 time 0.2190 (0.2195) data time 0.0010 (0.0036) model time 0.2180 (0.2159) loss 4.3107 (4.0005) grad_norm 1.0765 (1.3267) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][330/625] eta 0:01:04 lr 0.002000 wd 0.0500 time 0.2222 (0.2194) data time 0.0009 (0.0035) model time 0.2213 (0.2159) loss 3.9366 (4.0059) grad_norm 0.8365 (1.3198) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][340/625] eta 0:01:02 lr 0.002000 wd 0.0500 time 0.2112 (0.2194) data time 0.0010 (0.0035) model time 0.2102 (0.2160) loss 4.5915 (4.0206) grad_norm 1.2082 (1.3127) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][350/625] eta 0:01:00 lr 0.002000 wd 0.0500 time 0.2179 (0.2200) data time 0.0010 (0.0034) model time 0.2169 (0.2167) loss 4.0895 (4.0183) grad_norm 0.9944 (1.3144) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][360/625] eta 0:00:58 lr 0.002000 wd 0.0500 time 0.2142 (0.2198) data time 0.0009 (0.0033) model time 0.2134 (0.2166) loss 3.1469 (4.0173) grad_norm 1.0049 (1.3134) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][370/625] eta 0:00:56 lr 0.002000 wd 0.0500 time 0.2161 (0.2197) data time 0.0010 (0.0033) model time 0.2151 (0.2165) loss 4.1648 (4.0154) grad_norm 0.9997 (1.3105) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][380/625] eta 0:00:53 lr 0.002000 wd 0.0500 time 0.2132 (0.2195) data time 0.0011 (0.0032) model time 0.2121 (0.2164) loss 4.6296 (4.0199) grad_norm 1.4052 (1.3160) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][390/625] eta 0:00:51 lr 0.002000 wd 0.0500 time 0.2082 (0.2197) data time 0.0009 (0.0032) model time 0.2074 (0.2166) loss 3.4480 (4.0099) grad_norm 1.0512 (1.3107) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][400/625] eta 0:00:49 lr 0.002000 wd 0.0500 time 0.2168 (0.2196) data time 0.0011 (0.0031) model time 0.2157 (0.2166) loss 4.0851 (4.0125) grad_norm 2.2093 (1.3159) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][410/625] eta 0:00:47 lr 0.002000 wd 0.0500 time 0.2130 (0.2195) data time 0.0008 (0.0031) model time 0.2122 (0.2165) loss 3.7518 (4.0186) grad_norm 1.3653 (1.3163) loss_scale 16384.0000 (8351.4550) mem 8975MB [2024-07-29 13:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][420/625] eta 0:00:44 lr 0.002000 wd 0.0500 time 0.2096 (0.2193) data time 0.0009 (0.0030) model time 0.2087 (0.2164) loss 2.8393 (4.0145) grad_norm 1.4407 (1.3127) loss_scale 16384.0000 (8542.2518) mem 8975MB [2024-07-29 13:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][430/625] eta 0:00:42 lr 0.002000 wd 0.0500 time 0.2187 (0.2193) data time 0.0008 (0.0030) model time 0.2179 (0.2164) loss 3.3859 (4.0185) grad_norm 2.3927 (1.3141) loss_scale 16384.0000 (8724.1949) mem 8975MB [2024-07-29 13:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][440/625] eta 0:00:40 lr 0.002000 wd 0.0500 time 0.2132 (0.2192) data time 0.0010 (0.0029) model time 0.2122 (0.2163) loss 4.6664 (4.0221) grad_norm 1.5280 (1.3154) loss_scale 16384.0000 (8897.8866) mem 8975MB [2024-07-29 13:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][450/625] eta 0:00:38 lr 0.002000 wd 0.0500 time 0.2160 (0.2191) data time 0.0008 (0.0029) model time 0.2152 (0.2163) loss 4.6831 (4.0192) grad_norm 1.1878 (1.3126) loss_scale 16384.0000 (9063.8758) mem 8975MB [2024-07-29 13:13:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][460/625] eta 0:00:36 lr 0.002000 wd 0.0500 time 0.2129 (0.2190) data time 0.0009 (0.0028) model time 0.2120 (0.2162) loss 3.9076 (4.0196) grad_norm 0.9747 (1.3134) loss_scale 16384.0000 (9222.6638) mem 8975MB [2024-07-29 13:13:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][470/625] eta 0:00:33 lr 0.002000 wd 0.0500 time 0.2160 (0.2189) data time 0.0008 (0.0028) model time 0.2153 (0.2162) loss 4.4219 (4.0158) grad_norm 1.9800 (1.3141) loss_scale 16384.0000 (9374.7091) mem 8975MB [2024-07-29 13:13:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][480/625] eta 0:00:31 lr 0.002000 wd 0.0500 time 0.2177 (0.2189) data time 0.0010 (0.0028) model time 0.2167 (0.2162) loss 4.4668 (4.0093) grad_norm 0.9372 (1.3146) loss_scale 16384.0000 (9520.4324) mem 8975MB [2024-07-29 13:13:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][490/625] eta 0:00:29 lr 0.002000 wd 0.0500 time 0.2167 (0.2189) data time 0.0008 (0.0027) model time 0.2159 (0.2162) loss 3.3138 (4.0036) grad_norm 1.0230 (1.3080) loss_scale 16384.0000 (9660.2200) mem 8975MB [2024-07-29 13:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][500/625] eta 0:00:27 lr 0.002000 wd 0.0500 time 0.2184 (0.2188) data time 0.0008 (0.0027) model time 0.2176 (0.2162) loss 4.1704 (3.9991) grad_norm 1.2112 (1.3163) loss_scale 16384.0000 (9794.4271) mem 8975MB [2024-07-29 13:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][510/625] eta 0:00:25 lr 0.002000 wd 0.0500 time 0.2121 (0.2187) data time 0.0011 (0.0027) model time 0.2110 (0.2161) loss 4.4932 (3.9941) grad_norm 1.1710 (1.3136) loss_scale 16384.0000 (9923.3816) mem 8975MB [2024-07-29 13:13:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][520/625] eta 0:00:22 lr 0.002000 wd 0.0500 time 0.2141 (0.2187) data time 0.0011 (0.0026) model time 0.2131 (0.2160) loss 4.3137 (3.9930) grad_norm 0.9364 (1.3113) loss_scale 16384.0000 (10047.3858) mem 8975MB [2024-07-29 13:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][530/625] eta 0:00:20 lr 0.002000 wd 0.0500 time 0.2214 (0.2186) data time 0.0012 (0.0026) model time 0.2202 (0.2160) loss 3.6216 (3.9906) grad_norm 1.2960 (1.3129) loss_scale 16384.0000 (10166.7194) mem 8975MB [2024-07-29 13:13:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][540/625] eta 0:00:18 lr 0.002000 wd 0.0500 time 0.2147 (0.2185) data time 0.0009 (0.0026) model time 0.2138 (0.2159) loss 3.5999 (3.9923) grad_norm 0.8645 (1.3132) loss_scale 16384.0000 (10281.6414) mem 8975MB [2024-07-29 13:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][550/625] eta 0:00:16 lr 0.002000 wd 0.0500 time 0.2084 (0.2184) data time 0.0009 (0.0026) model time 0.2075 (0.2159) loss 3.0391 (3.9917) grad_norm 2.1128 (1.3123) loss_scale 16384.0000 (10392.3920) mem 8975MB [2024-07-29 13:13:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][560/625] eta 0:00:14 lr 0.002000 wd 0.0500 time 0.2151 (0.2184) data time 0.0013 (0.0026) model time 0.2138 (0.2159) loss 4.3117 (3.9937) grad_norm 1.3032 (1.3119) loss_scale 16384.0000 (10499.1943) mem 8975MB [2024-07-29 13:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][570/625] eta 0:00:12 lr 0.002000 wd 0.0500 time 0.2155 (0.2184) data time 0.0009 (0.0025) model time 0.2145 (0.2159) loss 3.7506 (3.9943) grad_norm 0.8305 (1.3093) loss_scale 16384.0000 (10602.2557) mem 8975MB [2024-07-29 13:14:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][580/625] eta 0:00:09 lr 0.002000 wd 0.0500 time 0.2116 (0.2184) data time 0.0008 (0.0025) model time 0.2107 (0.2159) loss 4.3821 (3.9974) grad_norm 2.0602 (1.3080) loss_scale 16384.0000 (10701.7694) mem 8975MB [2024-07-29 13:14:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][590/625] eta 0:00:07 lr 0.002000 wd 0.0500 time 0.2163 (0.2183) data time 0.0010 (0.0025) model time 0.2153 (0.2159) loss 4.0063 (4.0001) grad_norm 1.0860 (1.3043) loss_scale 16384.0000 (10797.9154) mem 8975MB [2024-07-29 13:14:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][600/625] eta 0:00:05 lr 0.002000 wd 0.0500 time 0.2224 (0.2183) data time 0.0008 (0.0025) model time 0.2216 (0.2159) loss 4.4536 (4.0037) grad_norm 1.9289 (1.3055) loss_scale 16384.0000 (10890.8619) mem 8975MB [2024-07-29 13:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][610/625] eta 0:00:03 lr 0.002000 wd 0.0500 time 0.2134 (0.2183) data time 0.0007 (0.0024) model time 0.2127 (0.2158) loss 3.4018 (4.0009) grad_norm 2.1605 (1.3133) loss_scale 16384.0000 (10980.7660) mem 8975MB [2024-07-29 13:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][620/625] eta 0:00:01 lr 0.002000 wd 0.0500 time 0.2167 (0.2182) data time 0.0005 (0.0024) model time 0.2162 (0.2158) loss 4.7473 (4.0018) grad_norm 1.4688 (1.3151) loss_scale 16384.0000 (11067.7746) mem 8975MB [2024-07-29 13:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 20 training takes 0:02:16 [2024-07-29 13:14:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:14:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:14:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 1.0371 (1.0371) Acc@1 78.711 (78.711) Acc@5 94.922 (94.922) Mem 8975MB [2024-07-29 13:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.098) Loss 1.7148 (1.2450) Acc@1 61.719 (73.184) Acc@5 86.182 (92.987) Mem 8975MB [2024-07-29 13:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.9629 (1.4983) Acc@1 57.617 (67.911) Acc@5 82.666 (89.381) Mem 8975MB [2024-07-29 13:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 67.676 Acc@5 89.283 [2024-07-29 13:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 67.7% [2024-07-29 13:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 67.68% [2024-07-29 13:14:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:14:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:14:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 6.6133 (6.6133) Acc@1 2.197 (2.197) Acc@5 4.102 (4.102) Mem 8975MB [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.095) Loss 6.8672 (7.0018) Acc@1 0.049 (0.253) Acc@5 1.270 (0.892) Mem 8975MB [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 6.9180 (6.9594) Acc@1 0.049 (0.212) Acc@5 1.611 (0.942) Mem 8975MB [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.238 Acc@5 0.998 [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.24% [2024-07-29 13:14:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:14:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:14:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][0/625] eta 0:06:43 lr 0.002000 wd 0.0500 time 0.6458 (0.6458) data time 0.4373 (0.4373) model time 0.0000 (0.0000) loss 4.2541 (4.2541) grad_norm 1.5808 (1.5808) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][10/625] eta 0:02:36 lr 0.002000 wd 0.0500 time 0.2262 (0.2552) data time 0.0007 (0.0407) model time 0.0000 (0.0000) loss 3.7403 (4.0196) grad_norm 1.5178 (1.1889) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][20/625] eta 0:02:22 lr 0.002000 wd 0.0500 time 0.2127 (0.2357) data time 0.0010 (0.0219) model time 0.0000 (0.0000) loss 3.7062 (3.8761) grad_norm 1.4288 (1.2315) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][30/625] eta 0:02:17 lr 0.002000 wd 0.0500 time 0.2132 (0.2310) data time 0.0010 (0.0152) model time 0.0000 (0.0000) loss 4.5752 (4.0054) grad_norm 1.0731 (1.2811) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][40/625] eta 0:02:12 lr 0.002000 wd 0.0500 time 0.2129 (0.2273) data time 0.0010 (0.0117) model time 0.0000 (0.0000) loss 3.8355 (3.9564) grad_norm 1.0048 (1.2600) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][50/625] eta 0:02:09 lr 0.002000 wd 0.0500 time 0.2147 (0.2258) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 4.6914 (3.9802) grad_norm 1.5366 (1.2550) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][60/625] eta 0:02:06 lr 0.002000 wd 0.0500 time 0.2088 (0.2239) data time 0.0012 (0.0083) model time 0.2076 (0.2134) loss 3.9710 (3.9660) grad_norm 1.1485 (1.2577) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][70/625] eta 0:02:03 lr 0.002000 wd 0.0500 time 0.2086 (0.2231) data time 0.0010 (0.0073) model time 0.2075 (0.2150) loss 2.3587 (3.9407) grad_norm 1.2313 (1.2552) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][80/625] eta 0:02:02 lr 0.002000 wd 0.0500 time 0.2203 (0.2245) data time 0.0011 (0.0073) model time 0.2193 (0.2191) loss 4.2311 (3.9164) grad_norm 1.4157 (1.2739) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][90/625] eta 0:01:59 lr 0.002000 wd 0.0500 time 0.2220 (0.2239) data time 0.0010 (0.0067) model time 0.2210 (0.2186) loss 4.4241 (3.8892) grad_norm 1.1424 (1.2871) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][100/625] eta 0:01:57 lr 0.002000 wd 0.0500 time 0.2177 (0.2235) data time 0.0010 (0.0061) model time 0.2167 (0.2187) loss 4.1168 (3.8757) grad_norm 1.0851 (1.2847) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][110/625] eta 0:01:54 lr 0.002000 wd 0.0500 time 0.2188 (0.2227) data time 0.0007 (0.0057) model time 0.2181 (0.2179) loss 4.5024 (3.9000) grad_norm 1.5842 (1.3081) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][120/625] eta 0:01:52 lr 0.002000 wd 0.0500 time 0.2299 (0.2222) data time 0.0012 (0.0053) model time 0.2287 (0.2175) loss 4.2848 (3.9126) grad_norm 1.1373 (1.2969) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][130/625] eta 0:01:49 lr 0.002000 wd 0.0500 time 0.2095 (0.2217) data time 0.0010 (0.0050) model time 0.2084 (0.2171) loss 4.1769 (3.9465) grad_norm 0.9615 (1.2839) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][140/625] eta 0:01:47 lr 0.002000 wd 0.0500 time 0.2099 (0.2213) data time 0.0008 (0.0047) model time 0.2091 (0.2168) loss 3.3618 (3.9507) grad_norm 1.2940 (1.2777) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:14:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:14:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:16:58 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:17:09 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:17:10 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:17:10 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:17:10 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 21) [2024-07-29 13:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][150/625] eta 0:06:47 lr 0.002000 wd 0.0500 time 0.2072 (0.8579) data time 0.0009 (0.0707) model time 0.2062 (0.7871) loss 4.3946 (4.4425) grad_norm 1.2071 (1.1410) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][160/625] eta 0:04:06 lr 0.002000 wd 0.0500 time 0.1999 (0.5293) data time 0.0008 (0.0359) model time 0.1992 (0.4934) loss 4.4558 (4.2433) grad_norm 1.5426 (1.2065) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][170/625] eta 0:03:10 lr 0.002000 wd 0.0500 time 0.2020 (0.4191) data time 0.0008 (0.0242) model time 0.2012 (0.3948) loss 4.1831 (4.2255) grad_norm 1.6737 (1.2973) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][180/625] eta 0:02:41 lr 0.002000 wd 0.0500 time 0.1985 (0.3637) data time 0.0007 (0.0184) model time 0.1978 (0.3453) loss 3.6228 (4.1480) grad_norm 0.8923 (1.3111) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][190/625] eta 0:02:23 lr 0.002000 wd 0.0500 time 0.2018 (0.3308) data time 0.0010 (0.0150) model time 0.2009 (0.3158) loss 3.8966 (4.1208) grad_norm 0.9280 (1.2677) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][200/625] eta 0:02:11 lr 0.002000 wd 0.0500 time 0.1996 (0.3089) data time 0.0007 (0.0126) model time 0.1989 (0.2963) loss 4.1983 (4.0813) grad_norm 0.8886 (1.2333) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][210/625] eta 0:02:01 lr 0.002000 wd 0.0500 time 0.2042 (0.2935) data time 0.0006 (0.0110) model time 0.2035 (0.2826) loss 3.3219 (4.0530) grad_norm 1.0360 (1.2286) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][220/625] eta 0:01:54 lr 0.002000 wd 0.0500 time 0.2023 (0.2817) data time 0.0008 (0.0097) model time 0.2015 (0.2720) loss 4.4257 (4.0483) grad_norm 0.7588 (1.2584) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][230/625] eta 0:01:47 lr 0.002000 wd 0.0500 time 0.1993 (0.2727) data time 0.0007 (0.0087) model time 0.1985 (0.2640) loss 4.3668 (4.0272) grad_norm 1.1654 (1.2685) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][240/625] eta 0:01:42 lr 0.002000 wd 0.0500 time 0.2026 (0.2654) data time 0.0011 (0.0079) model time 0.2015 (0.2575) loss 4.0334 (4.0261) grad_norm 1.4306 (1.2981) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][250/625] eta 0:01:37 lr 0.002000 wd 0.0500 time 0.2052 (0.2594) data time 0.0008 (0.0073) model time 0.2043 (0.2521) loss 3.6856 (4.0339) grad_norm 1.4889 (1.3035) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][260/625] eta 0:01:32 lr 0.002000 wd 0.0500 time 0.1987 (0.2545) data time 0.0007 (0.0068) model time 0.1980 (0.2477) loss 4.6366 (4.0407) grad_norm 1.2814 (1.3198) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][270/625] eta 0:01:28 lr 0.002000 wd 0.0500 time 0.1982 (0.2504) data time 0.0006 (0.0063) model time 0.1976 (0.2441) loss 3.9307 (4.0101) grad_norm 1.4543 (1.3082) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][280/625] eta 0:01:25 lr 0.002000 wd 0.0500 time 0.2003 (0.2468) data time 0.0008 (0.0059) model time 0.1995 (0.2408) loss 2.5248 (3.9962) grad_norm 1.6708 (1.3129) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][290/625] eta 0:01:21 lr 0.002000 wd 0.0500 time 0.2089 (0.2437) data time 0.0008 (0.0056) model time 0.2081 (0.2381) loss 4.5377 (3.9936) grad_norm 1.0086 (1.3053) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][300/625] eta 0:01:18 lr 0.002000 wd 0.0500 time 0.1968 (0.2410) data time 0.0009 (0.0053) model time 0.1958 (0.2356) loss 3.9314 (3.9884) grad_norm 2.9902 (1.3406) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][310/625] eta 0:01:15 lr 0.002000 wd 0.0500 time 0.1990 (0.2387) data time 0.0006 (0.0051) model time 0.1984 (0.2336) loss 2.9172 (3.9860) grad_norm 1.1407 (1.3350) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][320/625] eta 0:01:12 lr 0.002000 wd 0.0500 time 0.2033 (0.2367) data time 0.0007 (0.0048) model time 0.2026 (0.2318) loss 3.0716 (3.9657) grad_norm 2.0776 (1.3427) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][330/625] eta 0:01:09 lr 0.002000 wd 0.0500 time 0.2040 (0.2348) data time 0.0006 (0.0046) model time 0.2034 (0.2301) loss 3.3307 (3.9624) grad_norm 0.8551 (1.3479) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][340/625] eta 0:01:06 lr 0.002000 wd 0.0500 time 0.2001 (0.2330) data time 0.0010 (0.0045) model time 0.1991 (0.2285) loss 4.4254 (3.9489) grad_norm 1.0588 (1.3467) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][350/625] eta 0:01:03 lr 0.002000 wd 0.0500 time 0.2128 (0.2316) data time 0.0007 (0.0043) model time 0.2121 (0.2273) loss 3.9772 (3.9384) grad_norm 1.2745 (1.3360) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][360/625] eta 0:01:01 lr 0.002000 wd 0.0500 time 0.2002 (0.2302) data time 0.0009 (0.0041) model time 0.1993 (0.2260) loss 3.9607 (3.9316) grad_norm 1.2278 (1.3247) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][370/625] eta 0:00:58 lr 0.002000 wd 0.0500 time 0.1999 (0.2289) data time 0.0009 (0.0040) model time 0.1990 (0.2249) loss 4.3778 (3.9325) grad_norm 1.2649 (1.3265) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][380/625] eta 0:00:55 lr 0.002000 wd 0.0500 time 0.2012 (0.2277) data time 0.0009 (0.0039) model time 0.2003 (0.2238) loss 4.4080 (3.9304) grad_norm 1.3783 (1.3215) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][390/625] eta 0:00:53 lr 0.002000 wd 0.0500 time 0.1995 (0.2267) data time 0.0008 (0.0038) model time 0.1988 (0.2229) loss 2.9133 (3.9160) grad_norm 1.2996 (1.3235) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][400/625] eta 0:00:50 lr 0.002000 wd 0.0500 time 0.1974 (0.2257) data time 0.0013 (0.0037) model time 0.1961 (0.2221) loss 3.3145 (3.9098) grad_norm 1.4161 (1.3206) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][410/625] eta 0:00:48 lr 0.002000 wd 0.0500 time 0.1994 (0.2248) data time 0.0008 (0.0036) model time 0.1986 (0.2212) loss 4.9902 (3.9030) grad_norm 2.3158 (1.3281) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][420/625] eta 0:00:45 lr 0.002000 wd 0.0500 time 0.2125 (0.2240) data time 0.0008 (0.0035) model time 0.2117 (0.2205) loss 4.1340 (3.9105) grad_norm 0.7490 (1.3271) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][430/625] eta 0:00:43 lr 0.002000 wd 0.0500 time 0.2008 (0.2232) data time 0.0010 (0.0034) model time 0.1999 (0.2198) loss 3.4154 (3.9117) grad_norm 1.4905 (1.3241) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][440/625] eta 0:00:41 lr 0.002000 wd 0.0500 time 0.2038 (0.2225) data time 0.0006 (0.0033) model time 0.2032 (0.2192) loss 3.5202 (3.8954) grad_norm 1.3880 (1.3229) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][450/625] eta 0:00:38 lr 0.002000 wd 0.0500 time 0.1991 (0.2218) data time 0.0010 (0.0032) model time 0.1981 (0.2185) loss 3.9031 (3.8909) grad_norm 1.8123 (1.3259) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][460/625] eta 0:00:36 lr 0.002000 wd 0.0500 time 0.2001 (0.2212) data time 0.0009 (0.0032) model time 0.1992 (0.2180) loss 3.6520 (3.9060) grad_norm 1.4369 (1.3285) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][470/625] eta 0:00:34 lr 0.002000 wd 0.0500 time 0.1995 (0.2205) data time 0.0007 (0.0031) model time 0.1987 (0.2174) loss 4.2536 (3.9124) grad_norm 1.0534 (1.3289) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][480/625] eta 0:00:31 lr 0.002000 wd 0.0500 time 0.2051 (0.2200) data time 0.0008 (0.0030) model time 0.2042 (0.2170) loss 3.9305 (3.9155) grad_norm 0.9193 (1.3239) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][490/625] eta 0:00:29 lr 0.002000 wd 0.0500 time 0.2007 (0.2195) data time 0.0009 (0.0030) model time 0.1999 (0.2165) loss 3.3482 (3.9174) grad_norm 1.4560 (1.3215) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][500/625] eta 0:00:27 lr 0.002000 wd 0.0500 time 0.2020 (0.2190) data time 0.0007 (0.0029) model time 0.2013 (0.2161) loss 5.0083 (3.9185) grad_norm 1.1791 (1.3256) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][510/625] eta 0:00:25 lr 0.002000 wd 0.0500 time 0.2012 (0.2185) data time 0.0010 (0.0029) model time 0.2003 (0.2157) loss 4.2411 (3.9164) grad_norm 0.8924 (1.3228) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][520/625] eta 0:00:22 lr 0.002000 wd 0.0500 time 0.1993 (0.2181) data time 0.0008 (0.0028) model time 0.1985 (0.2153) loss 4.2391 (3.9153) grad_norm 1.5436 (1.3294) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][530/625] eta 0:00:20 lr 0.002000 wd 0.0500 time 0.2043 (0.2182) data time 0.0008 (0.0028) model time 0.2035 (0.2154) loss 2.8318 (3.9088) grad_norm 1.6131 (1.3298) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][540/625] eta 0:00:18 lr 0.002000 wd 0.0500 time 0.1996 (0.2178) data time 0.0009 (0.0027) model time 0.1987 (0.2151) loss 4.0881 (3.9160) grad_norm 0.8662 (1.3267) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][550/625] eta 0:00:16 lr 0.002000 wd 0.0500 time 0.1998 (0.2174) data time 0.0007 (0.0027) model time 0.1990 (0.2147) loss 4.3320 (3.9244) grad_norm 1.4436 (1.3249) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][560/625] eta 0:00:14 lr 0.002000 wd 0.0500 time 0.1972 (0.2177) data time 0.0008 (0.0026) model time 0.1964 (0.2150) loss 3.9887 (3.9239) grad_norm 1.4784 (1.3262) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][570/625] eta 0:00:11 lr 0.002000 wd 0.0500 time 0.2018 (0.2173) data time 0.0007 (0.0026) model time 0.2010 (0.2147) loss 4.2694 (3.9301) grad_norm 1.6621 (1.3251) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][580/625] eta 0:00:09 lr 0.002000 wd 0.0500 time 0.2046 (0.2169) data time 0.0008 (0.0026) model time 0.2038 (0.2144) loss 3.4338 (3.9345) grad_norm 1.2223 (1.3263) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][590/625] eta 0:00:07 lr 0.002000 wd 0.0500 time 0.2056 (0.2166) data time 0.0006 (0.0025) model time 0.2050 (0.2141) loss 4.1608 (3.9367) grad_norm 1.4524 (1.3239) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][600/625] eta 0:00:05 lr 0.002000 wd 0.0500 time 0.1992 (0.2163) data time 0.0008 (0.0025) model time 0.1985 (0.2138) loss 4.3473 (3.9306) grad_norm 1.1556 (1.3194) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][610/625] eta 0:00:03 lr 0.002000 wd 0.0500 time 0.1993 (0.2160) data time 0.0006 (0.0025) model time 0.1987 (0.2135) loss 3.9574 (3.9221) grad_norm 1.6979 (inf) loss_scale 8192.0000 (16296.8511) mem 8978MB [2024-07-29 13:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][620/625] eta 0:00:01 lr 0.002000 wd 0.0500 time 0.2019 (0.2157) data time 0.0005 (0.0024) model time 0.2014 (0.2133) loss 2.9398 (3.9183) grad_norm 1.3662 (inf) loss_scale 8192.0000 (16128.0000) mem 8978MB [2024-07-29 13:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 21 training takes 0:01:44 [2024-07-29 13:18:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:19:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.392 (0.392) Loss 1.0068 (1.0068) Acc@1 80.078 (80.078) Acc@5 95.215 (95.215) Mem 8978MB [2024-07-29 13:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.6934 (1.2533) Acc@1 63.135 (73.779) Acc@5 87.646 (93.244) Mem 8978MB [2024-07-29 13:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.9863 (1.5081) Acc@1 58.789 (68.522) Acc@5 83.252 (89.453) Mem 8978MB [2024-07-29 13:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 68.368 Acc@5 89.409 [2024-07-29 13:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 68.4% [2024-07-29 13:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 68.37% [2024-07-29 13:19:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:19:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:19:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.390 (0.390) Loss 6.5898 (6.5898) Acc@1 2.100 (2.100) Acc@5 4.297 (4.297) Mem 8978MB [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 6.7969 (6.9847) Acc@1 0.635 (0.373) Acc@5 2.295 (1.141) Mem 8978MB [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 6.8672 (6.9222) Acc@1 0.244 (0.328) Acc@5 2.539 (1.290) Mem 8978MB [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.368 Acc@5 1.410 [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.4% [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.37% [2024-07-29 13:19:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:19:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:19:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][0/625] eta 0:06:39 lr 0.002000 wd 0.0500 time 0.6390 (0.6390) data time 0.3743 (0.3743) model time 0.0000 (0.0000) loss 5.1617 (5.1617) grad_norm 1.5539 (1.5539) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 13:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][10/625] eta 0:02:28 lr 0.002000 wd 0.0500 time 0.1993 (0.2410) data time 0.0007 (0.0350) model time 0.0000 (0.0000) loss 3.7026 (4.0366) grad_norm 0.8782 (1.2114) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][20/625] eta 0:02:14 lr 0.002000 wd 0.0500 time 0.1981 (0.2221) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 4.0621 (3.9314) grad_norm 1.2864 (1.1816) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][30/625] eta 0:02:08 lr 0.002000 wd 0.0500 time 0.2007 (0.2153) data time 0.0009 (0.0130) model time 0.0000 (0.0000) loss 3.7702 (4.0680) grad_norm 1.0486 (1.1835) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][40/625] eta 0:02:03 lr 0.002000 wd 0.0500 time 0.2005 (0.2119) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 4.9127 (3.9502) grad_norm 1.3644 (1.1911) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][50/625] eta 0:02:00 lr 0.002000 wd 0.0500 time 0.2013 (0.2098) data time 0.0006 (0.0083) model time 0.0000 (0.0000) loss 3.8856 (3.9088) grad_norm 1.5663 (1.2229) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][60/625] eta 0:01:57 lr 0.002000 wd 0.0500 time 0.1983 (0.2083) data time 0.0008 (0.0071) model time 0.1975 (0.1999) loss 3.2464 (3.9010) grad_norm 0.9095 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][70/625] eta 0:01:55 lr 0.002000 wd 0.0500 time 0.1985 (0.2073) data time 0.0010 (0.0062) model time 0.1975 (0.1999) loss 4.1197 (3.9441) grad_norm 1.2744 (1.2497) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][80/625] eta 0:01:52 lr 0.002000 wd 0.0500 time 0.1975 (0.2064) data time 0.0011 (0.0056) model time 0.1963 (0.1997) loss 3.6907 (3.9537) grad_norm 1.2765 (1.2592) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][90/625] eta 0:01:50 lr 0.002000 wd 0.0500 time 0.1979 (0.2059) data time 0.0010 (0.0051) model time 0.1969 (0.1999) loss 4.5606 (3.9594) grad_norm 1.5939 (1.2811) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][100/625] eta 0:01:48 lr 0.002000 wd 0.0500 time 0.1992 (0.2060) data time 0.0009 (0.0047) model time 0.1983 (0.2012) loss 3.6539 (3.9647) grad_norm 0.9965 (1.2643) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][110/625] eta 0:01:45 lr 0.002000 wd 0.0500 time 0.1982 (0.2057) data time 0.0007 (0.0043) model time 0.1975 (0.2012) loss 4.4227 (3.9651) grad_norm 0.8719 (1.2596) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][120/625] eta 0:01:44 lr 0.002000 wd 0.0500 time 0.2153 (0.2077) data time 0.0007 (0.0041) model time 0.2146 (0.2052) loss 4.3136 (3.9506) grad_norm 1.4474 (1.2729) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][130/625] eta 0:01:42 lr 0.002000 wd 0.0500 time 0.2043 (0.2074) data time 0.0007 (0.0038) model time 0.2035 (0.2048) loss 4.4480 (3.9527) grad_norm 0.8990 (1.2648) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][140/625] eta 0:01:40 lr 0.002000 wd 0.0500 time 0.2011 (0.2070) data time 0.0010 (0.0036) model time 0.2001 (0.2044) loss 4.3888 (3.9738) grad_norm 1.0894 (1.2648) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][150/625] eta 0:01:38 lr 0.002000 wd 0.0500 time 0.2006 (0.2065) data time 0.0010 (0.0035) model time 0.1996 (0.2038) loss 3.6048 (3.9748) grad_norm 0.7749 (1.2622) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][160/625] eta 0:01:35 lr 0.002000 wd 0.0500 time 0.1981 (0.2061) data time 0.0007 (0.0033) model time 0.1974 (0.2034) loss 3.2415 (3.9601) grad_norm 1.0891 (1.2832) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][170/625] eta 0:01:33 lr 0.002000 wd 0.0500 time 0.2024 (0.2058) data time 0.0007 (0.0032) model time 0.2017 (0.2032) loss 3.6301 (3.9617) grad_norm 1.2410 (1.2824) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][180/625] eta 0:01:31 lr 0.002000 wd 0.0500 time 0.2016 (0.2056) data time 0.0007 (0.0030) model time 0.2009 (0.2030) loss 4.2520 (3.9436) grad_norm 1.1493 (1.2772) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][190/625] eta 0:01:29 lr 0.002000 wd 0.0500 time 0.2017 (0.2055) data time 0.0006 (0.0029) model time 0.2011 (0.2029) loss 4.2164 (3.9616) grad_norm 1.1848 (1.2728) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][200/625] eta 0:01:27 lr 0.002000 wd 0.0500 time 0.2033 (0.2053) data time 0.0008 (0.0028) model time 0.2025 (0.2027) loss 4.1238 (3.9622) grad_norm 0.8076 (1.2722) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][210/625] eta 0:01:25 lr 0.002000 wd 0.0500 time 0.2035 (0.2051) data time 0.0010 (0.0028) model time 0.2025 (0.2026) loss 3.7375 (3.9543) grad_norm 1.1348 (1.2661) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][220/625] eta 0:01:22 lr 0.002000 wd 0.0500 time 0.2065 (0.2049) data time 0.0006 (0.0027) model time 0.2059 (0.2024) loss 4.2618 (3.9447) grad_norm 0.9062 (1.2613) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][230/625] eta 0:01:20 lr 0.002000 wd 0.0500 time 0.1991 (0.2047) data time 0.0009 (0.0026) model time 0.1982 (0.2023) loss 4.4711 (3.9253) grad_norm 1.2014 (1.2668) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][240/625] eta 0:01:18 lr 0.002000 wd 0.0500 time 0.2089 (0.2046) data time 0.0008 (0.0025) model time 0.2081 (0.2022) loss 3.8172 (3.9209) grad_norm 1.9802 (1.2659) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][250/625] eta 0:01:16 lr 0.002000 wd 0.0500 time 0.1992 (0.2045) data time 0.0007 (0.0025) model time 0.1985 (0.2021) loss 4.1461 (3.9267) grad_norm 0.9021 (1.2589) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][260/625] eta 0:01:14 lr 0.002000 wd 0.0500 time 0.1990 (0.2044) data time 0.0009 (0.0024) model time 0.1981 (0.2021) loss 4.2952 (3.9307) grad_norm 1.5867 (1.2561) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][270/625] eta 0:01:12 lr 0.002000 wd 0.0500 time 0.2026 (0.2043) data time 0.0008 (0.0024) model time 0.2018 (0.2020) loss 2.7009 (3.9199) grad_norm 1.0595 (1.2581) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][280/625] eta 0:01:10 lr 0.002000 wd 0.0500 time 0.2013 (0.2042) data time 0.0007 (0.0023) model time 0.2005 (0.2019) loss 4.4665 (3.9304) grad_norm 1.4078 (1.2563) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][290/625] eta 0:01:08 lr 0.002000 wd 0.0500 time 0.2024 (0.2041) data time 0.0008 (0.0023) model time 0.2016 (0.2019) loss 3.3947 (3.9318) grad_norm 1.8012 (1.2528) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][300/625] eta 0:01:06 lr 0.002000 wd 0.0500 time 0.2022 (0.2040) data time 0.0010 (0.0022) model time 0.2012 (0.2018) loss 3.1140 (3.9291) grad_norm 1.4687 (1.2508) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][310/625] eta 0:01:04 lr 0.002000 wd 0.0500 time 0.2024 (0.2039) data time 0.0006 (0.0022) model time 0.2018 (0.2017) loss 3.4352 (3.9330) grad_norm 1.6329 (1.2568) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][320/625] eta 0:01:02 lr 0.002000 wd 0.0500 time 0.2036 (0.2038) data time 0.0006 (0.0022) model time 0.2030 (0.2017) loss 4.9301 (3.9291) grad_norm 1.8875 (1.2565) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][330/625] eta 0:01:00 lr 0.002000 wd 0.0500 time 0.2009 (0.2037) data time 0.0007 (0.0021) model time 0.2002 (0.2016) loss 4.0126 (3.9149) grad_norm 1.0349 (1.2629) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][340/625] eta 0:00:58 lr 0.002000 wd 0.0500 time 0.2033 (0.2037) data time 0.0007 (0.0021) model time 0.2026 (0.2016) loss 2.5848 (3.9089) grad_norm 1.5057 (1.2656) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][350/625] eta 0:00:55 lr 0.002000 wd 0.0500 time 0.2008 (0.2036) data time 0.0009 (0.0021) model time 0.1999 (0.2016) loss 3.2530 (3.8996) grad_norm 0.8562 (1.2662) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][360/625] eta 0:00:53 lr 0.002000 wd 0.0500 time 0.2039 (0.2035) data time 0.0007 (0.0020) model time 0.2032 (0.2015) loss 2.5536 (3.8956) grad_norm 2.2591 (1.2645) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][370/625] eta 0:00:51 lr 0.002000 wd 0.0500 time 0.2010 (0.2035) data time 0.0007 (0.0020) model time 0.2003 (0.2015) loss 4.3948 (3.9006) grad_norm 1.7592 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][380/625] eta 0:00:49 lr 0.002000 wd 0.0500 time 0.1983 (0.2034) data time 0.0008 (0.0020) model time 0.1975 (0.2015) loss 3.9184 (3.8983) grad_norm 1.3248 (1.2669) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][390/625] eta 0:00:47 lr 0.002000 wd 0.0500 time 0.2008 (0.2034) data time 0.0007 (0.0019) model time 0.2002 (0.2014) loss 4.2205 (3.9054) grad_norm 1.3633 (1.2640) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][400/625] eta 0:00:45 lr 0.002000 wd 0.0500 time 0.2062 (0.2033) data time 0.0006 (0.0019) model time 0.2056 (0.2014) loss 3.4450 (3.8989) grad_norm 1.0307 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][410/625] eta 0:00:43 lr 0.002000 wd 0.0500 time 0.2014 (0.2033) data time 0.0009 (0.0019) model time 0.2005 (0.2014) loss 3.7655 (3.8986) grad_norm 1.2135 (1.2632) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][420/625] eta 0:00:41 lr 0.002000 wd 0.0500 time 0.2049 (0.2032) data time 0.0007 (0.0019) model time 0.2042 (0.2014) loss 4.2133 (3.8991) grad_norm 0.9656 (1.2601) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][430/625] eta 0:00:39 lr 0.002000 wd 0.0500 time 0.1993 (0.2032) data time 0.0007 (0.0018) model time 0.1986 (0.2014) loss 4.4989 (3.8960) grad_norm 1.2170 (1.2627) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][440/625] eta 0:00:37 lr 0.002000 wd 0.0500 time 0.1997 (0.2032) data time 0.0006 (0.0018) model time 0.1991 (0.2013) loss 4.2735 (3.9028) grad_norm 1.1545 (1.2617) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][450/625] eta 0:00:35 lr 0.002000 wd 0.0500 time 0.1994 (0.2031) data time 0.0008 (0.0018) model time 0.1986 (0.2013) loss 3.0026 (3.9061) grad_norm 1.6275 (1.2590) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][460/625] eta 0:00:33 lr 0.002000 wd 0.0500 time 0.2004 (0.2031) data time 0.0006 (0.0018) model time 0.1998 (0.2013) loss 4.1661 (3.9019) grad_norm 1.2873 (1.2553) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][470/625] eta 0:00:31 lr 0.002000 wd 0.0500 time 0.2003 (0.2031) data time 0.0009 (0.0018) model time 0.1994 (0.2013) loss 4.1873 (3.8925) grad_norm 0.9374 (1.2557) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][480/625] eta 0:00:29 lr 0.002000 wd 0.0500 time 0.2026 (0.2030) data time 0.0008 (0.0017) model time 0.2018 (0.2013) loss 4.0564 (3.8933) grad_norm 1.8330 (1.2582) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][490/625] eta 0:00:27 lr 0.002000 wd 0.0500 time 0.1993 (0.2031) data time 0.0007 (0.0017) model time 0.1985 (0.2013) loss 4.9017 (3.8998) grad_norm 1.7711 (1.2591) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][500/625] eta 0:00:25 lr 0.002000 wd 0.0500 time 0.2019 (0.2031) data time 0.0008 (0.0017) model time 0.2011 (0.2013) loss 4.3125 (3.8978) grad_norm 1.5398 (1.2571) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][510/625] eta 0:00:23 lr 0.002000 wd 0.0500 time 0.2014 (0.2030) data time 0.0007 (0.0017) model time 0.2006 (0.2013) loss 4.3603 (3.8978) grad_norm 0.9580 (1.2527) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][520/625] eta 0:00:21 lr 0.002000 wd 0.0500 time 0.2011 (0.2030) data time 0.0006 (0.0017) model time 0.2005 (0.2013) loss 2.8284 (3.8937) grad_norm 1.7843 (1.2544) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][530/625] eta 0:00:19 lr 0.001999 wd 0.0500 time 0.2027 (0.2030) data time 0.0008 (0.0017) model time 0.2020 (0.2013) loss 4.6735 (3.9020) grad_norm 1.3943 (1.2548) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][540/625] eta 0:00:17 lr 0.001999 wd 0.0500 time 0.2034 (0.2034) data time 0.0006 (0.0017) model time 0.2028 (0.2017) loss 3.1161 (3.8963) grad_norm 0.9563 (1.2618) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][550/625] eta 0:00:15 lr 0.001999 wd 0.0500 time 0.2007 (0.2034) data time 0.0007 (0.0016) model time 0.1999 (0.2017) loss 4.5937 (3.8957) grad_norm 1.2065 (1.2701) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][560/625] eta 0:00:13 lr 0.001999 wd 0.0500 time 0.2064 (0.2034) data time 0.0008 (0.0016) model time 0.2056 (0.2017) loss 3.5960 (3.9013) grad_norm 0.8450 (1.2657) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][570/625] eta 0:00:11 lr 0.001999 wd 0.0500 time 0.2037 (0.2033) data time 0.0006 (0.0016) model time 0.2031 (0.2017) loss 4.5228 (3.9030) grad_norm 1.3281 (1.2656) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][580/625] eta 0:00:09 lr 0.001999 wd 0.0500 time 0.2032 (0.2033) data time 0.0006 (0.0016) model time 0.2026 (0.2017) loss 4.4841 (3.9048) grad_norm 0.8877 (1.2699) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][590/625] eta 0:00:07 lr 0.001999 wd 0.0500 time 0.1994 (0.2033) data time 0.0008 (0.0016) model time 0.1986 (0.2017) loss 3.7153 (3.8995) grad_norm 0.9966 (1.2683) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][600/625] eta 0:00:05 lr 0.001999 wd 0.0500 time 0.2011 (0.2032) data time 0.0006 (0.0016) model time 0.2005 (0.2017) loss 2.6106 (3.8989) grad_norm 0.9297 (1.2674) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][610/625] eta 0:00:03 lr 0.001999 wd 0.0500 time 0.2035 (0.2032) data time 0.0004 (0.0016) model time 0.2031 (0.2016) loss 3.4818 (3.8943) grad_norm 1.3182 (1.2675) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][620/625] eta 0:00:01 lr 0.001999 wd 0.0500 time 0.1997 (0.2032) data time 0.0003 (0.0016) model time 0.1993 (0.2016) loss 3.5763 (3.8956) grad_norm 1.2201 (1.2674) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 22 training takes 0:02:06 [2024-07-29 13:21:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:21:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.9150 (0.9150) Acc@1 79.346 (79.346) Acc@5 95.557 (95.557) Mem 8978MB [2024-07-29 13:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.6631 (1.1454) Acc@1 62.207 (74.512) Acc@5 86.377 (93.417) Mem 8978MB [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.7900 (1.4075) Acc@1 60.449 (69.315) Acc@5 85.547 (89.844) Mem 8978MB [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.158 Acc@5 89.789 [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 69.2% [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 69.16% [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:21:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.575 (0.575) Loss 6.5508 (6.5508) Acc@1 1.855 (1.855) Acc@5 4.346 (4.346) Mem 8978MB [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 6.7109 (6.9386) Acc@1 1.221 (0.488) Acc@5 3.320 (1.456) Mem 8978MB [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.080) Loss 6.8008 (6.8508) Acc@1 0.879 (0.530) Acc@5 3.857 (1.948) Mem 8978MB [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.624 Acc@5 2.223 [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.6% [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.62% [2024-07-29 13:21:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:21:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][0/625] eta 0:07:21 lr 0.001999 wd 0.0500 time 0.7070 (0.7070) data time 0.5111 (0.5111) model time 0.0000 (0.0000) loss 4.6728 (4.6728) grad_norm 0.9357 (0.9357) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][10/625] eta 0:02:33 lr 0.001999 wd 0.0500 time 0.2008 (0.2500) data time 0.0009 (0.0473) model time 0.0000 (0.0000) loss 3.6758 (4.2984) grad_norm 0.9781 (1.1723) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][20/625] eta 0:02:17 lr 0.001999 wd 0.0500 time 0.2004 (0.2267) data time 0.0008 (0.0253) model time 0.0000 (0.0000) loss 4.0281 (4.0626) grad_norm 0.7955 (1.2348) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][30/625] eta 0:02:09 lr 0.001999 wd 0.0500 time 0.2002 (0.2182) data time 0.0009 (0.0174) model time 0.0000 (0.0000) loss 4.3675 (3.9560) grad_norm 1.3111 (1.2901) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][40/625] eta 0:02:05 lr 0.001999 wd 0.0500 time 0.2087 (0.2142) data time 0.0009 (0.0134) model time 0.0000 (0.0000) loss 3.3410 (3.8947) grad_norm 0.8656 (1.2637) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][50/625] eta 0:02:01 lr 0.001999 wd 0.0500 time 0.2017 (0.2115) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 2.5464 (3.8746) grad_norm 0.7050 (1.2360) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][60/625] eta 0:01:58 lr 0.001999 wd 0.0500 time 0.2117 (0.2099) data time 0.0009 (0.0093) model time 0.2108 (0.2007) loss 4.4102 (3.8898) grad_norm 1.3935 (1.2435) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][70/625] eta 0:01:55 lr 0.001999 wd 0.0500 time 0.2148 (0.2089) data time 0.0009 (0.0081) model time 0.2139 (0.2015) loss 4.0478 (3.9396) grad_norm 0.9588 (1.2552) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][80/625] eta 0:01:53 lr 0.001999 wd 0.0500 time 0.2063 (0.2081) data time 0.0007 (0.0072) model time 0.2056 (0.2015) loss 4.3338 (3.9629) grad_norm 1.3562 (1.2866) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][90/625] eta 0:01:50 lr 0.001999 wd 0.0500 time 0.2000 (0.2074) data time 0.0007 (0.0065) model time 0.1993 (0.2014) loss 4.8163 (3.9649) grad_norm 1.9155 (1.3099) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][100/625] eta 0:01:48 lr 0.001999 wd 0.0500 time 0.2042 (0.2069) data time 0.0009 (0.0060) model time 0.2034 (0.2014) loss 3.7231 (3.9592) grad_norm 2.2147 (1.3243) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][110/625] eta 0:01:46 lr 0.001999 wd 0.0500 time 0.2051 (0.2066) data time 0.0008 (0.0055) model time 0.2043 (0.2016) loss 3.8407 (3.9439) grad_norm 1.0175 (1.3147) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][120/625] eta 0:01:44 lr 0.001999 wd 0.0500 time 0.2042 (0.2063) data time 0.0008 (0.0051) model time 0.2034 (0.2016) loss 4.2268 (3.9251) grad_norm 1.5053 (1.3034) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][130/625] eta 0:01:41 lr 0.001999 wd 0.0500 time 0.1993 (0.2059) data time 0.0006 (0.0048) model time 0.1987 (0.2014) loss 3.9623 (3.9255) grad_norm 0.8790 (1.2919) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][140/625] eta 0:01:39 lr 0.001999 wd 0.0500 time 0.1989 (0.2055) data time 0.0010 (0.0045) model time 0.1979 (0.2012) loss 3.0000 (3.9115) grad_norm 1.2052 (1.3025) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][150/625] eta 0:01:37 lr 0.001999 wd 0.0500 time 0.2096 (0.2052) data time 0.0008 (0.0043) model time 0.2088 (0.2011) loss 4.6593 (3.9234) grad_norm 1.4261 (1.2962) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][160/625] eta 0:01:35 lr 0.001999 wd 0.0500 time 0.1992 (0.2050) data time 0.0009 (0.0041) model time 0.1983 (0.2011) loss 3.7923 (3.9363) grad_norm 1.1789 (1.2864) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][170/625] eta 0:01:33 lr 0.001999 wd 0.0500 time 0.2003 (0.2047) data time 0.0007 (0.0039) model time 0.1997 (0.2010) loss 4.6245 (3.9491) grad_norm 0.9597 (1.2716) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:21:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][180/625] eta 0:01:31 lr 0.001999 wd 0.0500 time 0.1992 (0.2045) data time 0.0009 (0.0037) model time 0.1982 (0.2009) loss 3.9081 (3.9563) grad_norm 0.9038 (1.2616) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][190/625] eta 0:01:28 lr 0.001999 wd 0.0500 time 0.1981 (0.2043) data time 0.0009 (0.0036) model time 0.1972 (0.2008) loss 4.0030 (3.9546) grad_norm 1.1920 (1.2580) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][200/625] eta 0:01:26 lr 0.001999 wd 0.0500 time 0.2025 (0.2042) data time 0.0007 (0.0035) model time 0.2019 (0.2008) loss 4.6387 (3.9493) grad_norm 1.3457 (1.2781) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][210/625] eta 0:01:24 lr 0.001999 wd 0.0500 time 0.2049 (0.2041) data time 0.0006 (0.0033) model time 0.2043 (0.2009) loss 4.3183 (3.9296) grad_norm 0.8499 (1.2761) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][220/625] eta 0:01:22 lr 0.001999 wd 0.0500 time 0.2009 (0.2040) data time 0.0009 (0.0032) model time 0.2000 (0.2009) loss 4.1020 (3.9239) grad_norm 0.8836 (1.2747) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][230/625] eta 0:01:20 lr 0.001999 wd 0.0500 time 0.2017 (0.2038) data time 0.0009 (0.0031) model time 0.2008 (0.2008) loss 4.1647 (3.9209) grad_norm 1.1530 (1.2777) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][240/625] eta 0:01:18 lr 0.001999 wd 0.0500 time 0.2005 (0.2037) data time 0.0006 (0.0030) model time 0.1999 (0.2007) loss 4.6867 (3.9222) grad_norm 1.0276 (1.2752) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][250/625] eta 0:01:16 lr 0.001999 wd 0.0500 time 0.2010 (0.2036) data time 0.0008 (0.0029) model time 0.2002 (0.2007) loss 4.4308 (3.9264) grad_norm 1.7006 (1.2783) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][260/625] eta 0:01:14 lr 0.001999 wd 0.0500 time 0.4097 (0.2043) data time 0.0009 (0.0029) model time 0.4087 (0.2017) loss 4.1802 (3.9300) grad_norm 1.4194 (1.2817) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 13:22:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:22:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:22:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:24:24 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 23) [2024-07-29 13:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][270/625] eta 0:07:29 lr 0.001999 wd 0.0500 time 0.2111 (1.2659) data time 0.0009 (0.1101) model time 0.2102 (1.1558) loss 4.3062 (4.3285) grad_norm 1.4487 (1.2398) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:24:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][280/625] eta 0:03:54 lr 0.001999 wd 0.0500 time 0.2094 (0.6805) data time 0.0007 (0.0496) model time 0.2087 (0.6309) loss 4.5355 (4.1469) grad_norm 1.2780 (1.3042) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][290/625] eta 0:02:52 lr 0.001999 wd 0.0500 time 0.2181 (0.5136) data time 0.0012 (0.0323) model time 0.2170 (0.4814) loss 4.4945 (4.1746) grad_norm 1.9195 (1.4068) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][300/625] eta 0:02:21 lr 0.001999 wd 0.0500 time 0.2084 (0.4345) data time 0.0011 (0.0241) model time 0.2073 (0.4104) loss 3.6472 (4.0754) grad_norm 0.8782 (1.3223) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][310/625] eta 0:02:02 lr 0.001999 wd 0.0500 time 0.2063 (0.3880) data time 0.0007 (0.0193) model time 0.2055 (0.3687) loss 4.2209 (4.0618) grad_norm 1.1772 (1.2592) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][320/625] eta 0:01:49 lr 0.001999 wd 0.0500 time 0.2254 (0.3583) data time 0.0008 (0.0162) model time 0.2246 (0.3422) loss 3.0000 (4.0322) grad_norm 1.2359 (1.2194) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][330/625] eta 0:01:39 lr 0.001999 wd 0.0500 time 0.2091 (0.3367) data time 0.0007 (0.0140) model time 0.2084 (0.3228) loss 2.8365 (3.9905) grad_norm 1.1640 (1.2135) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][340/625] eta 0:01:31 lr 0.001999 wd 0.0500 time 0.2157 (0.3207) data time 0.0008 (0.0123) model time 0.2149 (0.3084) loss 3.2928 (3.9554) grad_norm 0.9207 (1.1919) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][350/625] eta 0:01:24 lr 0.001999 wd 0.0500 time 0.2121 (0.3085) data time 0.0010 (0.0111) model time 0.2112 (0.2974) loss 4.3770 (3.9301) grad_norm 1.1839 (1.1799) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][360/625] eta 0:01:19 lr 0.001999 wd 0.0500 time 0.2100 (0.2987) data time 0.0009 (0.0100) model time 0.2091 (0.2887) loss 4.5729 (3.9401) grad_norm 1.6031 (1.1955) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][370/625] eta 0:01:14 lr 0.001999 wd 0.0500 time 0.2084 (0.2907) data time 0.0009 (0.0092) model time 0.2075 (0.2814) loss 2.9629 (3.9526) grad_norm 0.9912 (1.1727) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][380/625] eta 0:01:09 lr 0.001999 wd 0.0500 time 0.2096 (0.2841) data time 0.0012 (0.0085) model time 0.2084 (0.2755) loss 3.9244 (3.9512) grad_norm 0.8703 (1.1720) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][390/625] eta 0:01:05 lr 0.001999 wd 0.0500 time 0.2045 (0.2785) data time 0.0010 (0.0080) model time 0.2035 (0.2705) loss 4.3650 (3.9386) grad_norm 1.5469 (1.1924) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][400/625] eta 0:01:01 lr 0.001999 wd 0.0500 time 0.2307 (0.2737) data time 0.0009 (0.0075) model time 0.2297 (0.2662) loss 3.7902 (3.9369) grad_norm 1.8823 (1.2221) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][410/625] eta 0:00:58 lr 0.001999 wd 0.0500 time 0.2112 (0.2698) data time 0.0009 (0.0071) model time 0.2103 (0.2627) loss 4.1046 (3.9259) grad_norm 1.1017 (1.2349) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][420/625] eta 0:00:54 lr 0.001999 wd 0.0500 time 0.2090 (0.2660) data time 0.0009 (0.0068) model time 0.2081 (0.2592) loss 3.3782 (3.9202) grad_norm 1.0337 (1.2330) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][430/625] eta 0:00:51 lr 0.001999 wd 0.0500 time 0.2022 (0.2627) data time 0.0012 (0.0064) model time 0.2009 (0.2563) loss 4.4578 (3.9295) grad_norm 0.9775 (1.2270) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][440/625] eta 0:00:48 lr 0.001999 wd 0.0500 time 0.2094 (0.2599) data time 0.0008 (0.0061) model time 0.2086 (0.2538) loss 2.9179 (3.9098) grad_norm 1.3536 (1.2333) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][450/625] eta 0:00:45 lr 0.001999 wd 0.0500 time 0.2236 (0.2573) data time 0.0007 (0.0059) model time 0.2229 (0.2514) loss 4.1616 (3.9114) grad_norm 0.7538 (1.2384) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][460/625] eta 0:00:42 lr 0.001999 wd 0.0500 time 0.2170 (0.2549) data time 0.0011 (0.0056) model time 0.2159 (0.2493) loss 3.2684 (3.8987) grad_norm 1.1838 (1.2425) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][470/625] eta 0:00:39 lr 0.001999 wd 0.0500 time 0.2510 (0.2531) data time 0.0010 (0.0054) model time 0.2499 (0.2477) loss 4.0299 (3.8837) grad_norm 1.1974 (1.2406) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][480/625] eta 0:00:36 lr 0.001999 wd 0.0500 time 0.2086 (0.2511) data time 0.0010 (0.0052) model time 0.2076 (0.2459) loss 3.7294 (3.8751) grad_norm 0.9995 (1.2403) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][490/625] eta 0:00:33 lr 0.001999 wd 0.0500 time 0.2070 (0.2494) data time 0.0010 (0.0050) model time 0.2060 (0.2443) loss 4.0684 (3.8837) grad_norm 1.1443 (1.2453) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][500/625] eta 0:00:30 lr 0.001999 wd 0.0500 time 0.2091 (0.2478) data time 0.0009 (0.0049) model time 0.2082 (0.2429) loss 4.3716 (3.8779) grad_norm 1.1536 (1.2410) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][510/625] eta 0:00:28 lr 0.001999 wd 0.0500 time 0.2124 (0.2462) data time 0.0011 (0.0047) model time 0.2114 (0.2415) loss 2.9395 (3.8678) grad_norm 0.7726 (1.2456) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][520/625] eta 0:00:25 lr 0.001999 wd 0.0500 time 0.2102 (0.2448) data time 0.0011 (0.0046) model time 0.2091 (0.2403) loss 4.6049 (3.8547) grad_norm 1.3267 (1.2437) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][530/625] eta 0:00:23 lr 0.001999 wd 0.0500 time 0.2122 (0.2436) data time 0.0007 (0.0044) model time 0.2115 (0.2391) loss 4.5890 (3.8474) grad_norm 1.4369 (1.2493) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][540/625] eta 0:00:20 lr 0.001999 wd 0.0500 time 0.2014 (0.2424) data time 0.0013 (0.0043) model time 0.2001 (0.2381) loss 3.0622 (3.8482) grad_norm 1.0133 (1.2472) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][550/625] eta 0:00:18 lr 0.001999 wd 0.0500 time 0.2136 (0.2413) data time 0.0009 (0.0042) model time 0.2127 (0.2371) loss 4.6698 (3.8513) grad_norm 0.7816 (1.2453) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][560/625] eta 0:00:15 lr 0.001999 wd 0.0500 time 0.2197 (0.2406) data time 0.0010 (0.0041) model time 0.2187 (0.2365) loss 3.5425 (3.8360) grad_norm 0.9966 (1.2422) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][570/625] eta 0:00:13 lr 0.001999 wd 0.0500 time 0.2171 (0.2398) data time 0.0008 (0.0040) model time 0.2163 (0.2358) loss 3.3189 (3.8321) grad_norm 1.2234 (1.2418) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][580/625] eta 0:00:10 lr 0.001999 wd 0.0500 time 0.2053 (0.2390) data time 0.0011 (0.0039) model time 0.2042 (0.2351) loss 3.2047 (3.8405) grad_norm 0.9838 (1.2375) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][590/625] eta 0:00:08 lr 0.001999 wd 0.0500 time 0.2176 (0.2382) data time 0.0008 (0.0038) model time 0.2168 (0.2344) loss 4.0395 (3.8481) grad_norm 0.9661 (1.2409) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][600/625] eta 0:00:05 lr 0.001999 wd 0.0500 time 0.2175 (0.2375) data time 0.0010 (0.0038) model time 0.2166 (0.2338) loss 3.9661 (3.8438) grad_norm 0.9205 (1.2338) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][610/625] eta 0:00:03 lr 0.001999 wd 0.0500 time 0.2050 (0.2367) data time 0.0005 (0.0037) model time 0.2046 (0.2330) loss 3.9592 (3.8448) grad_norm 1.4558 (1.2419) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][620/625] eta 0:00:01 lr 0.001999 wd 0.0500 time 0.2084 (0.2359) data time 0.0007 (0.0036) model time 0.2077 (0.2323) loss 3.8033 (3.8478) grad_norm 1.6666 (1.2536) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 13:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 23 training takes 0:01:25 [2024-07-29 13:26:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:26:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 1.0410 (1.0410) Acc@1 79.834 (79.834) Acc@5 94.727 (94.727) Mem 8977MB [2024-07-29 13:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.6348 (1.2096) Acc@1 65.137 (74.654) Acc@5 87.256 (93.208) Mem 8977MB [2024-07-29 13:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.8262 (1.4325) Acc@1 59.717 (69.766) Acc@5 84.277 (89.886) Mem 8977MB [2024-07-29 13:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.420 Acc@5 89.817 [2024-07-29 13:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 69.4% [2024-07-29 13:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 69.42% [2024-07-29 13:26:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:26:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 6.4844 (6.4844) Acc@1 1.611 (1.611) Acc@5 4.932 (4.932) Mem 8977MB [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 6.5664 (6.8420) Acc@1 1.904 (0.732) Acc@5 5.176 (2.375) Mem 8977MB [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 6.6875 (6.7351) Acc@1 1.074 (0.958) Acc@5 4.492 (3.220) Mem 8977MB [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 1.102 Acc@5 3.755 [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 1.1% [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 1.10% [2024-07-29 13:26:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:26:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][0/625] eta 0:07:08 lr 0.001999 wd 0.0500 time 0.6862 (0.6862) data time 0.3971 (0.3971) model time 0.0000 (0.0000) loss 4.3797 (4.3797) grad_norm 1.1843 (1.1843) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 13:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][10/625] eta 0:02:37 lr 0.001999 wd 0.0500 time 0.2094 (0.2563) data time 0.0007 (0.0371) model time 0.0000 (0.0000) loss 4.4567 (3.7203) grad_norm 1.0989 (1.2466) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][20/625] eta 0:02:22 lr 0.001999 wd 0.0500 time 0.2073 (0.2353) data time 0.0010 (0.0199) model time 0.0000 (0.0000) loss 2.9885 (3.6960) grad_norm 1.2533 (1.1620) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][30/625] eta 0:02:15 lr 0.001999 wd 0.0500 time 0.2072 (0.2272) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 4.0545 (3.6377) grad_norm 1.4827 (1.1730) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][40/625] eta 0:02:10 lr 0.001999 wd 0.0500 time 0.2122 (0.2231) data time 0.0010 (0.0107) model time 0.0000 (0.0000) loss 3.3312 (3.7643) grad_norm 0.9371 (1.1803) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][50/625] eta 0:02:09 lr 0.001999 wd 0.0500 time 0.2111 (0.2252) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 3.8304 (3.8141) grad_norm 0.9608 (1.1626) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][60/625] eta 0:02:08 lr 0.001999 wd 0.0500 time 0.2126 (0.2266) data time 0.0009 (0.0076) model time 0.2117 (0.2332) loss 3.1878 (3.7828) grad_norm 0.8191 (1.2031) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][70/625] eta 0:02:04 lr 0.001999 wd 0.0500 time 0.2222 (0.2242) data time 0.0008 (0.0066) model time 0.2214 (0.2207) loss 4.6095 (3.8702) grad_norm 1.5400 (1.2292) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][80/625] eta 0:02:01 lr 0.001999 wd 0.0500 time 0.2083 (0.2225) data time 0.0008 (0.0059) model time 0.2076 (0.2168) loss 3.9957 (3.8773) grad_norm 1.4180 (1.2226) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][90/625] eta 0:01:58 lr 0.001999 wd 0.0500 time 0.2099 (0.2213) data time 0.0009 (0.0054) model time 0.2089 (0.2153) loss 3.0006 (3.8527) grad_norm 1.5065 (1.2396) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][100/625] eta 0:01:55 lr 0.001999 wd 0.0500 time 0.2117 (0.2205) data time 0.0009 (0.0050) model time 0.2108 (0.2147) loss 3.0862 (3.8245) grad_norm 2.1046 (1.2534) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][110/625] eta 0:01:53 lr 0.001999 wd 0.0500 time 0.2149 (0.2197) data time 0.0008 (0.0046) model time 0.2141 (0.2139) loss 3.9188 (3.8015) grad_norm 1.0456 (1.2866) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][120/625] eta 0:01:50 lr 0.001999 wd 0.0500 time 0.2070 (0.2190) data time 0.0009 (0.0043) model time 0.2062 (0.2135) loss 4.6597 (3.8173) grad_norm 1.5316 (1.2763) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][130/625] eta 0:01:48 lr 0.001999 wd 0.0500 time 0.2145 (0.2186) data time 0.0007 (0.0041) model time 0.2138 (0.2134) loss 3.2904 (3.8279) grad_norm 0.9190 (1.2702) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][140/625] eta 0:01:45 lr 0.001999 wd 0.0500 time 0.2148 (0.2185) data time 0.0007 (0.0039) model time 0.2141 (0.2137) loss 3.0857 (3.8209) grad_norm 0.8719 (1.2550) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][150/625] eta 0:01:43 lr 0.001999 wd 0.0500 time 0.2299 (0.2183) data time 0.0007 (0.0037) model time 0.2292 (0.2138) loss 4.7961 (3.8581) grad_norm 1.0980 (1.2511) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][160/625] eta 0:01:41 lr 0.001999 wd 0.0500 time 0.2116 (0.2180) data time 0.0011 (0.0035) model time 0.2105 (0.2136) loss 4.0252 (3.8451) grad_norm 1.7970 (1.2471) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][170/625] eta 0:01:39 lr 0.001999 wd 0.0500 time 0.2138 (0.2178) data time 0.0008 (0.0034) model time 0.2130 (0.2135) loss 4.3034 (3.8454) grad_norm 1.0812 (1.2401) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][180/625] eta 0:01:36 lr 0.001999 wd 0.0500 time 0.2037 (0.2173) data time 0.0012 (0.0033) model time 0.2025 (0.2131) loss 3.8905 (3.8432) grad_norm 1.5958 (1.2358) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][190/625] eta 0:01:34 lr 0.001999 wd 0.0500 time 0.2114 (0.2170) data time 0.0009 (0.0032) model time 0.2105 (0.2129) loss 4.7582 (3.8476) grad_norm 0.8767 (1.2325) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][200/625] eta 0:01:32 lr 0.001999 wd 0.0500 time 0.2125 (0.2180) data time 0.0010 (0.0031) model time 0.2115 (0.2145) loss 3.8474 (3.8560) grad_norm 1.1984 (1.2319) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][210/625] eta 0:01:30 lr 0.001999 wd 0.0500 time 0.2107 (0.2176) data time 0.0009 (0.0030) model time 0.2098 (0.2141) loss 3.4854 (3.8560) grad_norm 0.8525 (1.2350) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][220/625] eta 0:01:28 lr 0.001999 wd 0.0500 time 0.2118 (0.2175) data time 0.0011 (0.0029) model time 0.2107 (0.2141) loss 4.0077 (3.8636) grad_norm 1.2420 (1.2366) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][230/625] eta 0:01:25 lr 0.001999 wd 0.0500 time 0.2062 (0.2173) data time 0.0011 (0.0028) model time 0.2051 (0.2140) loss 3.0517 (3.8665) grad_norm 1.8752 (1.2491) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][240/625] eta 0:01:23 lr 0.001999 wd 0.0500 time 0.2041 (0.2174) data time 0.0011 (0.0028) model time 0.2030 (0.2142) loss 2.8085 (3.8617) grad_norm 1.4784 (1.2589) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][250/625] eta 0:01:21 lr 0.001999 wd 0.0500 time 0.2060 (0.2172) data time 0.0012 (0.0027) model time 0.2048 (0.2141) loss 3.7368 (3.8626) grad_norm 1.0711 (1.2542) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][260/625] eta 0:01:19 lr 0.001999 wd 0.0500 time 0.2131 (0.2170) data time 0.0007 (0.0026) model time 0.2124 (0.2140) loss 4.7074 (3.8708) grad_norm 0.8410 (1.2613) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][270/625] eta 0:01:16 lr 0.001999 wd 0.0500 time 0.2119 (0.2168) data time 0.0009 (0.0026) model time 0.2110 (0.2138) loss 4.4878 (3.8731) grad_norm 0.8782 (1.2570) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][280/625] eta 0:01:14 lr 0.001999 wd 0.0500 time 0.2011 (0.2166) data time 0.0010 (0.0025) model time 0.2002 (0.2136) loss 4.5125 (3.8629) grad_norm 1.4724 (1.2574) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][290/625] eta 0:01:12 lr 0.001999 wd 0.0500 time 0.2144 (0.2166) data time 0.0010 (0.0025) model time 0.2134 (0.2136) loss 4.6970 (3.8569) grad_norm 1.9248 (1.2579) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][300/625] eta 0:01:10 lr 0.001999 wd 0.0500 time 0.2099 (0.2165) data time 0.0010 (0.0024) model time 0.2089 (0.2136) loss 3.5377 (3.8465) grad_norm 0.9996 (1.2597) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][310/625] eta 0:01:08 lr 0.001999 wd 0.0500 time 0.2103 (0.2163) data time 0.0008 (0.0024) model time 0.2095 (0.2135) loss 4.3514 (3.8543) grad_norm 1.3387 (1.2551) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][320/625] eta 0:01:05 lr 0.001999 wd 0.0500 time 0.2103 (0.2162) data time 0.0007 (0.0023) model time 0.2095 (0.2134) loss 3.6714 (3.8548) grad_norm 1.4091 (1.2554) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][330/625] eta 0:01:03 lr 0.001999 wd 0.0500 time 0.2129 (0.2160) data time 0.0011 (0.0023) model time 0.2118 (0.2132) loss 4.0270 (3.8485) grad_norm 0.8083 (1.2521) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][340/625] eta 0:01:01 lr 0.001999 wd 0.0500 time 0.2053 (0.2159) data time 0.0009 (0.0023) model time 0.2044 (0.2132) loss 4.2094 (3.8444) grad_norm 0.8031 (1.2489) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][350/625] eta 0:00:59 lr 0.001999 wd 0.0500 time 0.2127 (0.2159) data time 0.0007 (0.0022) model time 0.2119 (0.2132) loss 3.9430 (3.8396) grad_norm 1.0235 (1.2528) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][360/625] eta 0:00:57 lr 0.001999 wd 0.0500 time 0.2059 (0.2158) data time 0.0009 (0.0022) model time 0.2050 (0.2131) loss 3.6951 (3.8355) grad_norm 1.1025 (1.2616) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][370/625] eta 0:00:54 lr 0.001999 wd 0.0500 time 0.2057 (0.2156) data time 0.0011 (0.0022) model time 0.2046 (0.2130) loss 3.5271 (3.8342) grad_norm 2.6904 (1.2644) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][380/625] eta 0:00:52 lr 0.001999 wd 0.0500 time 0.2097 (0.2155) data time 0.0011 (0.0021) model time 0.2086 (0.2129) loss 4.1353 (3.8415) grad_norm 0.7921 (1.2618) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][390/625] eta 0:00:50 lr 0.001999 wd 0.0500 time 0.2120 (0.2154) data time 0.0009 (0.0021) model time 0.2110 (0.2129) loss 3.0296 (3.8413) grad_norm 1.3773 (1.2629) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][400/625] eta 0:00:48 lr 0.001999 wd 0.0500 time 0.2101 (0.2154) data time 0.0007 (0.0021) model time 0.2093 (0.2128) loss 3.5292 (3.8444) grad_norm 0.9697 (1.2654) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][410/625] eta 0:00:46 lr 0.001999 wd 0.0500 time 0.2140 (0.2153) data time 0.0008 (0.0021) model time 0.2132 (0.2128) loss 4.9174 (3.8509) grad_norm 0.9929 (1.2612) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][420/625] eta 0:00:44 lr 0.001999 wd 0.0500 time 0.2153 (0.2152) data time 0.0010 (0.0020) model time 0.2144 (0.2128) loss 3.9576 (3.8473) grad_norm 1.3376 (1.2635) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][430/625] eta 0:00:41 lr 0.001999 wd 0.0500 time 0.2119 (0.2152) data time 0.0009 (0.0020) model time 0.2110 (0.2128) loss 3.8982 (3.8462) grad_norm 1.5049 (1.2622) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][440/625] eta 0:00:39 lr 0.001999 wd 0.0500 time 0.2157 (0.2152) data time 0.0011 (0.0020) model time 0.2147 (0.2127) loss 3.5080 (3.8460) grad_norm 0.9588 (1.2590) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][450/625] eta 0:00:37 lr 0.001999 wd 0.0500 time 0.2089 (0.2151) data time 0.0009 (0.0020) model time 0.2081 (0.2127) loss 3.0000 (3.8332) grad_norm 1.3195 (1.2670) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][460/625] eta 0:00:35 lr 0.001999 wd 0.0500 time 0.2053 (0.2149) data time 0.0012 (0.0020) model time 0.2042 (0.2125) loss 3.2099 (3.8355) grad_norm 1.1476 (1.2783) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][470/625] eta 0:00:33 lr 0.001999 wd 0.0500 time 0.2102 (0.2149) data time 0.0009 (0.0019) model time 0.2093 (0.2126) loss 4.1455 (3.8308) grad_norm 0.8132 (1.2760) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][480/625] eta 0:00:31 lr 0.001999 wd 0.0500 time 0.2082 (0.2149) data time 0.0010 (0.0019) model time 0.2073 (0.2125) loss 4.4369 (3.8277) grad_norm 0.9072 (1.2708) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][490/625] eta 0:00:29 lr 0.001999 wd 0.0500 time 0.2167 (0.2149) data time 0.0007 (0.0019) model time 0.2159 (0.2126) loss 3.4205 (3.8252) grad_norm 1.6541 (1.2665) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][500/625] eta 0:00:26 lr 0.001999 wd 0.0500 time 0.2065 (0.2148) data time 0.0010 (0.0019) model time 0.2054 (0.2125) loss 2.7157 (3.8246) grad_norm 0.9483 (1.2628) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][510/625] eta 0:00:24 lr 0.001999 wd 0.0500 time 0.2058 (0.2148) data time 0.0009 (0.0019) model time 0.2049 (0.2125) loss 4.0708 (3.8296) grad_norm 0.7949 (1.2588) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][520/625] eta 0:00:22 lr 0.001999 wd 0.0500 time 0.2121 (0.2147) data time 0.0009 (0.0019) model time 0.2112 (0.2125) loss 4.0764 (3.8245) grad_norm 0.9195 (1.2556) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][530/625] eta 0:00:20 lr 0.001999 wd 0.0500 time 0.2181 (0.2147) data time 0.0009 (0.0018) model time 0.2172 (0.2125) loss 3.6995 (3.8252) grad_norm 0.8933 (1.2532) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][540/625] eta 0:00:18 lr 0.001999 wd 0.0500 time 0.2013 (0.2146) data time 0.0010 (0.0018) model time 0.2003 (0.2124) loss 4.3106 (3.8277) grad_norm 2.4268 (1.2599) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][550/625] eta 0:00:16 lr 0.001999 wd 0.0500 time 0.2078 (0.2146) data time 0.0008 (0.0018) model time 0.2071 (0.2124) loss 4.5960 (3.8261) grad_norm 1.3460 (1.2634) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][560/625] eta 0:00:13 lr 0.001999 wd 0.0500 time 0.2081 (0.2145) data time 0.0010 (0.0018) model time 0.2071 (0.2123) loss 3.7699 (3.8320) grad_norm 0.9834 (1.2589) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][570/625] eta 0:00:11 lr 0.001998 wd 0.0500 time 0.2117 (0.2144) data time 0.0008 (0.0018) model time 0.2108 (0.2123) loss 3.6275 (3.8364) grad_norm 1.2434 (1.2572) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][580/625] eta 0:00:09 lr 0.001998 wd 0.0500 time 0.2208 (0.2144) data time 0.0010 (0.0018) model time 0.2198 (0.2122) loss 3.3795 (3.8297) grad_norm 1.2118 (1.2549) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][590/625] eta 0:00:07 lr 0.001998 wd 0.0500 time 0.2125 (0.2143) data time 0.0008 (0.0018) model time 0.2117 (0.2122) loss 3.9291 (3.8262) grad_norm 0.9247 (1.2552) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][600/625] eta 0:00:05 lr 0.001998 wd 0.0500 time 0.2173 (0.2147) data time 0.0011 (0.0018) model time 0.2163 (0.2126) loss 4.1224 (3.8270) grad_norm 1.0054 (1.2535) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][610/625] eta 0:00:03 lr 0.001998 wd 0.0500 time 0.2069 (0.2147) data time 0.0005 (0.0018) model time 0.2064 (0.2126) loss 3.9526 (3.8301) grad_norm 1.3497 (1.2498) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][620/625] eta 0:00:01 lr 0.001998 wd 0.0500 time 0.2080 (0.2145) data time 0.0005 (0.0017) model time 0.2075 (0.2125) loss 3.2189 (3.8293) grad_norm 1.1427 (1.2470) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 24 training takes 0:02:14 [2024-07-29 13:28:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:28:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.9253 (0.9253) Acc@1 80.957 (80.957) Acc@5 96.289 (96.289) Mem 8975MB [2024-07-29 13:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.6113 (1.1798) Acc@1 64.111 (75.062) Acc@5 87.598 (93.568) Mem 8975MB [2024-07-29 13:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.8408 (1.4338) Acc@1 60.791 (69.885) Acc@5 84.814 (90.127) Mem 8975MB [2024-07-29 13:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.726 Acc@5 90.005 [2024-07-29 13:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 69.7% [2024-07-29 13:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 69.73% [2024-07-29 13:28:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:28:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:28:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.587 (0.587) Loss 6.3320 (6.3320) Acc@1 2.637 (2.637) Acc@5 7.324 (7.324) Mem 8975MB [2024-07-29 13:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 6.3672 (6.6602) Acc@1 2.783 (1.367) Acc@5 8.838 (4.554) Mem 8975MB [2024-07-29 13:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 6.5117 (6.5365) Acc@1 2.344 (1.797) Acc@5 7.031 (5.759) Mem 8975MB [2024-07-29 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 2.107 Acc@5 6.614 [2024-07-29 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 2.1% [2024-07-29 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 2.11% [2024-07-29 13:28:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:28:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][0/625] eta 0:07:35 lr 0.001998 wd 0.0500 time 0.7285 (0.7285) data time 0.5289 (0.5289) model time 0.0000 (0.0000) loss 2.8747 (2.8747) grad_norm 0.9494 (0.9494) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][10/625] eta 0:02:38 lr 0.001998 wd 0.0500 time 0.2048 (0.2577) data time 0.0010 (0.0491) model time 0.0000 (0.0000) loss 3.7196 (3.7951) grad_norm 1.0287 (1.4133) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][20/625] eta 0:02:23 lr 0.001998 wd 0.0500 time 0.2099 (0.2368) data time 0.0007 (0.0262) model time 0.0000 (0.0000) loss 4.8625 (3.7954) grad_norm 0.9832 (1.2715) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][30/625] eta 0:02:16 lr 0.001998 wd 0.0500 time 0.2072 (0.2288) data time 0.0009 (0.0181) model time 0.0000 (0.0000) loss 3.2626 (3.8094) grad_norm 1.7326 (1.3362) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][40/625] eta 0:02:11 lr 0.001998 wd 0.0500 time 0.2101 (0.2245) data time 0.0010 (0.0140) model time 0.0000 (0.0000) loss 3.1020 (3.7431) grad_norm 0.8845 (1.3389) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][50/625] eta 0:02:07 lr 0.001998 wd 0.0500 time 0.2083 (0.2217) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 4.1510 (3.7850) grad_norm 0.9962 (1.2808) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:28:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:28:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:33:51 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:34:06 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:34:07 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:34:07 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:34:07 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 25) [2024-07-29 13:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][60/625] eta 0:49:41 lr 0.001998 wd 0.0500 time 0.9256 (5.2768) data time 0.0008 (0.4844) model time 0.9248 (4.7924) loss 4.7463 (4.7820) grad_norm 0.9972 (1.2825) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 13:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][70/625] eta 0:09:42 lr 0.001998 wd 0.0500 time 0.2083 (1.0504) data time 0.0006 (0.0815) model time 0.2077 (0.9689) loss 3.0712 (4.1875) grad_norm 0.8627 (1.1275) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 13:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][80/625] eta 0:06:02 lr 0.001998 wd 0.0500 time 0.2097 (0.6654) data time 0.0011 (0.0450) model time 0.2086 (0.6204) loss 4.2008 (4.1435) grad_norm 1.5478 (1.1245) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 13:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][90/625] eta 0:04:39 lr 0.001998 wd 0.0500 time 0.2033 (0.5215) data time 0.0007 (0.0313) model time 0.2026 (0.4902) loss 4.0336 (4.1504) grad_norm 1.2241 (1.1872) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 13:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][100/625] eta 0:03:53 lr 0.001998 wd 0.0500 time 0.1987 (0.4454) data time 0.0008 (0.0241) model time 0.1979 (0.4213) loss 4.1968 (4.0911) grad_norm 1.5016 (1.1753) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 13:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][110/625] eta 0:03:25 lr 0.001998 wd 0.0500 time 0.2022 (0.3986) data time 0.0009 (0.0196) model time 0.2012 (0.3789) loss 4.1142 (4.0755) grad_norm 2.1693 (1.2204) loss_scale 16384.0000 (8979.6923) mem 8974MB [2024-07-29 13:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][120/625] eta 0:03:05 lr 0.001998 wd 0.0500 time 0.2049 (0.3673) data time 0.0007 (0.0166) model time 0.2041 (0.3507) loss 4.5588 (4.0557) grad_norm 0.9808 (1.2645) loss_scale 16384.0000 (10173.9355) mem 8974MB [2024-07-29 13:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][130/625] eta 0:02:50 lr 0.001998 wd 0.0500 time 0.2076 (0.3446) data time 0.0010 (0.0144) model time 0.2066 (0.3301) loss 3.5359 (3.9899) grad_norm 0.7509 (1.2260) loss_scale 16384.0000 (11036.4444) mem 8974MB [2024-07-29 13:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][140/625] eta 0:02:38 lr 0.001998 wd 0.0500 time 0.1985 (0.3273) data time 0.0010 (0.0128) model time 0.1975 (0.3145) loss 4.3143 (3.9680) grad_norm 0.8325 (1.1932) loss_scale 16384.0000 (11688.5854) mem 8974MB [2024-07-29 13:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][150/625] eta 0:02:29 lr 0.001998 wd 0.0500 time 0.1967 (0.3139) data time 0.0009 (0.0115) model time 0.1958 (0.3024) loss 2.3736 (3.9448) grad_norm 1.1632 (1.1934) loss_scale 16384.0000 (12198.9565) mem 8974MB [2024-07-29 13:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][160/625] eta 0:02:20 lr 0.001998 wd 0.0500 time 0.2027 (0.3031) data time 0.0007 (0.0105) model time 0.2020 (0.2926) loss 4.4897 (3.9669) grad_norm 1.0251 (1.1951) loss_scale 16384.0000 (12609.2549) mem 8974MB [2024-07-29 13:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][170/625] eta 0:02:13 lr 0.001998 wd 0.0500 time 0.2040 (0.2942) data time 0.0008 (0.0096) model time 0.2032 (0.2846) loss 3.9352 (3.9634) grad_norm 1.2141 (1.1943) loss_scale 16384.0000 (12946.2857) mem 8974MB [2024-07-29 13:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][180/625] eta 0:02:07 lr 0.001998 wd 0.0500 time 0.2008 (0.2869) data time 0.0006 (0.0089) model time 0.2002 (0.2780) loss 4.2832 (3.9550) grad_norm 1.6664 (1.2044) loss_scale 16384.0000 (13228.0656) mem 8974MB [2024-07-29 13:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][190/625] eta 0:02:02 lr 0.001998 wd 0.0500 time 0.2092 (0.2808) data time 0.0008 (0.0083) model time 0.2084 (0.2724) loss 4.0112 (3.9379) grad_norm 0.9810 (1.2031) loss_scale 16384.0000 (13467.1515) mem 8974MB [2024-07-29 13:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][200/625] eta 0:01:57 lr 0.001998 wd 0.0500 time 0.2036 (0.2758) data time 0.0009 (0.0079) model time 0.2027 (0.2679) loss 3.5641 (3.9219) grad_norm 1.0983 (1.1963) loss_scale 16384.0000 (13672.5634) mem 8974MB [2024-07-29 13:34:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][210/625] eta 0:01:52 lr 0.001998 wd 0.0500 time 0.2053 (0.2711) data time 0.0009 (0.0075) model time 0.2045 (0.2636) loss 3.8879 (3.9066) grad_norm 1.2894 (1.1952) loss_scale 16384.0000 (13850.9474) mem 8974MB [2024-07-29 13:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][220/625] eta 0:01:48 lr 0.001998 wd 0.0500 time 0.1998 (0.2670) data time 0.0010 (0.0071) model time 0.1988 (0.2599) loss 4.2377 (3.9123) grad_norm 0.6783 (1.1896) loss_scale 16384.0000 (14007.3086) mem 8974MB [2024-07-29 13:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][230/625] eta 0:01:43 lr 0.001998 wd 0.0500 time 0.2085 (0.2633) data time 0.0011 (0.0067) model time 0.2074 (0.2565) loss 3.5728 (3.9009) grad_norm 1.3407 (1.2019) loss_scale 16384.0000 (14145.4884) mem 8974MB [2024-07-29 13:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][240/625] eta 0:01:40 lr 0.001998 wd 0.0500 time 0.2052 (0.2600) data time 0.0011 (0.0064) model time 0.2041 (0.2536) loss 4.3008 (3.8879) grad_norm 0.9893 (1.2128) loss_scale 16384.0000 (14268.4835) mem 8974MB [2024-07-29 13:35:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][250/625] eta 0:01:36 lr 0.001998 wd 0.0500 time 0.1986 (0.2571) data time 0.0010 (0.0061) model time 0.1976 (0.2510) loss 4.6455 (3.8882) grad_norm 1.2951 (1.2091) loss_scale 16384.0000 (14378.6667) mem 8974MB [2024-07-29 13:35:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][260/625] eta 0:01:32 lr 0.001998 wd 0.0500 time 0.2062 (0.2546) data time 0.0008 (0.0059) model time 0.2054 (0.2487) loss 4.5802 (3.8739) grad_norm 0.8043 (1.2131) loss_scale 16384.0000 (14477.9406) mem 8974MB [2024-07-29 13:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][270/625] eta 0:01:29 lr 0.001998 wd 0.0500 time 0.2028 (0.2523) data time 0.0008 (0.0057) model time 0.2020 (0.2466) loss 4.3069 (3.8622) grad_norm 0.9327 (1.2152) loss_scale 16384.0000 (14567.8491) mem 8974MB [2024-07-29 13:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][280/625] eta 0:01:26 lr 0.001998 wd 0.0500 time 0.1991 (0.2502) data time 0.0009 (0.0054) model time 0.1982 (0.2447) loss 4.6803 (3.8572) grad_norm 1.0865 (1.2192) loss_scale 16384.0000 (14649.6577) mem 8974MB [2024-07-29 13:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][290/625] eta 0:01:23 lr 0.001998 wd 0.0500 time 0.1969 (0.2482) data time 0.0008 (0.0053) model time 0.1961 (0.2430) loss 3.7356 (3.8520) grad_norm 2.0799 (1.2323) loss_scale 16384.0000 (14724.4138) mem 8974MB [2024-07-29 13:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][300/625] eta 0:01:20 lr 0.001998 wd 0.0500 time 0.2053 (0.2464) data time 0.0008 (0.0051) model time 0.2045 (0.2413) loss 4.2593 (3.8516) grad_norm 0.8449 (1.2333) loss_scale 16384.0000 (14792.9917) mem 8974MB [2024-07-29 13:35:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][310/625] eta 0:01:17 lr 0.001998 wd 0.0500 time 0.2033 (0.2447) data time 0.0006 (0.0049) model time 0.2027 (0.2398) loss 4.4503 (3.8343) grad_norm 0.6895 (1.2296) loss_scale 16384.0000 (14856.1270) mem 8974MB [2024-07-29 13:35:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][320/625] eta 0:01:14 lr 0.001998 wd 0.0500 time 0.1979 (0.2430) data time 0.0011 (0.0048) model time 0.1968 (0.2382) loss 4.3539 (3.8234) grad_norm 1.2566 (1.2380) loss_scale 16384.0000 (14914.4427) mem 8974MB [2024-07-29 13:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][330/625] eta 0:01:11 lr 0.001998 wd 0.0500 time 0.2103 (0.2416) data time 0.0008 (0.0046) model time 0.2095 (0.2370) loss 3.7455 (3.8149) grad_norm 1.4136 (1.2387) loss_scale 16384.0000 (14968.4706) mem 8974MB [2024-07-29 13:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][340/625] eta 0:01:08 lr 0.001998 wd 0.0500 time 0.2082 (0.2402) data time 0.0010 (0.0045) model time 0.2072 (0.2357) loss 3.0272 (3.8153) grad_norm 0.8306 (1.2336) loss_scale 16384.0000 (15018.6667) mem 8974MB [2024-07-29 13:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][350/625] eta 0:01:05 lr 0.001998 wd 0.0500 time 0.1987 (0.2391) data time 0.0012 (0.0044) model time 0.1975 (0.2347) loss 3.5165 (3.8109) grad_norm 1.4692 (1.2432) loss_scale 16384.0000 (15065.4247) mem 8974MB [2024-07-29 13:35:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][360/625] eta 0:01:03 lr 0.001998 wd 0.0500 time 0.2075 (0.2379) data time 0.0009 (0.0043) model time 0.2067 (0.2336) loss 4.2553 (3.8018) grad_norm 1.0338 (1.2463) loss_scale 16384.0000 (15109.0861) mem 8974MB [2024-07-29 13:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][370/625] eta 0:01:00 lr 0.001998 wd 0.0500 time 0.1971 (0.2368) data time 0.0010 (0.0042) model time 0.1961 (0.2326) loss 4.4681 (3.7988) grad_norm 1.7059 (1.2409) loss_scale 16384.0000 (15149.9487) mem 8974MB [2024-07-29 13:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][380/625] eta 0:00:57 lr 0.001998 wd 0.0500 time 0.2012 (0.2357) data time 0.0008 (0.0041) model time 0.2004 (0.2316) loss 3.7580 (3.8113) grad_norm 1.5858 (1.2415) loss_scale 16384.0000 (15188.2733) mem 8974MB [2024-07-29 13:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][390/625] eta 0:00:55 lr 0.001998 wd 0.0500 time 0.2022 (0.2346) data time 0.0009 (0.0040) model time 0.2014 (0.2306) loss 4.1030 (3.8088) grad_norm 1.3541 (1.2377) loss_scale 16384.0000 (15224.2892) mem 8974MB [2024-07-29 13:35:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][400/625] eta 0:00:52 lr 0.001998 wd 0.0500 time 0.1957 (0.2337) data time 0.0007 (0.0039) model time 0.1951 (0.2298) loss 4.3665 (3.8140) grad_norm 1.2639 (1.2354) loss_scale 16384.0000 (15258.1988) mem 8974MB [2024-07-29 13:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][410/625] eta 0:00:50 lr 0.001998 wd 0.0500 time 0.1994 (0.2328) data time 0.0008 (0.0038) model time 0.1986 (0.2290) loss 5.0309 (3.8177) grad_norm 1.0896 (1.2314) loss_scale 16384.0000 (15290.1818) mem 8974MB [2024-07-29 13:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][420/625] eta 0:00:47 lr 0.001998 wd 0.0500 time 0.2135 (0.2321) data time 0.0009 (0.0037) model time 0.2126 (0.2283) loss 4.9943 (3.8212) grad_norm 0.9438 (1.2258) loss_scale 16384.0000 (15320.3978) mem 8974MB [2024-07-29 13:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][430/625] eta 0:00:45 lr 0.001998 wd 0.0500 time 0.2015 (0.2312) data time 0.0011 (0.0037) model time 0.2004 (0.2276) loss 3.6555 (3.8131) grad_norm 1.8728 (1.2255) loss_scale 16384.0000 (15348.9892) mem 8974MB [2024-07-29 13:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][440/625] eta 0:00:42 lr 0.001998 wd 0.0500 time 0.2044 (0.2305) data time 0.0007 (0.0036) model time 0.2037 (0.2269) loss 4.5201 (3.8111) grad_norm 1.0647 (1.2253) loss_scale 16384.0000 (15376.0838) mem 8974MB [2024-07-29 13:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][450/625] eta 0:00:40 lr 0.001998 wd 0.0500 time 0.2109 (0.2304) data time 0.0006 (0.0035) model time 0.2102 (0.2269) loss 3.3239 (3.8054) grad_norm 2.0534 (1.2221) loss_scale 16384.0000 (15401.7959) mem 8974MB [2024-07-29 13:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][460/625] eta 0:00:37 lr 0.001998 wd 0.0500 time 0.2008 (0.2298) data time 0.0008 (0.0035) model time 0.2000 (0.2263) loss 4.4240 (3.8140) grad_norm 0.9195 (1.2262) loss_scale 16384.0000 (15426.2289) mem 8974MB [2024-07-29 13:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][470/625] eta 0:00:35 lr 0.001998 wd 0.0500 time 0.1980 (0.2291) data time 0.0011 (0.0034) model time 0.1969 (0.2257) loss 3.8628 (3.8172) grad_norm 0.8978 (1.2256) loss_scale 16384.0000 (15449.4757) mem 8974MB [2024-07-29 13:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][480/625] eta 0:00:33 lr 0.001998 wd 0.0500 time 0.2016 (0.2291) data time 0.0008 (0.0033) model time 0.2008 (0.2258) loss 3.8680 (3.8151) grad_norm 0.8359 (1.2227) loss_scale 16384.0000 (15471.6209) mem 8974MB [2024-07-29 13:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][490/625] eta 0:00:30 lr 0.001998 wd 0.0500 time 0.2108 (0.2286) data time 0.0008 (0.0033) model time 0.2101 (0.2253) loss 3.3096 (3.8199) grad_norm 0.9879 (1.2271) loss_scale 16384.0000 (15492.7407) mem 8974MB [2024-07-29 13:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][500/625] eta 0:00:28 lr 0.001998 wd 0.0500 time 0.2016 (0.2280) data time 0.0009 (0.0032) model time 0.2007 (0.2248) loss 3.8017 (3.8245) grad_norm 1.0139 (1.2314) loss_scale 16384.0000 (15512.9050) mem 8974MB [2024-07-29 13:35:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][510/625] eta 0:00:26 lr 0.001998 wd 0.0500 time 0.1966 (0.2275) data time 0.0009 (0.0032) model time 0.1957 (0.2243) loss 3.2949 (3.8234) grad_norm 1.1864 (1.2367) loss_scale 16384.0000 (15532.1770) mem 8974MB [2024-07-29 13:35:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][520/625] eta 0:00:23 lr 0.001998 wd 0.0500 time 0.2060 (0.2270) data time 0.0009 (0.0031) model time 0.2052 (0.2239) loss 3.0324 (3.8175) grad_norm 1.3984 (1.2366) loss_scale 16384.0000 (15550.6147) mem 8974MB [2024-07-29 13:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][530/625] eta 0:00:21 lr 0.001998 wd 0.0500 time 0.1949 (0.2265) data time 0.0009 (0.0031) model time 0.1940 (0.2234) loss 4.0100 (3.8076) grad_norm 0.8533 (1.2315) loss_scale 16384.0000 (15568.2712) mem 8974MB [2024-07-29 13:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][540/625] eta 0:00:19 lr 0.001998 wd 0.0500 time 0.2017 (0.2260) data time 0.0006 (0.0030) model time 0.2011 (0.2230) loss 4.6853 (3.8086) grad_norm 1.9759 (1.2347) loss_scale 16384.0000 (15585.1950) mem 8974MB [2024-07-29 13:36:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][550/625] eta 0:00:16 lr 0.001998 wd 0.0500 time 0.2081 (0.2255) data time 0.0009 (0.0030) model time 0.2072 (0.2225) loss 4.2922 (3.8105) grad_norm 1.0399 (1.2376) loss_scale 16384.0000 (15601.4309) mem 8974MB [2024-07-29 13:36:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][560/625] eta 0:00:14 lr 0.001998 wd 0.0500 time 0.2003 (0.2251) data time 0.0010 (0.0030) model time 0.1993 (0.2221) loss 3.7114 (3.8087) grad_norm 1.5073 (1.2359) loss_scale 16384.0000 (15617.0199) mem 8974MB [2024-07-29 13:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][570/625] eta 0:00:12 lr 0.001998 wd 0.0500 time 0.2027 (0.2247) data time 0.0008 (0.0029) model time 0.2019 (0.2217) loss 4.5243 (3.8148) grad_norm 0.9348 (1.2348) loss_scale 16384.0000 (15632.0000) mem 8974MB [2024-07-29 13:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][580/625] eta 0:00:10 lr 0.001998 wd 0.0500 time 0.1985 (0.2243) data time 0.0006 (0.0029) model time 0.1979 (0.2214) loss 2.5495 (3.8108) grad_norm 0.7632 (1.2334) loss_scale 16384.0000 (15646.4061) mem 8974MB [2024-07-29 13:36:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][590/625] eta 0:00:07 lr 0.001998 wd 0.0500 time 0.2071 (0.2240) data time 0.0009 (0.0028) model time 0.2062 (0.2211) loss 4.3349 (3.8089) grad_norm 1.4798 (1.2302) loss_scale 16384.0000 (15660.2707) mem 8974MB [2024-07-29 13:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][600/625] eta 0:00:05 lr 0.001998 wd 0.0500 time 0.2143 (0.2237) data time 0.0008 (0.0028) model time 0.2134 (0.2208) loss 3.0623 (3.8062) grad_norm 1.4879 (1.2295) loss_scale 16384.0000 (15673.6236) mem 8974MB [2024-07-29 13:36:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][610/625] eta 0:00:03 lr 0.001998 wd 0.0500 time 0.1927 (0.2234) data time 0.0004 (0.0028) model time 0.1923 (0.2206) loss 4.3580 (3.8088) grad_norm 0.8960 (1.2301) loss_scale 16384.0000 (15686.4928) mem 8974MB [2024-07-29 13:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][620/625] eta 0:00:01 lr 0.001998 wd 0.0500 time 0.1946 (0.2229) data time 0.0004 (0.0027) model time 0.1942 (0.2201) loss 4.6036 (3.8141) grad_norm 1.5934 (1.2308) loss_scale 16384.0000 (15698.9039) mem 8974MB [2024-07-29 13:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 25 training takes 0:02:06 [2024-07-29 13:36:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:36:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.545 (0.545) Loss 0.9243 (0.9243) Acc@1 82.520 (82.520) Acc@5 96.240 (96.240) Mem 8974MB [2024-07-29 13:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.105) Loss 1.6094 (1.1569) Acc@1 64.893 (75.959) Acc@5 87.500 (94.012) Mem 8974MB [2024-07-29 13:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.081) Loss 1.7773 (1.3824) Acc@1 61.670 (70.826) Acc@5 84.717 (90.523) Mem 8974MB [2024-07-29 13:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 70.595 Acc@5 90.453 [2024-07-29 13:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 70.6% [2024-07-29 13:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 70.60% [2024-07-29 13:36:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:36:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.550 (0.550) Loss 6.0820 (6.0820) Acc@1 4.785 (4.785) Acc@5 12.744 (12.744) Mem 8974MB [2024-07-29 13:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.107) Loss 6.1211 (6.3768) Acc@1 4.346 (2.703) Acc@5 12.744 (8.332) Mem 8974MB [2024-07-29 13:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.081) Loss 6.2734 (6.2630) Acc@1 3.906 (3.348) Acc@5 9.912 (9.677) Mem 8974MB [2024-07-29 13:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 3.761 Acc@5 10.635 [2024-07-29 13:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 3.8% [2024-07-29 13:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 3.76% [2024-07-29 13:36:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:36:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:36:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][0/625] eta 0:14:04 lr 0.001998 wd 0.0500 time 1.3513 (1.3513) data time 0.5071 (0.5071) model time 0.0000 (0.0000) loss 4.3529 (4.3529) grad_norm 1.4096 (1.4096) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 13:36:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][10/625] eta 0:03:14 lr 0.001998 wd 0.0500 time 0.2186 (0.3155) data time 0.0007 (0.0470) model time 0.0000 (0.0000) loss 4.2816 (3.8735) grad_norm 1.1406 (1.1143) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][20/625] eta 0:02:39 lr 0.001998 wd 0.0500 time 0.2046 (0.2633) data time 0.0009 (0.0251) model time 0.0000 (0.0000) loss 3.9294 (3.9162) grad_norm 1.7704 (1.3968) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][30/625] eta 0:02:25 lr 0.001998 wd 0.0500 time 0.2110 (0.2445) data time 0.0008 (0.0173) model time 0.0000 (0.0000) loss 4.4760 (3.9157) grad_norm 0.9022 (1.3389) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][40/625] eta 0:02:17 lr 0.001998 wd 0.0500 time 0.2023 (0.2345) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 4.2379 (3.8515) grad_norm 1.2222 (1.3530) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][50/625] eta 0:02:11 lr 0.001998 wd 0.0500 time 0.2070 (0.2293) data time 0.0011 (0.0109) model time 0.0000 (0.0000) loss 3.9102 (3.8537) grad_norm 1.1154 (1.3656) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][60/625] eta 0:02:07 lr 0.001998 wd 0.0500 time 0.2087 (0.2249) data time 0.0008 (0.0093) model time 0.2079 (0.2017) loss 3.6565 (3.8587) grad_norm 2.3363 (1.3940) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][70/625] eta 0:02:05 lr 0.001998 wd 0.0500 time 0.2014 (0.2258) data time 0.0008 (0.0081) model time 0.2006 (0.2161) loss 3.2603 (3.8508) grad_norm 0.7482 (1.3628) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][80/625] eta 0:02:01 lr 0.001998 wd 0.0500 time 0.2092 (0.2231) data time 0.0008 (0.0072) model time 0.2084 (0.2117) loss 3.7312 (3.8077) grad_norm 1.4723 (1.3427) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][90/625] eta 0:01:58 lr 0.001998 wd 0.0500 time 0.2017 (0.2208) data time 0.0009 (0.0065) model time 0.2009 (0.2091) loss 2.5994 (3.7821) grad_norm 1.3505 (1.3287) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][100/625] eta 0:01:55 lr 0.001998 wd 0.0500 time 0.2204 (0.2195) data time 0.0009 (0.0060) model time 0.2195 (0.2087) loss 4.2795 (3.7734) grad_norm 1.4382 (1.3223) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][110/625] eta 0:01:52 lr 0.001998 wd 0.0500 time 0.1979 (0.2179) data time 0.0008 (0.0055) model time 0.1971 (0.2073) loss 3.5618 (3.8034) grad_norm 1.2145 (1.3189) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][120/625] eta 0:01:49 lr 0.001998 wd 0.0500 time 0.2028 (0.2167) data time 0.0008 (0.0051) model time 0.2021 (0.2067) loss 4.4051 (3.8068) grad_norm 1.1037 (1.3230) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][130/625] eta 0:01:46 lr 0.001998 wd 0.0500 time 0.2050 (0.2159) data time 0.0009 (0.0048) model time 0.2041 (0.2065) loss 2.7206 (3.7758) grad_norm 1.4233 (1.3082) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][140/625] eta 0:01:44 lr 0.001998 wd 0.0500 time 0.2012 (0.2150) data time 0.0007 (0.0046) model time 0.2006 (0.2060) loss 3.1410 (3.7860) grad_norm 0.7518 (1.2916) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][150/625] eta 0:01:41 lr 0.001998 wd 0.0500 time 0.2043 (0.2144) data time 0.0009 (0.0043) model time 0.2034 (0.2059) loss 2.6853 (3.7599) grad_norm 1.2596 (1.2878) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][160/625] eta 0:01:39 lr 0.001998 wd 0.0500 time 0.2009 (0.2136) data time 0.0007 (0.0041) model time 0.2003 (0.2055) loss 2.9325 (3.7586) grad_norm 2.1275 (1.2860) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][170/625] eta 0:01:36 lr 0.001998 wd 0.0500 time 0.2068 (0.2131) data time 0.0010 (0.0039) model time 0.2058 (0.2052) loss 4.5555 (3.7806) grad_norm 1.7337 (1.2911) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][180/625] eta 0:01:34 lr 0.001998 wd 0.0500 time 0.2009 (0.2126) data time 0.0007 (0.0038) model time 0.2003 (0.2051) loss 4.5566 (3.7854) grad_norm 0.9139 (1.2866) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][190/625] eta 0:01:32 lr 0.001998 wd 0.0500 time 0.2010 (0.2120) data time 0.0009 (0.0036) model time 0.2001 (0.2048) loss 4.0684 (3.7720) grad_norm 0.9435 (1.2820) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][200/625] eta 0:01:29 lr 0.001998 wd 0.0500 time 0.2017 (0.2116) data time 0.0009 (0.0035) model time 0.2008 (0.2047) loss 3.1070 (3.7837) grad_norm 0.9034 (1.2732) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][210/625] eta 0:01:27 lr 0.001997 wd 0.0500 time 0.2041 (0.2113) data time 0.0008 (0.0034) model time 0.2033 (0.2046) loss 2.9892 (3.7844) grad_norm 0.9449 (1.2783) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][220/625] eta 0:01:25 lr 0.001997 wd 0.0500 time 0.2114 (0.2109) data time 0.0006 (0.0033) model time 0.2108 (0.2044) loss 4.6523 (3.7833) grad_norm 1.2481 (1.2711) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][230/625] eta 0:01:23 lr 0.001997 wd 0.0500 time 0.1950 (0.2105) data time 0.0008 (0.0032) model time 0.1942 (0.2042) loss 4.2656 (3.7831) grad_norm 1.0598 (1.2674) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][240/625] eta 0:01:20 lr 0.001997 wd 0.0500 time 0.2080 (0.2103) data time 0.0008 (0.0031) model time 0.2072 (0.2043) loss 2.7722 (3.7698) grad_norm 1.1993 (1.2593) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][250/625] eta 0:01:18 lr 0.001997 wd 0.0500 time 0.2127 (0.2101) data time 0.0009 (0.0030) model time 0.2118 (0.2043) loss 3.9517 (3.7661) grad_norm 1.4558 (1.2657) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][260/625] eta 0:01:16 lr 0.001997 wd 0.0500 time 0.2080 (0.2099) data time 0.0006 (0.0029) model time 0.2074 (0.2043) loss 3.3884 (3.7519) grad_norm 1.1721 (1.2638) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][270/625] eta 0:01:14 lr 0.001997 wd 0.0500 time 0.1996 (0.2097) data time 0.0010 (0.0028) model time 0.1986 (0.2042) loss 2.6424 (3.7456) grad_norm 0.9704 (1.2593) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][280/625] eta 0:01:12 lr 0.001997 wd 0.0500 time 0.2063 (0.2095) data time 0.0009 (0.0028) model time 0.2054 (0.2041) loss 4.3330 (3.7473) grad_norm 1.5502 (1.2751) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][290/625] eta 0:01:10 lr 0.001997 wd 0.0500 time 0.2092 (0.2093) data time 0.0010 (0.0027) model time 0.2081 (0.2041) loss 3.8654 (3.7500) grad_norm 0.9437 (1.2695) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][300/625] eta 0:01:07 lr 0.001997 wd 0.0500 time 0.2010 (0.2091) data time 0.0006 (0.0026) model time 0.2004 (0.2040) loss 3.3157 (3.7460) grad_norm 1.0648 (1.2613) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][310/625] eta 0:01:05 lr 0.001997 wd 0.0500 time 0.2004 (0.2088) data time 0.0007 (0.0026) model time 0.1998 (0.2039) loss 2.8669 (3.7478) grad_norm 0.8387 (1.2603) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][320/625] eta 0:01:03 lr 0.001997 wd 0.0500 time 0.2119 (0.2087) data time 0.0006 (0.0025) model time 0.2113 (0.2039) loss 3.6630 (3.7426) grad_norm 0.8518 (1.2612) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][330/625] eta 0:01:01 lr 0.001997 wd 0.0500 time 0.2009 (0.2085) data time 0.0011 (0.0025) model time 0.1998 (0.2038) loss 3.4591 (3.7389) grad_norm 1.6891 (1.2668) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][340/625] eta 0:00:59 lr 0.001997 wd 0.0500 time 0.2061 (0.2083) data time 0.0007 (0.0024) model time 0.2054 (0.2037) loss 2.7727 (3.7408) grad_norm 1.4956 (1.2627) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][350/625] eta 0:00:57 lr 0.001997 wd 0.0500 time 0.2016 (0.2082) data time 0.0010 (0.0024) model time 0.2006 (0.2036) loss 3.6714 (3.7425) grad_norm 1.0332 (1.2651) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][360/625] eta 0:00:55 lr 0.001997 wd 0.0500 time 0.2013 (0.2080) data time 0.0009 (0.0023) model time 0.2004 (0.2036) loss 3.9138 (3.7548) grad_norm 1.3416 (1.2621) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][370/625] eta 0:00:52 lr 0.001997 wd 0.0500 time 0.2004 (0.2078) data time 0.0009 (0.0023) model time 0.1995 (0.2035) loss 4.4975 (3.7556) grad_norm 1.1892 (1.2618) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][380/625] eta 0:00:50 lr 0.001997 wd 0.0500 time 0.1977 (0.2077) data time 0.0007 (0.0023) model time 0.1969 (0.2034) loss 2.9496 (3.7495) grad_norm 1.0597 (1.2582) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][390/625] eta 0:00:48 lr 0.001997 wd 0.0500 time 0.2021 (0.2076) data time 0.0008 (0.0022) model time 0.2013 (0.2034) loss 3.6399 (3.7419) grad_norm 1.2532 (1.2664) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][400/625] eta 0:00:46 lr 0.001997 wd 0.0500 time 0.2047 (0.2075) data time 0.0009 (0.0022) model time 0.2038 (0.2034) loss 3.8826 (3.7446) grad_norm 1.8185 (1.2652) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][410/625] eta 0:00:44 lr 0.001997 wd 0.0500 time 0.2089 (0.2075) data time 0.0009 (0.0022) model time 0.2080 (0.2034) loss 2.6900 (3.7478) grad_norm 1.0809 (1.2599) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][420/625] eta 0:00:42 lr 0.001997 wd 0.0500 time 0.2037 (0.2073) data time 0.0007 (0.0022) model time 0.2030 (0.2034) loss 4.8567 (3.7496) grad_norm 1.3371 (1.2621) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][430/625] eta 0:00:40 lr 0.001997 wd 0.0500 time 0.1983 (0.2073) data time 0.0007 (0.0021) model time 0.1976 (0.2034) loss 2.8309 (3.7491) grad_norm 1.0904 (1.2640) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][440/625] eta 0:00:38 lr 0.001997 wd 0.0500 time 0.2004 (0.2072) data time 0.0011 (0.0021) model time 0.1993 (0.2034) loss 3.6698 (3.7491) grad_norm 0.9074 (1.2662) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][450/625] eta 0:00:36 lr 0.001997 wd 0.0500 time 0.2045 (0.2071) data time 0.0012 (0.0021) model time 0.2033 (0.2034) loss 2.6521 (3.7511) grad_norm 0.8874 (1.2601) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][460/625] eta 0:00:34 lr 0.001997 wd 0.0500 time 0.2050 (0.2070) data time 0.0009 (0.0020) model time 0.2040 (0.2033) loss 3.7042 (3.7466) grad_norm 0.9001 (1.2597) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][470/625] eta 0:00:32 lr 0.001997 wd 0.0500 time 0.2066 (0.2070) data time 0.0009 (0.0020) model time 0.2057 (0.2034) loss 4.3491 (3.7472) grad_norm 1.4012 (1.2599) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][480/625] eta 0:00:30 lr 0.001997 wd 0.0500 time 0.1995 (0.2073) data time 0.0008 (0.0020) model time 0.1987 (0.2038) loss 3.4373 (3.7508) grad_norm 1.1607 (1.2611) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][490/625] eta 0:00:27 lr 0.001997 wd 0.0500 time 0.2003 (0.2073) data time 0.0009 (0.0020) model time 0.1995 (0.2038) loss 4.2558 (3.7549) grad_norm 1.5927 (1.2692) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][500/625] eta 0:00:25 lr 0.001997 wd 0.0500 time 0.2020 (0.2072) data time 0.0008 (0.0020) model time 0.2011 (0.2037) loss 3.3221 (3.7573) grad_norm 1.4324 (1.2705) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][510/625] eta 0:00:23 lr 0.001997 wd 0.0500 time 0.2017 (0.2071) data time 0.0011 (0.0019) model time 0.2006 (0.2037) loss 4.0946 (3.7527) grad_norm 0.7610 (1.2657) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][520/625] eta 0:00:21 lr 0.001997 wd 0.0500 time 0.1964 (0.2070) data time 0.0008 (0.0019) model time 0.1957 (0.2036) loss 3.7008 (3.7524) grad_norm 1.5307 (1.2619) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][530/625] eta 0:00:19 lr 0.001997 wd 0.0500 time 0.2088 (0.2070) data time 0.0006 (0.0019) model time 0.2081 (0.2036) loss 4.8200 (3.7533) grad_norm 1.4667 (1.2639) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][540/625] eta 0:00:17 lr 0.001997 wd 0.0500 time 0.1947 (0.2069) data time 0.0008 (0.0019) model time 0.1939 (0.2036) loss 4.0992 (3.7532) grad_norm 0.8355 (1.2678) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][550/625] eta 0:00:15 lr 0.001997 wd 0.0500 time 0.1973 (0.2068) data time 0.0008 (0.0019) model time 0.1965 (0.2036) loss 3.9761 (3.7581) grad_norm 1.1382 (1.2664) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][560/625] eta 0:00:13 lr 0.001997 wd 0.0500 time 0.2064 (0.2067) data time 0.0006 (0.0019) model time 0.2058 (0.2035) loss 3.9359 (3.7579) grad_norm 1.3748 (1.2636) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][570/625] eta 0:00:11 lr 0.001997 wd 0.0500 time 0.2002 (0.2067) data time 0.0008 (0.0018) model time 0.1993 (0.2035) loss 3.1860 (3.7563) grad_norm 1.6253 (1.2627) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][580/625] eta 0:00:09 lr 0.001997 wd 0.0500 time 0.1967 (0.2066) data time 0.0008 (0.0018) model time 0.1960 (0.2034) loss 4.4547 (3.7547) grad_norm 1.4177 (1.2629) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][590/625] eta 0:00:07 lr 0.001997 wd 0.0500 time 0.2025 (0.2065) data time 0.0009 (0.0018) model time 0.2016 (0.2034) loss 3.7292 (3.7542) grad_norm 1.0096 (1.2658) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][600/625] eta 0:00:05 lr 0.001997 wd 0.0500 time 0.1986 (0.2064) data time 0.0010 (0.0018) model time 0.1976 (0.2034) loss 3.7582 (3.7513) grad_norm 0.8658 (1.2620) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][610/625] eta 0:00:03 lr 0.001997 wd 0.0500 time 0.1934 (0.2063) data time 0.0004 (0.0018) model time 0.1930 (0.2033) loss 4.6743 (3.7589) grad_norm 1.2742 (1.2632) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][620/625] eta 0:00:01 lr 0.001997 wd 0.0500 time 0.1944 (0.2061) data time 0.0006 (0.0018) model time 0.1938 (0.2031) loss 4.2618 (3.7652) grad_norm 0.9967 (1.2620) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 26 training takes 0:02:08 [2024-07-29 13:38:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:38:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.636 (0.636) Loss 0.9663 (0.9663) Acc@1 81.299 (81.299) Acc@5 95.654 (95.654) Mem 8978MB [2024-07-29 13:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.116) Loss 1.5977 (1.1654) Acc@1 64.990 (75.395) Acc@5 88.867 (94.061) Mem 8978MB [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.086) Loss 1.7832 (1.4044) Acc@1 62.451 (70.615) Acc@5 85.986 (90.623) Mem 8978MB [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 70.673 Acc@5 90.643 [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 70.7% [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 70.67% [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:38:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.663 (0.663) Loss 5.7227 (5.7227) Acc@1 9.131 (9.131) Acc@5 20.166 (20.166) Mem 8978MB [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.118) Loss 5.8086 (5.9656) Acc@1 7.080 (5.513) Acc@5 18.750 (15.017) Mem 8978MB [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.087) Loss 5.9766 (5.9016) Acc@1 5.518 (5.897) Acc@5 13.965 (15.560) Mem 8978MB [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 6.336 Acc@5 16.509 [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 6.3% [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 6.34% [2024-07-29 13:38:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:38:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:38:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][0/625] eta 0:09:06 lr 0.001997 wd 0.0500 time 0.8744 (0.8744) data time 0.6733 (0.6733) model time 0.0000 (0.0000) loss 3.1876 (3.1876) grad_norm 1.4190 (1.4190) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:38:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:38:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:38:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:43:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:43:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:43:39 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 27) [2024-07-29 13:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:45:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:45:55 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 27) [2024-07-29 13:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:46:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][10/625] eta 0:11:30 lr 0.001997 wd 0.0500 time 0.2076 (1.1229) data time 0.0009 (0.1293) model time 0.0000 (0.0000) loss 4.0421 (4.2395) grad_norm 1.1703 (1.3207) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][20/625] eta 0:06:40 lr 0.001997 wd 0.0500 time 0.2009 (0.6626) data time 0.0007 (0.0652) model time 0.0000 (0.0000) loss 4.1136 (4.0223) grad_norm 1.1094 (1.3379) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][30/625] eta 0:05:06 lr 0.001997 wd 0.0500 time 0.2019 (0.5153) data time 0.0011 (0.0439) model time 0.0000 (0.0000) loss 3.8475 (4.0316) grad_norm 1.0600 (1.2710) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][40/625] eta 0:04:15 lr 0.001997 wd 0.0500 time 0.1981 (0.4370) data time 0.0008 (0.0332) model time 0.0000 (0.0000) loss 2.8096 (3.9312) grad_norm 0.8955 (1.2103) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][50/625] eta 0:03:44 lr 0.001997 wd 0.0500 time 0.2007 (0.3904) data time 0.0009 (0.0267) model time 0.0000 (0.0000) loss 3.7440 (3.9142) grad_norm 0.8402 (1.1755) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][60/625] eta 0:03:23 lr 0.001997 wd 0.0500 time 0.2133 (0.3594) data time 0.0007 (0.0225) model time 0.2126 (0.2032) loss 4.0087 (3.8927) grad_norm 0.9302 (1.1692) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][70/625] eta 0:03:07 lr 0.001997 wd 0.0500 time 0.1966 (0.3385) data time 0.0008 (0.0194) model time 0.1958 (0.2078) loss 2.8171 (3.8517) grad_norm 1.7512 (1.2757) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][80/625] eta 0:02:56 lr 0.001997 wd 0.0500 time 0.1939 (0.3237) data time 0.0010 (0.0171) model time 0.1929 (0.2115) loss 4.2558 (3.8336) grad_norm 1.0081 (1.2598) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][90/625] eta 0:02:46 lr 0.001997 wd 0.0500 time 0.2008 (0.3105) data time 0.0012 (0.0153) model time 0.1996 (0.2095) loss 4.1756 (3.8011) grad_norm 1.0302 (1.2515) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][100/625] eta 0:02:38 lr 0.001997 wd 0.0500 time 0.2076 (0.3010) data time 0.0011 (0.0139) model time 0.2066 (0.2105) loss 4.2460 (3.8158) grad_norm 0.9798 (1.2554) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][110/625] eta 0:02:30 lr 0.001997 wd 0.0500 time 0.1966 (0.2923) data time 0.0013 (0.0127) model time 0.1953 (0.2095) loss 3.8889 (3.8317) grad_norm 1.3957 (1.2581) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][120/625] eta 0:02:24 lr 0.001997 wd 0.0500 time 0.2212 (0.2857) data time 0.0010 (0.0118) model time 0.2202 (0.2098) loss 4.4816 (3.8326) grad_norm 0.9839 (1.2493) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][130/625] eta 0:02:18 lr 0.001997 wd 0.0500 time 0.1955 (0.2800) data time 0.0009 (0.0110) model time 0.1947 (0.2099) loss 3.8089 (3.8123) grad_norm 0.8297 (1.2261) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][140/625] eta 0:02:13 lr 0.001997 wd 0.0500 time 0.1964 (0.2747) data time 0.0011 (0.0102) model time 0.1953 (0.2093) loss 2.7667 (3.8108) grad_norm 0.8646 (1.2351) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][150/625] eta 0:02:08 lr 0.001997 wd 0.0500 time 0.2080 (0.2701) data time 0.0009 (0.0096) model time 0.2071 (0.2089) loss 4.0583 (3.8128) grad_norm 0.9916 (1.2216) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][160/625] eta 0:02:03 lr 0.001997 wd 0.0500 time 0.1992 (0.2664) data time 0.0012 (0.0091) model time 0.1980 (0.2090) loss 3.7190 (3.8110) grad_norm 1.4877 (1.2129) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:46:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][170/625] eta 0:01:59 lr 0.001997 wd 0.0500 time 0.2066 (0.2634) data time 0.0007 (0.0086) model time 0.2059 (0.2094) loss 2.9000 (3.8067) grad_norm 1.0912 (1.2017) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][180/625] eta 0:01:55 lr 0.001997 wd 0.0500 time 0.2041 (0.2600) data time 0.0009 (0.0082) model time 0.2032 (0.2088) loss 2.7076 (3.7893) grad_norm 2.0023 (1.2216) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][190/625] eta 0:01:52 lr 0.001997 wd 0.0500 time 0.2524 (0.2576) data time 0.0008 (0.0078) model time 0.2516 (0.2091) loss 3.2736 (3.7964) grad_norm 1.2204 (1.2185) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][200/625] eta 0:01:48 lr 0.001997 wd 0.0500 time 0.1966 (0.2552) data time 0.0011 (0.0075) model time 0.1954 (0.2091) loss 3.7253 (3.7845) grad_norm 0.9550 (1.2136) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][210/625] eta 0:01:44 lr 0.001997 wd 0.0500 time 0.1971 (0.2529) data time 0.0013 (0.0072) model time 0.1958 (0.2088) loss 3.6184 (3.7809) grad_norm 1.4492 (1.2120) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][220/625] eta 0:01:41 lr 0.001997 wd 0.0500 time 0.2071 (0.2515) data time 0.0009 (0.0069) model time 0.2062 (0.2096) loss 3.5639 (3.7731) grad_norm 2.0175 (1.2186) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][230/625] eta 0:01:38 lr 0.001997 wd 0.0500 time 0.2033 (0.2496) data time 0.0014 (0.0067) model time 0.2019 (0.2094) loss 3.9923 (3.7751) grad_norm 1.4889 (1.2266) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][240/625] eta 0:01:35 lr 0.001997 wd 0.0500 time 0.2051 (0.2476) data time 0.0009 (0.0064) model time 0.2042 (0.2089) loss 3.9457 (3.7630) grad_norm 1.2313 (1.2190) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][250/625] eta 0:01:32 lr 0.001997 wd 0.0500 time 0.2044 (0.2458) data time 0.0009 (0.0062) model time 0.2035 (0.2086) loss 2.8587 (3.7529) grad_norm 0.9838 (1.2106) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][260/625] eta 0:01:29 lr 0.001997 wd 0.0500 time 0.2050 (0.2445) data time 0.0013 (0.0060) model time 0.2038 (0.2087) loss 3.3941 (3.7431) grad_norm 0.8165 (1.2147) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][270/625] eta 0:01:26 lr 0.001997 wd 0.0500 time 0.2042 (0.2432) data time 0.0007 (0.0058) model time 0.2036 (0.2087) loss 4.7977 (3.7413) grad_norm 1.4725 (1.2224) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][280/625] eta 0:01:23 lr 0.001997 wd 0.0500 time 0.2346 (0.2418) data time 0.0010 (0.0057) model time 0.2336 (0.2084) loss 3.9722 (3.7487) grad_norm 0.8738 (1.2260) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][290/625] eta 0:01:20 lr 0.001997 wd 0.0500 time 0.2031 (0.2408) data time 0.0010 (0.0055) model time 0.2021 (0.2086) loss 2.8117 (3.7424) grad_norm 0.9828 (1.2385) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][300/625] eta 0:01:17 lr 0.001997 wd 0.0500 time 0.2080 (0.2397) data time 0.0011 (0.0053) model time 0.2069 (0.2085) loss 3.1619 (3.7318) grad_norm 0.9251 (1.2382) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][310/625] eta 0:01:15 lr 0.001997 wd 0.0500 time 0.2481 (0.2387) data time 0.0009 (0.0052) model time 0.2472 (0.2085) loss 3.8367 (3.7321) grad_norm 0.9692 (1.2362) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][320/625] eta 0:01:12 lr 0.001996 wd 0.0500 time 0.2102 (0.2388) data time 0.0013 (0.0051) model time 0.2089 (0.2097) loss 3.5185 (3.7407) grad_norm 0.7542 (1.2304) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][330/625] eta 0:01:10 lr 0.001996 wd 0.0500 time 0.2064 (0.2377) data time 0.0008 (0.0050) model time 0.2055 (0.2094) loss 4.1961 (3.7466) grad_norm 0.7659 (1.2284) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][340/625] eta 0:01:07 lr 0.001996 wd 0.0500 time 0.2052 (0.2367) data time 0.0012 (0.0049) model time 0.2040 (0.2091) loss 4.1839 (3.7471) grad_norm 1.2234 (1.2358) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][350/625] eta 0:01:04 lr 0.001996 wd 0.0500 time 0.2036 (0.2359) data time 0.0010 (0.0047) model time 0.2026 (0.2091) loss 3.2213 (3.7524) grad_norm 1.5619 (1.2450) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][360/625] eta 0:01:02 lr 0.001996 wd 0.0500 time 0.1948 (0.2351) data time 0.0011 (0.0046) model time 0.1937 (0.2090) loss 4.4071 (3.7533) grad_norm 0.8879 (1.2468) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][370/625] eta 0:00:59 lr 0.001996 wd 0.0500 time 0.2013 (0.2342) data time 0.0009 (0.0045) model time 0.2004 (0.2087) loss 4.3122 (3.7532) grad_norm 1.1660 (1.2475) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][380/625] eta 0:00:57 lr 0.001996 wd 0.0500 time 0.1974 (0.2336) data time 0.0007 (0.0045) model time 0.1967 (0.2088) loss 3.4951 (3.7500) grad_norm 1.1998 (1.2423) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][390/625] eta 0:00:54 lr 0.001996 wd 0.0500 time 0.2019 (0.2333) data time 0.0009 (0.0044) model time 0.2009 (0.2092) loss 2.7057 (3.7412) grad_norm 1.9683 (1.2416) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][400/625] eta 0:00:52 lr 0.001996 wd 0.0500 time 0.2083 (0.2327) data time 0.0010 (0.0043) model time 0.2073 (0.2091) loss 3.7572 (3.7485) grad_norm 1.8579 (1.2580) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][410/625] eta 0:00:49 lr 0.001996 wd 0.0500 time 0.2184 (0.2325) data time 0.0006 (0.0042) model time 0.2177 (0.2095) loss 3.9004 (3.7538) grad_norm 1.0895 (1.2578) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][420/625] eta 0:00:47 lr 0.001996 wd 0.0500 time 0.1971 (0.2325) data time 0.0008 (0.0041) model time 0.1963 (0.2101) loss 4.1513 (3.7513) grad_norm 1.1775 (1.2541) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][430/625] eta 0:00:45 lr 0.001996 wd 0.0500 time 0.2016 (0.2318) data time 0.0012 (0.0041) model time 0.2004 (0.2099) loss 3.6097 (3.7592) grad_norm 1.0980 (1.2577) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][440/625] eta 0:00:42 lr 0.001996 wd 0.0500 time 0.2002 (0.2314) data time 0.0010 (0.0040) model time 0.1992 (0.2099) loss 3.5201 (3.7627) grad_norm 0.9859 (1.2535) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][450/625] eta 0:00:40 lr 0.001996 wd 0.0500 time 0.2192 (0.2307) data time 0.0007 (0.0039) model time 0.2185 (0.2097) loss 4.2071 (3.7654) grad_norm 1.2143 (1.2544) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][460/625] eta 0:00:37 lr 0.001996 wd 0.0500 time 0.2066 (0.2302) data time 0.0007 (0.0039) model time 0.2059 (0.2096) loss 3.7907 (3.7601) grad_norm 0.8715 (1.2595) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][470/625] eta 0:00:35 lr 0.001996 wd 0.0500 time 0.2074 (0.2300) data time 0.0012 (0.0038) model time 0.2062 (0.2098) loss 3.9056 (3.7506) grad_norm 1.2582 (1.2579) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][480/625] eta 0:00:33 lr 0.001996 wd 0.0500 time 0.2048 (0.2294) data time 0.0009 (0.0037) model time 0.2039 (0.2096) loss 3.0496 (3.7484) grad_norm 0.8697 (1.2519) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][490/625] eta 0:00:30 lr 0.001996 wd 0.0500 time 0.2064 (0.2289) data time 0.0008 (0.0037) model time 0.2055 (0.2095) loss 3.2061 (3.7526) grad_norm 0.9757 (1.2469) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][500/625] eta 0:00:28 lr 0.001996 wd 0.0500 time 0.2006 (0.2286) data time 0.0006 (0.0036) model time 0.2000 (0.2095) loss 2.4774 (3.7484) grad_norm 0.8319 (1.2408) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][510/625] eta 0:00:26 lr 0.001996 wd 0.0500 time 0.1969 (0.2283) data time 0.0009 (0.0036) model time 0.1960 (0.2096) loss 3.7635 (3.7538) grad_norm 1.3615 (1.2397) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][520/625] eta 0:00:23 lr 0.001996 wd 0.0500 time 0.2088 (0.2279) data time 0.0008 (0.0035) model time 0.2081 (0.2095) loss 4.0261 (3.7542) grad_norm 1.0604 (1.2398) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][530/625] eta 0:00:21 lr 0.001996 wd 0.0500 time 0.2067 (0.2275) data time 0.0009 (0.0035) model time 0.2058 (0.2095) loss 3.4601 (3.7492) grad_norm 1.3013 (1.2394) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][540/625] eta 0:00:19 lr 0.001996 wd 0.0500 time 0.1962 (0.2271) data time 0.0007 (0.0034) model time 0.1955 (0.2094) loss 3.9423 (3.7468) grad_norm 1.1906 (1.2378) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][550/625] eta 0:00:17 lr 0.001996 wd 0.0500 time 0.2481 (0.2269) data time 0.0016 (0.0034) model time 0.2465 (0.2095) loss 2.9842 (3.7465) grad_norm 2.1027 (1.2407) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][560/625] eta 0:00:14 lr 0.001996 wd 0.0500 time 0.2044 (0.2267) data time 0.0010 (0.0034) model time 0.2034 (0.2096) loss 3.8424 (3.7541) grad_norm 1.3996 (1.2482) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][570/625] eta 0:00:12 lr 0.001996 wd 0.0500 time 0.2032 (0.2262) data time 0.0009 (0.0033) model time 0.2023 (0.2094) loss 3.9665 (3.7563) grad_norm 0.9301 (1.2508) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][580/625] eta 0:00:10 lr 0.001996 wd 0.0500 time 0.2003 (0.2258) data time 0.0010 (0.0033) model time 0.1992 (0.2092) loss 3.5082 (3.7582) grad_norm 1.0343 (1.2461) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][590/625] eta 0:00:07 lr 0.001996 wd 0.0500 time 0.2049 (0.2256) data time 0.0007 (0.0032) model time 0.2042 (0.2093) loss 4.3896 (3.7617) grad_norm 1.9348 (1.2491) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][600/625] eta 0:00:05 lr 0.001996 wd 0.0500 time 0.2032 (0.2254) data time 0.0009 (0.0032) model time 0.2023 (0.2094) loss 4.2071 (3.7634) grad_norm 0.9434 (1.2476) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][610/625] eta 0:00:03 lr 0.001996 wd 0.0500 time 0.1921 (0.2250) data time 0.0005 (0.0032) model time 0.1916 (0.2092) loss 3.6248 (3.7610) grad_norm 2.1682 (1.2481) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][620/625] eta 0:00:01 lr 0.001996 wd 0.0500 time 0.1934 (0.2248) data time 0.0007 (0.0031) model time 0.1927 (0.2092) loss 3.9738 (3.7618) grad_norm 0.7879 (1.2453) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 13:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 27 training takes 0:02:20 [2024-07-29 13:48:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:48:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.7959 (0.7959) Acc@1 83.691 (83.691) Acc@5 97.070 (97.070) Mem 8978MB [2024-07-29 13:48:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.4688 (1.0787) Acc@1 67.480 (76.598) Acc@5 89.209 (94.363) Mem 8978MB [2024-07-29 13:48:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.079) Loss 1.6104 (1.3027) Acc@1 65.039 (71.791) Acc@5 87.109 (91.271) Mem 8978MB [2024-07-29 13:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.625 Acc@5 91.263 [2024-07-29 13:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 71.6% [2024-07-29 13:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 71.63% [2024-07-29 13:48:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:48:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 5.3008 (5.3008) Acc@1 13.330 (13.330) Acc@5 28.906 (28.906) Mem 8978MB [2024-07-29 13:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.104) Loss 5.4648 (5.4936) Acc@1 10.498 (9.783) Acc@5 24.658 (23.389) Mem 8978MB [2024-07-29 13:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.080) Loss 5.6602 (5.4970) Acc@1 8.203 (9.447) Acc@5 18.994 (22.710) Mem 8978MB [2024-07-29 13:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 9.843 Acc@5 23.452 [2024-07-29 13:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 9.8% [2024-07-29 13:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 9.84% [2024-07-29 13:48:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:48:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:48:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][0/625] eta 0:08:52 lr 0.001996 wd 0.0500 time 0.8513 (0.8513) data time 0.5255 (0.5255) model time 0.0000 (0.0000) loss 3.6914 (3.6914) grad_norm 0.8760 (0.8760) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 13:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][10/625] eta 0:02:46 lr 0.001996 wd 0.0500 time 0.2011 (0.2702) data time 0.0009 (0.0488) model time 0.0000 (0.0000) loss 3.3759 (3.7169) grad_norm 0.8295 (1.2302) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][20/625] eta 0:02:28 lr 0.001996 wd 0.0500 time 0.1966 (0.2458) data time 0.0010 (0.0260) model time 0.0000 (0.0000) loss 2.9328 (3.5916) grad_norm 1.1167 (1.1099) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][30/625] eta 0:02:23 lr 0.001996 wd 0.0500 time 0.4606 (0.2407) data time 0.0009 (0.0180) model time 0.0000 (0.0000) loss 3.3249 (3.6949) grad_norm 1.9872 (1.1711) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:48:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][40/625] eta 0:02:17 lr 0.001996 wd 0.0500 time 0.2069 (0.2353) data time 0.0010 (0.0138) model time 0.0000 (0.0000) loss 3.4134 (3.6211) grad_norm 1.9548 (1.2165) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:48:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][50/625] eta 0:02:11 lr 0.001996 wd 0.0500 time 0.1987 (0.2288) data time 0.0009 (0.0113) model time 0.0000 (0.0000) loss 4.3400 (3.7333) grad_norm 0.9189 (1.2320) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][60/625] eta 0:02:07 lr 0.001996 wd 0.0500 time 0.2500 (0.2263) data time 0.0013 (0.0096) model time 0.2486 (0.2126) loss 3.9699 (3.7249) grad_norm 1.4912 (1.2303) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][70/625] eta 0:02:05 lr 0.001996 wd 0.0500 time 0.2051 (0.2253) data time 0.0011 (0.0084) model time 0.2041 (0.2156) loss 3.2888 (3.7109) grad_norm 0.9060 (1.2428) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][80/625] eta 0:02:01 lr 0.001996 wd 0.0500 time 0.2102 (0.2230) data time 0.0009 (0.0075) model time 0.2094 (0.2123) loss 4.2495 (3.7051) grad_norm 1.5944 (1.2371) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][90/625] eta 0:01:58 lr 0.001996 wd 0.0500 time 0.1983 (0.2213) data time 0.0009 (0.0068) model time 0.1974 (0.2108) loss 4.0878 (3.6635) grad_norm 1.1653 (1.2443) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][100/625] eta 0:01:56 lr 0.001996 wd 0.0500 time 0.2456 (0.2210) data time 0.0010 (0.0062) model time 0.2446 (0.2121) loss 3.4180 (3.6541) grad_norm 1.2178 (1.2346) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][110/625] eta 0:01:53 lr 0.001996 wd 0.0500 time 0.2006 (0.2208) data time 0.0007 (0.0057) model time 0.1999 (0.2130) loss 3.6150 (3.6776) grad_norm 1.0257 (1.2164) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][120/625] eta 0:01:50 lr 0.001996 wd 0.0500 time 0.2027 (0.2195) data time 0.0011 (0.0053) model time 0.2016 (0.2117) loss 3.6078 (3.6856) grad_norm 0.7562 (1.2127) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][130/625] eta 0:01:48 lr 0.001996 wd 0.0500 time 0.2321 (0.2185) data time 0.0013 (0.0050) model time 0.2309 (0.2110) loss 2.9176 (3.6740) grad_norm 0.9923 (1.2077) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][140/625] eta 0:01:45 lr 0.001996 wd 0.0500 time 0.2120 (0.2180) data time 0.0008 (0.0047) model time 0.2113 (0.2109) loss 4.2305 (3.6936) grad_norm 1.2471 (1.2035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][150/625] eta 0:01:43 lr 0.001996 wd 0.0500 time 0.2048 (0.2171) data time 0.0011 (0.0045) model time 0.2036 (0.2101) loss 2.8104 (3.7027) grad_norm 1.2026 (1.2003) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][160/625] eta 0:01:41 lr 0.001996 wd 0.0500 time 0.2270 (0.2178) data time 0.0014 (0.0043) model time 0.2255 (0.2116) loss 2.8167 (3.7051) grad_norm 1.3104 (1.2233) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][170/625] eta 0:01:38 lr 0.001996 wd 0.0500 time 0.1981 (0.2171) data time 0.0010 (0.0041) model time 0.1972 (0.2111) loss 3.4403 (3.7108) grad_norm 1.0982 (1.2165) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][180/625] eta 0:01:36 lr 0.001996 wd 0.0500 time 0.2029 (0.2163) data time 0.0008 (0.0039) model time 0.2021 (0.2103) loss 4.7599 (3.7079) grad_norm 1.0298 (1.2206) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][190/625] eta 0:01:33 lr 0.001996 wd 0.0500 time 0.2206 (0.2157) data time 0.0008 (0.0038) model time 0.2197 (0.2099) loss 4.1482 (3.6895) grad_norm 1.2633 (1.2229) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][200/625] eta 0:01:31 lr 0.001996 wd 0.0500 time 0.1979 (0.2156) data time 0.0007 (0.0036) model time 0.1972 (0.2101) loss 3.1469 (3.6917) grad_norm 1.0776 (1.2198) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][210/625] eta 0:01:29 lr 0.001996 wd 0.0500 time 0.2040 (0.2156) data time 0.0014 (0.0035) model time 0.2026 (0.2103) loss 2.4528 (3.6760) grad_norm 1.2213 (1.2198) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][220/625] eta 0:01:27 lr 0.001996 wd 0.0500 time 0.2296 (0.2151) data time 0.0011 (0.0034) model time 0.2284 (0.2100) loss 2.7097 (3.6759) grad_norm 2.1541 (1.2291) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][230/625] eta 0:01:25 lr 0.001996 wd 0.0500 time 0.2040 (0.2152) data time 0.0009 (0.0033) model time 0.2032 (0.2103) loss 4.3543 (3.6865) grad_norm 1.1006 (1.2387) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][240/625] eta 0:01:22 lr 0.001996 wd 0.0500 time 0.1994 (0.2147) data time 0.0010 (0.0032) model time 0.1984 (0.2099) loss 3.1188 (3.6848) grad_norm 0.9969 (1.2343) loss_scale 32768.0000 (17063.8340) mem 8975MB [2024-07-29 13:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:49:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:49:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 13:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 13:52:48 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 28) [2024-07-29 13:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 13:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][250/625] eta 0:06:55 lr 0.001996 wd 0.0500 time 0.2235 (1.1087) data time 0.0008 (0.0656) model time 0.2227 (1.0430) loss 3.8860 (4.2451) grad_norm 1.2256 (1.1330) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][260/625] eta 0:03:53 lr 0.001996 wd 0.0500 time 0.2124 (0.6387) data time 0.0010 (0.0317) model time 0.2114 (0.6070) loss 4.0803 (4.0037) grad_norm 1.5771 (1.3331) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][270/625] eta 0:02:55 lr 0.001996 wd 0.0500 time 0.2159 (0.4932) data time 0.0007 (0.0212) model time 0.2152 (0.4720) loss 4.3585 (4.0409) grad_norm 2.0490 (1.3155) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][280/625] eta 0:02:25 lr 0.001996 wd 0.0500 time 0.2207 (0.4216) data time 0.0011 (0.0161) model time 0.2196 (0.4056) loss 4.0071 (3.9410) grad_norm 1.4563 (1.2646) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][290/625] eta 0:02:07 lr 0.001996 wd 0.0500 time 0.2130 (0.3804) data time 0.0010 (0.0130) model time 0.2119 (0.3673) loss 3.5310 (3.9149) grad_norm 1.1385 (1.3125) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][300/625] eta 0:01:54 lr 0.001996 wd 0.0500 time 0.2131 (0.3526) data time 0.0008 (0.0110) model time 0.2123 (0.3416) loss 2.7305 (3.8648) grad_norm 1.0429 (1.2706) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][310/625] eta 0:01:44 lr 0.001996 wd 0.0500 time 0.2066 (0.3324) data time 0.0010 (0.0096) model time 0.2056 (0.3228) loss 3.8533 (3.8429) grad_norm 1.0964 (1.2440) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][320/625] eta 0:01:36 lr 0.001995 wd 0.0500 time 0.2322 (0.3178) data time 0.0010 (0.0086) model time 0.2312 (0.3093) loss 3.8461 (3.8181) grad_norm 1.2950 (1.2429) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][330/625] eta 0:01:30 lr 0.001995 wd 0.0500 time 0.2119 (0.3066) data time 0.0008 (0.0077) model time 0.2112 (0.2989) loss 3.6296 (3.7883) grad_norm 0.7594 (1.2224) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][340/625] eta 0:01:24 lr 0.001995 wd 0.0500 time 0.2330 (0.2975) data time 0.0012 (0.0071) model time 0.2319 (0.2904) loss 3.9688 (3.8040) grad_norm 1.3794 (1.2144) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][350/625] eta 0:01:19 lr 0.001995 wd 0.0500 time 0.2111 (0.2902) data time 0.0010 (0.0065) model time 0.2101 (0.2837) loss 4.6966 (3.8274) grad_norm 0.9494 (1.2030) loss_scale 32768.0000 (32768.0000) mem 8977MB [2024-07-29 13:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][360/625] eta 0:01:15 lr 0.001995 wd 0.0500 time 0.2129 (0.2839) data time 0.0011 (0.0061) model time 0.2118 (0.2778) loss 4.2248 (3.8280) grad_norm 0.8667 (inf) loss_scale 16384.0000 (31391.1933) mem 8977MB [2024-07-29 13:53:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][370/625] eta 0:01:11 lr 0.001995 wd 0.0500 time 0.2119 (0.2788) data time 0.0010 (0.0057) model time 0.2109 (0.2731) loss 3.8384 (3.8152) grad_norm 1.1325 (inf) loss_scale 16384.0000 (30227.8450) mem 8977MB [2024-07-29 13:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][380/625] eta 0:01:07 lr 0.001995 wd 0.0500 time 0.2171 (0.2742) data time 0.0008 (0.0054) model time 0.2163 (0.2688) loss 4.2850 (3.8148) grad_norm 1.3490 (inf) loss_scale 16384.0000 (29231.8849) mem 8977MB [2024-07-29 13:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][390/625] eta 0:01:03 lr 0.001995 wd 0.0500 time 0.2091 (0.2701) data time 0.0009 (0.0051) model time 0.2081 (0.2651) loss 3.4254 (3.8039) grad_norm 0.9385 (inf) loss_scale 16384.0000 (28369.6107) mem 8977MB [2024-07-29 13:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][400/625] eta 0:01:00 lr 0.001995 wd 0.0500 time 0.2201 (0.2667) data time 0.0009 (0.0048) model time 0.2193 (0.2619) loss 4.8909 (3.8079) grad_norm 1.0732 (inf) loss_scale 16384.0000 (27615.7987) mem 8977MB [2024-07-29 13:53:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][410/625] eta 0:00:56 lr 0.001995 wd 0.0500 time 0.2248 (0.2639) data time 0.0010 (0.0046) model time 0.2238 (0.2593) loss 4.1825 (3.8154) grad_norm 1.7091 (inf) loss_scale 16384.0000 (26951.1953) mem 8977MB [2024-07-29 13:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][420/625] eta 0:00:53 lr 0.001995 wd 0.0500 time 0.2136 (0.2613) data time 0.0008 (0.0044) model time 0.2128 (0.2569) loss 3.6108 (3.7980) grad_norm 1.3245 (inf) loss_scale 16384.0000 (26360.8492) mem 8977MB [2024-07-29 13:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][430/625] eta 0:00:50 lr 0.001995 wd 0.0500 time 0.2142 (0.2588) data time 0.0008 (0.0043) model time 0.2135 (0.2545) loss 4.6473 (3.7986) grad_norm 0.7852 (inf) loss_scale 16384.0000 (25832.9735) mem 8977MB [2024-07-29 13:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][440/625] eta 0:00:47 lr 0.001995 wd 0.0500 time 0.2156 (0.2567) data time 0.0010 (0.0041) model time 0.2146 (0.2526) loss 2.7271 (3.7822) grad_norm 0.9296 (inf) loss_scale 16384.0000 (25358.1508) mem 8977MB [2024-07-29 13:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][450/625] eta 0:00:44 lr 0.001995 wd 0.0500 time 0.2228 (0.2547) data time 0.0008 (0.0040) model time 0.2220 (0.2507) loss 3.5740 (3.7732) grad_norm 1.6543 (inf) loss_scale 16384.0000 (24928.7656) mem 8977MB [2024-07-29 13:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][460/625] eta 0:00:41 lr 0.001995 wd 0.0500 time 0.2167 (0.2530) data time 0.0009 (0.0038) model time 0.2158 (0.2491) loss 4.2111 (3.7666) grad_norm 0.9812 (inf) loss_scale 16384.0000 (24538.5936) mem 8977MB [2024-07-29 13:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][470/625] eta 0:00:38 lr 0.001995 wd 0.0500 time 0.2132 (0.2514) data time 0.0011 (0.0037) model time 0.2121 (0.2477) loss 2.7596 (3.7659) grad_norm 1.0721 (inf) loss_scale 16384.0000 (24182.4978) mem 8977MB [2024-07-29 13:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][480/625] eta 0:00:36 lr 0.001995 wd 0.0500 time 0.2160 (0.2500) data time 0.0009 (0.0036) model time 0.2151 (0.2463) loss 2.7716 (3.7607) grad_norm 1.5526 (inf) loss_scale 16384.0000 (23856.2008) mem 8977MB [2024-07-29 13:54:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][490/625] eta 0:00:33 lr 0.001995 wd 0.0500 time 0.2184 (0.2487) data time 0.0009 (0.0035) model time 0.2175 (0.2452) loss 3.3180 (3.7496) grad_norm 0.8334 (inf) loss_scale 16384.0000 (23556.1124) mem 8977MB [2024-07-29 13:54:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][500/625] eta 0:00:30 lr 0.001995 wd 0.0500 time 0.2164 (0.2474) data time 0.0012 (0.0034) model time 0.2152 (0.2440) loss 4.4856 (3.7362) grad_norm 0.9507 (inf) loss_scale 16384.0000 (23279.1969) mem 8977MB [2024-07-29 13:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][510/625] eta 0:00:28 lr 0.001995 wd 0.0500 time 0.2275 (0.2464) data time 0.0008 (0.0034) model time 0.2267 (0.2430) loss 2.4130 (3.7243) grad_norm 0.9339 (inf) loss_scale 16384.0000 (23022.8699) mem 8977MB [2024-07-29 13:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][520/625] eta 0:00:25 lr 0.001995 wd 0.0500 time 0.2069 (0.2451) data time 0.0014 (0.0033) model time 0.2055 (0.2418) loss 4.2112 (3.7341) grad_norm 1.2048 (inf) loss_scale 16384.0000 (22784.9176) mem 8977MB [2024-07-29 13:54:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][530/625] eta 0:00:23 lr 0.001995 wd 0.0500 time 0.2155 (0.2440) data time 0.0010 (0.0032) model time 0.2145 (0.2408) loss 3.6240 (3.7341) grad_norm 1.4247 (inf) loss_scale 16384.0000 (22563.4325) mem 8977MB [2024-07-29 13:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][540/625] eta 0:00:20 lr 0.001995 wd 0.0500 time 0.2188 (0.2431) data time 0.0010 (0.0032) model time 0.2179 (0.2399) loss 3.6395 (3.7181) grad_norm 0.8218 (inf) loss_scale 16384.0000 (22356.7625) mem 8977MB [2024-07-29 13:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][550/625] eta 0:00:18 lr 0.001995 wd 0.0500 time 0.2042 (0.2422) data time 0.0012 (0.0031) model time 0.2031 (0.2391) loss 4.0921 (3.7149) grad_norm 0.8947 (inf) loss_scale 16384.0000 (22163.4693) mem 8977MB [2024-07-29 13:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][560/625] eta 0:00:15 lr 0.001995 wd 0.0500 time 0.2111 (0.2414) data time 0.0009 (0.0030) model time 0.2102 (0.2384) loss 4.3548 (3.7286) grad_norm 0.8914 (inf) loss_scale 16384.0000 (21982.2947) mem 8977MB [2024-07-29 13:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][570/625] eta 0:00:13 lr 0.001995 wd 0.0500 time 0.2140 (0.2406) data time 0.0011 (0.0030) model time 0.2128 (0.2376) loss 2.4618 (3.7341) grad_norm 1.2180 (inf) loss_scale 16384.0000 (21812.1337) mem 8977MB [2024-07-29 13:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][580/625] eta 0:00:10 lr 0.001995 wd 0.0500 time 0.2123 (0.2397) data time 0.0010 (0.0029) model time 0.2113 (0.2368) loss 3.7537 (3.7342) grad_norm 0.8342 (inf) loss_scale 16384.0000 (21652.0118) mem 8977MB [2024-07-29 13:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][590/625] eta 0:00:08 lr 0.001995 wd 0.0500 time 0.2061 (0.2390) data time 0.0010 (0.0029) model time 0.2051 (0.2361) loss 4.1440 (3.7349) grad_norm 1.3210 (inf) loss_scale 16384.0000 (21501.0659) mem 8977MB [2024-07-29 13:54:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][600/625] eta 0:00:05 lr 0.001995 wd 0.0500 time 0.2140 (0.2383) data time 0.0008 (0.0028) model time 0.2133 (0.2355) loss 3.4741 (3.7320) grad_norm 1.0208 (inf) loss_scale 16384.0000 (21358.5292) mem 8977MB [2024-07-29 13:54:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][610/625] eta 0:00:03 lr 0.001995 wd 0.0500 time 0.2093 (0.2376) data time 0.0007 (0.0028) model time 0.2085 (0.2348) loss 4.1121 (3.7314) grad_norm 1.1932 (inf) loss_scale 16384.0000 (21223.7182) mem 8977MB [2024-07-29 13:54:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][620/625] eta 0:00:01 lr 0.001995 wd 0.0500 time 0.2098 (0.2369) data time 0.0005 (0.0027) model time 0.2094 (0.2342) loss 4.1041 (3.7288) grad_norm 0.7718 (inf) loss_scale 16384.0000 (21096.0211) mem 8977MB [2024-07-29 13:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 28 training takes 0:01:30 [2024-07-29 13:54:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:54:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 13:54:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.8892 (0.8892) Acc@1 82.666 (82.666) Acc@5 96.240 (96.240) Mem 8977MB [2024-07-29 13:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.101) Loss 1.4775 (1.0961) Acc@1 66.895 (76.665) Acc@5 89.307 (94.141) Mem 8977MB [2024-07-29 13:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.6152 (1.3163) Acc@1 66.016 (71.968) Acc@5 87.305 (91.164) Mem 8977MB [2024-07-29 13:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.753 Acc@5 91.103 [2024-07-29 13:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 71.8% [2024-07-29 13:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 71.75% [2024-07-29 13:54:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 13:54:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 13:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.443 (0.443) Loss 4.8477 (4.8477) Acc@1 18.994 (18.994) Acc@5 36.328 (36.328) Mem 8977MB [2024-07-29 13:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.096) Loss 5.0938 (5.0124) Acc@1 13.965 (14.551) Acc@5 30.371 (31.667) Mem 8977MB [2024-07-29 13:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 5.3125 (5.0748) Acc@1 11.133 (13.707) Acc@5 23.584 (29.955) Mem 8977MB [2024-07-29 13:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 13.958 Acc@5 30.556 [2024-07-29 13:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 14.0% [2024-07-29 13:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 13.96% [2024-07-29 13:54:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 13:54:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 13:54:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][0/625] eta 0:07:47 lr 0.001995 wd 0.0500 time 0.7475 (0.7475) data time 0.3951 (0.3951) model time 0.0000 (0.0000) loss 2.5665 (2.5665) grad_norm 0.9529 (0.9529) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 13:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][10/625] eta 0:02:40 lr 0.001995 wd 0.0500 time 0.2227 (0.2618) data time 0.0009 (0.0369) model time 0.0000 (0.0000) loss 3.7624 (3.6138) grad_norm 0.9112 (1.0003) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][20/625] eta 0:02:24 lr 0.001995 wd 0.0500 time 0.2121 (0.2396) data time 0.0010 (0.0199) model time 0.0000 (0.0000) loss 3.9780 (3.8141) grad_norm 1.0259 (1.0529) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:54:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][30/625] eta 0:02:18 lr 0.001995 wd 0.0500 time 0.2137 (0.2329) data time 0.0013 (0.0139) model time 0.0000 (0.0000) loss 3.2009 (3.8270) grad_norm 2.3963 (1.1280) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][40/625] eta 0:02:16 lr 0.001995 wd 0.0500 time 0.2101 (0.2340) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 4.3360 (3.7928) grad_norm 1.4110 (1.1826) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][50/625] eta 0:02:12 lr 0.001995 wd 0.0500 time 0.2105 (0.2307) data time 0.0013 (0.0088) model time 0.0000 (0.0000) loss 3.9954 (3.8444) grad_norm 1.8054 (1.1745) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][60/625] eta 0:02:09 lr 0.001995 wd 0.0500 time 0.2199 (0.2286) data time 0.0011 (0.0076) model time 0.2188 (0.2166) loss 3.8025 (3.8429) grad_norm 0.7765 (1.2083) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][70/625] eta 0:02:05 lr 0.001995 wd 0.0500 time 0.2156 (0.2269) data time 0.0009 (0.0067) model time 0.2147 (0.2161) loss 4.2890 (3.7860) grad_norm 1.6251 (1.1961) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][80/625] eta 0:02:02 lr 0.001995 wd 0.0500 time 0.2060 (0.2256) data time 0.0010 (0.0060) model time 0.2050 (0.2158) loss 4.2333 (3.7325) grad_norm 1.1283 (1.1773) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][90/625] eta 0:02:00 lr 0.001995 wd 0.0500 time 0.2178 (0.2246) data time 0.0012 (0.0055) model time 0.2166 (0.2156) loss 3.7191 (3.6890) grad_norm 1.0485 (1.1783) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][100/625] eta 0:01:57 lr 0.001995 wd 0.0500 time 0.2090 (0.2235) data time 0.0010 (0.0050) model time 0.2079 (0.2150) loss 3.5969 (3.7044) grad_norm 1.1596 (1.1912) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][110/625] eta 0:01:54 lr 0.001995 wd 0.0500 time 0.2109 (0.2225) data time 0.0013 (0.0047) model time 0.2096 (0.2144) loss 3.4952 (3.7093) grad_norm 1.4166 (1.1821) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][120/625] eta 0:01:51 lr 0.001995 wd 0.0500 time 0.2142 (0.2217) data time 0.0011 (0.0044) model time 0.2132 (0.2140) loss 3.6929 (3.7012) grad_norm 1.2462 (1.1890) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][130/625] eta 0:01:49 lr 0.001995 wd 0.0500 time 0.2211 (0.2213) data time 0.0009 (0.0041) model time 0.2201 (0.2141) loss 4.1725 (3.7340) grad_norm 1.4528 (1.1827) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][140/625] eta 0:01:47 lr 0.001995 wd 0.0500 time 0.2110 (0.2209) data time 0.0011 (0.0039) model time 0.2099 (0.2141) loss 2.7156 (3.7111) grad_norm 1.7420 (1.1866) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][150/625] eta 0:01:44 lr 0.001995 wd 0.0500 time 0.2146 (0.2205) data time 0.0007 (0.0037) model time 0.2138 (0.2141) loss 2.5126 (3.7012) grad_norm 0.7724 (1.1920) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][160/625] eta 0:01:42 lr 0.001995 wd 0.0500 time 0.2085 (0.2201) data time 0.0010 (0.0036) model time 0.2075 (0.2140) loss 4.2294 (3.7046) grad_norm 1.1398 (1.2035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][170/625] eta 0:01:39 lr 0.001995 wd 0.0500 time 0.2183 (0.2197) data time 0.0010 (0.0034) model time 0.2173 (0.2139) loss 3.8714 (3.7181) grad_norm 2.1482 (1.2066) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][180/625] eta 0:01:38 lr 0.001995 wd 0.0500 time 0.2086 (0.2223) data time 0.0009 (0.0033) model time 0.2077 (0.2178) loss 4.2148 (3.7360) grad_norm 1.9410 (1.2090) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][190/625] eta 0:01:36 lr 0.001995 wd 0.0500 time 0.2116 (0.2219) data time 0.0007 (0.0032) model time 0.2109 (0.2175) loss 3.7077 (3.7380) grad_norm 1.5225 (1.2172) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][200/625] eta 0:01:34 lr 0.001995 wd 0.0500 time 0.2035 (0.2216) data time 0.0010 (0.0031) model time 0.2026 (0.2174) loss 2.7207 (3.7378) grad_norm 1.5519 (1.2137) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][210/625] eta 0:01:31 lr 0.001995 wd 0.0500 time 0.2170 (0.2213) data time 0.0007 (0.0030) model time 0.2163 (0.2172) loss 4.0689 (3.7437) grad_norm 1.2371 (1.2248) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][220/625] eta 0:01:29 lr 0.001995 wd 0.0500 time 0.2059 (0.2211) data time 0.0010 (0.0029) model time 0.2049 (0.2171) loss 4.0193 (3.7370) grad_norm 0.9794 (1.2191) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][230/625] eta 0:01:27 lr 0.001995 wd 0.0500 time 0.2105 (0.2210) data time 0.0008 (0.0028) model time 0.2097 (0.2171) loss 3.5923 (3.7388) grad_norm 1.4909 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][240/625] eta 0:01:24 lr 0.001995 wd 0.0500 time 0.2250 (0.2207) data time 0.0008 (0.0028) model time 0.2242 (0.2169) loss 4.0903 (3.7536) grad_norm 1.6692 (1.2316) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][250/625] eta 0:01:22 lr 0.001994 wd 0.0500 time 0.2245 (0.2206) data time 0.0011 (0.0027) model time 0.2233 (0.2169) loss 3.9980 (3.7557) grad_norm 0.9928 (1.2319) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][260/625] eta 0:01:20 lr 0.001994 wd 0.0500 time 0.2237 (0.2205) data time 0.0007 (0.0026) model time 0.2230 (0.2169) loss 3.6177 (3.7448) grad_norm 0.7914 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][270/625] eta 0:01:18 lr 0.001994 wd 0.0500 time 0.2184 (0.2203) data time 0.0007 (0.0026) model time 0.2177 (0.2168) loss 4.0020 (3.7489) grad_norm 1.1614 (1.2239) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][280/625] eta 0:01:15 lr 0.001994 wd 0.0500 time 0.2140 (0.2203) data time 0.0009 (0.0025) model time 0.2130 (0.2169) loss 4.3682 (3.7374) grad_norm 1.2441 (1.2187) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][290/625] eta 0:01:13 lr 0.001994 wd 0.0500 time 0.2083 (0.2201) data time 0.0008 (0.0025) model time 0.2075 (0.2167) loss 4.7628 (3.7506) grad_norm 1.3305 (1.2169) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][300/625] eta 0:01:11 lr 0.001994 wd 0.0500 time 0.2141 (0.2198) data time 0.0010 (0.0024) model time 0.2131 (0.2165) loss 4.1035 (3.7493) grad_norm 1.3552 (1.2211) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][310/625] eta 0:01:09 lr 0.001994 wd 0.0500 time 0.2176 (0.2196) data time 0.0011 (0.0024) model time 0.2165 (0.2163) loss 3.8438 (3.7444) grad_norm 0.9220 (1.2103) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][320/625] eta 0:01:06 lr 0.001994 wd 0.0500 time 0.2115 (0.2196) data time 0.0010 (0.0024) model time 0.2105 (0.2163) loss 4.1576 (3.7361) grad_norm 1.1234 (1.2043) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][330/625] eta 0:01:04 lr 0.001994 wd 0.0500 time 0.2124 (0.2194) data time 0.0007 (0.0024) model time 0.2116 (0.2162) loss 3.0224 (3.7288) grad_norm 0.8561 (1.2009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][340/625] eta 0:01:02 lr 0.001994 wd 0.0500 time 0.2063 (0.2192) data time 0.0008 (0.0023) model time 0.2056 (0.2161) loss 3.4777 (3.7253) grad_norm 1.0375 (1.2004) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][350/625] eta 0:01:00 lr 0.001994 wd 0.0500 time 0.2091 (0.2191) data time 0.0013 (0.0023) model time 0.2079 (0.2159) loss 4.5763 (3.7305) grad_norm 1.7538 (1.2130) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][360/625] eta 0:00:58 lr 0.001994 wd 0.0500 time 0.2151 (0.2190) data time 0.0008 (0.0023) model time 0.2143 (0.2159) loss 2.5441 (3.7330) grad_norm 1.4455 (1.2121) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 13:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 13:56:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 13:56:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:29:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 14:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 14:29:15 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 14:29:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 14:29:26 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 14:29:27 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 14:29:27 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 14:29:27 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 29) [2024-07-29 14:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 14:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][370/625] eta 0:08:20 lr 0.001994 wd 0.0500 time 0.1984 (1.9645) data time 0.0007 (0.1611) model time 0.1977 (1.8034) loss 4.3801 (4.1920) grad_norm 1.0988 (1.1099) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][380/625] eta 0:02:52 lr 0.001994 wd 0.0500 time 0.2000 (0.7038) data time 0.0006 (0.0467) model time 0.1995 (0.6572) loss 3.9685 (4.0092) grad_norm 1.3673 (1.2026) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][390/625] eta 0:01:56 lr 0.001994 wd 0.0500 time 0.1969 (0.4937) data time 0.0009 (0.0276) model time 0.1961 (0.4661) loss 3.6319 (4.0210) grad_norm 1.0340 (1.2186) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][400/625] eta 0:01:31 lr 0.001994 wd 0.0500 time 0.1974 (0.4072) data time 0.0006 (0.0197) model time 0.1968 (0.3875) loss 3.0318 (3.9671) grad_norm 1.6674 (1.2382) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][410/625] eta 0:01:17 lr 0.001994 wd 0.0500 time 0.2047 (0.3605) data time 0.0007 (0.0155) model time 0.2040 (0.3450) loss 3.4823 (3.9210) grad_norm 1.6519 (1.2508) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][420/625] eta 0:01:07 lr 0.001994 wd 0.0500 time 0.1995 (0.3306) data time 0.0007 (0.0128) model time 0.1988 (0.3179) loss 4.3989 (3.9198) grad_norm 1.7044 (1.2734) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][430/625] eta 0:01:00 lr 0.001994 wd 0.0500 time 0.1997 (0.3101) data time 0.0006 (0.0109) model time 0.1991 (0.2992) loss 4.2095 (3.8898) grad_norm 0.9266 (1.2441) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][440/625] eta 0:00:54 lr 0.001994 wd 0.0500 time 0.1999 (0.2953) data time 0.0006 (0.0095) model time 0.1993 (0.2857) loss 3.6190 (3.8553) grad_norm 2.0972 (1.2572) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][450/625] eta 0:00:49 lr 0.001994 wd 0.0500 time 0.1986 (0.2839) data time 0.0009 (0.0085) model time 0.1977 (0.2754) loss 3.6902 (3.8104) grad_norm 1.4168 (1.2583) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][460/625] eta 0:00:45 lr 0.001994 wd 0.0500 time 0.1972 (0.2751) data time 0.0009 (0.0077) model time 0.1963 (0.2674) loss 3.2726 (3.7982) grad_norm 1.0382 (1.2447) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][470/625] eta 0:00:41 lr 0.001994 wd 0.0500 time 0.2026 (0.2679) data time 0.0008 (0.0070) model time 0.2018 (0.2609) loss 3.5851 (3.8227) grad_norm 1.2175 (1.2324) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][480/625] eta 0:00:38 lr 0.001994 wd 0.0500 time 0.1994 (0.2621) data time 0.0009 (0.0065) model time 0.1985 (0.2556) loss 3.8366 (3.8102) grad_norm 1.4489 (1.2273) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][490/625] eta 0:00:34 lr 0.001994 wd 0.0500 time 0.2030 (0.2573) data time 0.0009 (0.0060) model time 0.2022 (0.2512) loss 3.2914 (3.8129) grad_norm 0.8848 (1.2259) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][500/625] eta 0:00:31 lr 0.001994 wd 0.0500 time 0.1994 (0.2532) data time 0.0008 (0.0057) model time 0.1986 (0.2475) loss 4.1736 (3.8084) grad_norm 1.6254 (1.2314) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][510/625] eta 0:00:28 lr 0.001994 wd 0.0500 time 0.2001 (0.2496) data time 0.0006 (0.0054) model time 0.1995 (0.2443) loss 3.5289 (3.7900) grad_norm 1.5355 (1.2501) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][520/625] eta 0:00:25 lr 0.001994 wd 0.0500 time 0.1992 (0.2465) data time 0.0007 (0.0051) model time 0.1985 (0.2414) loss 3.6099 (3.7784) grad_norm 1.4036 (1.2647) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][530/625] eta 0:00:23 lr 0.001994 wd 0.0500 time 0.1998 (0.2437) data time 0.0006 (0.0048) model time 0.1991 (0.2389) loss 3.8491 (3.7772) grad_norm 0.8578 (1.2618) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][540/625] eta 0:00:20 lr 0.001994 wd 0.0500 time 0.1993 (0.2413) data time 0.0008 (0.0046) model time 0.1985 (0.2367) loss 2.5887 (3.7734) grad_norm 1.1166 (1.2504) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][550/625] eta 0:00:17 lr 0.001994 wd 0.0500 time 0.2001 (0.2391) data time 0.0007 (0.0044) model time 0.1994 (0.2347) loss 3.1187 (3.7647) grad_norm 1.3309 (1.2644) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][560/625] eta 0:00:15 lr 0.001994 wd 0.0500 time 0.1986 (0.2372) data time 0.0006 (0.0042) model time 0.1979 (0.2330) loss 4.0817 (3.7649) grad_norm 1.8687 (1.2685) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][570/625] eta 0:00:12 lr 0.001994 wd 0.0500 time 0.2083 (0.2355) data time 0.0008 (0.0040) model time 0.2075 (0.2315) loss 3.7561 (3.7503) grad_norm 1.6211 (1.2818) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][580/625] eta 0:00:10 lr 0.001994 wd 0.0500 time 0.2047 (0.2339) data time 0.0007 (0.0039) model time 0.2040 (0.2300) loss 3.7015 (3.7406) grad_norm 1.1670 (1.2977) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][590/625] eta 0:00:08 lr 0.001994 wd 0.0500 time 0.2011 (0.2324) data time 0.0008 (0.0038) model time 0.2003 (0.2287) loss 4.0582 (3.7396) grad_norm 1.1070 (1.3030) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][600/625] eta 0:00:05 lr 0.001994 wd 0.0500 time 0.2006 (0.2311) data time 0.0007 (0.0036) model time 0.2000 (0.2275) loss 3.0206 (3.7296) grad_norm 0.6837 (1.3025) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][610/625] eta 0:00:03 lr 0.001994 wd 0.0500 time 0.2011 (0.2299) data time 0.0004 (0.0035) model time 0.2007 (0.2264) loss 2.3802 (3.7319) grad_norm 0.9562 (1.2990) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][620/625] eta 0:00:01 lr 0.001994 wd 0.0500 time 0.1988 (0.2288) data time 0.0004 (0.0034) model time 0.1985 (0.2253) loss 2.5815 (3.7196) grad_norm 0.8975 (1.2892) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 29 training takes 0:00:58 [2024-07-29 14:30:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:30:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.394 (0.394) Loss 0.8745 (0.8745) Acc@1 82.422 (82.422) Acc@5 96.387 (96.387) Mem 8977MB [2024-07-29 14:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 1.5254 (1.0819) Acc@1 66.162 (76.669) Acc@5 89.062 (94.487) Mem 8977MB [2024-07-29 14:30:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.6699 (1.3074) Acc@1 63.623 (72.061) Acc@5 86.279 (91.499) Mem 8977MB [2024-07-29 14:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.995 Acc@5 91.421 [2024-07-29 14:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.0% [2024-07-29 14:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 71.99% [2024-07-29 14:30:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 14:30:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 14:30:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.627 (0.627) Loss 4.4141 (4.4141) Acc@1 24.170 (24.170) Acc@5 43.896 (43.896) Mem 8977MB [2024-07-29 14:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.109) Loss 4.6914 (4.5561) Acc@1 17.041 (19.269) Acc@5 36.963 (39.626) Mem 8977MB [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.083) Loss 4.9688 (4.6639) Acc@1 13.965 (17.983) Acc@5 30.762 (37.307) Mem 8977MB [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 18.124 Acc@5 37.726 [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 18.1% [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 18.12% [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 14:30:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 14:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][0/625] eta 0:10:18 lr 0.001994 wd 0.0500 time 0.9895 (0.9895) data time 0.5069 (0.5069) model time 0.0000 (0.0000) loss 4.0585 (4.0585) grad_norm 1.1719 (1.1719) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 14:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][10/625] eta 0:02:48 lr 0.001994 wd 0.0500 time 0.2108 (0.2742) data time 0.0007 (0.0469) model time 0.0000 (0.0000) loss 2.4601 (3.4738) grad_norm 1.9109 (1.2569) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][20/625] eta 0:02:24 lr 0.001994 wd 0.0500 time 0.2003 (0.2391) data time 0.0009 (0.0250) model time 0.0000 (0.0000) loss 4.0411 (3.6661) grad_norm 1.2019 (1.3026) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][30/625] eta 0:02:15 lr 0.001994 wd 0.0500 time 0.2051 (0.2269) data time 0.0009 (0.0173) model time 0.0000 (0.0000) loss 3.1952 (3.6854) grad_norm 0.9742 (1.2479) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][40/625] eta 0:02:12 lr 0.001994 wd 0.0500 time 0.2062 (0.2264) data time 0.0007 (0.0133) model time 0.0000 (0.0000) loss 4.1471 (3.5696) grad_norm 0.8670 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][50/625] eta 0:02:07 lr 0.001994 wd 0.0500 time 0.2031 (0.2216) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 4.0619 (3.5707) grad_norm 0.8481 (1.1785) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][60/625] eta 0:02:03 lr 0.001994 wd 0.0500 time 0.2019 (0.2183) data time 0.0007 (0.0092) model time 0.2013 (0.2010) loss 4.0982 (3.6508) grad_norm 1.3684 (1.1697) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][70/625] eta 0:02:00 lr 0.001994 wd 0.0500 time 0.2034 (0.2163) data time 0.0008 (0.0080) model time 0.2026 (0.2019) loss 2.6303 (3.6728) grad_norm 0.8243 (1.1488) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][80/625] eta 0:01:57 lr 0.001994 wd 0.0500 time 0.2036 (0.2153) data time 0.0008 (0.0072) model time 0.2028 (0.2039) loss 3.3018 (3.6720) grad_norm 0.7528 (1.1295) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][90/625] eta 0:01:54 lr 0.001994 wd 0.0500 time 0.2024 (0.2138) data time 0.0006 (0.0065) model time 0.2018 (0.2030) loss 4.5047 (3.6949) grad_norm 0.9877 (1.1487) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:30:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][100/625] eta 0:01:51 lr 0.001994 wd 0.0500 time 0.2097 (0.2131) data time 0.0006 (0.0059) model time 0.2091 (0.2036) loss 3.3424 (3.6911) grad_norm 1.6406 (1.1846) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][110/625] eta 0:01:49 lr 0.001994 wd 0.0500 time 0.2033 (0.2122) data time 0.0008 (0.0055) model time 0.2024 (0.2035) loss 3.9652 (3.6939) grad_norm 2.1396 (1.1898) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][120/625] eta 0:01:46 lr 0.001994 wd 0.0500 time 0.2025 (0.2114) data time 0.0006 (0.0051) model time 0.2019 (0.2031) loss 4.3272 (3.6897) grad_norm 0.9475 (1.1905) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][130/625] eta 0:01:44 lr 0.001994 wd 0.0500 time 0.2031 (0.2106) data time 0.0006 (0.0048) model time 0.2025 (0.2027) loss 3.9499 (3.6713) grad_norm 0.8419 (1.1978) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][140/625] eta 0:01:41 lr 0.001993 wd 0.0500 time 0.2045 (0.2100) data time 0.0006 (0.0045) model time 0.2038 (0.2026) loss 4.2520 (3.6876) grad_norm 1.3761 (1.2050) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][150/625] eta 0:01:40 lr 0.001993 wd 0.0500 time 0.1987 (0.2120) data time 0.0008 (0.0042) model time 0.1979 (0.2063) loss 3.6070 (3.6920) grad_norm 0.9320 (1.2158) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][160/625] eta 0:01:38 lr 0.001993 wd 0.0500 time 0.2007 (0.2115) data time 0.0006 (0.0040) model time 0.2001 (0.2059) loss 3.0816 (3.6878) grad_norm 0.9271 (1.2358) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][170/625] eta 0:01:36 lr 0.001993 wd 0.0500 time 0.1996 (0.2111) data time 0.0006 (0.0039) model time 0.1990 (0.2057) loss 3.9905 (3.7060) grad_norm 0.8836 (1.2315) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][180/625] eta 0:01:33 lr 0.001993 wd 0.0500 time 0.1970 (0.2106) data time 0.0009 (0.0037) model time 0.1961 (0.2054) loss 3.7530 (3.7147) grad_norm 0.8819 (1.2233) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][190/625] eta 0:01:31 lr 0.001993 wd 0.0500 time 0.1979 (0.2102) data time 0.0008 (0.0036) model time 0.1971 (0.2052) loss 3.8509 (3.7137) grad_norm 1.0057 (1.2149) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][200/625] eta 0:01:29 lr 0.001993 wd 0.0500 time 0.1988 (0.2099) data time 0.0010 (0.0034) model time 0.1979 (0.2050) loss 3.2892 (3.6931) grad_norm 0.7883 (1.2048) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][210/625] eta 0:01:26 lr 0.001993 wd 0.0500 time 0.1978 (0.2095) data time 0.0007 (0.0033) model time 0.1970 (0.2048) loss 2.6095 (3.6791) grad_norm 1.6478 (1.2126) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][220/625] eta 0:01:25 lr 0.001993 wd 0.0500 time 0.2011 (0.2113) data time 0.0008 (0.0032) model time 0.2002 (0.2073) loss 4.3743 (3.6774) grad_norm 0.9403 (1.2024) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][230/625] eta 0:01:23 lr 0.001993 wd 0.0500 time 0.2028 (0.2114) data time 0.0009 (0.0031) model time 0.2019 (0.2076) loss 3.5055 (3.6868) grad_norm 0.9609 (1.2066) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][240/625] eta 0:01:21 lr 0.001993 wd 0.0500 time 0.2002 (0.2111) data time 0.0007 (0.0031) model time 0.1995 (0.2072) loss 3.7974 (3.6868) grad_norm 1.2177 (1.2004) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][250/625] eta 0:01:19 lr 0.001993 wd 0.0500 time 0.2028 (0.2109) data time 0.0006 (0.0030) model time 0.2022 (0.2071) loss 4.2075 (3.6895) grad_norm 1.7821 (1.2068) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][260/625] eta 0:01:17 lr 0.001993 wd 0.0500 time 0.2041 (0.2126) data time 0.0008 (0.0030) model time 0.2033 (0.2094) loss 2.7029 (3.6937) grad_norm 0.9745 (1.2016) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][270/625] eta 0:01:15 lr 0.001993 wd 0.0500 time 0.2030 (0.2122) data time 0.0006 (0.0029) model time 0.2023 (0.2090) loss 4.6295 (3.6834) grad_norm 0.8685 (1.1936) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][280/625] eta 0:01:13 lr 0.001993 wd 0.0500 time 0.2044 (0.2118) data time 0.0007 (0.0028) model time 0.2037 (0.2086) loss 4.3137 (3.6843) grad_norm 1.8607 (1.1932) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][290/625] eta 0:01:10 lr 0.001993 wd 0.0500 time 0.2018 (0.2116) data time 0.0006 (0.0027) model time 0.2012 (0.2084) loss 4.0518 (3.6874) grad_norm 1.8832 (1.1961) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][300/625] eta 0:01:08 lr 0.001993 wd 0.0500 time 0.2065 (0.2113) data time 0.0006 (0.0027) model time 0.2059 (0.2081) loss 4.6369 (3.6954) grad_norm 1.6421 (1.1955) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][310/625] eta 0:01:06 lr 0.001993 wd 0.0500 time 0.2107 (0.2112) data time 0.0008 (0.0026) model time 0.2099 (0.2082) loss 3.7047 (3.7029) grad_norm 0.8460 (1.1989) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][320/625] eta 0:01:04 lr 0.001993 wd 0.0500 time 0.2090 (0.2112) data time 0.0005 (0.0026) model time 0.2084 (0.2081) loss 4.4881 (3.7065) grad_norm 0.8494 (1.1963) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][330/625] eta 0:01:02 lr 0.001993 wd 0.0500 time 0.2102 (0.2109) data time 0.0007 (0.0026) model time 0.2095 (0.2079) loss 3.0446 (3.7058) grad_norm 1.5837 (1.1945) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][340/625] eta 0:01:00 lr 0.001993 wd 0.0500 time 0.2043 (0.2107) data time 0.0008 (0.0025) model time 0.2035 (0.2077) loss 3.4785 (3.7079) grad_norm 1.4618 (1.1994) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][350/625] eta 0:00:57 lr 0.001993 wd 0.0500 time 0.2048 (0.2104) data time 0.0006 (0.0025) model time 0.2042 (0.2075) loss 4.4305 (3.7061) grad_norm 0.8931 (1.2048) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][360/625] eta 0:00:55 lr 0.001993 wd 0.0500 time 0.2047 (0.2102) data time 0.0006 (0.0024) model time 0.2041 (0.2073) loss 4.2856 (3.7090) grad_norm 0.8409 (1.2035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][370/625] eta 0:00:53 lr 0.001993 wd 0.0500 time 0.2136 (0.2101) data time 0.0007 (0.0024) model time 0.2128 (0.2072) loss 3.1137 (3.7122) grad_norm 1.6783 (1.2064) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][380/625] eta 0:00:51 lr 0.001993 wd 0.0500 time 0.2169 (0.2099) data time 0.0006 (0.0024) model time 0.2163 (0.2071) loss 4.2973 (3.7210) grad_norm 1.3099 (1.2046) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][390/625] eta 0:00:49 lr 0.001993 wd 0.0500 time 0.2006 (0.2097) data time 0.0006 (0.0023) model time 0.2000 (0.2068) loss 3.2315 (3.7157) grad_norm 0.8392 (1.2048) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][400/625] eta 0:00:47 lr 0.001993 wd 0.0500 time 0.2027 (0.2095) data time 0.0009 (0.0023) model time 0.2018 (0.2067) loss 3.7061 (3.7115) grad_norm 0.9942 (1.2009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][410/625] eta 0:00:45 lr 0.001993 wd 0.0500 time 0.2059 (0.2093) data time 0.0007 (0.0023) model time 0.2053 (0.2065) loss 4.2536 (3.7177) grad_norm 1.0140 (1.2026) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][420/625] eta 0:00:42 lr 0.001993 wd 0.0500 time 0.2017 (0.2091) data time 0.0006 (0.0022) model time 0.2011 (0.2064) loss 3.4021 (3.7236) grad_norm 0.9278 (1.2001) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][430/625] eta 0:00:40 lr 0.001993 wd 0.0500 time 0.2030 (0.2090) data time 0.0008 (0.0022) model time 0.2022 (0.2063) loss 2.6840 (3.7225) grad_norm 1.0135 (1.2071) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][440/625] eta 0:00:38 lr 0.001993 wd 0.0500 time 0.2047 (0.2090) data time 0.0007 (0.0022) model time 0.2039 (0.2063) loss 3.1842 (3.7171) grad_norm 1.2595 (1.2013) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][450/625] eta 0:00:36 lr 0.001993 wd 0.0500 time 0.2057 (0.2089) data time 0.0008 (0.0021) model time 0.2049 (0.2062) loss 3.1013 (3.7205) grad_norm 1.2070 (1.2017) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][460/625] eta 0:00:34 lr 0.001993 wd 0.0500 time 0.1976 (0.2087) data time 0.0007 (0.0021) model time 0.1970 (0.2061) loss 4.1110 (3.7156) grad_norm 2.0737 (1.2036) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][470/625] eta 0:00:32 lr 0.001993 wd 0.0500 time 0.2103 (0.2086) data time 0.0006 (0.0021) model time 0.2097 (0.2060) loss 4.3403 (3.7153) grad_norm 0.8738 (1.2050) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][480/625] eta 0:00:30 lr 0.001993 wd 0.0500 time 0.2181 (0.2085) data time 0.0008 (0.0021) model time 0.2173 (0.2059) loss 4.0418 (3.7222) grad_norm 1.0133 (1.2027) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][490/625] eta 0:00:28 lr 0.001993 wd 0.0500 time 0.2028 (0.2084) data time 0.0008 (0.0020) model time 0.2020 (0.2058) loss 3.3586 (3.7205) grad_norm 1.6982 (1.2012) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][500/625] eta 0:00:26 lr 0.001993 wd 0.0500 time 0.2030 (0.2082) data time 0.0008 (0.0020) model time 0.2023 (0.2057) loss 3.9698 (3.7209) grad_norm 2.6778 (1.2066) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][510/625] eta 0:00:23 lr 0.001993 wd 0.0500 time 0.2251 (0.2082) data time 0.0006 (0.0020) model time 0.2245 (0.2057) loss 3.4594 (3.7210) grad_norm 1.3288 (1.2133) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][520/625] eta 0:00:21 lr 0.001993 wd 0.0500 time 0.2013 (0.2082) data time 0.0006 (0.0020) model time 0.2007 (0.2057) loss 2.9267 (3.7246) grad_norm 0.8897 (1.2113) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][530/625] eta 0:00:19 lr 0.001993 wd 0.0500 time 0.1988 (0.2080) data time 0.0008 (0.0020) model time 0.1980 (0.2056) loss 3.6235 (3.7247) grad_norm 1.2193 (1.2136) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][540/625] eta 0:00:17 lr 0.001993 wd 0.0500 time 0.2123 (0.2079) data time 0.0007 (0.0019) model time 0.2115 (0.2055) loss 3.7469 (3.7245) grad_norm 0.9924 (1.2136) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][550/625] eta 0:00:15 lr 0.001993 wd 0.0500 time 0.2055 (0.2078) data time 0.0006 (0.0019) model time 0.2049 (0.2054) loss 3.0319 (3.7200) grad_norm 1.4169 (1.2123) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][560/625] eta 0:00:13 lr 0.001993 wd 0.0500 time 0.2013 (0.2077) data time 0.0008 (0.0019) model time 0.2006 (0.2053) loss 3.9045 (3.7186) grad_norm 1.1413 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][570/625] eta 0:00:11 lr 0.001993 wd 0.0500 time 0.2018 (0.2076) data time 0.0006 (0.0019) model time 0.2012 (0.2053) loss 3.6090 (3.7113) grad_norm 1.3409 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][580/625] eta 0:00:09 lr 0.001993 wd 0.0500 time 0.1997 (0.2075) data time 0.0007 (0.0019) model time 0.1990 (0.2052) loss 4.3013 (3.7114) grad_norm 1.4196 (1.2227) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][590/625] eta 0:00:07 lr 0.001993 wd 0.0500 time 0.2010 (0.2074) data time 0.0008 (0.0019) model time 0.2002 (0.2051) loss 3.9275 (3.7076) grad_norm 1.2440 (1.2249) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][600/625] eta 0:00:05 lr 0.001993 wd 0.0500 time 0.2041 (0.2074) data time 0.0005 (0.0018) model time 0.2036 (0.2051) loss 3.9690 (3.7117) grad_norm 0.9781 (1.2242) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][610/625] eta 0:00:03 lr 0.001993 wd 0.0500 time 0.2016 (0.2074) data time 0.0006 (0.0018) model time 0.2010 (0.2051) loss 4.1604 (3.7112) grad_norm 1.0357 (1.2264) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][620/625] eta 0:00:01 lr 0.001992 wd 0.0500 time 0.2004 (0.2073) data time 0.0003 (0.0018) model time 0.2000 (0.2050) loss 3.7501 (3.7086) grad_norm 1.9981 (1.2324) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 30 training takes 0:02:09 [2024-07-29 14:32:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:32:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.8418 (0.8418) Acc@1 83.057 (83.057) Acc@5 96.436 (96.436) Mem 8975MB [2024-07-29 14:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.096) Loss 1.4766 (1.0790) Acc@1 67.383 (76.709) Acc@5 89.355 (94.194) Mem 8975MB [2024-07-29 14:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.5977 (1.2916) Acc@1 64.258 (72.196) Acc@5 87.354 (91.353) Mem 8975MB [2024-07-29 14:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.023 Acc@5 91.351 [2024-07-29 14:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.0% [2024-07-29 14:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 72.02% [2024-07-29 14:32:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 14:32:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 14:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.430 (0.430) Loss 3.9746 (3.9746) Acc@1 29.443 (29.443) Acc@5 50.586 (50.586) Mem 8975MB [2024-07-29 14:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 4.3086 (4.1245) Acc@1 21.045 (24.059) Acc@5 42.920 (46.964) Mem 8975MB [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 4.6133 (4.2668) Acc@1 18.115 (22.512) Acc@5 36.182 (44.259) Mem 8975MB [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 22.631 Acc@5 44.560 [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 22.6% [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 22.63% [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 14:32:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 14:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][0/625] eta 0:07:28 lr 0.001992 wd 0.0500 time 0.7171 (0.7171) data time 0.5255 (0.5255) model time 0.0000 (0.0000) loss 4.9989 (4.9989) grad_norm 1.3011 (1.3011) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 14:32:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 14:47:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 14:47:32 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 31) [2024-07-29 14:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 14:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][10/625] eta 0:12:36 lr 0.001992 wd 0.0500 time 0.2092 (1.2295) data time 0.0008 (0.0985) model time 0.0000 (0.0000) loss 4.1641 (4.2562) grad_norm 1.1099 (1.2017) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][20/625] eta 0:06:41 lr 0.001992 wd 0.0500 time 0.2125 (0.6637) data time 0.0010 (0.0444) model time 0.0000 (0.0000) loss 4.2196 (3.9969) grad_norm 1.2067 (1.2415) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][30/625] eta 0:04:58 lr 0.001992 wd 0.0500 time 0.2096 (0.5011) data time 0.0010 (0.0289) model time 0.0000 (0.0000) loss 4.0045 (3.9755) grad_norm 1.2666 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][40/625] eta 0:04:08 lr 0.001992 wd 0.0500 time 0.2063 (0.4247) data time 0.0010 (0.0215) model time 0.0000 (0.0000) loss 3.6566 (3.8911) grad_norm 1.3637 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][50/625] eta 0:03:38 lr 0.001992 wd 0.0500 time 0.2124 (0.3798) data time 0.0007 (0.0173) model time 0.0000 (0.0000) loss 4.1193 (3.8814) grad_norm 1.7326 (1.2539) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][60/625] eta 0:03:18 lr 0.001992 wd 0.0500 time 0.2115 (0.3506) data time 0.0008 (0.0145) model time 0.2108 (0.2092) loss 3.2311 (3.8729) grad_norm 0.7918 (1.2839) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][70/625] eta 0:03:03 lr 0.001992 wd 0.0500 time 0.2063 (0.3301) data time 0.0008 (0.0125) model time 0.2054 (0.2098) loss 2.8699 (3.8355) grad_norm 0.7129 (1.2564) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][80/625] eta 0:02:51 lr 0.001992 wd 0.0500 time 0.2030 (0.3151) data time 0.0007 (0.0110) model time 0.2022 (0.2104) loss 3.0974 (3.8002) grad_norm 1.5446 (1.2442) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][90/625] eta 0:02:42 lr 0.001992 wd 0.0500 time 0.2073 (0.3032) data time 0.0012 (0.0099) model time 0.2062 (0.2102) loss 4.0352 (3.7694) grad_norm 1.2086 (1.2571) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][100/625] eta 0:02:34 lr 0.001992 wd 0.0500 time 0.2106 (0.2938) data time 0.0008 (0.0090) model time 0.2098 (0.2101) loss 4.0699 (3.7800) grad_norm 0.8622 (1.2342) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][110/625] eta 0:02:27 lr 0.001992 wd 0.0500 time 0.2099 (0.2859) data time 0.0008 (0.0083) model time 0.2091 (0.2097) loss 3.2981 (3.8010) grad_norm 0.7950 (1.2092) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][120/625] eta 0:02:21 lr 0.001992 wd 0.0500 time 0.2104 (0.2798) data time 0.0011 (0.0077) model time 0.2093 (0.2100) loss 3.8010 (3.8055) grad_norm 1.4559 (1.1955) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][130/625] eta 0:02:15 lr 0.001992 wd 0.0500 time 0.2226 (0.2746) data time 0.0008 (0.0072) model time 0.2218 (0.2103) loss 3.7608 (3.8007) grad_norm 1.6851 (1.1950) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][140/625] eta 0:02:10 lr 0.001992 wd 0.0500 time 0.2116 (0.2700) data time 0.0010 (0.0068) model time 0.2106 (0.2103) loss 3.5200 (3.7977) grad_norm 1.2019 (1.1938) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][150/625] eta 0:02:06 lr 0.001992 wd 0.0500 time 0.2044 (0.2659) data time 0.0007 (0.0064) model time 0.2037 (0.2102) loss 4.1700 (3.7912) grad_norm 1.8402 (1.2115) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][160/625] eta 0:02:02 lr 0.001992 wd 0.0500 time 0.2099 (0.2624) data time 0.0008 (0.0060) model time 0.2090 (0.2101) loss 2.9397 (3.7845) grad_norm 1.4572 (1.2227) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][170/625] eta 0:01:58 lr 0.001992 wd 0.0500 time 0.2292 (0.2596) data time 0.0010 (0.0058) model time 0.2282 (0.2102) loss 4.2299 (3.7889) grad_norm 1.2524 (1.2163) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][180/625] eta 0:01:54 lr 0.001992 wd 0.0500 time 0.2071 (0.2570) data time 0.0009 (0.0055) model time 0.2062 (0.2104) loss 3.0826 (3.7744) grad_norm 0.9864 (1.2054) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][190/625] eta 0:01:50 lr 0.001992 wd 0.0500 time 0.2103 (0.2544) data time 0.0007 (0.0053) model time 0.2096 (0.2102) loss 3.7641 (3.7654) grad_norm 1.1883 (1.1993) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][200/625] eta 0:01:47 lr 0.001992 wd 0.0500 time 0.2513 (0.2524) data time 0.0012 (0.0051) model time 0.2501 (0.2104) loss 2.8381 (3.7493) grad_norm 1.1292 (1.1951) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][210/625] eta 0:01:43 lr 0.001992 wd 0.0500 time 0.2065 (0.2503) data time 0.0010 (0.0049) model time 0.2055 (0.2103) loss 3.9318 (3.7361) grad_norm 1.5267 (1.1982) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][220/625] eta 0:01:40 lr 0.001992 wd 0.0500 time 0.2240 (0.2485) data time 0.0010 (0.0047) model time 0.2231 (0.2102) loss 3.3214 (3.7292) grad_norm 0.9773 (1.1981) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][230/625] eta 0:01:37 lr 0.001992 wd 0.0500 time 0.2106 (0.2468) data time 0.0010 (0.0045) model time 0.2096 (0.2101) loss 3.9509 (3.7327) grad_norm 1.5580 (1.1968) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][240/625] eta 0:01:34 lr 0.001992 wd 0.0500 time 0.2152 (0.2454) data time 0.0011 (0.0044) model time 0.2141 (0.2102) loss 3.9806 (3.7266) grad_norm 0.8928 (1.1881) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][250/625] eta 0:01:31 lr 0.001992 wd 0.0500 time 0.2077 (0.2439) data time 0.0010 (0.0043) model time 0.2067 (0.2102) loss 2.5550 (3.7162) grad_norm 1.0299 (1.1805) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][260/625] eta 0:01:28 lr 0.001992 wd 0.0500 time 0.2834 (0.2430) data time 0.0009 (0.0041) model time 0.2825 (0.2106) loss 3.8813 (3.7040) grad_norm 1.3223 (1.1825) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][270/625] eta 0:01:26 lr 0.001992 wd 0.0500 time 0.2144 (0.2424) data time 0.0007 (0.0040) model time 0.2137 (0.2113) loss 4.3152 (3.6988) grad_norm 1.5541 (1.1863) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][280/625] eta 0:01:23 lr 0.001992 wd 0.0500 time 0.2095 (0.2412) data time 0.0010 (0.0039) model time 0.2085 (0.2111) loss 3.0094 (3.7023) grad_norm 1.0132 (1.1822) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][290/625] eta 0:01:20 lr 0.001992 wd 0.0500 time 0.2199 (0.2402) data time 0.0008 (0.0038) model time 0.2191 (0.2111) loss 4.3807 (3.7070) grad_norm 0.7967 (1.1750) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:48:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][300/625] eta 0:01:17 lr 0.001992 wd 0.0500 time 0.2138 (0.2392) data time 0.0009 (0.0037) model time 0.2129 (0.2110) loss 3.4965 (3.6885) grad_norm 0.8711 (1.1710) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][310/625] eta 0:01:15 lr 0.001992 wd 0.0500 time 0.2429 (0.2383) data time 0.0007 (0.0036) model time 0.2422 (0.2110) loss 3.3187 (3.6842) grad_norm 0.9347 (1.1675) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][320/625] eta 0:01:12 lr 0.001992 wd 0.0500 time 0.2221 (0.2375) data time 0.0010 (0.0036) model time 0.2211 (0.2111) loss 3.2157 (3.6921) grad_norm 1.1730 (1.1733) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][330/625] eta 0:01:09 lr 0.001992 wd 0.0500 time 0.2110 (0.2367) data time 0.0007 (0.0035) model time 0.2102 (0.2111) loss 3.9239 (3.7020) grad_norm 1.7767 (1.1785) loss_scale 16384.0000 (16384.0000) mem 8974MB [2024-07-29 14:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 14:49:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:49:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:50:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 14:50:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 14:51:01 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 31) [2024-07-29 14:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 14:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][340/625] eta 0:07:37 lr 0.001992 wd 0.0500 time 0.2058 (1.6046) data time 0.0007 (0.1466) model time 0.2051 (1.4580) loss 4.4074 (4.3513) grad_norm 1.2925 (1.1962) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][350/625] eta 0:03:03 lr 0.001992 wd 0.0500 time 0.2006 (0.6689) data time 0.0008 (0.0495) model time 0.1997 (0.6195) loss 4.1848 (4.1089) grad_norm 1.7777 (1.4335) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][360/625] eta 0:02:07 lr 0.001992 wd 0.0500 time 0.2034 (0.4827) data time 0.0008 (0.0300) model time 0.2026 (0.4527) loss 4.0635 (4.0944) grad_norm 1.1775 (1.3841) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][370/625] eta 0:01:42 lr 0.001992 wd 0.0500 time 0.1979 (0.4020) data time 0.0009 (0.0217) model time 0.1970 (0.3803) loss 3.8558 (4.0304) grad_norm 1.1594 (1.2620) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][380/625] eta 0:01:27 lr 0.001992 wd 0.0500 time 0.1979 (0.3572) data time 0.0009 (0.0171) model time 0.1970 (0.3401) loss 4.0733 (3.9724) grad_norm 2.0938 (1.2524) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][390/625] eta 0:01:17 lr 0.001992 wd 0.0500 time 0.1938 (0.3285) data time 0.0007 (0.0141) model time 0.1931 (0.3143) loss 2.7566 (3.9279) grad_norm 0.8826 (1.2295) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][400/625] eta 0:01:09 lr 0.001992 wd 0.0500 time 0.1958 (0.3084) data time 0.0009 (0.0121) model time 0.1949 (0.2963) loss 4.2014 (3.8871) grad_norm 1.0396 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][410/625] eta 0:01:03 lr 0.001992 wd 0.0500 time 0.1978 (0.2938) data time 0.0007 (0.0106) model time 0.1971 (0.2832) loss 2.9789 (3.8431) grad_norm 1.2283 (1.2372) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][420/625] eta 0:00:58 lr 0.001992 wd 0.0500 time 0.1987 (0.2832) data time 0.0007 (0.0095) model time 0.1980 (0.2738) loss 3.4654 (3.8113) grad_norm 1.5660 (1.2739) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][430/625] eta 0:00:53 lr 0.001991 wd 0.0500 time 0.1980 (0.2747) data time 0.0009 (0.0085) model time 0.1971 (0.2661) loss 3.7499 (3.8056) grad_norm 1.2254 (1.2687) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][440/625] eta 0:00:49 lr 0.001991 wd 0.0500 time 0.2001 (0.2681) data time 0.0008 (0.0078) model time 0.1993 (0.2603) loss 3.1383 (3.8203) grad_norm 1.3036 (1.2664) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][450/625] eta 0:00:45 lr 0.001991 wd 0.0500 time 0.1986 (0.2621) data time 0.0007 (0.0072) model time 0.1979 (0.2549) loss 3.1399 (3.8040) grad_norm 1.5322 (1.2542) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][460/625] eta 0:00:42 lr 0.001991 wd 0.0500 time 0.2024 (0.2572) data time 0.0006 (0.0067) model time 0.2017 (0.2505) loss 3.8570 (3.8061) grad_norm 1.0462 (1.2650) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][470/625] eta 0:00:39 lr 0.001991 wd 0.0500 time 0.2061 (0.2530) data time 0.0006 (0.0063) model time 0.2054 (0.2467) loss 3.6173 (3.7959) grad_norm 1.7469 (1.2675) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 14:51:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][480/625] eta 0:00:36 lr 0.001991 wd 0.0500 time 0.2001 (0.2493) data time 0.0008 (0.0059) model time 0.1993 (0.2434) loss 3.8900 (3.7810) grad_norm 1.7568 (1.2730) loss_scale 32768.0000 (16948.9655) mem 8977MB [2024-07-29 14:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][490/625] eta 0:00:33 lr 0.001991 wd 0.0500 time 0.1976 (0.2465) data time 0.0010 (0.0056) model time 0.1966 (0.2409) loss 4.2733 (3.7787) grad_norm 1.1547 (1.2679) loss_scale 32768.0000 (17969.5484) mem 8977MB [2024-07-29 14:51:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][500/625] eta 0:00:30 lr 0.001991 wd 0.0500 time 0.1988 (0.2437) data time 0.0008 (0.0053) model time 0.1980 (0.2384) loss 3.6223 (3.7713) grad_norm 1.3365 (1.2656) loss_scale 32768.0000 (18866.4242) mem 8977MB [2024-07-29 14:52:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][510/625] eta 0:00:27 lr 0.001991 wd 0.0500 time 0.2006 (0.2412) data time 0.0007 (0.0051) model time 0.1999 (0.2361) loss 4.2208 (3.7622) grad_norm 1.2361 (inf) loss_scale 16384.0000 (19005.4400) mem 8977MB [2024-07-29 14:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][520/625] eta 0:00:25 lr 0.001991 wd 0.0500 time 0.1989 (0.2390) data time 0.0010 (0.0049) model time 0.1979 (0.2341) loss 3.9274 (3.7575) grad_norm 0.8509 (inf) loss_scale 16384.0000 (18863.7405) mem 8977MB [2024-07-29 14:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][530/625] eta 0:00:22 lr 0.001991 wd 0.0500 time 0.2026 (0.2370) data time 0.0007 (0.0046) model time 0.2019 (0.2324) loss 2.6447 (3.7485) grad_norm 1.1719 (inf) loss_scale 16384.0000 (18736.5744) mem 8977MB [2024-07-29 14:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][540/625] eta 0:00:20 lr 0.001991 wd 0.0500 time 0.2013 (0.2354) data time 0.0008 (0.0045) model time 0.2005 (0.2310) loss 3.1843 (3.7367) grad_norm 1.6477 (inf) loss_scale 16384.0000 (18621.8146) mem 8977MB [2024-07-29 14:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][550/625] eta 0:00:17 lr 0.001991 wd 0.0500 time 0.2172 (0.2341) data time 0.0010 (0.0043) model time 0.2162 (0.2298) loss 3.5465 (3.7258) grad_norm 1.0415 (inf) loss_scale 16384.0000 (18517.7302) mem 8977MB [2024-07-29 14:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][560/625] eta 0:00:15 lr 0.001991 wd 0.0500 time 0.1992 (0.2326) data time 0.0007 (0.0041) model time 0.1985 (0.2285) loss 3.5565 (3.7225) grad_norm 1.1808 (inf) loss_scale 16384.0000 (18422.8978) mem 8977MB [2024-07-29 14:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][570/625] eta 0:00:12 lr 0.001991 wd 0.0500 time 0.2011 (0.2313) data time 0.0008 (0.0040) model time 0.2004 (0.2273) loss 4.2830 (3.7175) grad_norm 2.3857 (inf) loss_scale 16384.0000 (18336.1362) mem 8977MB [2024-07-29 14:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][580/625] eta 0:00:10 lr 0.001991 wd 0.0500 time 0.1990 (0.2299) data time 0.0009 (0.0039) model time 0.1982 (0.2261) loss 3.6255 (3.7206) grad_norm 0.8872 (inf) loss_scale 16384.0000 (18256.4571) mem 8977MB [2024-07-29 14:52:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][590/625] eta 0:00:08 lr 0.001991 wd 0.0500 time 0.2007 (0.2289) data time 0.0008 (0.0038) model time 0.2000 (0.2251) loss 3.4554 (3.7096) grad_norm 1.1355 (inf) loss_scale 16384.0000 (18183.0275) mem 8977MB [2024-07-29 14:52:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][600/625] eta 0:00:05 lr 0.001991 wd 0.0500 time 0.1992 (0.2278) data time 0.0007 (0.0036) model time 0.1985 (0.2242) loss 2.9081 (3.6936) grad_norm 1.4432 (inf) loss_scale 16384.0000 (18115.1396) mem 8977MB [2024-07-29 14:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][610/625] eta 0:00:03 lr 0.001991 wd 0.0500 time 0.2028 (0.2269) data time 0.0006 (0.0036) model time 0.2022 (0.2234) loss 3.8739 (3.6959) grad_norm 0.7977 (inf) loss_scale 16384.0000 (18052.1891) mem 8977MB [2024-07-29 14:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][620/625] eta 0:00:01 lr 0.001991 wd 0.0500 time 0.2005 (0.2260) data time 0.0006 (0.0035) model time 0.2000 (0.2226) loss 3.8507 (3.6939) grad_norm 1.3259 (inf) loss_scale 16384.0000 (17993.6561) mem 8977MB [2024-07-29 14:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 31 training takes 0:01:05 [2024-07-29 14:52:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:52:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.411 (0.411) Loss 0.8965 (0.8965) Acc@1 83.643 (83.643) Acc@5 96.289 (96.289) Mem 8977MB [2024-07-29 14:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.093) Loss 1.4570 (1.0790) Acc@1 67.627 (77.597) Acc@5 89.893 (94.682) Mem 8977MB [2024-07-29 14:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.6035 (1.2753) Acc@1 65.186 (72.954) Acc@5 86.719 (91.729) Mem 8977MB [2024-07-29 14:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.717 Acc@5 91.669 [2024-07-29 14:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.7% [2024-07-29 14:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 72.72% [2024-07-29 14:52:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 14:52:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 14:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.391 (0.391) Loss 3.5547 (3.5547) Acc@1 35.156 (35.156) Acc@5 58.154 (58.154) Mem 8977MB [2024-07-29 14:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.088) Loss 3.9473 (3.7301) Acc@1 25.830 (29.142) Acc@5 49.219 (53.551) Mem 8977MB [2024-07-29 14:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 4.2695 (3.8930) Acc@1 20.996 (27.330) Acc@5 42.090 (50.477) Mem 8977MB [2024-07-29 14:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 27.361 Acc@5 50.712 [2024-07-29 14:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 27.4% [2024-07-29 14:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 27.36% [2024-07-29 14:52:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 14:52:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 14:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][0/625] eta 0:06:28 lr 0.001991 wd 0.0500 time 0.6215 (0.6215) data time 0.3563 (0.3563) model time 0.0000 (0.0000) loss 3.2729 (3.2729) grad_norm 0.8803 (0.8803) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 14:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][10/625] eta 0:02:27 lr 0.001991 wd 0.0500 time 0.2006 (0.2397) data time 0.0007 (0.0332) model time 0.0000 (0.0000) loss 3.5717 (3.3182) grad_norm 0.9104 (1.0242) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][20/625] eta 0:02:14 lr 0.001991 wd 0.0500 time 0.2004 (0.2225) data time 0.0008 (0.0178) model time 0.0000 (0.0000) loss 3.0685 (3.3219) grad_norm 1.2912 (1.1502) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][30/625] eta 0:02:08 lr 0.001991 wd 0.0500 time 0.2018 (0.2154) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 3.4041 (3.5111) grad_norm 1.3976 (1.1834) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][40/625] eta 0:02:04 lr 0.001991 wd 0.0500 time 0.1974 (0.2121) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 4.1219 (3.5961) grad_norm 1.1476 (1.1763) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][50/625] eta 0:02:01 lr 0.001991 wd 0.0500 time 0.1943 (0.2113) data time 0.0009 (0.0079) model time 0.0000 (0.0000) loss 3.4191 (3.5940) grad_norm 0.8123 (1.2213) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][60/625] eta 0:01:58 lr 0.001991 wd 0.0500 time 0.1986 (0.2094) data time 0.0009 (0.0067) model time 0.1977 (0.1990) loss 3.1700 (3.6229) grad_norm 0.9715 (1.2234) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][70/625] eta 0:01:55 lr 0.001991 wd 0.0500 time 0.1999 (0.2082) data time 0.0006 (0.0059) model time 0.1993 (0.1996) loss 4.5498 (3.6415) grad_norm 2.1589 (1.2235) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][80/625] eta 0:01:52 lr 0.001991 wd 0.0500 time 0.1959 (0.2072) data time 0.0009 (0.0053) model time 0.1950 (0.1994) loss 4.0478 (3.6539) grad_norm 0.9498 (1.2570) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][90/625] eta 0:01:50 lr 0.001991 wd 0.0500 time 0.1954 (0.2064) data time 0.0008 (0.0048) model time 0.1947 (0.1994) loss 3.6542 (3.6494) grad_norm 0.9412 (1.2564) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][100/625] eta 0:01:48 lr 0.001991 wd 0.0500 time 0.2048 (0.2058) data time 0.0009 (0.0044) model time 0.2039 (0.1995) loss 3.3610 (3.6163) grad_norm 0.9483 (1.2483) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][110/625] eta 0:01:45 lr 0.001991 wd 0.0500 time 0.2038 (0.2054) data time 0.0008 (0.0041) model time 0.2030 (0.1995) loss 3.6794 (3.6408) grad_norm 1.4256 (1.2495) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][120/625] eta 0:01:44 lr 0.001991 wd 0.0500 time 0.1966 (0.2069) data time 0.0008 (0.0038) model time 0.1959 (0.2029) loss 3.4336 (3.6542) grad_norm 0.9035 (1.2389) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][130/625] eta 0:01:42 lr 0.001991 wd 0.0500 time 0.1957 (0.2072) data time 0.0007 (0.0037) model time 0.1951 (0.2035) loss 4.0362 (3.6488) grad_norm 0.8266 (1.2411) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][140/625] eta 0:01:40 lr 0.001991 wd 0.0500 time 0.2011 (0.2068) data time 0.0008 (0.0035) model time 0.2003 (0.2032) loss 4.1840 (3.6748) grad_norm 1.1358 (1.2286) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][150/625] eta 0:01:38 lr 0.001991 wd 0.0500 time 0.1995 (0.2063) data time 0.0008 (0.0033) model time 0.1987 (0.2028) loss 3.2862 (3.6838) grad_norm 0.8339 (1.2244) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][160/625] eta 0:01:36 lr 0.001991 wd 0.0500 time 0.1961 (0.2073) data time 0.0008 (0.0032) model time 0.1953 (0.2045) loss 3.5537 (3.6802) grad_norm 2.3193 (1.2364) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][170/625] eta 0:01:34 lr 0.001991 wd 0.0500 time 0.1970 (0.2069) data time 0.0007 (0.0031) model time 0.1963 (0.2041) loss 3.8592 (3.6715) grad_norm 1.5755 (1.2398) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][180/625] eta 0:01:31 lr 0.001991 wd 0.0500 time 0.2006 (0.2066) data time 0.0008 (0.0029) model time 0.1998 (0.2037) loss 3.6993 (3.6537) grad_norm 0.8631 (1.2474) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][190/625] eta 0:01:29 lr 0.001991 wd 0.0500 time 0.2006 (0.2066) data time 0.0009 (0.0028) model time 0.1997 (0.2040) loss 3.4132 (3.6549) grad_norm 1.6890 (1.2524) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][200/625] eta 0:01:27 lr 0.001991 wd 0.0500 time 0.2008 (0.2065) data time 0.0008 (0.0027) model time 0.2000 (0.2039) loss 3.1777 (3.6603) grad_norm 1.7424 (1.2650) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][210/625] eta 0:01:25 lr 0.001991 wd 0.0500 time 0.1940 (0.2067) data time 0.0007 (0.0026) model time 0.1933 (0.2043) loss 2.6891 (3.6563) grad_norm 0.8313 (1.2610) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 14:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 14:53:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 14:53:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 14:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 14:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:00:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:00:46 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 32) [2024-07-29 15:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][220/625] eta 0:08:21 lr 0.001991 wd 0.0500 time 0.2107 (1.2390) data time 0.0008 (0.0987) model time 0.2099 (1.1403) loss 4.0414 (4.1772) grad_norm 0.8646 (1.0884) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][230/625] eta 0:04:24 lr 0.001990 wd 0.0500 time 0.2326 (0.6687) data time 0.0008 (0.0444) model time 0.2318 (0.6243) loss 4.3495 (3.9823) grad_norm 1.2062 (1.1363) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][240/625] eta 0:03:14 lr 0.001990 wd 0.0500 time 0.2188 (0.5064) data time 0.0010 (0.0289) model time 0.2178 (0.4775) loss 3.7544 (3.9784) grad_norm 1.1874 (1.1196) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][250/625] eta 0:02:41 lr 0.001990 wd 0.0500 time 0.2164 (0.4295) data time 0.0010 (0.0219) model time 0.2154 (0.4077) loss 3.6736 (3.9344) grad_norm 1.2528 (1.0847) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][260/625] eta 0:02:20 lr 0.001990 wd 0.0500 time 0.2101 (0.3844) data time 0.0008 (0.0177) model time 0.2093 (0.3667) loss 4.0130 (3.8898) grad_norm 0.9311 (1.0472) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][270/625] eta 0:02:05 lr 0.001990 wd 0.0500 time 0.2081 (0.3543) data time 0.0008 (0.0148) model time 0.2073 (0.3395) loss 3.2257 (3.8547) grad_norm 0.9678 (1.0556) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][280/625] eta 0:01:54 lr 0.001990 wd 0.0500 time 0.2034 (0.3330) data time 0.0008 (0.0128) model time 0.2026 (0.3202) loss 2.7059 (3.8201) grad_norm 2.0456 (1.0972) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][290/625] eta 0:01:46 lr 0.001990 wd 0.0500 time 0.2159 (0.3175) data time 0.0008 (0.0113) model time 0.2151 (0.3062) loss 3.2759 (3.7745) grad_norm 0.9790 (1.1231) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][300/625] eta 0:01:39 lr 0.001990 wd 0.0500 time 0.2251 (0.3057) data time 0.0010 (0.0101) model time 0.2241 (0.2955) loss 3.9385 (3.7528) grad_norm 0.8521 (1.1200) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][310/625] eta 0:01:33 lr 0.001990 wd 0.0500 time 0.2108 (0.2959) data time 0.0008 (0.0092) model time 0.2101 (0.2867) loss 4.4896 (3.7661) grad_norm 1.4714 (1.1529) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][320/625] eta 0:01:27 lr 0.001990 wd 0.0500 time 0.2063 (0.2880) data time 0.0009 (0.0084) model time 0.2054 (0.2796) loss 2.9661 (3.7838) grad_norm 0.7444 (1.1761) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][330/625] eta 0:01:23 lr 0.001990 wd 0.0500 time 0.2211 (0.2816) data time 0.0010 (0.0078) model time 0.2201 (0.2738) loss 3.7301 (3.7787) grad_norm 0.7719 (1.1955) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][340/625] eta 0:01:18 lr 0.001990 wd 0.0500 time 0.2097 (0.2761) data time 0.0007 (0.0073) model time 0.2089 (0.2688) loss 3.7541 (3.7713) grad_norm 1.7299 (1.2089) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][350/625] eta 0:01:14 lr 0.001990 wd 0.0500 time 0.2144 (0.2713) data time 0.0012 (0.0068) model time 0.2132 (0.2645) loss 3.6722 (3.7609) grad_norm 0.8892 (1.2125) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][360/625] eta 0:01:10 lr 0.001990 wd 0.0500 time 0.2018 (0.2671) data time 0.0008 (0.0064) model time 0.2010 (0.2606) loss 4.5117 (3.7557) grad_norm 1.5830 (1.2364) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][370/625] eta 0:01:07 lr 0.001990 wd 0.0500 time 0.2062 (0.2635) data time 0.0008 (0.0061) model time 0.2055 (0.2574) loss 2.8132 (3.7447) grad_norm 0.9628 (1.2415) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:01:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:01:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:04:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:04:32 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:04:50 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:04:51 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:04:51 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:04:51 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 32) [2024-07-29 15:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][380/625] eta 0:04:36 lr 0.001990 wd 0.0500 time 0.2022 (1.1279) data time 0.0007 (0.1198) model time 0.2015 (1.0082) loss 4.1185 (4.2090) grad_norm 1.1101 (1.1044) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:05:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:05:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:05:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:09:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:09:14 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:09:24 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:09:25 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:09:25 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:09:25 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 32) [2024-07-29 15:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][390/625] eta 0:08:12 lr 0.001990 wd 0.0500 time 0.1985 (2.0974) data time 0.0007 (0.1809) model time 0.1978 (1.9165) loss 4.1337 (4.0358) grad_norm 0.9992 (1.0366) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][400/625] eta 0:02:47 lr 0.001990 wd 0.0500 time 0.2026 (0.7427) data time 0.0006 (0.0524) model time 0.2020 (0.6903) loss 4.1873 (3.9618) grad_norm 1.3106 (1.1843) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][410/625] eta 0:01:51 lr 0.001990 wd 0.0500 time 0.2095 (0.5171) data time 0.0009 (0.0310) model time 0.2086 (0.4861) loss 3.8743 (4.0097) grad_norm 1.4411 (1.1961) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][420/625] eta 0:01:26 lr 0.001990 wd 0.0500 time 0.2012 (0.4243) data time 0.0007 (0.0221) model time 0.2005 (0.4022) loss 2.6786 (3.9626) grad_norm 0.8125 (1.1419) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][430/625] eta 0:01:12 lr 0.001990 wd 0.0500 time 0.1960 (0.3731) data time 0.0007 (0.0173) model time 0.1953 (0.3558) loss 3.5876 (3.9072) grad_norm 0.9015 (1.1691) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][440/625] eta 0:01:03 lr 0.001990 wd 0.0500 time 0.2012 (0.3411) data time 0.0007 (0.0143) model time 0.2005 (0.3268) loss 4.2077 (3.9022) grad_norm 0.9654 (1.1502) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][450/625] eta 0:00:55 lr 0.001990 wd 0.0500 time 0.1990 (0.3188) data time 0.0006 (0.0122) model time 0.1984 (0.3066) loss 4.0797 (3.8381) grad_norm 0.9093 (1.1694) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][460/625] eta 0:00:49 lr 0.001990 wd 0.0500 time 0.2004 (0.3028) data time 0.0007 (0.0107) model time 0.1997 (0.2922) loss 3.9182 (3.8120) grad_norm 0.8280 (1.1577) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][470/625] eta 0:00:45 lr 0.001990 wd 0.0500 time 0.2000 (0.2906) data time 0.0009 (0.0095) model time 0.1991 (0.2811) loss 3.9809 (3.7856) grad_norm 1.5947 (1.1734) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][480/625] eta 0:00:40 lr 0.001990 wd 0.0500 time 0.1982 (0.2811) data time 0.0009 (0.0086) model time 0.1973 (0.2725) loss 3.3797 (3.7706) grad_norm 0.9262 (1.1641) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][490/625] eta 0:00:36 lr 0.001990 wd 0.0500 time 0.2069 (0.2735) data time 0.0010 (0.0078) model time 0.2059 (0.2656) loss 3.9212 (3.8089) grad_norm 0.7866 (1.1454) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][500/625] eta 0:00:33 lr 0.001990 wd 0.0500 time 0.1980 (0.2671) data time 0.0009 (0.0072) model time 0.1971 (0.2599) loss 3.7216 (3.7883) grad_norm 1.1040 (1.1485) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][510/625] eta 0:00:30 lr 0.001990 wd 0.0500 time 0.2010 (0.2620) data time 0.0008 (0.0067) model time 0.2001 (0.2552) loss 3.7225 (3.7864) grad_norm 0.8570 (1.1538) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][520/625] eta 0:00:27 lr 0.001990 wd 0.0500 time 0.2072 (0.2575) data time 0.0009 (0.0063) model time 0.2064 (0.2512) loss 4.2672 (3.7824) grad_norm 1.4325 (1.1489) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][530/625] eta 0:00:24 lr 0.001990 wd 0.0500 time 0.2033 (0.2535) data time 0.0007 (0.0059) model time 0.2026 (0.2476) loss 3.5187 (3.7578) grad_norm 1.1323 (1.1361) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][540/625] eta 0:00:21 lr 0.001990 wd 0.0500 time 0.2035 (0.2501) data time 0.0006 (0.0056) model time 0.2029 (0.2445) loss 3.3342 (3.7446) grad_norm 1.7239 (1.1826) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][550/625] eta 0:00:18 lr 0.001990 wd 0.0500 time 0.2016 (0.2470) data time 0.0006 (0.0053) model time 0.2009 (0.2417) loss 3.7804 (3.7408) grad_norm 0.9590 (1.1835) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][560/625] eta 0:00:15 lr 0.001990 wd 0.0500 time 0.2005 (0.2445) data time 0.0009 (0.0051) model time 0.1996 (0.2394) loss 2.0895 (3.7250) grad_norm 1.1416 (1.1755) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][570/625] eta 0:00:13 lr 0.001990 wd 0.0500 time 0.2202 (0.2426) data time 0.0007 (0.0048) model time 0.2196 (0.2377) loss 3.4265 (3.7225) grad_norm 1.0569 (1.1805) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][580/625] eta 0:00:10 lr 0.001990 wd 0.0500 time 0.2460 (0.2407) data time 0.0007 (0.0046) model time 0.2452 (0.2360) loss 3.1338 (3.7196) grad_norm 1.4755 (1.1873) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][590/625] eta 0:00:08 lr 0.001990 wd 0.0500 time 0.2022 (0.2387) data time 0.0009 (0.0044) model time 0.2014 (0.2343) loss 3.9012 (3.7050) grad_norm 1.6459 (1.1966) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][600/625] eta 0:00:05 lr 0.001990 wd 0.0500 time 0.2101 (0.2370) data time 0.0005 (0.0043) model time 0.2095 (0.2327) loss 3.7637 (3.6989) grad_norm 2.5208 (1.2044) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][610/625] eta 0:00:03 lr 0.001990 wd 0.0500 time 0.2022 (0.2354) data time 0.0006 (0.0041) model time 0.2017 (0.2312) loss 3.9958 (3.6993) grad_norm 1.3309 (1.2081) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][620/625] eta 0:00:01 lr 0.001990 wd 0.0500 time 0.2043 (0.2340) data time 0.0003 (0.0040) model time 0.2039 (0.2300) loss 2.8470 (3.6859) grad_norm 0.8961 (1.2004) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 32 training takes 0:00:55 [2024-07-29 15:10:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:10:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.394 (0.394) Loss 0.8687 (0.8687) Acc@1 83.301 (83.301) Acc@5 96.436 (96.436) Mem 8977MB [2024-07-29 15:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.4629 (1.0734) Acc@1 68.701 (77.832) Acc@5 88.965 (94.647) Mem 8977MB [2024-07-29 15:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.6260 (1.2783) Acc@1 65.723 (73.112) Acc@5 86.963 (91.874) Mem 8977MB [2024-07-29 15:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.825 Acc@5 91.869 [2024-07-29 15:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.8% [2024-07-29 15:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 72.82% [2024-07-29 15:10:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 15:10:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 15:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:14:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:14:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth [2024-07-29 15:14:28 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth.................... [2024-07-29 15:14:29 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:14:29 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:14:29 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth' (epoch 32) [2024-07-29 15:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][0/625] eta 1:09:56 lr 0.001989 wd 0.0500 time 6.7145 (6.7145) data time 0.7256 (0.7256) model time 0.0000 (0.0000) loss 4.4627 (4.4627) grad_norm 1.1549 (1.1549) loss_scale 16384.0000 (16384.0000) mem 10976MB [2024-07-29 15:14:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][10/625] eta 0:08:40 lr 0.001989 wd 0.0500 time 0.1971 (0.8460) data time 0.0010 (0.0668) model time 0.0000 (0.0000) loss 2.9545 (3.9656) grad_norm 1.4209 (1.2590) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][20/625] eta 0:05:25 lr 0.001989 wd 0.0500 time 0.1983 (0.5382) data time 0.0009 (0.0354) model time 0.0000 (0.0000) loss 3.5092 (3.8869) grad_norm 1.0064 (1.3562) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][30/625] eta 0:04:14 lr 0.001989 wd 0.0500 time 0.1980 (0.4284) data time 0.0006 (0.0243) model time 0.0000 (0.0000) loss 2.7889 (3.8756) grad_norm 1.7419 (1.3448) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][40/625] eta 0:03:37 lr 0.001989 wd 0.0500 time 0.2012 (0.3725) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 3.8248 (3.8298) grad_norm 1.0184 (1.3500) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][50/625] eta 0:03:14 lr 0.001989 wd 0.0500 time 0.2003 (0.3385) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 4.0698 (3.8340) grad_norm 1.4412 (1.3796) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][60/625] eta 0:02:58 lr 0.001989 wd 0.0500 time 0.1986 (0.3163) data time 0.0009 (0.0128) model time 0.1977 (0.2020) loss 3.9606 (3.8085) grad_norm 1.4648 (1.4107) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][70/625] eta 0:02:46 lr 0.001989 wd 0.0500 time 0.1993 (0.2999) data time 0.0008 (0.0112) model time 0.1984 (0.2002) loss 3.3281 (3.7547) grad_norm 1.0266 (1.3839) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][80/625] eta 0:02:36 lr 0.001989 wd 0.0500 time 0.1997 (0.2875) data time 0.0009 (0.0099) model time 0.1988 (0.1996) loss 3.0072 (3.7399) grad_norm 1.4321 (1.3580) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][90/625] eta 0:02:28 lr 0.001989 wd 0.0500 time 0.2093 (0.2780) data time 0.0007 (0.0089) model time 0.2086 (0.1998) loss 4.2459 (3.7218) grad_norm 1.2109 (1.3543) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][100/625] eta 0:02:22 lr 0.001989 wd 0.0500 time 0.1955 (0.2714) data time 0.0007 (0.0081) model time 0.1947 (0.2020) loss 4.0815 (3.7289) grad_norm 0.8770 (1.3320) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][110/625] eta 0:02:16 lr 0.001989 wd 0.0500 time 0.2076 (0.2650) data time 0.0009 (0.0075) model time 0.2067 (0.2016) loss 3.1975 (3.7292) grad_norm 1.0465 (1.3245) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][120/625] eta 0:02:11 lr 0.001989 wd 0.0500 time 0.1989 (0.2596) data time 0.0007 (0.0069) model time 0.1982 (0.2012) loss 2.8901 (3.7356) grad_norm 1.7216 (1.3387) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][130/625] eta 0:02:06 lr 0.001989 wd 0.0500 time 0.1988 (0.2551) data time 0.0010 (0.0065) model time 0.1979 (0.2009) loss 3.9338 (3.7288) grad_norm 1.3166 (1.3261) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][140/625] eta 0:02:01 lr 0.001989 wd 0.0500 time 0.1978 (0.2511) data time 0.0008 (0.0061) model time 0.1970 (0.2006) loss 4.1398 (3.7197) grad_norm 1.6667 (1.3076) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][150/625] eta 0:01:57 lr 0.001989 wd 0.0500 time 0.1984 (0.2477) data time 0.0009 (0.0057) model time 0.1975 (0.2005) loss 3.1233 (3.7066) grad_norm 1.1420 (1.3101) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][160/625] eta 0:01:53 lr 0.001989 wd 0.0500 time 0.2009 (0.2448) data time 0.0008 (0.0054) model time 0.2000 (0.2004) loss 4.2252 (3.7108) grad_norm 1.2923 (1.3066) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][170/625] eta 0:01:50 lr 0.001989 wd 0.0500 time 0.1979 (0.2422) data time 0.0009 (0.0052) model time 0.1970 (0.2003) loss 3.4994 (3.6947) grad_norm 1.0378 (1.2966) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][180/625] eta 0:01:46 lr 0.001989 wd 0.0500 time 0.2035 (0.2399) data time 0.0009 (0.0049) model time 0.2026 (0.2003) loss 4.1951 (3.6820) grad_norm 1.6928 (1.2879) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][190/625] eta 0:01:43 lr 0.001989 wd 0.0500 time 0.1979 (0.2380) data time 0.0009 (0.0047) model time 0.1969 (0.2005) loss 3.2846 (3.6843) grad_norm 0.9409 (1.2859) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][200/625] eta 0:01:40 lr 0.001989 wd 0.0500 time 0.2009 (0.2362) data time 0.0009 (0.0045) model time 0.2001 (0.2005) loss 3.2315 (3.6756) grad_norm 0.9921 (1.2871) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][210/625] eta 0:01:37 lr 0.001989 wd 0.0500 time 0.2036 (0.2344) data time 0.0009 (0.0044) model time 0.2027 (0.2004) loss 4.2565 (3.6711) grad_norm 2.2884 (1.2881) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][220/625] eta 0:01:34 lr 0.001989 wd 0.0500 time 0.2080 (0.2329) data time 0.0006 (0.0042) model time 0.2074 (0.2004) loss 3.8103 (3.6693) grad_norm 1.6311 (1.2928) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][230/625] eta 0:01:31 lr 0.001989 wd 0.0500 time 0.2003 (0.2315) data time 0.0007 (0.0041) model time 0.1996 (0.2003) loss 2.1251 (3.6618) grad_norm 0.9062 (1.2825) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][240/625] eta 0:01:28 lr 0.001989 wd 0.0500 time 0.1999 (0.2306) data time 0.0007 (0.0039) model time 0.1993 (0.2007) loss 3.8296 (3.6605) grad_norm 0.9065 (1.2679) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][250/625] eta 0:01:26 lr 0.001989 wd 0.0500 time 0.1987 (0.2295) data time 0.0007 (0.0038) model time 0.1979 (0.2008) loss 4.0055 (3.6534) grad_norm 1.2615 (1.2598) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][260/625] eta 0:01:23 lr 0.001989 wd 0.0500 time 0.2067 (0.2286) data time 0.0007 (0.0037) model time 0.2060 (0.2010) loss 3.4659 (3.6434) grad_norm 0.8586 (1.2577) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][270/625] eta 0:01:20 lr 0.001989 wd 0.0500 time 0.2025 (0.2277) data time 0.0007 (0.0036) model time 0.2018 (0.2012) loss 4.1729 (3.6383) grad_norm 0.8219 (1.2601) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][280/625] eta 0:01:18 lr 0.001989 wd 0.0500 time 0.1999 (0.2268) data time 0.0009 (0.0035) model time 0.1990 (0.2011) loss 3.3842 (3.6438) grad_norm 1.5915 (1.2589) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][290/625] eta 0:01:15 lr 0.001989 wd 0.0500 time 0.1995 (0.2260) data time 0.0007 (0.0034) model time 0.1988 (0.2012) loss 2.3142 (3.6355) grad_norm 0.8622 (1.2498) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][300/625] eta 0:01:13 lr 0.001989 wd 0.0500 time 0.2016 (0.2252) data time 0.0008 (0.0033) model time 0.2008 (0.2012) loss 3.8170 (3.6282) grad_norm 1.2990 (1.2511) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][310/625] eta 0:01:10 lr 0.001989 wd 0.0500 time 0.2012 (0.2244) data time 0.0008 (0.0032) model time 0.2004 (0.2011) loss 4.0607 (3.6276) grad_norm 0.9890 (1.2512) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][320/625] eta 0:01:08 lr 0.001989 wd 0.0500 time 0.2034 (0.2239) data time 0.0007 (0.0032) model time 0.2027 (0.2013) loss 4.3792 (3.6386) grad_norm 0.8917 (1.2536) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][330/625] eta 0:01:05 lr 0.001989 wd 0.0500 time 0.1973 (0.2233) data time 0.0007 (0.0031) model time 0.1966 (0.2014) loss 2.5556 (3.6391) grad_norm 1.5870 (1.2536) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][340/625] eta 0:01:03 lr 0.001989 wd 0.0500 time 0.2016 (0.2226) data time 0.0008 (0.0030) model time 0.2008 (0.2014) loss 4.0068 (3.6418) grad_norm 0.9184 (1.2548) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][350/625] eta 0:01:01 lr 0.001989 wd 0.0500 time 0.2059 (0.2221) data time 0.0006 (0.0030) model time 0.2053 (0.2014) loss 4.2008 (3.6419) grad_norm 0.8932 (1.2532) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][360/625] eta 0:00:58 lr 0.001989 wd 0.0500 time 0.2009 (0.2215) data time 0.0007 (0.0029) model time 0.2003 (0.2014) loss 3.5287 (3.6454) grad_norm 0.9504 (1.2532) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][370/625] eta 0:00:56 lr 0.001989 wd 0.0500 time 0.2008 (0.2210) data time 0.0008 (0.0029) model time 0.2000 (0.2014) loss 2.8266 (3.6439) grad_norm 1.6779 (1.2575) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][380/625] eta 0:00:54 lr 0.001988 wd 0.0500 time 0.2064 (0.2205) data time 0.0006 (0.0028) model time 0.2057 (0.2014) loss 2.8952 (3.6413) grad_norm 1.1241 (1.2599) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:15:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][390/625] eta 0:00:51 lr 0.001988 wd 0.0500 time 0.2338 (0.2208) data time 0.0007 (0.0027) model time 0.2332 (0.2022) loss 4.5872 (3.6384) grad_norm 0.8487 (1.2562) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][400/625] eta 0:00:49 lr 0.001988 wd 0.0500 time 0.2003 (0.2203) data time 0.0008 (0.0027) model time 0.1995 (0.2022) loss 3.8205 (3.6447) grad_norm 1.4022 (1.2517) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][410/625] eta 0:00:47 lr 0.001988 wd 0.0500 time 0.2030 (0.2199) data time 0.0009 (0.0027) model time 0.2020 (0.2023) loss 3.9290 (3.6505) grad_norm 1.3998 (1.2527) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][420/625] eta 0:00:45 lr 0.001988 wd 0.0500 time 0.2020 (0.2201) data time 0.0007 (0.0026) model time 0.2013 (0.2029) loss 4.3947 (3.6486) grad_norm 1.1838 (1.2581) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:16:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][430/625] eta 0:00:42 lr 0.001988 wd 0.0500 time 0.1954 (0.2197) data time 0.0008 (0.0026) model time 0.1947 (0.2029) loss 3.8495 (3.6554) grad_norm 1.5077 (1.2560) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:16:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:18:08 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 33) [2024-07-29 15:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:18:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][440/625] eta 0:09:09 lr 0.001988 wd 0.0500 time 0.2063 (2.9684) data time 0.0007 (0.2960) model time 0.2056 (2.6724) loss 3.3348 (3.9043) grad_norm 1.4262 (1.2794) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][450/625] eta 0:02:28 lr 0.001988 wd 0.0500 time 0.2487 (0.8509) data time 0.0009 (0.0692) model time 0.2478 (0.7817) loss 3.6015 (3.8995) grad_norm 1.5674 (1.1572) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][460/625] eta 0:01:34 lr 0.001988 wd 0.0500 time 0.2151 (0.5740) data time 0.0010 (0.0397) model time 0.2141 (0.5343) loss 4.5485 (3.9166) grad_norm 0.7916 (1.1428) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][470/625] eta 0:01:11 lr 0.001988 wd 0.0500 time 0.2138 (0.4643) data time 0.0010 (0.0280) model time 0.2128 (0.4363) loss 4.7427 (3.9555) grad_norm 1.0918 (1.1018) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][480/625] eta 0:00:58 lr 0.001988 wd 0.0500 time 0.2373 (0.4058) data time 0.0010 (0.0218) model time 0.2363 (0.3840) loss 3.7555 (3.8846) grad_norm 0.9228 (1.0932) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][490/625] eta 0:00:49 lr 0.001988 wd 0.0500 time 0.2102 (0.3696) data time 0.0011 (0.0179) model time 0.2091 (0.3516) loss 3.7449 (3.8612) grad_norm 1.4300 (1.0792) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][500/625] eta 0:00:43 lr 0.001988 wd 0.0500 time 0.2059 (0.3441) data time 0.0011 (0.0153) model time 0.2048 (0.3289) loss 3.3592 (3.8232) grad_norm 1.0432 (1.1063) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][510/625] eta 0:00:37 lr 0.001988 wd 0.0500 time 0.2137 (0.3259) data time 0.0011 (0.0134) model time 0.2125 (0.3125) loss 3.8084 (3.7722) grad_norm 0.9961 (1.1153) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][520/625] eta 0:00:32 lr 0.001988 wd 0.0500 time 0.2713 (0.3128) data time 0.0008 (0.0119) model time 0.2705 (0.3009) loss 2.9760 (3.7453) grad_norm 0.9367 (1.1123) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][530/625] eta 0:00:28 lr 0.001988 wd 0.0500 time 0.2123 (0.3036) data time 0.0008 (0.0110) model time 0.2115 (0.2926) loss 4.4324 (3.7441) grad_norm 0.9372 (1.1023) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][540/625] eta 0:00:25 lr 0.001988 wd 0.0500 time 0.2061 (0.2948) data time 0.0009 (0.0100) model time 0.2052 (0.2847) loss 4.5298 (3.7706) grad_norm 0.8910 (1.0976) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][550/625] eta 0:00:21 lr 0.001988 wd 0.0500 time 0.2094 (0.2874) data time 0.0009 (0.0093) model time 0.2085 (0.2781) loss 3.4279 (3.7627) grad_norm 1.0804 (1.0952) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][560/625] eta 0:00:18 lr 0.001988 wd 0.0500 time 0.2172 (0.2811) data time 0.0012 (0.0086) model time 0.2160 (0.2725) loss 3.8337 (3.7624) grad_norm 1.1149 (1.1349) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][570/625] eta 0:00:15 lr 0.001988 wd 0.0500 time 0.2133 (0.2757) data time 0.0010 (0.0080) model time 0.2123 (0.2676) loss 3.9203 (3.7507) grad_norm 2.1524 (1.1701) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][580/625] eta 0:00:12 lr 0.001988 wd 0.0500 time 0.2087 (0.2709) data time 0.0009 (0.0075) model time 0.2078 (0.2634) loss 4.2195 (3.7330) grad_norm 1.0117 (1.1885) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][590/625] eta 0:00:09 lr 0.001988 wd 0.0500 time 0.2040 (0.2670) data time 0.0008 (0.0071) model time 0.2032 (0.2599) loss 3.5970 (3.7194) grad_norm 1.1607 (1.1880) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][600/625] eta 0:00:06 lr 0.001988 wd 0.0500 time 0.2211 (0.2636) data time 0.0009 (0.0068) model time 0.2202 (0.2569) loss 3.0723 (3.7201) grad_norm 1.8375 (1.2000) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][610/625] eta 0:00:03 lr 0.001988 wd 0.0500 time 0.2033 (0.2605) data time 0.0008 (0.0065) model time 0.2025 (0.2541) loss 3.9513 (3.7164) grad_norm 1.1142 (1.2012) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][620/625] eta 0:00:01 lr 0.001988 wd 0.0500 time 0.2127 (0.2576) data time 0.0007 (0.0061) model time 0.2120 (0.2514) loss 3.8948 (3.7019) grad_norm 1.1269 (1.1998) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 33 training takes 0:00:47 [2024-07-29 15:19:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:19:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.8730 (0.8730) Acc@1 84.033 (84.033) Acc@5 96.826 (96.826) Mem 8975MB [2024-07-29 15:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.4287 (1.0659) Acc@1 69.678 (78.125) Acc@5 89.990 (94.802) Mem 8975MB [2024-07-29 15:19:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.5381 (1.2789) Acc@1 66.699 (73.224) Acc@5 88.428 (91.869) Mem 8975MB [2024-07-29 15:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.043 Acc@5 91.873 [2024-07-29 15:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.0% [2024-07-29 15:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.04% [2024-07-29 15:19:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 15:19:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 15:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 2.8379 (2.8379) Acc@1 44.629 (44.629) Acc@5 68.457 (68.457) Mem 8975MB [2024-07-29 15:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 3.3301 (3.0659) Acc@1 34.570 (38.037) Acc@5 58.008 (64.040) Mem 8975MB [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 3.6641 (3.2528) Acc@1 29.297 (35.914) Acc@5 53.027 (60.889) Mem 8975MB [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 35.805 Acc@5 60.962 [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 35.8% [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 35.80% [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:19:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][0/625] eta 0:12:36 lr 0.001988 wd 0.0500 time 1.2103 (1.2103) data time 0.5394 (0.5394) model time 0.0000 (0.0000) loss 4.1010 (4.1010) grad_norm 1.1207 (1.1207) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 15:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][10/625] eta 0:03:04 lr 0.001988 wd 0.0500 time 0.2086 (0.3004) data time 0.0010 (0.0500) model time 0.0000 (0.0000) loss 2.8773 (3.5816) grad_norm 1.8608 (1.2235) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][20/625] eta 0:02:36 lr 0.001988 wd 0.0500 time 0.2145 (0.2594) data time 0.0010 (0.0268) model time 0.0000 (0.0000) loss 3.9071 (3.4650) grad_norm 0.9953 (1.1485) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][30/625] eta 0:02:25 lr 0.001988 wd 0.0500 time 0.2223 (0.2437) data time 0.0010 (0.0185) model time 0.0000 (0.0000) loss 3.6070 (3.4787) grad_norm 1.0029 (1.1911) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][40/625] eta 0:02:18 lr 0.001988 wd 0.0500 time 0.2092 (0.2373) data time 0.0011 (0.0143) model time 0.0000 (0.0000) loss 4.2174 (3.5543) grad_norm 2.1375 (1.2347) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][50/625] eta 0:02:13 lr 0.001988 wd 0.0500 time 0.2096 (0.2322) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 4.4061 (3.5634) grad_norm 0.9808 (1.2445) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][60/625] eta 0:02:11 lr 0.001988 wd 0.0500 time 0.1945 (0.2330) data time 0.0012 (0.0100) model time 0.1933 (0.2357) loss 3.0344 (3.5483) grad_norm 0.7575 (1.2215) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][70/625] eta 0:02:07 lr 0.001988 wd 0.0500 time 0.2134 (0.2297) data time 0.0009 (0.0087) model time 0.2125 (0.2222) loss 3.7624 (3.5087) grad_norm 0.8312 (1.2164) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][80/625] eta 0:02:03 lr 0.001988 wd 0.0500 time 0.2146 (0.2274) data time 0.0007 (0.0078) model time 0.2138 (0.2181) loss 4.6380 (3.5127) grad_norm 1.0407 (1.2048) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][90/625] eta 0:02:00 lr 0.001988 wd 0.0500 time 0.2186 (0.2255) data time 0.0010 (0.0071) model time 0.2175 (0.2159) loss 2.4795 (3.5391) grad_norm 1.3844 (1.2336) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][100/625] eta 0:01:57 lr 0.001988 wd 0.0500 time 0.2095 (0.2239) data time 0.0007 (0.0065) model time 0.2088 (0.2144) loss 4.1855 (3.5519) grad_norm 1.4735 (1.2258) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][110/625] eta 0:01:54 lr 0.001988 wd 0.0500 time 0.2203 (0.2228) data time 0.0010 (0.0060) model time 0.2193 (0.2138) loss 3.3610 (3.5220) grad_norm 0.9076 (1.2098) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][120/625] eta 0:01:52 lr 0.001987 wd 0.0500 time 0.2137 (0.2218) data time 0.0008 (0.0056) model time 0.2129 (0.2132) loss 3.0673 (3.5250) grad_norm 1.2735 (1.2213) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][130/625] eta 0:01:49 lr 0.001987 wd 0.0500 time 0.2150 (0.2212) data time 0.0009 (0.0053) model time 0.2141 (0.2130) loss 3.8955 (3.5558) grad_norm 2.1750 (1.2327) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][140/625] eta 0:01:46 lr 0.001987 wd 0.0500 time 0.2144 (0.2204) data time 0.0009 (0.0050) model time 0.2134 (0.2125) loss 4.0232 (3.5811) grad_norm 1.3292 (1.2254) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][150/625] eta 0:01:44 lr 0.001987 wd 0.0500 time 0.2189 (0.2198) data time 0.0010 (0.0047) model time 0.2180 (0.2123) loss 4.2060 (3.5769) grad_norm 1.3431 (1.2316) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][160/625] eta 0:01:42 lr 0.001987 wd 0.0500 time 0.2063 (0.2194) data time 0.0008 (0.0045) model time 0.2055 (0.2123) loss 3.2153 (3.5820) grad_norm 0.8396 (1.2147) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][170/625] eta 0:01:39 lr 0.001987 wd 0.0500 time 0.2068 (0.2189) data time 0.0011 (0.0043) model time 0.2057 (0.2120) loss 3.4310 (3.5872) grad_norm 1.0546 (1.2045) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][180/625] eta 0:01:37 lr 0.001987 wd 0.0500 time 0.2146 (0.2184) data time 0.0009 (0.0041) model time 0.2137 (0.2118) loss 3.0637 (3.5888) grad_norm 1.2345 (1.2093) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][190/625] eta 0:01:34 lr 0.001987 wd 0.0500 time 0.2099 (0.2181) data time 0.0013 (0.0040) model time 0.2087 (0.2119) loss 3.3439 (3.5895) grad_norm 1.1505 (1.2032) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][200/625] eta 0:01:32 lr 0.001987 wd 0.0500 time 0.2074 (0.2179) data time 0.0008 (0.0038) model time 0.2067 (0.2120) loss 3.0871 (3.5827) grad_norm 1.5467 (1.2050) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][210/625] eta 0:01:30 lr 0.001987 wd 0.0500 time 0.2165 (0.2176) data time 0.0009 (0.0037) model time 0.2156 (0.2119) loss 3.5479 (3.5847) grad_norm 0.9521 (1.2060) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][220/625] eta 0:01:28 lr 0.001987 wd 0.0500 time 0.2056 (0.2184) data time 0.0010 (0.0036) model time 0.2046 (0.2131) loss 4.0746 (3.5989) grad_norm 1.7481 (1.2056) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][230/625] eta 0:01:26 lr 0.001987 wd 0.0500 time 0.2124 (0.2182) data time 0.0011 (0.0035) model time 0.2114 (0.2130) loss 4.3177 (3.5990) grad_norm 1.1788 (1.2289) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][240/625] eta 0:01:23 lr 0.001987 wd 0.0500 time 0.2138 (0.2179) data time 0.0007 (0.0034) model time 0.2131 (0.2129) loss 4.2613 (3.6073) grad_norm 0.8678 (1.2231) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][250/625] eta 0:01:21 lr 0.001987 wd 0.0500 time 0.2101 (0.2177) data time 0.0011 (0.0033) model time 0.2090 (0.2129) loss 3.4624 (3.6195) grad_norm 1.3483 (1.2158) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][260/625] eta 0:01:19 lr 0.001987 wd 0.0500 time 0.2184 (0.2175) data time 0.0010 (0.0032) model time 0.2174 (0.2128) loss 2.9423 (3.6207) grad_norm 1.1582 (1.2085) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][270/625] eta 0:01:17 lr 0.001987 wd 0.0500 time 0.2143 (0.2173) data time 0.0007 (0.0031) model time 0.2136 (0.2127) loss 3.5661 (3.6134) grad_norm 1.8429 (1.2059) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][280/625] eta 0:01:14 lr 0.001987 wd 0.0500 time 0.2152 (0.2174) data time 0.0010 (0.0032) model time 0.2142 (0.2128) loss 3.5816 (3.6037) grad_norm 0.9348 (1.2152) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][290/625] eta 0:01:12 lr 0.001987 wd 0.0500 time 0.2083 (0.2172) data time 0.0011 (0.0031) model time 0.2072 (0.2127) loss 2.5878 (3.5949) grad_norm 1.6909 (1.2109) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][300/625] eta 0:01:10 lr 0.001987 wd 0.0500 time 0.2110 (0.2181) data time 0.0011 (0.0031) model time 0.2099 (0.2139) loss 3.9874 (3.6065) grad_norm 1.0250 (1.2115) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][310/625] eta 0:01:08 lr 0.001987 wd 0.0500 time 0.2106 (0.2179) data time 0.0008 (0.0030) model time 0.2098 (0.2138) loss 3.8719 (3.6050) grad_norm 0.8082 (1.2089) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][320/625] eta 0:01:06 lr 0.001987 wd 0.0500 time 0.2191 (0.2177) data time 0.0010 (0.0030) model time 0.2181 (0.2137) loss 4.0692 (3.6049) grad_norm 0.7776 (1.2057) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][330/625] eta 0:01:04 lr 0.001987 wd 0.0500 time 0.2099 (0.2175) data time 0.0011 (0.0029) model time 0.2088 (0.2135) loss 3.8492 (3.6198) grad_norm 0.9016 (1.2060) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][340/625] eta 0:01:01 lr 0.001987 wd 0.0500 time 0.2073 (0.2173) data time 0.0009 (0.0028) model time 0.2064 (0.2134) loss 2.6179 (3.6056) grad_norm 1.3274 (1.2168) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][350/625] eta 0:00:59 lr 0.001987 wd 0.0500 time 0.2093 (0.2172) data time 0.0010 (0.0028) model time 0.2083 (0.2134) loss 4.1713 (3.6037) grad_norm 1.2733 (1.2176) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][360/625] eta 0:00:57 lr 0.001987 wd 0.0500 time 0.2120 (0.2172) data time 0.0007 (0.0028) model time 0.2113 (0.2134) loss 4.3159 (3.6052) grad_norm 1.3687 (1.2253) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][370/625] eta 0:00:55 lr 0.001987 wd 0.0500 time 0.2092 (0.2171) data time 0.0012 (0.0028) model time 0.2080 (0.2134) loss 4.0437 (3.6110) grad_norm 0.9055 (1.2322) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][380/625] eta 0:00:53 lr 0.001987 wd 0.0500 time 0.2059 (0.2169) data time 0.0009 (0.0027) model time 0.2050 (0.2132) loss 4.3921 (3.6170) grad_norm 1.0923 (1.2300) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][390/625] eta 0:00:50 lr 0.001987 wd 0.0500 time 0.2101 (0.2167) data time 0.0010 (0.0027) model time 0.2091 (0.2131) loss 3.8246 (3.6148) grad_norm 0.9073 (1.2257) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][400/625] eta 0:00:48 lr 0.001987 wd 0.0500 time 0.2147 (0.2166) data time 0.0011 (0.0026) model time 0.2136 (0.2130) loss 3.5987 (3.6171) grad_norm 0.7938 (1.2180) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][410/625] eta 0:00:46 lr 0.001987 wd 0.0500 time 0.2069 (0.2165) data time 0.0011 (0.0026) model time 0.2058 (0.2130) loss 3.3002 (3.6158) grad_norm 1.5430 (1.2160) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][420/625] eta 0:00:44 lr 0.001987 wd 0.0500 time 0.2045 (0.2164) data time 0.0010 (0.0026) model time 0.2035 (0.2129) loss 3.4100 (3.6146) grad_norm 1.0215 (1.2207) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][430/625] eta 0:00:42 lr 0.001987 wd 0.0500 time 0.2057 (0.2163) data time 0.0011 (0.0025) model time 0.2046 (0.2129) loss 3.4489 (3.6159) grad_norm 1.0560 (1.2188) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][440/625] eta 0:00:40 lr 0.001987 wd 0.0500 time 0.2119 (0.2163) data time 0.0008 (0.0025) model time 0.2111 (0.2129) loss 2.9965 (3.6182) grad_norm 0.7837 (1.2220) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][450/625] eta 0:00:37 lr 0.001987 wd 0.0500 time 0.2094 (0.2162) data time 0.0011 (0.0025) model time 0.2083 (0.2129) loss 3.8215 (3.6210) grad_norm 0.9809 (1.2200) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][460/625] eta 0:00:35 lr 0.001986 wd 0.0500 time 0.2146 (0.2162) data time 0.0011 (0.0024) model time 0.2135 (0.2129) loss 3.8348 (3.6158) grad_norm 1.5640 (1.2247) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][470/625] eta 0:00:33 lr 0.001986 wd 0.0500 time 0.2239 (0.2162) data time 0.0008 (0.0024) model time 0.2230 (0.2130) loss 3.1176 (3.6126) grad_norm 1.3578 (1.2260) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][480/625] eta 0:00:31 lr 0.001986 wd 0.0500 time 0.2118 (0.2161) data time 0.0009 (0.0024) model time 0.2109 (0.2130) loss 3.6430 (3.6126) grad_norm 0.9571 (1.2269) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][490/625] eta 0:00:29 lr 0.001986 wd 0.0500 time 0.2100 (0.2160) data time 0.0010 (0.0023) model time 0.2091 (0.2129) loss 4.1064 (3.6173) grad_norm 1.5360 (1.2274) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][500/625] eta 0:00:26 lr 0.001986 wd 0.0500 time 0.2139 (0.2159) data time 0.0007 (0.0023) model time 0.2132 (0.2128) loss 3.0322 (3.6179) grad_norm 0.9353 (1.2255) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][510/625] eta 0:00:24 lr 0.001986 wd 0.0500 time 0.2140 (0.2159) data time 0.0009 (0.0023) model time 0.2131 (0.2128) loss 4.0340 (3.6114) grad_norm 1.7427 (1.2249) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][520/625] eta 0:00:22 lr 0.001986 wd 0.0500 time 0.2082 (0.2158) data time 0.0013 (0.0023) model time 0.2069 (0.2127) loss 3.7350 (3.6116) grad_norm 1.7226 (1.2253) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][530/625] eta 0:00:20 lr 0.001986 wd 0.0500 time 0.2088 (0.2157) data time 0.0008 (0.0023) model time 0.2081 (0.2127) loss 4.1112 (3.6046) grad_norm 1.1433 (1.2286) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][540/625] eta 0:00:18 lr 0.001986 wd 0.0500 time 0.2168 (0.2156) data time 0.0010 (0.0022) model time 0.2158 (0.2126) loss 3.4767 (3.6031) grad_norm 0.8051 (1.2275) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][550/625] eta 0:00:16 lr 0.001986 wd 0.0500 time 0.2160 (0.2156) data time 0.0010 (0.0022) model time 0.2150 (0.2127) loss 3.6721 (3.6084) grad_norm 1.0667 (1.2269) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][560/625] eta 0:00:14 lr 0.001986 wd 0.0500 time 0.2099 (0.2155) data time 0.0007 (0.0022) model time 0.2091 (0.2126) loss 2.4044 (3.6087) grad_norm 1.4140 (1.2280) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][570/625] eta 0:00:11 lr 0.001986 wd 0.0500 time 0.2079 (0.2154) data time 0.0009 (0.0022) model time 0.2069 (0.2125) loss 4.2852 (3.6090) grad_norm 0.9981 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][580/625] eta 0:00:09 lr 0.001986 wd 0.0500 time 0.2159 (0.2154) data time 0.0010 (0.0022) model time 0.2149 (0.2125) loss 3.8343 (3.6117) grad_norm 1.1229 (1.2278) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][590/625] eta 0:00:07 lr 0.001986 wd 0.0500 time 0.2143 (0.2153) data time 0.0008 (0.0021) model time 0.2135 (0.2125) loss 4.4320 (3.6159) grad_norm 0.8817 (1.2296) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][600/625] eta 0:00:05 lr 0.001986 wd 0.0500 time 0.2054 (0.2152) data time 0.0009 (0.0021) model time 0.2045 (0.2124) loss 4.1291 (3.6161) grad_norm 1.1959 (1.2318) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][610/625] eta 0:00:03 lr 0.001986 wd 0.0500 time 0.2102 (0.2152) data time 0.0005 (0.0021) model time 0.2097 (0.2124) loss 3.8943 (3.6142) grad_norm 1.7013 (1.2359) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][620/625] eta 0:00:01 lr 0.001986 wd 0.0500 time 0.2096 (0.2151) data time 0.0007 (0.0021) model time 0.2089 (0.2124) loss 3.9727 (3.6103) grad_norm 0.8517 (1.2344) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 34 training takes 0:02:14 [2024-07-29 15:21:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:21:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.539 (0.539) Loss 0.8999 (0.8999) Acc@1 83.643 (83.643) Acc@5 96.631 (96.631) Mem 8978MB [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.104) Loss 1.4619 (1.0672) Acc@1 68.896 (78.209) Acc@5 89.551 (94.882) Mem 8978MB [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.6465 (1.2745) Acc@1 64.648 (73.689) Acc@5 86.572 (92.097) Mem 8978MB [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.520 Acc@5 92.097 [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.5% [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.52% [2024-07-29 15:21:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 15:21:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 15:21:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.529 (0.529) Loss 2.5391 (2.5391) Acc@1 49.170 (49.170) Acc@5 72.998 (72.998) Mem 8978MB [2024-07-29 15:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 3.0605 (2.7869) Acc@1 37.695 (42.081) Acc@5 63.477 (68.701) Mem 8978MB [2024-07-29 15:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 3.3867 (2.9789) Acc@1 33.008 (39.823) Acc@5 57.275 (65.337) Mem 8978MB [2024-07-29 15:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 39.679 Acc@5 65.355 [2024-07-29 15:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 39.7% [2024-07-29 15:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 39.68% [2024-07-29 15:21:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:21:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][0/625] eta 0:06:57 lr 0.001986 wd 0.0500 time 0.6677 (0.6677) data time 0.4499 (0.4499) model time 0.0000 (0.0000) loss 2.7200 (2.7200) grad_norm 1.1755 (1.1755) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][10/625] eta 0:02:35 lr 0.001986 wd 0.0500 time 0.2215 (0.2535) data time 0.0010 (0.0420) model time 0.0000 (0.0000) loss 2.8211 (3.5415) grad_norm 0.9667 (1.2309) loss_scale 32768.0000 (26810.1818) mem 8978MB [2024-07-29 15:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][20/625] eta 0:02:21 lr 0.001986 wd 0.0500 time 0.2112 (0.2341) data time 0.0009 (0.0225) model time 0.0000 (0.0000) loss 3.6438 (3.5039) grad_norm 1.2518 (1.3480) loss_scale 32768.0000 (29647.2381) mem 8978MB [2024-07-29 15:21:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][30/625] eta 0:02:14 lr 0.001986 wd 0.0500 time 0.2122 (0.2267) data time 0.0007 (0.0156) model time 0.0000 (0.0000) loss 4.5185 (3.5398) grad_norm 0.9584 (1.2681) loss_scale 32768.0000 (30653.9355) mem 8978MB [2024-07-29 15:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][40/625] eta 0:02:10 lr 0.001986 wd 0.0500 time 0.2062 (0.2228) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 3.1495 (3.5146) grad_norm 2.0780 (1.2426) loss_scale 32768.0000 (31169.5610) mem 8978MB [2024-07-29 15:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][50/625] eta 0:02:06 lr 0.001986 wd 0.0500 time 0.2160 (0.2202) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 2.6162 (3.5460) grad_norm 1.0587 (1.2089) loss_scale 32768.0000 (31482.9804) mem 8978MB [2024-07-29 15:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][60/625] eta 0:02:04 lr 0.001986 wd 0.0500 time 0.2482 (0.2195) data time 0.0007 (0.0085) model time 0.2474 (0.2148) loss 4.0110 (3.5700) grad_norm 0.9205 (1.1828) loss_scale 32768.0000 (31693.6393) mem 8978MB [2024-07-29 15:22:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][70/625] eta 0:02:01 lr 0.001986 wd 0.0500 time 0.2144 (0.2184) data time 0.0011 (0.0074) model time 0.2133 (0.2126) loss 3.9977 (3.5536) grad_norm 0.8548 (1.1692) loss_scale 32768.0000 (31844.9577) mem 8978MB [2024-07-29 15:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][80/625] eta 0:01:58 lr 0.001986 wd 0.0500 time 0.2093 (0.2174) data time 0.0008 (0.0066) model time 0.2085 (0.2115) loss 3.4152 (3.5547) grad_norm 0.7860 (1.1664) loss_scale 32768.0000 (31958.9136) mem 8978MB [2024-07-29 15:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][90/625] eta 0:01:56 lr 0.001986 wd 0.0500 time 0.2139 (0.2168) data time 0.0010 (0.0060) model time 0.2129 (0.2115) loss 3.8715 (3.5583) grad_norm 1.0947 (1.1670) loss_scale 32768.0000 (32047.8242) mem 8978MB [2024-07-29 15:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][100/625] eta 0:01:53 lr 0.001986 wd 0.0500 time 0.2174 (0.2163) data time 0.0007 (0.0055) model time 0.2167 (0.2112) loss 4.1847 (3.5399) grad_norm 1.6408 (1.1620) loss_scale 32768.0000 (32119.1287) mem 8978MB [2024-07-29 15:22:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][110/625] eta 0:01:51 lr 0.001986 wd 0.0500 time 0.2199 (0.2160) data time 0.0011 (0.0051) model time 0.2188 (0.2113) loss 3.5290 (3.5739) grad_norm 1.1592 (1.1990) loss_scale 32768.0000 (32177.5856) mem 8978MB [2024-07-29 15:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][120/625] eta 0:01:48 lr 0.001986 wd 0.0500 time 0.2101 (0.2154) data time 0.0009 (0.0048) model time 0.2092 (0.2109) loss 3.4991 (3.6047) grad_norm 1.3180 (1.1990) loss_scale 32768.0000 (32226.3802) mem 8978MB [2024-07-29 15:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][130/625] eta 0:01:46 lr 0.001986 wd 0.0500 time 0.2258 (0.2152) data time 0.0012 (0.0045) model time 0.2247 (0.2110) loss 3.2403 (3.5750) grad_norm 1.3758 (1.2040) loss_scale 32768.0000 (32267.7252) mem 8978MB [2024-07-29 15:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][140/625] eta 0:01:44 lr 0.001986 wd 0.0500 time 0.2109 (0.2149) data time 0.0009 (0.0043) model time 0.2100 (0.2109) loss 4.0475 (3.5592) grad_norm 0.7995 (1.1927) loss_scale 32768.0000 (32303.2057) mem 8978MB [2024-07-29 15:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][150/625] eta 0:01:41 lr 0.001986 wd 0.0500 time 0.2143 (0.2147) data time 0.0011 (0.0041) model time 0.2132 (0.2108) loss 3.8831 (3.5628) grad_norm 1.0771 (1.2018) loss_scale 32768.0000 (32333.9868) mem 8978MB [2024-07-29 15:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][160/625] eta 0:01:39 lr 0.001986 wd 0.0500 time 0.2041 (0.2145) data time 0.0009 (0.0039) model time 0.2032 (0.2107) loss 3.7794 (3.5721) grad_norm 2.1072 (inf) loss_scale 16384.0000 (32157.4161) mem 8978MB [2024-07-29 15:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][170/625] eta 0:01:37 lr 0.001986 wd 0.0500 time 0.2083 (0.2145) data time 0.0007 (0.0037) model time 0.2076 (0.2109) loss 2.7732 (3.5730) grad_norm 0.7839 (inf) loss_scale 16384.0000 (31234.9942) mem 8978MB [2024-07-29 15:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][180/625] eta 0:01:35 lr 0.001985 wd 0.0500 time 0.2730 (0.2146) data time 0.0007 (0.0036) model time 0.2723 (0.2113) loss 3.2903 (3.5749) grad_norm 1.1122 (inf) loss_scale 16384.0000 (30414.4972) mem 8978MB [2024-07-29 15:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][190/625] eta 0:01:33 lr 0.001985 wd 0.0500 time 0.2101 (0.2146) data time 0.0008 (0.0035) model time 0.2093 (0.2114) loss 3.6956 (3.5676) grad_norm 1.0968 (inf) loss_scale 16384.0000 (29679.9162) mem 8978MB [2024-07-29 15:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][200/625] eta 0:01:31 lr 0.001985 wd 0.0500 time 0.2110 (0.2145) data time 0.0010 (0.0035) model time 0.2100 (0.2113) loss 3.9745 (3.5874) grad_norm 1.2773 (inf) loss_scale 16384.0000 (29018.4279) mem 8978MB [2024-07-29 15:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][210/625] eta 0:01:29 lr 0.001985 wd 0.0500 time 0.2275 (0.2146) data time 0.0007 (0.0033) model time 0.2268 (0.2115) loss 2.4735 (3.5772) grad_norm 1.4465 (inf) loss_scale 16384.0000 (28419.6398) mem 8978MB [2024-07-29 15:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][220/625] eta 0:01:26 lr 0.001985 wd 0.0500 time 0.2163 (0.2145) data time 0.0008 (0.0032) model time 0.2155 (0.2115) loss 3.1570 (3.5682) grad_norm 0.7444 (inf) loss_scale 16384.0000 (27875.0407) mem 8978MB [2024-07-29 15:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][230/625] eta 0:01:24 lr 0.001985 wd 0.0500 time 0.2108 (0.2144) data time 0.0009 (0.0032) model time 0.2099 (0.2115) loss 4.1110 (3.5865) grad_norm 0.8611 (inf) loss_scale 16384.0000 (27377.5931) mem 8978MB [2024-07-29 15:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][240/625] eta 0:01:22 lr 0.001985 wd 0.0500 time 0.2077 (0.2142) data time 0.0012 (0.0031) model time 0.2065 (0.2113) loss 3.8600 (3.5849) grad_norm 1.3590 (inf) loss_scale 16384.0000 (26921.4274) mem 8978MB [2024-07-29 15:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][250/625] eta 0:01:20 lr 0.001985 wd 0.0500 time 0.2122 (0.2140) data time 0.0007 (0.0030) model time 0.2115 (0.2112) loss 3.5591 (3.5950) grad_norm 0.8065 (inf) loss_scale 16384.0000 (26501.6096) mem 8978MB [2024-07-29 15:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][260/625] eta 0:01:18 lr 0.001985 wd 0.0500 time 0.2103 (0.2139) data time 0.0011 (0.0029) model time 0.2093 (0.2111) loss 3.2807 (3.5820) grad_norm 1.1948 (inf) loss_scale 16384.0000 (26113.9617) mem 8978MB [2024-07-29 15:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][270/625] eta 0:01:15 lr 0.001985 wd 0.0500 time 0.2135 (0.2138) data time 0.0011 (0.0029) model time 0.2123 (0.2110) loss 4.0882 (3.5864) grad_norm 1.4505 (inf) loss_scale 16384.0000 (25754.9225) mem 8978MB [2024-07-29 15:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][280/625] eta 0:01:13 lr 0.001985 wd 0.0500 time 0.2399 (0.2143) data time 0.0010 (0.0028) model time 0.2389 (0.2117) loss 3.4215 (3.5816) grad_norm 0.7999 (inf) loss_scale 16384.0000 (25421.4377) mem 8978MB [2024-07-29 15:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][290/625] eta 0:01:11 lr 0.001985 wd 0.0500 time 0.3100 (0.2145) data time 0.0009 (0.0028) model time 0.3091 (0.2121) loss 3.9467 (3.5830) grad_norm 1.1548 (inf) loss_scale 16384.0000 (25110.8729) mem 8978MB [2024-07-29 15:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][300/625] eta 0:01:09 lr 0.001985 wd 0.0500 time 0.2196 (0.2145) data time 0.0008 (0.0027) model time 0.2187 (0.2121) loss 3.2932 (3.5872) grad_norm 1.1224 (inf) loss_scale 16384.0000 (24820.9435) mem 8978MB [2024-07-29 15:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][310/625] eta 0:01:07 lr 0.001985 wd 0.0500 time 0.2099 (0.2145) data time 0.0011 (0.0026) model time 0.2088 (0.2121) loss 3.5080 (3.5917) grad_norm 1.4030 (inf) loss_scale 16384.0000 (24549.6592) mem 8978MB [2024-07-29 15:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][320/625] eta 0:01:05 lr 0.001985 wd 0.0500 time 0.2054 (0.2144) data time 0.0011 (0.0026) model time 0.2043 (0.2121) loss 3.4117 (3.5962) grad_norm 1.0194 (inf) loss_scale 16384.0000 (24295.2773) mem 8978MB [2024-07-29 15:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][330/625] eta 0:01:03 lr 0.001985 wd 0.0500 time 0.2101 (0.2142) data time 0.0009 (0.0026) model time 0.2093 (0.2119) loss 4.4283 (3.5966) grad_norm 2.0720 (inf) loss_scale 16384.0000 (24056.2659) mem 8978MB [2024-07-29 15:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][340/625] eta 0:01:01 lr 0.001985 wd 0.0500 time 0.2077 (0.2142) data time 0.0009 (0.0025) model time 0.2068 (0.2120) loss 3.9662 (3.5986) grad_norm 1.0614 (inf) loss_scale 16384.0000 (23831.2727) mem 8978MB [2024-07-29 15:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][350/625] eta 0:00:58 lr 0.001985 wd 0.0500 time 0.2076 (0.2142) data time 0.0011 (0.0025) model time 0.2066 (0.2119) loss 3.2856 (3.5925) grad_norm 1.7122 (inf) loss_scale 16384.0000 (23619.0997) mem 8978MB [2024-07-29 15:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][360/625] eta 0:00:56 lr 0.001985 wd 0.0500 time 0.2199 (0.2141) data time 0.0008 (0.0024) model time 0.2191 (0.2119) loss 4.2192 (3.5951) grad_norm 3.4677 (inf) loss_scale 16384.0000 (23418.6814) mem 8978MB [2024-07-29 15:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][370/625] eta 0:00:54 lr 0.001985 wd 0.0500 time 0.2279 (0.2141) data time 0.0007 (0.0024) model time 0.2271 (0.2119) loss 4.0712 (3.6077) grad_norm 1.0833 (inf) loss_scale 16384.0000 (23229.0674) mem 8978MB [2024-07-29 15:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][380/625] eta 0:00:52 lr 0.001985 wd 0.0500 time 0.2175 (0.2140) data time 0.0010 (0.0024) model time 0.2165 (0.2118) loss 3.9412 (3.6073) grad_norm 0.7664 (inf) loss_scale 16384.0000 (23049.4068) mem 8978MB [2024-07-29 15:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][390/625] eta 0:00:50 lr 0.001985 wd 0.0500 time 0.2102 (0.2139) data time 0.0008 (0.0023) model time 0.2095 (0.2118) loss 4.4071 (3.6099) grad_norm 1.4869 (inf) loss_scale 16384.0000 (22878.9361) mem 8978MB [2024-07-29 15:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][400/625] eta 0:00:48 lr 0.001985 wd 0.0500 time 0.2105 (0.2138) data time 0.0008 (0.0023) model time 0.2098 (0.2117) loss 3.7191 (3.6141) grad_norm 0.9456 (inf) loss_scale 16384.0000 (22716.9676) mem 8978MB [2024-07-29 15:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][410/625] eta 0:00:45 lr 0.001985 wd 0.0500 time 0.2138 (0.2138) data time 0.0011 (0.0023) model time 0.2127 (0.2117) loss 3.8084 (3.6106) grad_norm 0.8182 (inf) loss_scale 16384.0000 (22562.8808) mem 8978MB [2024-07-29 15:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][420/625] eta 0:00:43 lr 0.001985 wd 0.0500 time 0.2164 (0.2137) data time 0.0008 (0.0022) model time 0.2157 (0.2116) loss 4.1505 (3.6047) grad_norm 1.2113 (inf) loss_scale 16384.0000 (22416.1140) mem 8978MB [2024-07-29 15:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][430/625] eta 0:00:41 lr 0.001985 wd 0.0500 time 0.2037 (0.2136) data time 0.0012 (0.0022) model time 0.2024 (0.2116) loss 4.2125 (3.6086) grad_norm 1.2231 (inf) loss_scale 16384.0000 (22276.1578) mem 8978MB [2024-07-29 15:23:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][440/625] eta 0:00:39 lr 0.001985 wd 0.0500 time 0.2128 (0.2135) data time 0.0011 (0.0022) model time 0.2117 (0.2115) loss 4.0004 (3.6009) grad_norm 1.3577 (inf) loss_scale 16384.0000 (22142.5488) mem 8978MB [2024-07-29 15:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][450/625] eta 0:00:37 lr 0.001985 wd 0.0500 time 0.2187 (0.2135) data time 0.0010 (0.0022) model time 0.2177 (0.2114) loss 3.6388 (3.6051) grad_norm 0.8794 (inf) loss_scale 16384.0000 (22014.8647) mem 8978MB [2024-07-29 15:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][460/625] eta 0:00:35 lr 0.001985 wd 0.0500 time 0.2089 (0.2139) data time 0.0010 (0.0021) model time 0.2079 (0.2120) loss 4.2532 (3.6090) grad_norm 1.8334 (inf) loss_scale 16384.0000 (21892.7202) mem 8978MB [2024-07-29 15:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][470/625] eta 0:00:33 lr 0.001985 wd 0.0500 time 0.2155 (0.2139) data time 0.0010 (0.0021) model time 0.2146 (0.2120) loss 3.7019 (3.6163) grad_norm 0.9819 (inf) loss_scale 16384.0000 (21775.7622) mem 8978MB [2024-07-29 15:23:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][480/625] eta 0:00:31 lr 0.001985 wd 0.0500 time 0.2176 (0.2139) data time 0.0010 (0.0021) model time 0.2166 (0.2120) loss 3.8230 (3.6205) grad_norm 0.9900 (inf) loss_scale 16384.0000 (21663.6674) mem 8978MB [2024-07-29 15:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][490/625] eta 0:00:28 lr 0.001985 wd 0.0500 time 0.2130 (0.2139) data time 0.0011 (0.0021) model time 0.2119 (0.2120) loss 4.0532 (3.6232) grad_norm 2.3822 (inf) loss_scale 16384.0000 (21556.1385) mem 8978MB [2024-07-29 15:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][500/625] eta 0:00:26 lr 0.001984 wd 0.0500 time 0.2149 (0.2138) data time 0.0009 (0.0021) model time 0.2140 (0.2119) loss 2.9558 (3.6166) grad_norm 1.7111 (inf) loss_scale 16384.0000 (21452.9022) mem 8978MB [2024-07-29 15:23:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][510/625] eta 0:00:24 lr 0.001984 wd 0.0500 time 0.2042 (0.2138) data time 0.0010 (0.0020) model time 0.2032 (0.2119) loss 4.2674 (3.6124) grad_norm 0.8738 (inf) loss_scale 16384.0000 (21353.7065) mem 8978MB [2024-07-29 15:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][520/625] eta 0:00:22 lr 0.001984 wd 0.0500 time 0.2164 (0.2138) data time 0.0011 (0.0020) model time 0.2153 (0.2119) loss 3.7507 (3.6166) grad_norm 0.7603 (inf) loss_scale 16384.0000 (21258.3186) mem 8978MB [2024-07-29 15:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][530/625] eta 0:00:20 lr 0.001984 wd 0.0500 time 0.2115 (0.2138) data time 0.0010 (0.0020) model time 0.2104 (0.2119) loss 3.3873 (3.6141) grad_norm 1.0971 (inf) loss_scale 16384.0000 (21166.5235) mem 8978MB [2024-07-29 15:23:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][540/625] eta 0:00:18 lr 0.001984 wd 0.0500 time 0.2050 (0.2137) data time 0.0009 (0.0020) model time 0.2041 (0.2119) loss 4.0033 (3.6179) grad_norm 1.0196 (inf) loss_scale 16384.0000 (21078.1220) mem 8978MB [2024-07-29 15:23:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][550/625] eta 0:00:16 lr 0.001984 wd 0.0500 time 0.2165 (0.2137) data time 0.0010 (0.0020) model time 0.2155 (0.2119) loss 4.4180 (3.6200) grad_norm 0.8166 (inf) loss_scale 16384.0000 (20992.9292) mem 8978MB [2024-07-29 15:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][560/625] eta 0:00:13 lr 0.001984 wd 0.0500 time 0.2103 (0.2137) data time 0.0008 (0.0020) model time 0.2095 (0.2119) loss 4.1638 (3.6231) grad_norm 1.0911 (inf) loss_scale 16384.0000 (20910.7736) mem 8978MB [2024-07-29 15:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][570/625] eta 0:00:11 lr 0.001984 wd 0.0500 time 0.2157 (0.2137) data time 0.0008 (0.0019) model time 0.2150 (0.2119) loss 4.5322 (3.6264) grad_norm 1.4157 (inf) loss_scale 16384.0000 (20831.4956) mem 8978MB [2024-07-29 15:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][580/625] eta 0:00:09 lr 0.001984 wd 0.0500 time 0.2122 (0.2138) data time 0.0009 (0.0019) model time 0.2113 (0.2120) loss 3.3581 (3.6242) grad_norm 2.1385 (inf) loss_scale 16384.0000 (20754.9466) mem 8978MB [2024-07-29 15:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][590/625] eta 0:00:07 lr 0.001984 wd 0.0500 time 0.2122 (0.2137) data time 0.0008 (0.0019) model time 0.2114 (0.2120) loss 4.4972 (3.6292) grad_norm 1.0997 (inf) loss_scale 16384.0000 (20680.9882) mem 8978MB [2024-07-29 15:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][600/625] eta 0:00:05 lr 0.001984 wd 0.0500 time 0.2119 (0.2141) data time 0.0010 (0.0019) model time 0.2110 (0.2124) loss 4.0220 (3.6320) grad_norm 1.4557 (inf) loss_scale 16384.0000 (20609.4908) mem 8978MB [2024-07-29 15:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][610/625] eta 0:00:03 lr 0.001984 wd 0.0500 time 0.2044 (0.2141) data time 0.0005 (0.0019) model time 0.2039 (0.2124) loss 4.2179 (3.6333) grad_norm 2.4099 (inf) loss_scale 16384.0000 (20540.3339) mem 8978MB [2024-07-29 15:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][620/625] eta 0:00:01 lr 0.001984 wd 0.0500 time 0.2108 (0.2140) data time 0.0007 (0.0019) model time 0.2100 (0.2123) loss 4.0010 (3.6328) grad_norm 1.3935 (inf) loss_scale 16384.0000 (20473.4042) mem 8978MB [2024-07-29 15:24:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 35 training takes 0:02:13 [2024-07-29 15:24:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:24:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.604 (0.604) Loss 0.9043 (0.9043) Acc@1 83.496 (83.496) Acc@5 96.875 (96.875) Mem 8978MB [2024-07-29 15:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.061 (0.112) Loss 1.5459 (1.1135) Acc@1 67.676 (77.535) Acc@5 88.330 (94.638) Mem 8978MB [2024-07-29 15:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.6973 (1.3071) Acc@1 62.793 (73.012) Acc@5 87.158 (91.785) Mem 8978MB [2024-07-29 15:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.033 Acc@5 91.843 [2024-07-29 15:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.0% [2024-07-29 15:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.856 (0.856) Loss 2.2793 (2.2793) Acc@1 52.783 (52.783) Acc@5 76.953 (76.953) Mem 8978MB [2024-07-29 15:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.142) Loss 2.8145 (2.5353) Acc@1 41.406 (46.143) Acc@5 67.773 (72.563) Mem 8978MB [2024-07-29 15:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.101) Loss 3.1426 (2.7311) Acc@1 36.475 (43.552) Acc@5 61.621 (69.217) Mem 8978MB [2024-07-29 15:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 43.370 Acc@5 69.186 [2024-07-29 15:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 43.4% [2024-07-29 15:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 43.37% [2024-07-29 15:24:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:24:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][0/625] eta 0:06:51 lr 0.001984 wd 0.0500 time 0.6583 (0.6583) data time 0.4628 (0.4628) model time 0.0000 (0.0000) loss 3.3860 (3.3860) grad_norm 1.4507 (1.4507) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:24:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:24:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:29:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:29:44 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 36) [2024-07-29 15:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][10/625] eta 0:15:15 lr 0.001984 wd 0.0500 time 0.2015 (1.4879) data time 0.0011 (0.1011) model time 0.0000 (0.0000) loss 3.9035 (4.2096) grad_norm 1.0458 (1.3455) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][20/625] eta 0:07:21 lr 0.001984 wd 0.0500 time 0.1969 (0.7302) data time 0.0010 (0.0422) model time 0.0000 (0.0000) loss 3.2813 (3.8495) grad_norm 1.0325 (1.2042) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][30/625] eta 0:05:18 lr 0.001984 wd 0.0500 time 0.1964 (0.5346) data time 0.0008 (0.0269) model time 0.0000 (0.0000) loss 4.3336 (3.8858) grad_norm 1.1365 (1.1601) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][40/625] eta 0:04:19 lr 0.001984 wd 0.0500 time 0.1966 (0.4441) data time 0.0009 (0.0199) model time 0.0000 (0.0000) loss 2.9883 (3.8380) grad_norm 1.2761 (1.1320) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][50/625] eta 0:03:45 lr 0.001984 wd 0.0500 time 0.1974 (0.3925) data time 0.0008 (0.0158) model time 0.0000 (0.0000) loss 4.1382 (3.7897) grad_norm 0.9211 (1.1532) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][60/625] eta 0:03:22 lr 0.001984 wd 0.0500 time 0.1986 (0.3585) data time 0.0008 (0.0132) model time 0.1978 (0.1983) loss 3.6512 (3.7874) grad_norm 1.4552 (1.2116) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][70/625] eta 0:03:06 lr 0.001984 wd 0.0500 time 0.1978 (0.3356) data time 0.0009 (0.0114) model time 0.1969 (0.2010) loss 3.8659 (3.7592) grad_norm 1.1907 (1.2135) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][80/625] eta 0:02:53 lr 0.001984 wd 0.0500 time 0.2002 (0.3183) data time 0.0008 (0.0101) model time 0.1994 (0.2010) loss 3.5908 (3.7070) grad_norm 0.8194 (1.1979) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][90/625] eta 0:02:42 lr 0.001984 wd 0.0500 time 0.2011 (0.3046) data time 0.0010 (0.0090) model time 0.2001 (0.2004) loss 3.6554 (3.6836) grad_norm 0.8821 (1.1750) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][100/625] eta 0:02:34 lr 0.001984 wd 0.0500 time 0.1975 (0.2938) data time 0.0010 (0.0082) model time 0.1965 (0.2001) loss 4.0126 (3.6949) grad_norm 1.2463 (1.1774) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][110/625] eta 0:02:26 lr 0.001984 wd 0.0500 time 0.2013 (0.2853) data time 0.0007 (0.0075) model time 0.2007 (0.2005) loss 3.6958 (3.7148) grad_norm 1.7721 (1.1931) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][120/625] eta 0:02:20 lr 0.001984 wd 0.0500 time 0.2050 (0.2781) data time 0.0009 (0.0069) model time 0.2042 (0.2004) loss 4.0392 (3.6989) grad_norm 1.6804 (1.2194) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:30:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:30:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:32:43 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 36) [2024-07-29 15:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][130/625] eta 0:11:03 lr 0.001984 wd 0.0500 time 0.2149 (1.3407) data time 0.0009 (0.0728) model time 0.2140 (1.2679) loss 4.1412 (4.1493) grad_norm 2.6045 (1.4459) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][140/625] eta 0:06:16 lr 0.001984 wd 0.0500 time 0.2044 (0.7756) data time 0.0008 (0.0370) model time 0.2036 (0.7385) loss 4.2463 (3.9963) grad_norm 1.1930 (1.3638) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][150/625] eta 0:04:38 lr 0.001984 wd 0.0500 time 0.2026 (0.5873) data time 0.0009 (0.0250) model time 0.2017 (0.5623) loss 3.9493 (3.9735) grad_norm 0.9285 (1.2544) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][160/625] eta 0:03:49 lr 0.001984 wd 0.0500 time 0.2142 (0.4932) data time 0.0007 (0.0190) model time 0.2135 (0.4742) loss 2.6341 (3.8730) grad_norm 0.8271 (1.2383) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][170/625] eta 0:03:18 lr 0.001984 wd 0.0500 time 0.2118 (0.4370) data time 0.0010 (0.0154) model time 0.2108 (0.4216) loss 3.0230 (3.8311) grad_norm 0.9709 (1.2636) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][180/625] eta 0:02:57 lr 0.001984 wd 0.0500 time 0.2169 (0.3995) data time 0.0008 (0.0130) model time 0.2161 (0.3865) loss 3.6955 (3.8083) grad_norm 0.8555 (1.2322) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][190/625] eta 0:02:42 lr 0.001983 wd 0.0500 time 0.2086 (0.3732) data time 0.0007 (0.0113) model time 0.2079 (0.3619) loss 2.6476 (3.7602) grad_norm 0.8130 (1.1968) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][200/625] eta 0:02:30 lr 0.001983 wd 0.0500 time 0.2074 (0.3538) data time 0.0009 (0.0102) model time 0.2065 (0.3436) loss 3.8266 (3.7355) grad_norm 1.1360 (1.1990) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][210/625] eta 0:02:20 lr 0.001983 wd 0.0500 time 0.2085 (0.3380) data time 0.0007 (0.0092) model time 0.2078 (0.3288) loss 4.4077 (3.7085) grad_norm 1.1821 (1.2316) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][220/625] eta 0:02:11 lr 0.001983 wd 0.0500 time 0.2117 (0.3250) data time 0.0011 (0.0084) model time 0.2107 (0.3166) loss 3.5737 (3.7167) grad_norm 1.6732 (1.2254) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][230/625] eta 0:02:04 lr 0.001983 wd 0.0500 time 0.2094 (0.3146) data time 0.0009 (0.0077) model time 0.2085 (0.3069) loss 3.5250 (3.7225) grad_norm 1.3469 (1.2214) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][240/625] eta 0:01:57 lr 0.001983 wd 0.0500 time 0.2079 (0.3060) data time 0.0008 (0.0072) model time 0.2071 (0.2989) loss 4.2261 (3.7192) grad_norm 1.0128 (1.2209) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][250/625] eta 0:01:51 lr 0.001983 wd 0.0500 time 0.2126 (0.2986) data time 0.0008 (0.0067) model time 0.2118 (0.2919) loss 3.8721 (3.7023) grad_norm 1.1252 (1.2190) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][260/625] eta 0:01:46 lr 0.001983 wd 0.0500 time 0.2080 (0.2925) data time 0.0010 (0.0063) model time 0.2071 (0.2862) loss 2.1410 (3.6854) grad_norm 1.6999 (1.2300) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][270/625] eta 0:01:41 lr 0.001983 wd 0.0500 time 0.2070 (0.2870) data time 0.0010 (0.0059) model time 0.2060 (0.2811) loss 3.7969 (3.6843) grad_norm 1.1945 (1.2309) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][280/625] eta 0:01:37 lr 0.001983 wd 0.0500 time 0.2053 (0.2822) data time 0.0012 (0.0056) model time 0.2041 (0.2766) loss 3.9354 (3.6780) grad_norm 1.2174 (1.2254) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][290/625] eta 0:01:33 lr 0.001983 wd 0.0500 time 0.2189 (0.2779) data time 0.0008 (0.0054) model time 0.2180 (0.2726) loss 2.8107 (3.6800) grad_norm 1.0251 (1.2483) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][300/625] eta 0:01:29 lr 0.001983 wd 0.0500 time 0.2152 (0.2741) data time 0.0007 (0.0051) model time 0.2145 (0.2690) loss 2.7417 (3.6655) grad_norm 1.4184 (1.2761) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][310/625] eta 0:01:25 lr 0.001983 wd 0.0500 time 0.2115 (0.2707) data time 0.0007 (0.0049) model time 0.2108 (0.2658) loss 3.1784 (3.6638) grad_norm 1.1185 (1.2796) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][320/625] eta 0:01:21 lr 0.001983 wd 0.0500 time 0.2127 (0.2678) data time 0.0010 (0.0047) model time 0.2116 (0.2631) loss 3.6159 (3.6487) grad_norm 1.0731 (1.2723) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][330/625] eta 0:01:18 lr 0.001983 wd 0.0500 time 0.2047 (0.2651) data time 0.0007 (0.0045) model time 0.2039 (0.2606) loss 3.4378 (3.6424) grad_norm 0.9415 (1.2572) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][340/625] eta 0:01:14 lr 0.001983 wd 0.0500 time 0.2200 (0.2629) data time 0.0010 (0.0044) model time 0.2191 (0.2585) loss 3.6836 (3.6351) grad_norm 1.2660 (1.2459) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][350/625] eta 0:01:11 lr 0.001983 wd 0.0500 time 0.2071 (0.2606) data time 0.0010 (0.0043) model time 0.2062 (0.2564) loss 4.1939 (3.6366) grad_norm 0.9964 (1.2441) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][360/625] eta 0:01:08 lr 0.001983 wd 0.0500 time 0.2071 (0.2589) data time 0.0011 (0.0041) model time 0.2060 (0.2547) loss 3.9986 (3.6270) grad_norm 1.0855 (1.2369) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][370/625] eta 0:01:05 lr 0.001983 wd 0.0500 time 0.2075 (0.2570) data time 0.0009 (0.0040) model time 0.2066 (0.2530) loss 2.7802 (3.6173) grad_norm 1.7483 (1.2428) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][380/625] eta 0:01:02 lr 0.001983 wd 0.0500 time 0.2199 (0.2552) data time 0.0009 (0.0039) model time 0.2190 (0.2513) loss 2.8682 (3.6080) grad_norm 1.3269 (1.2466) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][390/625] eta 0:00:59 lr 0.001983 wd 0.0500 time 0.2103 (0.2536) data time 0.0007 (0.0038) model time 0.2096 (0.2498) loss 4.8285 (3.6020) grad_norm 1.0317 (1.2374) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][400/625] eta 0:00:56 lr 0.001983 wd 0.0500 time 0.2054 (0.2521) data time 0.0009 (0.0037) model time 0.2045 (0.2484) loss 4.1843 (3.6131) grad_norm 1.2026 (1.2322) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][410/625] eta 0:00:53 lr 0.001983 wd 0.0500 time 0.2091 (0.2508) data time 0.0010 (0.0036) model time 0.2081 (0.2472) loss 2.8657 (3.6078) grad_norm 1.0172 (1.2244) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][420/625] eta 0:00:51 lr 0.001983 wd 0.0500 time 0.2091 (0.2494) data time 0.0008 (0.0035) model time 0.2083 (0.2459) loss 3.2253 (3.5950) grad_norm 1.4745 (1.2182) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][430/625] eta 0:00:48 lr 0.001983 wd 0.0500 time 0.2030 (0.2482) data time 0.0010 (0.0034) model time 0.2020 (0.2447) loss 3.5655 (3.5927) grad_norm 1.6444 (1.2297) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][440/625] eta 0:00:45 lr 0.001983 wd 0.0500 time 0.2073 (0.2470) data time 0.0010 (0.0034) model time 0.2063 (0.2437) loss 3.5131 (3.6042) grad_norm 0.8208 (1.2284) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][450/625] eta 0:00:43 lr 0.001983 wd 0.0500 time 0.2086 (0.2459) data time 0.0008 (0.0033) model time 0.2079 (0.2426) loss 3.4938 (3.6097) grad_norm 0.9101 (1.2247) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][460/625] eta 0:00:40 lr 0.001983 wd 0.0500 time 0.2094 (0.2449) data time 0.0010 (0.0032) model time 0.2084 (0.2417) loss 3.5764 (3.6087) grad_norm 1.4612 (1.2222) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][470/625] eta 0:00:37 lr 0.001983 wd 0.0500 time 0.2115 (0.2440) data time 0.0011 (0.0032) model time 0.2104 (0.2409) loss 3.0318 (3.6135) grad_norm 1.2376 (1.2296) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][480/625] eta 0:00:35 lr 0.001983 wd 0.0500 time 0.2149 (0.2432) data time 0.0008 (0.0031) model time 0.2141 (0.2401) loss 4.3624 (3.6174) grad_norm 1.1746 (1.2303) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][490/625] eta 0:00:32 lr 0.001982 wd 0.0500 time 0.2112 (0.2424) data time 0.0011 (0.0031) model time 0.2101 (0.2393) loss 4.0667 (3.6143) grad_norm 0.9385 (1.2256) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][500/625] eta 0:00:30 lr 0.001982 wd 0.0500 time 0.2104 (0.2420) data time 0.0007 (0.0030) model time 0.2097 (0.2390) loss 3.6126 (3.6112) grad_norm 1.5356 (1.2283) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][510/625] eta 0:00:27 lr 0.001982 wd 0.0500 time 0.2146 (0.2417) data time 0.0010 (0.0030) model time 0.2136 (0.2388) loss 2.6852 (3.6036) grad_norm 0.8908 (1.2312) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][520/625] eta 0:00:25 lr 0.001982 wd 0.0500 time 0.2064 (0.2409) data time 0.0010 (0.0029) model time 0.2054 (0.2380) loss 4.1517 (3.6099) grad_norm 1.1341 (1.2284) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][530/625] eta 0:00:22 lr 0.001982 wd 0.0500 time 0.2120 (0.2402) data time 0.0007 (0.0029) model time 0.2114 (0.2374) loss 3.7198 (3.6165) grad_norm 1.2507 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][540/625] eta 0:00:20 lr 0.001982 wd 0.0500 time 0.2060 (0.2402) data time 0.0007 (0.0028) model time 0.2053 (0.2374) loss 3.8078 (3.6143) grad_norm 1.2009 (1.2306) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][550/625] eta 0:00:17 lr 0.001982 wd 0.0500 time 0.2252 (0.2396) data time 0.0010 (0.0028) model time 0.2242 (0.2368) loss 3.4165 (3.6217) grad_norm 1.0920 (1.2269) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][560/625] eta 0:00:15 lr 0.001982 wd 0.0500 time 0.2154 (0.2389) data time 0.0010 (0.0027) model time 0.2144 (0.2362) loss 3.2143 (3.6246) grad_norm 1.6424 (1.2253) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][570/625] eta 0:00:13 lr 0.001982 wd 0.0500 time 0.2169 (0.2383) data time 0.0008 (0.0027) model time 0.2161 (0.2356) loss 3.6841 (3.6250) grad_norm 1.1676 (1.2210) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][580/625] eta 0:00:10 lr 0.001982 wd 0.0500 time 0.2095 (0.2377) data time 0.0007 (0.0027) model time 0.2088 (0.2351) loss 3.7355 (3.6150) grad_norm 1.0607 (1.2190) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][590/625] eta 0:00:08 lr 0.001982 wd 0.0500 time 0.2109 (0.2371) data time 0.0010 (0.0026) model time 0.2099 (0.2345) loss 3.0853 (3.6080) grad_norm 1.0832 (1.2165) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][600/625] eta 0:00:05 lr 0.001982 wd 0.0500 time 0.2044 (0.2365) data time 0.0010 (0.0026) model time 0.2035 (0.2339) loss 2.8942 (3.6050) grad_norm 1.2288 (1.2170) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:34:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:34:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:37:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 36) [2024-07-29 15:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][610/625] eta 0:00:48 lr 0.001982 wd 0.0500 time 0.2029 (3.2347) data time 0.0005 (0.4189) model time 0.2024 (2.8157) loss 3.6516 (4.2788) grad_norm 1.1759 (1.2136) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][620/625] eta 0:00:04 lr 0.001982 wd 0.0500 time 0.1924 (0.8978) data time 0.0007 (0.0972) model time 0.1917 (0.8006) loss 3.7384 (3.9121) grad_norm 1.1670 (1.1755) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 15:37:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 36 training takes 0:00:12 [2024-07-29 15:37:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:37:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.8354 (0.8354) Acc@1 82.959 (82.959) Acc@5 96.973 (96.973) Mem 8977MB [2024-07-29 15:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.103) Loss 1.4863 (1.0325) Acc@1 69.434 (78.622) Acc@5 89.697 (95.131) Mem 8977MB [2024-07-29 15:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.079) Loss 1.5420 (1.2470) Acc@1 66.943 (74.130) Acc@5 88.818 (92.380) Mem 8977MB [2024-07-29 15:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.854 Acc@5 92.302 [2024-07-29 15:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.9% [2024-07-29 15:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.85% [2024-07-29 15:37:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 15:37:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 15:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.616 (0.616) Loss 2.0332 (2.0332) Acc@1 57.715 (57.715) Acc@5 80.957 (80.957) Mem 8977MB [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.115) Loss 2.5840 (2.2972) Acc@1 44.580 (50.102) Acc@5 70.898 (76.017) Mem 8977MB [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.085) Loss 2.9121 (2.4980) Acc@1 39.453 (47.270) Acc@5 65.967 (72.768) Mem 8977MB [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 47.063 Acc@5 72.753 [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 47.1% [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 47.06% [2024-07-29 15:37:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:37:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][0/625] eta 0:14:08 lr 0.001982 wd 0.0500 time 1.3578 (1.3578) data time 0.8003 (0.8003) model time 0.0000 (0.0000) loss 4.0939 (4.0939) grad_norm 0.9247 (0.9247) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 15:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][10/625] eta 0:03:08 lr 0.001982 wd 0.0500 time 0.2029 (0.3058) data time 0.0009 (0.0737) model time 0.0000 (0.0000) loss 3.7235 (3.9305) grad_norm 0.7348 (0.9924) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][20/625] eta 0:02:34 lr 0.001982 wd 0.0500 time 0.1994 (0.2554) data time 0.0012 (0.0391) model time 0.0000 (0.0000) loss 3.7453 (3.7934) grad_norm 1.7670 (1.0611) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][30/625] eta 0:02:21 lr 0.001982 wd 0.0500 time 0.1946 (0.2374) data time 0.0010 (0.0268) model time 0.0000 (0.0000) loss 3.9049 (3.7721) grad_norm 1.3449 (1.0863) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][40/625] eta 0:02:14 lr 0.001982 wd 0.0500 time 0.2022 (0.2293) data time 0.0007 (0.0205) model time 0.0000 (0.0000) loss 3.0728 (3.7505) grad_norm 1.1974 (1.1550) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][50/625] eta 0:02:08 lr 0.001982 wd 0.0500 time 0.2125 (0.2241) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 2.4004 (3.7148) grad_norm 1.3609 (1.1508) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][60/625] eta 0:02:04 lr 0.001982 wd 0.0500 time 0.2061 (0.2208) data time 0.0009 (0.0141) model time 0.2052 (0.2030) loss 3.0275 (3.6943) grad_norm 1.1399 (1.1728) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][70/625] eta 0:02:01 lr 0.001982 wd 0.0500 time 0.2092 (0.2183) data time 0.0009 (0.0123) model time 0.2083 (0.2026) loss 3.6524 (3.6526) grad_norm 1.0446 (1.1491) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][80/625] eta 0:01:57 lr 0.001982 wd 0.0500 time 0.2070 (0.2160) data time 0.0009 (0.0109) model time 0.2060 (0.2014) loss 4.0107 (3.6700) grad_norm 1.2618 (1.1897) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][90/625] eta 0:01:54 lr 0.001982 wd 0.0500 time 0.2114 (0.2146) data time 0.0007 (0.0098) model time 0.2108 (0.2015) loss 3.0966 (3.6867) grad_norm 1.1178 (1.1742) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][100/625] eta 0:01:51 lr 0.001982 wd 0.0500 time 0.2021 (0.2132) data time 0.0014 (0.0090) model time 0.2007 (0.2011) loss 3.3305 (3.6798) grad_norm 0.9337 (1.1952) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][110/625] eta 0:01:49 lr 0.001982 wd 0.0500 time 0.2139 (0.2124) data time 0.0009 (0.0082) model time 0.2130 (0.2014) loss 3.8287 (3.6658) grad_norm 1.0826 (1.2271) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][120/625] eta 0:01:46 lr 0.001982 wd 0.0500 time 0.2089 (0.2115) data time 0.0009 (0.0077) model time 0.2080 (0.2012) loss 3.6759 (3.6618) grad_norm 1.0513 (1.2202) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][130/625] eta 0:01:44 lr 0.001982 wd 0.0500 time 0.2009 (0.2106) data time 0.0007 (0.0072) model time 0.2002 (0.2009) loss 3.9455 (3.6618) grad_norm 1.1801 (1.2322) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][140/625] eta 0:01:41 lr 0.001982 wd 0.0500 time 0.2003 (0.2100) data time 0.0007 (0.0067) model time 0.1996 (0.2009) loss 3.0529 (3.6522) grad_norm 1.8509 (1.2371) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][150/625] eta 0:01:39 lr 0.001982 wd 0.0500 time 0.1962 (0.2095) data time 0.0011 (0.0063) model time 0.1951 (0.2010) loss 4.1997 (3.6577) grad_norm 1.0972 (1.2256) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][160/625] eta 0:01:37 lr 0.001982 wd 0.0500 time 0.2008 (0.2090) data time 0.0007 (0.0060) model time 0.2001 (0.2010) loss 3.1993 (3.6450) grad_norm 0.7802 (1.2139) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][170/625] eta 0:01:34 lr 0.001981 wd 0.0500 time 0.1970 (0.2086) data time 0.0008 (0.0057) model time 0.1962 (0.2010) loss 3.5036 (3.6404) grad_norm 1.4750 (1.2125) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][180/625] eta 0:01:32 lr 0.001981 wd 0.0500 time 0.2124 (0.2082) data time 0.0014 (0.0055) model time 0.2110 (0.2010) loss 3.0126 (3.6290) grad_norm 1.4752 (1.2235) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][190/625] eta 0:01:30 lr 0.001981 wd 0.0500 time 0.2038 (0.2078) data time 0.0008 (0.0052) model time 0.2029 (0.2008) loss 3.7486 (3.6178) grad_norm 0.9392 (1.2260) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][200/625] eta 0:01:28 lr 0.001981 wd 0.0500 time 0.1939 (0.2075) data time 0.0010 (0.0050) model time 0.1929 (0.2008) loss 3.1256 (3.6077) grad_norm 1.3290 (1.2407) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][210/625] eta 0:01:26 lr 0.001981 wd 0.0500 time 0.2112 (0.2073) data time 0.0011 (0.0048) model time 0.2101 (0.2009) loss 3.9024 (3.6107) grad_norm 1.7940 (1.2559) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][220/625] eta 0:01:23 lr 0.001981 wd 0.0500 time 0.1965 (0.2072) data time 0.0008 (0.0046) model time 0.1957 (0.2012) loss 4.0846 (3.6117) grad_norm 1.0048 (1.2623) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][230/625] eta 0:01:21 lr 0.001981 wd 0.0500 time 0.2053 (0.2071) data time 0.0009 (0.0045) model time 0.2043 (0.2013) loss 2.7006 (3.6003) grad_norm 0.7657 (1.2607) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][240/625] eta 0:01:19 lr 0.001981 wd 0.0500 time 0.1974 (0.2070) data time 0.0010 (0.0043) model time 0.1964 (0.2014) loss 3.8631 (3.5896) grad_norm 0.9701 (1.2526) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][250/625] eta 0:01:17 lr 0.001981 wd 0.0500 time 0.2007 (0.2067) data time 0.0007 (0.0042) model time 0.2000 (0.2013) loss 3.8442 (3.5845) grad_norm 0.7375 (1.2561) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][260/625] eta 0:01:15 lr 0.001981 wd 0.0500 time 0.1999 (0.2066) data time 0.0010 (0.0041) model time 0.1989 (0.2014) loss 2.5302 (3.5815) grad_norm 1.5824 (1.2687) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][270/625] eta 0:01:13 lr 0.001981 wd 0.0500 time 0.1921 (0.2074) data time 0.0009 (0.0040) model time 0.1912 (0.2025) loss 4.5908 (3.5861) grad_norm 0.9343 (1.2730) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][280/625] eta 0:01:11 lr 0.001981 wd 0.0500 time 0.2085 (0.2072) data time 0.0009 (0.0039) model time 0.2076 (0.2025) loss 3.2928 (3.5707) grad_norm 1.0036 (1.2724) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][290/625] eta 0:01:09 lr 0.001981 wd 0.0500 time 0.2021 (0.2071) data time 0.0008 (0.0038) model time 0.2014 (0.2025) loss 2.9909 (3.5704) grad_norm 0.8626 (1.2752) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][300/625] eta 0:01:07 lr 0.001981 wd 0.0500 time 0.1966 (0.2071) data time 0.0013 (0.0037) model time 0.1953 (0.2026) loss 3.2447 (3.5793) grad_norm 1.2895 (1.2756) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][310/625] eta 0:01:05 lr 0.001981 wd 0.0500 time 0.1982 (0.2070) data time 0.0008 (0.0036) model time 0.1974 (0.2026) loss 3.7617 (3.5891) grad_norm 1.1339 (1.2727) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][320/625] eta 0:01:03 lr 0.001981 wd 0.0500 time 0.1992 (0.2068) data time 0.0012 (0.0035) model time 0.1980 (0.2025) loss 4.1304 (3.5882) grad_norm 1.9756 (1.2790) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][330/625] eta 0:01:00 lr 0.001981 wd 0.0500 time 0.2039 (0.2067) data time 0.0009 (0.0035) model time 0.2031 (0.2025) loss 3.2638 (3.5846) grad_norm 0.8203 (1.2768) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][340/625] eta 0:00:58 lr 0.001981 wd 0.0500 time 0.1974 (0.2065) data time 0.0013 (0.0034) model time 0.1961 (0.2023) loss 3.5767 (3.5884) grad_norm 1.1039 (1.2700) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][350/625] eta 0:00:56 lr 0.001981 wd 0.0500 time 0.2043 (0.2065) data time 0.0007 (0.0033) model time 0.2036 (0.2024) loss 2.6180 (3.5841) grad_norm 1.8858 (1.2657) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][360/625] eta 0:00:54 lr 0.001981 wd 0.0500 time 0.1970 (0.2065) data time 0.0010 (0.0033) model time 0.1960 (0.2025) loss 3.7090 (3.5836) grad_norm 1.4222 (1.2660) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][370/625] eta 0:00:52 lr 0.001981 wd 0.0500 time 0.1940 (0.2076) data time 0.0007 (0.0032) model time 0.1933 (0.2039) loss 3.4136 (3.5783) grad_norm 0.8662 (1.2618) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][380/625] eta 0:00:50 lr 0.001981 wd 0.0500 time 0.2060 (0.2075) data time 0.0009 (0.0031) model time 0.2051 (0.2038) loss 3.6270 (3.5796) grad_norm 1.4578 (1.2600) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][390/625] eta 0:00:48 lr 0.001981 wd 0.0500 time 0.1977 (0.2075) data time 0.0013 (0.0031) model time 0.1964 (0.2039) loss 3.8025 (3.5870) grad_norm 0.8594 (1.2575) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][400/625] eta 0:00:46 lr 0.001981 wd 0.0500 time 0.2039 (0.2084) data time 0.0009 (0.0030) model time 0.2031 (0.2051) loss 4.3296 (3.5855) grad_norm 1.1001 (1.2537) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][410/625] eta 0:00:44 lr 0.001981 wd 0.0500 time 0.2079 (0.2083) data time 0.0007 (0.0030) model time 0.2072 (0.2050) loss 4.2046 (3.5893) grad_norm 1.4722 (1.2549) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][420/625] eta 0:00:42 lr 0.001981 wd 0.0500 time 0.2083 (0.2081) data time 0.0010 (0.0029) model time 0.2073 (0.2048) loss 3.8958 (3.5957) grad_norm 1.1036 (1.2500) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][430/625] eta 0:00:40 lr 0.001981 wd 0.0500 time 0.2015 (0.2080) data time 0.0012 (0.0029) model time 0.2004 (0.2048) loss 3.2644 (3.5950) grad_norm 1.1002 (1.2465) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][440/625] eta 0:00:38 lr 0.001981 wd 0.0500 time 0.1973 (0.2079) data time 0.0009 (0.0029) model time 0.1964 (0.2047) loss 3.1969 (3.5895) grad_norm 1.2627 (1.2451) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][450/625] eta 0:00:36 lr 0.001980 wd 0.0500 time 0.2069 (0.2078) data time 0.0009 (0.0028) model time 0.2060 (0.2047) loss 3.2234 (3.5829) grad_norm 1.0007 (1.2422) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][460/625] eta 0:00:34 lr 0.001980 wd 0.0500 time 0.2059 (0.2078) data time 0.0012 (0.0028) model time 0.2048 (0.2047) loss 2.5621 (3.5772) grad_norm 1.1208 (1.2432) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][470/625] eta 0:00:32 lr 0.001980 wd 0.0500 time 0.2049 (0.2076) data time 0.0009 (0.0027) model time 0.2040 (0.2046) loss 3.8496 (3.5842) grad_norm 1.9811 (1.2417) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][480/625] eta 0:00:30 lr 0.001980 wd 0.0500 time 0.2087 (0.2076) data time 0.0011 (0.0027) model time 0.2075 (0.2046) loss 4.0232 (3.5810) grad_norm 0.9040 (1.2385) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][490/625] eta 0:00:28 lr 0.001980 wd 0.0500 time 0.1948 (0.2075) data time 0.0008 (0.0027) model time 0.1940 (0.2046) loss 4.8774 (3.5843) grad_norm 1.7599 (1.2498) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][500/625] eta 0:00:25 lr 0.001980 wd 0.0500 time 0.2016 (0.2075) data time 0.0010 (0.0026) model time 0.2007 (0.2045) loss 3.9905 (3.5906) grad_norm 1.0380 (1.2469) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][510/625] eta 0:00:23 lr 0.001980 wd 0.0500 time 0.2039 (0.2075) data time 0.0008 (0.0026) model time 0.2031 (0.2046) loss 2.4495 (3.5833) grad_norm 1.0546 (1.2529) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][520/625] eta 0:00:21 lr 0.001980 wd 0.0500 time 0.1997 (0.2078) data time 0.0009 (0.0026) model time 0.1988 (0.2049) loss 4.3810 (3.5858) grad_norm 1.1994 (1.2486) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][530/625] eta 0:00:19 lr 0.001980 wd 0.0500 time 0.1943 (0.2077) data time 0.0009 (0.0026) model time 0.1933 (0.2049) loss 3.7988 (3.5882) grad_norm 1.0419 (1.2443) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][540/625] eta 0:00:17 lr 0.001980 wd 0.0500 time 0.2000 (0.2076) data time 0.0010 (0.0026) model time 0.1989 (0.2048) loss 3.9268 (3.5921) grad_norm 1.1254 (1.2412) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][550/625] eta 0:00:15 lr 0.001980 wd 0.0500 time 0.1973 (0.2076) data time 0.0008 (0.0025) model time 0.1965 (0.2049) loss 4.1107 (3.5975) grad_norm 1.1336 (1.2413) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][560/625] eta 0:00:13 lr 0.001980 wd 0.0500 time 0.2019 (0.2076) data time 0.0009 (0.0025) model time 0.2010 (0.2049) loss 3.9321 (3.5991) grad_norm 1.2949 (1.2410) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][570/625] eta 0:00:11 lr 0.001980 wd 0.0500 time 0.1958 (0.2075) data time 0.0010 (0.0025) model time 0.1948 (0.2048) loss 3.8135 (3.6016) grad_norm 1.0726 (1.2381) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][580/625] eta 0:00:09 lr 0.001980 wd 0.0500 time 0.2089 (0.2075) data time 0.0018 (0.0025) model time 0.2071 (0.2048) loss 3.3466 (3.6024) grad_norm 1.6036 (1.2463) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][590/625] eta 0:00:07 lr 0.001980 wd 0.0500 time 0.2103 (0.2074) data time 0.0010 (0.0024) model time 0.2093 (0.2047) loss 3.3192 (3.6002) grad_norm 1.1178 (1.2486) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][600/625] eta 0:00:05 lr 0.001980 wd 0.0500 time 0.1969 (0.2073) data time 0.0010 (0.0024) model time 0.1960 (0.2047) loss 3.0761 (3.6016) grad_norm 0.9625 (1.2510) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][610/625] eta 0:00:03 lr 0.001980 wd 0.0500 time 0.1945 (0.2072) data time 0.0005 (0.0024) model time 0.1940 (0.2046) loss 3.4291 (3.6042) grad_norm 0.9232 (1.2559) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][620/625] eta 0:00:01 lr 0.001980 wd 0.0500 time 0.2354 (0.2072) data time 0.0007 (0.0024) model time 0.2347 (0.2046) loss 4.0485 (3.6077) grad_norm 1.0605 (1.2523) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:39:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 37 training takes 0:02:09 [2024-07-29 15:39:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:39:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.592 (0.592) Loss 0.8442 (0.8442) Acc@1 86.035 (86.035) Acc@5 96.973 (96.973) Mem 8975MB [2024-07-29 15:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.108) Loss 1.3916 (1.0696) Acc@1 69.385 (78.689) Acc@5 89.990 (94.953) Mem 8975MB [2024-07-29 15:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.083) Loss 1.5625 (1.2542) Acc@1 65.332 (74.007) Acc@5 87.500 (92.320) Mem 8975MB [2024-07-29 15:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.698 Acc@5 92.159 [2024-07-29 15:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.7% [2024-07-29 15:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.885 (0.885) Loss 1.8301 (1.8301) Acc@1 60.205 (60.205) Acc@5 83.057 (83.057) Mem 8975MB [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.151) Loss 2.3828 (2.0985) Acc@1 48.145 (53.347) Acc@5 74.365 (78.986) Mem 8975MB [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.105) Loss 2.7148 (2.3005) Acc@1 42.139 (50.353) Acc@5 69.287 (75.739) Mem 8975MB [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 50.124 Acc@5 75.702 [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 50.1% [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 50.12% [2024-07-29 15:40:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:40:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:40:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][0/625] eta 0:08:15 lr 0.001980 wd 0.0500 time 0.7923 (0.7923) data time 0.5634 (0.5634) model time 0.0000 (0.0000) loss 4.2311 (4.2311) grad_norm 0.9654 (0.9654) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][10/625] eta 0:02:38 lr 0.001980 wd 0.0500 time 0.2036 (0.2580) data time 0.0007 (0.0521) model time 0.0000 (0.0000) loss 4.0891 (3.5719) grad_norm 0.9234 (1.1923) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][20/625] eta 0:02:19 lr 0.001980 wd 0.0500 time 0.2027 (0.2303) data time 0.0009 (0.0277) model time 0.0000 (0.0000) loss 3.3295 (3.4230) grad_norm 1.5540 (1.2040) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][30/625] eta 0:02:11 lr 0.001980 wd 0.0500 time 0.2059 (0.2215) data time 0.0007 (0.0191) model time 0.0000 (0.0000) loss 4.2960 (3.5707) grad_norm 2.5148 (1.3921) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][40/625] eta 0:02:07 lr 0.001980 wd 0.0500 time 0.2123 (0.2179) data time 0.0009 (0.0147) model time 0.0000 (0.0000) loss 3.7550 (3.5856) grad_norm 1.2620 (1.4272) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][50/625] eta 0:02:03 lr 0.001980 wd 0.0500 time 0.2190 (0.2151) data time 0.0009 (0.0120) model time 0.0000 (0.0000) loss 3.9823 (3.5601) grad_norm 1.3184 (1.4322) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:40:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:40:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:40:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:43:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:44:07 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 38) [2024-07-29 15:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][60/625] eta 0:14:12 lr 0.001980 wd 0.0500 time 0.1990 (1.5095) data time 0.0011 (0.1653) model time 0.1979 (1.3442) loss 3.8754 (4.0006) grad_norm 1.0600 (1.0221) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][70/625] eta 0:06:52 lr 0.001980 wd 0.0500 time 0.2004 (0.7424) data time 0.0010 (0.0701) model time 0.1994 (0.6723) loss 3.3385 (3.8214) grad_norm 1.9074 (1.0938) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][80/625] eta 0:04:55 lr 0.001980 wd 0.0500 time 0.2060 (0.5427) data time 0.0007 (0.0445) model time 0.2053 (0.4982) loss 4.3603 (3.8659) grad_norm 0.9234 (1.0797) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][90/625] eta 0:04:00 lr 0.001980 wd 0.0500 time 0.1931 (0.4504) data time 0.0011 (0.0328) model time 0.1920 (0.4177) loss 3.5309 (3.8418) grad_norm 1.0093 (1.0783) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][100/625] eta 0:03:28 lr 0.001980 wd 0.0500 time 0.2054 (0.3975) data time 0.0007 (0.0260) model time 0.2047 (0.3715) loss 4.0032 (3.8174) grad_norm 1.2760 (1.0904) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][110/625] eta 0:03:07 lr 0.001979 wd 0.0500 time 0.2039 (0.3635) data time 0.0008 (0.0216) model time 0.2031 (0.3419) loss 3.6865 (3.7863) grad_norm 0.8424 (1.1027) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:44:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:44:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:44:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:46:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:46:25 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 38) [2024-07-29 15:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:46:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][120/625] eta 0:07:41 lr 0.001979 wd 0.0500 time 0.2042 (0.9145) data time 0.0009 (0.0930) model time 0.2033 (0.8215) loss 4.1729 (4.1867) grad_norm 0.9258 (1.1433) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][130/625] eta 0:04:35 lr 0.001979 wd 0.0500 time 0.1978 (0.5574) data time 0.0007 (0.0469) model time 0.1971 (0.5105) loss 4.2691 (3.9714) grad_norm 0.9156 (1.3828) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:46:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][140/625] eta 0:03:32 lr 0.001979 wd 0.0500 time 0.1964 (0.4382) data time 0.0008 (0.0316) model time 0.1956 (0.4065) loss 4.1181 (3.9489) grad_norm 1.8198 (1.4448) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][150/625] eta 0:02:59 lr 0.001979 wd 0.0500 time 0.1967 (0.3787) data time 0.0007 (0.0239) model time 0.1960 (0.3548) loss 3.0897 (3.8468) grad_norm 1.2332 (1.3954) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][160/625] eta 0:02:39 lr 0.001979 wd 0.0500 time 0.1985 (0.3431) data time 0.0007 (0.0193) model time 0.1978 (0.3238) loss 4.0533 (3.8332) grad_norm 1.0076 (1.3948) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][170/625] eta 0:02:25 lr 0.001979 wd 0.0500 time 0.1976 (0.3194) data time 0.0008 (0.0163) model time 0.1968 (0.3031) loss 3.8406 (3.7948) grad_norm 1.3795 (1.3612) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][180/625] eta 0:02:14 lr 0.001979 wd 0.0500 time 0.2003 (0.3025) data time 0.0006 (0.0141) model time 0.1997 (0.2885) loss 2.7077 (3.7509) grad_norm 0.9913 (1.3552) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][190/625] eta 0:02:06 lr 0.001979 wd 0.0500 time 0.1978 (0.2900) data time 0.0009 (0.0124) model time 0.1969 (0.2775) loss 3.9149 (3.7326) grad_norm 1.2138 (1.3974) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][200/625] eta 0:01:58 lr 0.001979 wd 0.0500 time 0.2009 (0.2799) data time 0.0007 (0.0112) model time 0.2002 (0.2687) loss 4.3858 (3.7025) grad_norm 0.8387 (1.3760) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][210/625] eta 0:01:52 lr 0.001979 wd 0.0500 time 0.2009 (0.2719) data time 0.0008 (0.0101) model time 0.2000 (0.2618) loss 4.1065 (3.7055) grad_norm 1.2259 (1.3586) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][220/625] eta 0:01:47 lr 0.001979 wd 0.0500 time 0.1966 (0.2658) data time 0.0009 (0.0093) model time 0.1957 (0.2565) loss 3.7374 (3.7189) grad_norm 1.3352 (1.3443) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][230/625] eta 0:01:42 lr 0.001979 wd 0.0500 time 0.1988 (0.2603) data time 0.0007 (0.0086) model time 0.1982 (0.2517) loss 4.1072 (3.7260) grad_norm 1.0379 (1.3249) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:47:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:47:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:49:23 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 38) [2024-07-29 15:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][240/625] eta 0:21:01 lr 0.001979 wd 0.0500 time 0.2026 (3.2760) data time 0.0008 (0.3138) model time 0.2018 (2.9621) loss 3.1761 (3.9766) grad_norm 0.9319 (1.5056) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][250/625] eta 0:05:41 lr 0.001979 wd 0.0500 time 0.2139 (0.9114) data time 0.0009 (0.0732) model time 0.2130 (0.8382) loss 3.8380 (3.9106) grad_norm 1.1216 (1.4587) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][260/625] eta 0:03:42 lr 0.001979 wd 0.0500 time 0.2069 (0.6102) data time 0.0010 (0.0418) model time 0.2059 (0.5684) loss 4.0229 (3.8417) grad_norm 1.0923 (1.3830) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][270/625] eta 0:02:54 lr 0.001979 wd 0.0500 time 0.2054 (0.4908) data time 0.0011 (0.0295) model time 0.2042 (0.4614) loss 4.3367 (3.8313) grad_norm 0.9695 (1.3151) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][280/625] eta 0:02:26 lr 0.001979 wd 0.0500 time 0.1940 (0.4233) data time 0.0009 (0.0229) model time 0.1930 (0.4005) loss 3.4844 (3.7677) grad_norm 0.9141 (1.2591) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 15:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][290/625] eta 0:02:08 lr 0.001979 wd 0.0500 time 0.2506 (0.3845) data time 0.0009 (0.0187) model time 0.2497 (0.3658) loss 3.9748 (3.7423) grad_norm 2.0350 (1.2484) loss_scale 32768.0000 (18547.9245) mem 8976MB [2024-07-29 15:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][300/625] eta 0:01:55 lr 0.001979 wd 0.0500 time 0.1996 (0.3556) data time 0.0010 (0.0159) model time 0.1985 (0.3397) loss 3.1615 (3.7190) grad_norm 1.1239 (1.2905) loss_scale 32768.0000 (20805.0794) mem 8976MB [2024-07-29 15:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][310/625] eta 0:01:45 lr 0.001979 wd 0.0500 time 0.1992 (0.3348) data time 0.0009 (0.0139) model time 0.1984 (0.3209) loss 4.2691 (3.6935) grad_norm 1.5531 (1.3268) loss_scale 32768.0000 (22443.8356) mem 8976MB [2024-07-29 15:50:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][320/625] eta 0:01:38 lr 0.001979 wd 0.0500 time 0.1997 (0.3214) data time 0.0009 (0.0123) model time 0.1988 (0.3091) loss 2.6672 (3.6805) grad_norm 1.5328 (1.3326) loss_scale 32768.0000 (23687.7108) mem 8976MB [2024-07-29 15:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][330/625] eta 0:01:31 lr 0.001979 wd 0.0500 time 0.2214 (0.3088) data time 0.0009 (0.0111) model time 0.2206 (0.2977) loss 4.2522 (3.6726) grad_norm 1.0051 (1.3255) loss_scale 32768.0000 (24664.0860) mem 8976MB [2024-07-29 15:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][340/625] eta 0:01:25 lr 0.001979 wd 0.0500 time 0.2019 (0.2986) data time 0.0008 (0.0101) model time 0.2012 (0.2885) loss 4.5977 (3.7037) grad_norm 1.0930 (1.3125) loss_scale 32768.0000 (25450.8738) mem 8976MB [2024-07-29 15:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][350/625] eta 0:01:19 lr 0.001979 wd 0.0500 time 0.1986 (0.2908) data time 0.0009 (0.0093) model time 0.1977 (0.2815) loss 3.3696 (3.6875) grad_norm 1.5784 (inf) loss_scale 16384.0000 (25808.4248) mem 8976MB [2024-07-29 15:50:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 15:50:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:50:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 15:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 15:57:03 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 15:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 15:57:20 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 15:57:21 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 15:57:21 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 15:57:21 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 38) [2024-07-29 15:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 15:57:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][360/625] eta 0:13:20 lr 0.001979 wd 0.0500 time 0.2072 (3.0204) data time 0.0007 (0.3589) model time 0.2066 (2.6615) loss 3.4479 (4.0501) grad_norm 0.8914 (0.8913) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][370/625] eta 0:03:39 lr 0.001979 wd 0.0500 time 0.2098 (0.8604) data time 0.0010 (0.0837) model time 0.2089 (0.7768) loss 3.5294 (3.8636) grad_norm 0.8794 (1.1098) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][380/625] eta 0:02:21 lr 0.001979 wd 0.0500 time 0.2059 (0.5772) data time 0.0008 (0.0477) model time 0.2050 (0.5295) loss 3.9127 (3.8966) grad_norm 1.1525 (1.1005) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][390/625] eta 0:01:49 lr 0.001978 wd 0.0500 time 0.2114 (0.4657) data time 0.0009 (0.0336) model time 0.2105 (0.4321) loss 4.2322 (3.8774) grad_norm 1.4662 (1.1297) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][400/625] eta 0:01:31 lr 0.001978 wd 0.0500 time 0.2055 (0.4052) data time 0.0014 (0.0260) model time 0.2041 (0.3792) loss 3.4594 (3.7882) grad_norm 0.9751 (1.1226) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][410/625] eta 0:01:19 lr 0.001978 wd 0.0500 time 0.2106 (0.3680) data time 0.0009 (0.0213) model time 0.2097 (0.3466) loss 3.6897 (3.7480) grad_norm 2.1042 (1.1791) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][420/625] eta 0:01:10 lr 0.001978 wd 0.0500 time 0.2074 (0.3431) data time 0.0010 (0.0181) model time 0.2064 (0.3250) loss 3.4983 (3.7170) grad_norm 1.0530 (1.2328) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][430/625] eta 0:01:03 lr 0.001978 wd 0.0500 time 0.2149 (0.3254) data time 0.0009 (0.0158) model time 0.2139 (0.3096) loss 3.6401 (3.6706) grad_norm 0.9102 (1.2561) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][440/625] eta 0:00:57 lr 0.001978 wd 0.0500 time 0.2122 (0.3114) data time 0.0007 (0.0140) model time 0.2114 (0.2974) loss 2.3863 (3.6309) grad_norm 1.1742 (1.2398) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][450/625] eta 0:00:52 lr 0.001978 wd 0.0500 time 0.2072 (0.3006) data time 0.0008 (0.0126) model time 0.2064 (0.2880) loss 4.4686 (3.6306) grad_norm 1.1322 (1.2297) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][460/625] eta 0:00:48 lr 0.001978 wd 0.0500 time 0.2072 (0.2918) data time 0.0008 (0.0115) model time 0.2064 (0.2803) loss 4.0580 (3.6531) grad_norm 1.2845 (1.2497) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][470/625] eta 0:00:44 lr 0.001978 wd 0.0500 time 0.2086 (0.2844) data time 0.0007 (0.0106) model time 0.2079 (0.2738) loss 3.8446 (3.6438) grad_norm 0.9263 (1.2720) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][480/625] eta 0:00:40 lr 0.001978 wd 0.0500 time 0.2086 (0.2783) data time 0.0010 (0.0098) model time 0.2076 (0.2685) loss 3.5008 (3.6442) grad_norm 0.7263 (1.2710) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][490/625] eta 0:00:36 lr 0.001978 wd 0.0500 time 0.2046 (0.2732) data time 0.0012 (0.0092) model time 0.2034 (0.2640) loss 3.9548 (3.6343) grad_norm 0.9768 (1.2419) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][500/625] eta 0:00:33 lr 0.001978 wd 0.0500 time 0.2042 (0.2688) data time 0.0009 (0.0086) model time 0.2033 (0.2602) loss 4.1109 (3.6200) grad_norm 1.0185 (1.2270) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][510/625] eta 0:00:30 lr 0.001978 wd 0.0500 time 0.2117 (0.2651) data time 0.0007 (0.0081) model time 0.2110 (0.2570) loss 3.6169 (3.6157) grad_norm 0.9013 (1.2113) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][520/625] eta 0:00:27 lr 0.001978 wd 0.0500 time 0.2049 (0.2616) data time 0.0007 (0.0077) model time 0.2042 (0.2539) loss 2.6290 (3.6194) grad_norm 1.7008 (1.2022) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][530/625] eta 0:00:24 lr 0.001978 wd 0.0500 time 0.2038 (0.2585) data time 0.0011 (0.0073) model time 0.2027 (0.2512) loss 3.8231 (3.6210) grad_norm 1.1965 (1.2240) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][540/625] eta 0:00:21 lr 0.001978 wd 0.0500 time 0.2073 (0.2559) data time 0.0009 (0.0069) model time 0.2064 (0.2490) loss 4.0372 (3.6093) grad_norm 1.2851 (1.2324) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][550/625] eta 0:00:19 lr 0.001978 wd 0.0500 time 0.2129 (0.2537) data time 0.0008 (0.0066) model time 0.2122 (0.2470) loss 3.5055 (3.6059) grad_norm 1.2246 (1.2370) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][560/625] eta 0:00:16 lr 0.001978 wd 0.0500 time 0.2115 (0.2515) data time 0.0008 (0.0064) model time 0.2107 (0.2451) loss 2.7097 (3.5862) grad_norm 0.9275 (1.2338) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][570/625] eta 0:00:13 lr 0.001978 wd 0.0500 time 0.2123 (0.2496) data time 0.0011 (0.0061) model time 0.2113 (0.2435) loss 2.4477 (3.5786) grad_norm 1.4624 (1.2271) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][580/625] eta 0:00:11 lr 0.001978 wd 0.0500 time 0.2082 (0.2479) data time 0.0008 (0.0059) model time 0.2074 (0.2420) loss 3.0335 (3.5836) grad_norm 1.4593 (1.2270) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][590/625] eta 0:00:08 lr 0.001978 wd 0.0500 time 0.2084 (0.2464) data time 0.0009 (0.0057) model time 0.2074 (0.2407) loss 2.7849 (3.5800) grad_norm 1.1574 (1.2255) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][600/625] eta 0:00:06 lr 0.001978 wd 0.0500 time 0.2082 (0.2450) data time 0.0010 (0.0055) model time 0.2072 (0.2395) loss 3.8936 (3.5851) grad_norm 1.2448 (1.2232) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][610/625] eta 0:00:03 lr 0.001978 wd 0.0500 time 0.2062 (0.2436) data time 0.0007 (0.0053) model time 0.2055 (0.2383) loss 3.6153 (3.5733) grad_norm 1.0875 (1.2361) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][620/625] eta 0:00:01 lr 0.001978 wd 0.0500 time 0.2056 (0.2422) data time 0.0005 (0.0052) model time 0.2051 (0.2370) loss 3.4587 (3.5614) grad_norm 0.8063 (1.2327) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 15:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 38 training takes 0:01:04 [2024-07-29 15:58:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 15:58:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 15:58:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.607 (0.607) Loss 0.8135 (0.8135) Acc@1 83.887 (83.887) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 15:58:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.111) Loss 1.4062 (1.0351) Acc@1 69.580 (78.560) Acc@5 90.527 (95.157) Mem 8975MB [2024-07-29 15:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.5674 (1.2302) Acc@1 65.918 (74.049) Acc@5 87.939 (92.455) Mem 8975MB [2024-07-29 15:58:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.756 Acc@5 92.392 [2024-07-29 15:58:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.8% [2024-07-29 15:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.783 (0.783) Loss 1.6592 (1.6592) Acc@1 63.770 (63.770) Acc@5 84.717 (84.717) Mem 8975MB [2024-07-29 15:58:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.133) Loss 2.2168 (1.9299) Acc@1 51.123 (56.472) Acc@5 77.441 (81.419) Mem 8975MB [2024-07-29 15:58:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.096) Loss 2.5469 (2.1341) Acc@1 44.873 (53.251) Acc@5 71.484 (78.202) Mem 8975MB [2024-07-29 15:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 52.987 Acc@5 78.159 [2024-07-29 15:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 53.0% [2024-07-29 15:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 52.99% [2024-07-29 15:58:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 15:58:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 15:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][0/625] eta 0:07:25 lr 0.001978 wd 0.0500 time 0.7132 (0.7132) data time 0.4209 (0.4209) model time 0.0000 (0.0000) loss 3.7643 (3.7643) grad_norm 1.3342 (1.3342) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 15:58:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][10/625] eta 0:02:37 lr 0.001978 wd 0.0500 time 0.2078 (0.2556) data time 0.0010 (0.0392) model time 0.0000 (0.0000) loss 2.4865 (3.6172) grad_norm 1.0510 (1.0570) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][20/625] eta 0:02:21 lr 0.001978 wd 0.0500 time 0.2080 (0.2339) data time 0.0007 (0.0211) model time 0.0000 (0.0000) loss 4.4448 (3.6203) grad_norm 1.0370 (1.0998) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][30/625] eta 0:02:14 lr 0.001977 wd 0.0500 time 0.2087 (0.2259) data time 0.0011 (0.0147) model time 0.0000 (0.0000) loss 3.3810 (3.4686) grad_norm 1.0175 (1.0640) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][40/625] eta 0:02:14 lr 0.001977 wd 0.0500 time 0.2147 (0.2299) data time 0.0007 (0.0113) model time 0.0000 (0.0000) loss 2.9660 (3.4944) grad_norm 1.2629 (1.1275) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][50/625] eta 0:02:09 lr 0.001977 wd 0.0500 time 0.2080 (0.2259) data time 0.0012 (0.0093) model time 0.0000 (0.0000) loss 3.2387 (3.5815) grad_norm 1.3489 (1.1411) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][60/625] eta 0:02:06 lr 0.001977 wd 0.0500 time 0.2130 (0.2233) data time 0.0008 (0.0080) model time 0.2122 (0.2091) loss 3.8279 (3.6203) grad_norm 2.5225 (1.2361) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][70/625] eta 0:02:03 lr 0.001977 wd 0.0500 time 0.2094 (0.2220) data time 0.0009 (0.0070) model time 0.2085 (0.2109) loss 4.2080 (3.6092) grad_norm 0.7620 (1.2476) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][80/625] eta 0:02:00 lr 0.001977 wd 0.0500 time 0.2092 (0.2204) data time 0.0008 (0.0063) model time 0.2084 (0.2101) loss 3.7820 (3.6204) grad_norm 0.9881 (1.2309) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][90/625] eta 0:01:57 lr 0.001977 wd 0.0500 time 0.2137 (0.2192) data time 0.0009 (0.0057) model time 0.2127 (0.2097) loss 3.7424 (3.6207) grad_norm 1.0943 (1.2293) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][100/625] eta 0:01:54 lr 0.001977 wd 0.0500 time 0.2232 (0.2187) data time 0.0008 (0.0052) model time 0.2223 (0.2104) loss 2.8672 (3.6076) grad_norm 0.8815 (1.2228) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][110/625] eta 0:01:52 lr 0.001977 wd 0.0500 time 0.2037 (0.2181) data time 0.0010 (0.0048) model time 0.2027 (0.2104) loss 3.6309 (3.6156) grad_norm 0.8320 (1.2253) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][120/625] eta 0:01:49 lr 0.001977 wd 0.0500 time 0.2052 (0.2177) data time 0.0007 (0.0046) model time 0.2044 (0.2106) loss 3.1202 (3.5898) grad_norm 1.4411 (1.2197) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][130/625] eta 0:01:47 lr 0.001977 wd 0.0500 time 0.2154 (0.2172) data time 0.0007 (0.0043) model time 0.2147 (0.2105) loss 3.6136 (3.5996) grad_norm 0.9228 (1.2089) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][140/625] eta 0:01:46 lr 0.001977 wd 0.0500 time 0.2061 (0.2201) data time 0.0010 (0.0041) model time 0.2051 (0.2157) loss 3.7092 (3.6094) grad_norm 0.9960 (1.1933) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][150/625] eta 0:01:44 lr 0.001977 wd 0.0500 time 0.2005 (0.2197) data time 0.0008 (0.0039) model time 0.1997 (0.2155) loss 4.1695 (3.6052) grad_norm 1.2964 (1.2008) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][160/625] eta 0:01:42 lr 0.001977 wd 0.0500 time 0.2036 (0.2194) data time 0.0008 (0.0037) model time 0.2028 (0.2152) loss 4.2554 (3.6144) grad_norm 0.9408 (1.1984) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][170/625] eta 0:01:39 lr 0.001977 wd 0.0500 time 0.2238 (0.2189) data time 0.0009 (0.0036) model time 0.2229 (0.2148) loss 3.8672 (3.6271) grad_norm 1.0098 (1.1972) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][180/625] eta 0:01:37 lr 0.001977 wd 0.0500 time 0.2295 (0.2185) data time 0.0013 (0.0034) model time 0.2283 (0.2145) loss 3.0722 (3.6265) grad_norm 1.1587 (1.2096) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][190/625] eta 0:01:34 lr 0.001977 wd 0.0500 time 0.2109 (0.2182) data time 0.0007 (0.0033) model time 0.2102 (0.2143) loss 3.1104 (3.6135) grad_norm 1.0332 (1.2022) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][200/625] eta 0:01:32 lr 0.001977 wd 0.0500 time 0.2033 (0.2178) data time 0.0009 (0.0032) model time 0.2024 (0.2140) loss 3.2649 (3.5997) grad_norm 1.0334 (1.2014) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][210/625] eta 0:01:30 lr 0.001977 wd 0.0500 time 0.2106 (0.2176) data time 0.0009 (0.0031) model time 0.2097 (0.2139) loss 2.8631 (3.5881) grad_norm 1.9376 (1.2069) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][220/625] eta 0:01:28 lr 0.001977 wd 0.0500 time 0.2053 (0.2175) data time 0.0012 (0.0030) model time 0.2041 (0.2138) loss 3.4592 (3.5916) grad_norm 1.7843 (1.2159) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][230/625] eta 0:01:25 lr 0.001977 wd 0.0500 time 0.2137 (0.2172) data time 0.0007 (0.0029) model time 0.2130 (0.2137) loss 3.9462 (3.5882) grad_norm 1.2549 (1.2159) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][240/625] eta 0:01:23 lr 0.001977 wd 0.0500 time 0.2115 (0.2169) data time 0.0007 (0.0028) model time 0.2108 (0.2134) loss 4.4211 (3.5880) grad_norm 1.0542 (1.2066) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][250/625] eta 0:01:21 lr 0.001977 wd 0.0500 time 0.2071 (0.2178) data time 0.0010 (0.0028) model time 0.2060 (0.2147) loss 3.9238 (3.6052) grad_norm 1.7470 (1.2208) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][260/625] eta 0:01:19 lr 0.001977 wd 0.0500 time 0.2048 (0.2175) data time 0.0007 (0.0027) model time 0.2041 (0.2143) loss 2.4190 (3.5867) grad_norm 0.9311 (1.2138) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][270/625] eta 0:01:17 lr 0.001977 wd 0.0500 time 0.2048 (0.2172) data time 0.0009 (0.0026) model time 0.2039 (0.2141) loss 4.3967 (3.5822) grad_norm 1.0960 (1.2195) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][280/625] eta 0:01:14 lr 0.001977 wd 0.0500 time 0.2033 (0.2169) data time 0.0007 (0.0026) model time 0.2026 (0.2138) loss 4.0636 (3.5843) grad_norm 2.1085 (1.2268) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][290/625] eta 0:01:12 lr 0.001976 wd 0.0500 time 0.2176 (0.2169) data time 0.0009 (0.0025) model time 0.2167 (0.2139) loss 3.7852 (3.5908) grad_norm 0.9845 (1.2319) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][300/625] eta 0:01:10 lr 0.001976 wd 0.0500 time 0.2124 (0.2168) data time 0.0007 (0.0025) model time 0.2117 (0.2139) loss 4.4312 (3.6000) grad_norm 1.4208 (1.2286) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][310/625] eta 0:01:08 lr 0.001976 wd 0.0500 time 0.2101 (0.2166) data time 0.0009 (0.0024) model time 0.2092 (0.2137) loss 4.2158 (3.6037) grad_norm 0.8485 (1.2266) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][320/625] eta 0:01:06 lr 0.001976 wd 0.0500 time 0.2029 (0.2164) data time 0.0009 (0.0024) model time 0.2021 (0.2135) loss 4.2679 (3.6051) grad_norm 0.8121 (1.2226) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][330/625] eta 0:01:03 lr 0.001976 wd 0.0500 time 0.2151 (0.2163) data time 0.0009 (0.0023) model time 0.2143 (0.2134) loss 3.4983 (3.6073) grad_norm 0.9194 (1.2177) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][340/625] eta 0:01:01 lr 0.001976 wd 0.0500 time 0.2088 (0.2163) data time 0.0012 (0.0023) model time 0.2075 (0.2136) loss 3.2824 (3.6030) grad_norm 0.8413 (1.2101) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][350/625] eta 0:00:59 lr 0.001976 wd 0.0500 time 0.2143 (0.2162) data time 0.0010 (0.0023) model time 0.2134 (0.2135) loss 3.4560 (3.6041) grad_norm 1.1557 (1.2076) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][360/625] eta 0:00:57 lr 0.001976 wd 0.0500 time 0.2075 (0.2160) data time 0.0007 (0.0022) model time 0.2068 (0.2133) loss 3.2384 (3.6061) grad_norm 2.0895 (1.2140) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 15:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][370/625] eta 0:00:55 lr 0.001976 wd 0.0500 time 0.2080 (0.2158) data time 0.0010 (0.0022) model time 0.2070 (0.2132) loss 3.9908 (3.6107) grad_norm 0.9499 (1.2170) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][380/625] eta 0:00:52 lr 0.001976 wd 0.0500 time 0.2050 (0.2156) data time 0.0009 (0.0022) model time 0.2041 (0.2130) loss 3.6877 (3.6046) grad_norm 1.0634 (1.2139) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][390/625] eta 0:00:50 lr 0.001976 wd 0.0500 time 0.2094 (0.2155) data time 0.0008 (0.0021) model time 0.2086 (0.2129) loss 2.9895 (3.6009) grad_norm 0.9224 (1.2154) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][400/625] eta 0:00:48 lr 0.001976 wd 0.0500 time 0.2125 (0.2154) data time 0.0008 (0.0021) model time 0.2117 (0.2128) loss 3.7728 (3.6005) grad_norm 1.9847 (1.2212) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][410/625] eta 0:00:46 lr 0.001976 wd 0.0500 time 0.2050 (0.2153) data time 0.0011 (0.0021) model time 0.2039 (0.2128) loss 3.6260 (3.6052) grad_norm 1.2111 (1.2219) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][420/625] eta 0:00:44 lr 0.001976 wd 0.0500 time 0.2094 (0.2152) data time 0.0007 (0.0021) model time 0.2086 (0.2127) loss 3.4177 (3.6062) grad_norm 1.2022 (1.2264) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][430/625] eta 0:00:41 lr 0.001976 wd 0.0500 time 0.2071 (0.2151) data time 0.0007 (0.0020) model time 0.2064 (0.2126) loss 4.1277 (3.5981) grad_norm 0.8267 (1.2245) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][440/625] eta 0:00:39 lr 0.001976 wd 0.0500 time 0.2093 (0.2150) data time 0.0009 (0.0020) model time 0.2084 (0.2125) loss 3.4998 (3.5962) grad_norm 1.4674 (1.2205) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][450/625] eta 0:00:37 lr 0.001976 wd 0.0500 time 0.2102 (0.2149) data time 0.0008 (0.0020) model time 0.2094 (0.2125) loss 4.2644 (3.5879) grad_norm 0.8479 (1.2178) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][460/625] eta 0:00:35 lr 0.001976 wd 0.0500 time 0.2088 (0.2148) data time 0.0010 (0.0020) model time 0.2078 (0.2124) loss 3.7964 (3.5836) grad_norm 1.8549 (1.2188) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][470/625] eta 0:00:33 lr 0.001976 wd 0.0500 time 0.2058 (0.2148) data time 0.0011 (0.0020) model time 0.2047 (0.2124) loss 4.2059 (3.5917) grad_norm 0.9156 (1.2155) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][480/625] eta 0:00:31 lr 0.001976 wd 0.0500 time 0.2135 (0.2147) data time 0.0007 (0.0019) model time 0.2128 (0.2123) loss 2.5012 (3.5890) grad_norm 0.9028 (1.2101) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][490/625] eta 0:00:28 lr 0.001976 wd 0.0500 time 0.2092 (0.2146) data time 0.0008 (0.0019) model time 0.2084 (0.2123) loss 4.1598 (3.5885) grad_norm 0.8853 (1.2040) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][500/625] eta 0:00:26 lr 0.001976 wd 0.0500 time 0.2113 (0.2146) data time 0.0009 (0.0019) model time 0.2104 (0.2122) loss 3.7050 (3.5902) grad_norm 1.4422 (1.2055) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][510/625] eta 0:00:24 lr 0.001976 wd 0.0500 time 0.2125 (0.2145) data time 0.0007 (0.0019) model time 0.2119 (0.2122) loss 4.5818 (3.5933) grad_norm 1.3092 (1.2086) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][520/625] eta 0:00:22 lr 0.001976 wd 0.0500 time 0.2074 (0.2144) data time 0.0009 (0.0019) model time 0.2066 (0.2121) loss 3.6680 (3.5910) grad_norm 1.1634 (1.2086) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][530/625] eta 0:00:20 lr 0.001976 wd 0.0500 time 0.2140 (0.2144) data time 0.0009 (0.0019) model time 0.2131 (0.2121) loss 3.8067 (3.5890) grad_norm 2.0934 (1.2089) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][540/625] eta 0:00:18 lr 0.001976 wd 0.0500 time 0.2071 (0.2144) data time 0.0011 (0.0019) model time 0.2060 (0.2121) loss 4.0795 (3.5858) grad_norm 1.3728 (1.2082) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][550/625] eta 0:00:16 lr 0.001975 wd 0.0500 time 0.2061 (0.2144) data time 0.0012 (0.0019) model time 0.2049 (0.2121) loss 4.1301 (3.5837) grad_norm 2.8429 (1.2114) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][560/625] eta 0:00:13 lr 0.001975 wd 0.0500 time 0.2093 (0.2144) data time 0.0013 (0.0018) model time 0.2080 (0.2121) loss 3.5366 (3.5758) grad_norm 1.3133 (1.2111) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][570/625] eta 0:00:11 lr 0.001975 wd 0.0500 time 0.2032 (0.2143) data time 0.0009 (0.0018) model time 0.2023 (0.2121) loss 4.5201 (3.5765) grad_norm 1.7888 (1.2089) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][580/625] eta 0:00:09 lr 0.001975 wd 0.0500 time 0.2171 (0.2143) data time 0.0009 (0.0018) model time 0.2161 (0.2121) loss 2.4474 (3.5746) grad_norm 1.0566 (1.2112) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][590/625] eta 0:00:07 lr 0.001975 wd 0.0500 time 0.2062 (0.2142) data time 0.0007 (0.0018) model time 0.2055 (0.2120) loss 2.6780 (3.5773) grad_norm 1.0847 (1.2138) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][600/625] eta 0:00:05 lr 0.001975 wd 0.0500 time 0.2244 (0.2146) data time 0.0011 (0.0018) model time 0.2233 (0.2124) loss 4.2777 (3.5761) grad_norm 1.9937 (1.2165) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][610/625] eta 0:00:03 lr 0.001975 wd 0.0500 time 0.2071 (0.2145) data time 0.0007 (0.0018) model time 0.2064 (0.2124) loss 3.6429 (3.5766) grad_norm 1.0125 (1.2175) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][620/625] eta 0:00:01 lr 0.001975 wd 0.0500 time 0.2069 (0.2144) data time 0.0007 (0.0018) model time 0.2062 (0.2123) loss 3.4734 (3.5745) grad_norm 1.2476 (1.2163) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 39 training takes 0:02:13 [2024-07-29 16:00:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:00:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:00:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.603 (0.603) Loss 0.8135 (0.8135) Acc@1 83.887 (83.887) Acc@5 96.875 (96.875) Mem 8978MB [2024-07-29 16:00:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.112) Loss 1.3721 (1.0133) Acc@1 69.727 (78.338) Acc@5 90.283 (94.931) Mem 8978MB [2024-07-29 16:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.4863 (1.2117) Acc@1 66.943 (73.721) Acc@5 88.428 (92.260) Mem 8978MB [2024-07-29 16:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.490 Acc@5 92.204 [2024-07-29 16:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.5% [2024-07-29 16:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.894 (0.894) Loss 1.5117 (1.5117) Acc@1 66.064 (66.064) Acc@5 86.572 (86.572) Mem 8978MB [2024-07-29 16:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.137) Loss 2.0742 (1.7826) Acc@1 54.150 (59.273) Acc@5 79.785 (83.598) Mem 8978MB [2024-07-29 16:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.098) Loss 2.3984 (1.9883) Acc@1 47.217 (55.852) Acc@5 74.023 (80.269) Mem 8978MB [2024-07-29 16:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 55.588 Acc@5 80.216 [2024-07-29 16:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 55.6% [2024-07-29 16:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 55.59% [2024-07-29 16:00:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 16:01:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 16:01:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][0/625] eta 0:06:54 lr 0.001975 wd 0.0500 time 0.6635 (0.6635) data time 0.4628 (0.4628) model time 0.0000 (0.0000) loss 3.0609 (3.0609) grad_norm 0.9743 (0.9743) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][10/625] eta 0:02:33 lr 0.001975 wd 0.0500 time 0.2085 (0.2494) data time 0.0009 (0.0430) model time 0.0000 (0.0000) loss 3.5631 (3.5553) grad_norm 1.2593 (1.2856) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][20/625] eta 0:02:20 lr 0.001975 wd 0.0500 time 0.2116 (0.2320) data time 0.0009 (0.0230) model time 0.0000 (0.0000) loss 4.0329 (3.4861) grad_norm 1.8460 (1.4524) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][30/625] eta 0:02:13 lr 0.001975 wd 0.0500 time 0.2087 (0.2248) data time 0.0011 (0.0160) model time 0.0000 (0.0000) loss 3.8343 (3.5660) grad_norm 1.7771 (1.4319) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][40/625] eta 0:02:09 lr 0.001975 wd 0.0500 time 0.2169 (0.2213) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 3.7498 (3.6186) grad_norm 1.1176 (1.3638) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][50/625] eta 0:02:06 lr 0.001975 wd 0.0500 time 0.2084 (0.2193) data time 0.0010 (0.0101) model time 0.0000 (0.0000) loss 3.1091 (3.5357) grad_norm 0.9202 (1.2854) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][60/625] eta 0:02:03 lr 0.001975 wd 0.0500 time 0.2089 (0.2178) data time 0.0007 (0.0086) model time 0.2082 (0.2088) loss 3.7393 (3.4915) grad_norm 0.7837 (1.2805) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][70/625] eta 0:02:00 lr 0.001975 wd 0.0500 time 0.2108 (0.2171) data time 0.0009 (0.0076) model time 0.2099 (0.2103) loss 3.9548 (3.5089) grad_norm 1.2598 (1.2826) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][80/625] eta 0:01:57 lr 0.001975 wd 0.0500 time 0.2156 (0.2164) data time 0.0008 (0.0068) model time 0.2148 (0.2104) loss 3.6549 (3.5368) grad_norm 1.7305 (1.3000) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][90/625] eta 0:01:55 lr 0.001975 wd 0.0500 time 0.2163 (0.2158) data time 0.0007 (0.0062) model time 0.2155 (0.2102) loss 2.7623 (3.5336) grad_norm 2.3603 (1.3225) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][100/625] eta 0:01:52 lr 0.001975 wd 0.0500 time 0.2102 (0.2152) data time 0.0008 (0.0057) model time 0.2094 (0.2099) loss 3.0816 (3.5428) grad_norm 1.0161 (1.3266) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][110/625] eta 0:01:50 lr 0.001975 wd 0.0500 time 0.2105 (0.2154) data time 0.0008 (0.0052) model time 0.2098 (0.2110) loss 4.0344 (3.5348) grad_norm 1.1332 (1.3154) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][120/625] eta 0:01:48 lr 0.001975 wd 0.0500 time 0.2044 (0.2151) data time 0.0009 (0.0049) model time 0.2035 (0.2109) loss 3.7966 (3.5572) grad_norm 1.6326 (1.3105) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][130/625] eta 0:01:46 lr 0.001975 wd 0.0500 time 0.2130 (0.2148) data time 0.0007 (0.0046) model time 0.2122 (0.2108) loss 2.3958 (3.5454) grad_norm 1.5076 (1.3048) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][140/625] eta 0:01:44 lr 0.001975 wd 0.0500 time 0.2216 (0.2145) data time 0.0007 (0.0043) model time 0.2209 (0.2107) loss 2.5762 (3.5245) grad_norm 2.0340 (1.3003) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][150/625] eta 0:01:41 lr 0.001975 wd 0.0500 time 0.2116 (0.2143) data time 0.0008 (0.0041) model time 0.2108 (0.2106) loss 4.2653 (3.5512) grad_norm 1.0973 (1.3063) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][160/625] eta 0:01:39 lr 0.001975 wd 0.0500 time 0.2116 (0.2140) data time 0.0009 (0.0039) model time 0.2107 (0.2105) loss 3.5505 (3.5453) grad_norm 1.2629 (1.3026) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][170/625] eta 0:01:37 lr 0.001975 wd 0.0500 time 0.2052 (0.2137) data time 0.0007 (0.0037) model time 0.2045 (0.2102) loss 3.4780 (3.5556) grad_norm 1.1006 (1.3137) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][180/625] eta 0:01:35 lr 0.001974 wd 0.0500 time 0.2119 (0.2136) data time 0.0009 (0.0036) model time 0.2110 (0.2103) loss 3.3575 (3.5464) grad_norm 0.9786 (1.2922) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][190/625] eta 0:01:32 lr 0.001974 wd 0.0500 time 0.2093 (0.2135) data time 0.0009 (0.0035) model time 0.2084 (0.2104) loss 3.7260 (3.5539) grad_norm 0.9528 (1.2782) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][200/625] eta 0:01:30 lr 0.001974 wd 0.0500 time 0.2154 (0.2136) data time 0.0009 (0.0033) model time 0.2145 (0.2106) loss 3.4697 (3.5402) grad_norm 0.9291 (1.2716) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][210/625] eta 0:01:28 lr 0.001974 wd 0.0500 time 0.2110 (0.2134) data time 0.0010 (0.0032) model time 0.2099 (0.2104) loss 3.6717 (3.5373) grad_norm 1.1529 (1.2839) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][220/625] eta 0:01:26 lr 0.001974 wd 0.0500 time 0.2236 (0.2133) data time 0.0008 (0.0031) model time 0.2227 (0.2105) loss 3.4783 (3.5399) grad_norm 1.0279 (1.2743) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][230/625] eta 0:01:24 lr 0.001974 wd 0.0500 time 0.2128 (0.2131) data time 0.0009 (0.0030) model time 0.2120 (0.2104) loss 3.4074 (3.5542) grad_norm 1.2476 (1.2674) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][240/625] eta 0:01:22 lr 0.001974 wd 0.0500 time 0.2085 (0.2131) data time 0.0009 (0.0030) model time 0.2076 (0.2104) loss 3.1582 (3.5544) grad_norm 1.4531 (1.2606) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][250/625] eta 0:01:19 lr 0.001974 wd 0.0500 time 0.2096 (0.2131) data time 0.0007 (0.0029) model time 0.2089 (0.2104) loss 3.8567 (3.5554) grad_norm 0.8635 (1.2559) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][260/625] eta 0:01:17 lr 0.001974 wd 0.0500 time 0.2081 (0.2130) data time 0.0008 (0.0028) model time 0.2072 (0.2104) loss 4.1075 (3.5533) grad_norm 1.2383 (1.2665) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][270/625] eta 0:01:15 lr 0.001974 wd 0.0500 time 0.2095 (0.2130) data time 0.0011 (0.0027) model time 0.2084 (0.2105) loss 2.6783 (3.5469) grad_norm 1.1912 (1.2573) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][280/625] eta 0:01:13 lr 0.001974 wd 0.0500 time 0.2100 (0.2130) data time 0.0007 (0.0027) model time 0.2092 (0.2105) loss 3.7601 (3.5505) grad_norm 0.9229 (1.2532) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][290/625] eta 0:01:11 lr 0.001974 wd 0.0500 time 0.2193 (0.2130) data time 0.0007 (0.0026) model time 0.2186 (0.2106) loss 4.1801 (3.5655) grad_norm 1.6587 (1.2570) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][300/625] eta 0:01:09 lr 0.001974 wd 0.0500 time 0.2139 (0.2130) data time 0.0010 (0.0026) model time 0.2129 (0.2107) loss 3.3949 (3.5650) grad_norm 1.3454 (1.2592) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][310/625] eta 0:01:07 lr 0.001974 wd 0.0500 time 0.2062 (0.2130) data time 0.0008 (0.0025) model time 0.2054 (0.2107) loss 4.1666 (3.5713) grad_norm 1.6556 (1.2547) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][320/625] eta 0:01:04 lr 0.001974 wd 0.0500 time 0.2116 (0.2130) data time 0.0007 (0.0025) model time 0.2109 (0.2108) loss 3.2447 (3.5706) grad_norm 1.0921 (1.2586) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][330/625] eta 0:01:02 lr 0.001974 wd 0.0500 time 0.2241 (0.2132) data time 0.0010 (0.0024) model time 0.2231 (0.2111) loss 3.9074 (3.5684) grad_norm 1.2940 (1.2658) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][340/625] eta 0:01:00 lr 0.001974 wd 0.0500 time 0.2119 (0.2133) data time 0.0007 (0.0024) model time 0.2111 (0.2112) loss 4.2326 (3.5634) grad_norm 1.8622 (1.2675) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][350/625] eta 0:00:58 lr 0.001974 wd 0.0500 time 0.2097 (0.2133) data time 0.0010 (0.0024) model time 0.2086 (0.2112) loss 3.7435 (3.5643) grad_norm 0.8486 (1.2649) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][360/625] eta 0:00:56 lr 0.001974 wd 0.0500 time 0.2152 (0.2133) data time 0.0010 (0.0023) model time 0.2142 (0.2112) loss 3.5052 (3.5550) grad_norm 0.7689 (1.2587) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][370/625] eta 0:00:54 lr 0.001974 wd 0.0500 time 0.2138 (0.2133) data time 0.0009 (0.0023) model time 0.2128 (0.2113) loss 3.5678 (3.5599) grad_norm 0.7928 (1.2571) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 16:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][380/625] eta 0:00:52 lr 0.001974 wd 0.0500 time 0.2082 (0.2132) data time 0.0012 (0.0023) model time 0.2070 (0.2113) loss 4.0252 (3.5658) grad_norm 0.8355 (inf) loss_scale 8192.0000 (16340.9974) mem 8978MB [2024-07-29 16:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][390/625] eta 0:00:50 lr 0.001974 wd 0.0500 time 0.2097 (0.2132) data time 0.0011 (0.0022) model time 0.2086 (0.2112) loss 3.9159 (3.5744) grad_norm 0.8127 (inf) loss_scale 8192.0000 (16132.5831) mem 8978MB [2024-07-29 16:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][400/625] eta 0:00:47 lr 0.001974 wd 0.0500 time 0.2165 (0.2131) data time 0.0009 (0.0022) model time 0.2156 (0.2112) loss 3.9535 (3.5763) grad_norm 1.1226 (inf) loss_scale 8192.0000 (15934.5636) mem 8978MB [2024-07-29 16:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][410/625] eta 0:00:45 lr 0.001974 wd 0.0500 time 0.2119 (0.2137) data time 0.0010 (0.0022) model time 0.2109 (0.2118) loss 3.6070 (3.5768) grad_norm 0.7803 (inf) loss_scale 8192.0000 (15746.1800) mem 8978MB [2024-07-29 16:02:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][420/625] eta 0:00:43 lr 0.001973 wd 0.0500 time 0.2074 (0.2136) data time 0.0010 (0.0022) model time 0.2064 (0.2117) loss 2.7786 (3.5691) grad_norm 1.0460 (inf) loss_scale 8192.0000 (15566.7458) mem 8978MB [2024-07-29 16:02:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][430/625] eta 0:00:41 lr 0.001973 wd 0.0500 time 0.2104 (0.2135) data time 0.0010 (0.0021) model time 0.2094 (0.2117) loss 3.4414 (3.5628) grad_norm 1.7171 (inf) loss_scale 8192.0000 (15395.6381) mem 8978MB [2024-07-29 16:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][440/625] eta 0:00:39 lr 0.001973 wd 0.0500 time 0.2096 (0.2134) data time 0.0010 (0.0021) model time 0.2086 (0.2116) loss 3.8902 (3.5636) grad_norm 0.9575 (inf) loss_scale 8192.0000 (15232.2902) mem 8978MB [2024-07-29 16:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][450/625] eta 0:00:37 lr 0.001973 wd 0.0500 time 0.2111 (0.2134) data time 0.0010 (0.0021) model time 0.2101 (0.2116) loss 3.1424 (3.5631) grad_norm 1.0552 (inf) loss_scale 8192.0000 (15076.1863) mem 8978MB [2024-07-29 16:02:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][460/625] eta 0:00:35 lr 0.001973 wd 0.0500 time 0.2053 (0.2133) data time 0.0009 (0.0021) model time 0.2044 (0.2115) loss 3.4826 (3.5677) grad_norm 1.0091 (inf) loss_scale 8192.0000 (14926.8547) mem 8978MB [2024-07-29 16:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][470/625] eta 0:00:33 lr 0.001973 wd 0.0500 time 0.2047 (0.2133) data time 0.0010 (0.0020) model time 0.2037 (0.2115) loss 4.1033 (3.5732) grad_norm 1.1703 (inf) loss_scale 8192.0000 (14783.8641) mem 8978MB [2024-07-29 16:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][480/625] eta 0:00:30 lr 0.001973 wd 0.0500 time 0.2106 (0.2132) data time 0.0008 (0.0020) model time 0.2098 (0.2115) loss 4.1991 (3.5768) grad_norm 1.4922 (inf) loss_scale 8192.0000 (14646.8191) mem 8978MB [2024-07-29 16:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][490/625] eta 0:00:28 lr 0.001973 wd 0.0500 time 0.1999 (0.2132) data time 0.0008 (0.0020) model time 0.1991 (0.2114) loss 4.2874 (3.5785) grad_norm 1.2984 (inf) loss_scale 8192.0000 (14515.3564) mem 8978MB [2024-07-29 16:02:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][500/625] eta 0:00:26 lr 0.001973 wd 0.0500 time 0.2143 (0.2132) data time 0.0008 (0.0020) model time 0.2135 (0.2114) loss 3.3847 (3.5754) grad_norm 0.9040 (inf) loss_scale 8192.0000 (14389.1417) mem 8978MB [2024-07-29 16:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][510/625] eta 0:00:24 lr 0.001973 wd 0.0500 time 0.2126 (0.2136) data time 0.0007 (0.0020) model time 0.2119 (0.2119) loss 4.0974 (3.5812) grad_norm 0.7214 (inf) loss_scale 8192.0000 (14267.8669) mem 8978MB [2024-07-29 16:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][520/625] eta 0:00:22 lr 0.001973 wd 0.0500 time 0.2127 (0.2135) data time 0.0010 (0.0019) model time 0.2117 (0.2119) loss 3.9441 (3.5837) grad_norm 1.0573 (inf) loss_scale 8192.0000 (14151.2476) mem 8978MB [2024-07-29 16:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][530/625] eta 0:00:20 lr 0.001973 wd 0.0500 time 0.2145 (0.2135) data time 0.0009 (0.0019) model time 0.2136 (0.2118) loss 4.0081 (3.5849) grad_norm 0.9101 (inf) loss_scale 8192.0000 (14039.0207) mem 8978MB [2024-07-29 16:02:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][540/625] eta 0:00:18 lr 0.001973 wd 0.0500 time 0.2073 (0.2134) data time 0.0010 (0.0019) model time 0.2063 (0.2118) loss 3.7076 (3.5844) grad_norm 1.4325 (inf) loss_scale 8192.0000 (13930.9427) mem 8978MB [2024-07-29 16:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][550/625] eta 0:00:16 lr 0.001973 wd 0.0500 time 0.2238 (0.2134) data time 0.0014 (0.0019) model time 0.2224 (0.2117) loss 4.2634 (3.5856) grad_norm 0.9219 (inf) loss_scale 8192.0000 (13826.7877) mem 8978MB [2024-07-29 16:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][560/625] eta 0:00:13 lr 0.001973 wd 0.0500 time 0.2107 (0.2134) data time 0.0009 (0.0019) model time 0.2098 (0.2117) loss 3.3699 (3.5849) grad_norm 1.0731 (inf) loss_scale 8192.0000 (13726.3458) mem 8978MB [2024-07-29 16:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][570/625] eta 0:00:11 lr 0.001973 wd 0.0500 time 0.2245 (0.2134) data time 0.0009 (0.0019) model time 0.2236 (0.2117) loss 3.2943 (3.5900) grad_norm 0.8370 (inf) loss_scale 8192.0000 (13629.4221) mem 8978MB [2024-07-29 16:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][580/625] eta 0:00:09 lr 0.001973 wd 0.0500 time 0.2135 (0.2133) data time 0.0009 (0.0018) model time 0.2125 (0.2116) loss 3.5974 (3.5910) grad_norm 1.1882 (inf) loss_scale 8192.0000 (13535.8348) mem 8978MB [2024-07-29 16:03:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][590/625] eta 0:00:07 lr 0.001973 wd 0.0500 time 0.2160 (0.2133) data time 0.0007 (0.0018) model time 0.2153 (0.2116) loss 2.5479 (3.5829) grad_norm 0.8810 (inf) loss_scale 8192.0000 (13445.4146) mem 8978MB [2024-07-29 16:03:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][600/625] eta 0:00:05 lr 0.001973 wd 0.0500 time 0.2156 (0.2132) data time 0.0010 (0.0018) model time 0.2147 (0.2116) loss 2.8153 (3.5763) grad_norm 0.9368 (inf) loss_scale 8192.0000 (13358.0033) mem 8978MB [2024-07-29 16:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][610/625] eta 0:00:03 lr 0.001973 wd 0.0500 time 0.2086 (0.2131) data time 0.0007 (0.0018) model time 0.2079 (0.2115) loss 3.7112 (3.5757) grad_norm 0.8479 (inf) loss_scale 8192.0000 (13273.4534) mem 8978MB [2024-07-29 16:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][620/625] eta 0:00:01 lr 0.001973 wd 0.0500 time 0.2075 (0.2130) data time 0.0005 (0.0018) model time 0.2070 (0.2114) loss 3.5053 (3.5796) grad_norm 1.1361 (inf) loss_scale 8192.0000 (13191.6264) mem 8978MB [2024-07-29 16:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 40 training takes 0:02:13 [2024-07-29 16:03:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:03:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:03:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.8687 (0.8687) Acc@1 83.301 (83.301) Acc@5 96.777 (96.777) Mem 8978MB [2024-07-29 16:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.3906 (1.0603) Acc@1 70.996 (78.786) Acc@5 90.869 (95.202) Mem 8978MB [2024-07-29 16:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.4678 (1.2493) Acc@1 67.822 (74.493) Acc@5 90.625 (92.511) Mem 8978MB [2024-07-29 16:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.290 Acc@5 92.538 [2024-07-29 16:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.3% [2024-07-29 16:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.29% [2024-07-29 16:03:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 16:03:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 16:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 1.3857 (1.3857) Acc@1 68.652 (68.652) Acc@5 88.184 (88.184) Mem 8978MB [2024-07-29 16:03:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.093) Loss 1.9424 (1.6485) Acc@1 56.201 (61.799) Acc@5 81.641 (85.445) Mem 8978MB [2024-07-29 16:03:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 2.2656 (1.8557) Acc@1 50.000 (58.261) Acc@5 75.928 (82.164) Mem 8978MB [2024-07-29 16:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 57.925 Acc@5 82.102 [2024-07-29 16:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 57.9% [2024-07-29 16:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 57.93% [2024-07-29 16:03:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 16:03:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 16:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][0/625] eta 0:07:39 lr 0.001973 wd 0.0500 time 0.7359 (0.7359) data time 0.5110 (0.5110) model time 0.0000 (0.0000) loss 3.8215 (3.8215) grad_norm 2.4371 (2.4371) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][10/625] eta 0:02:41 lr 0.001973 wd 0.0500 time 0.2066 (0.2620) data time 0.0009 (0.0474) model time 0.0000 (0.0000) loss 2.4176 (3.6686) grad_norm 1.0060 (1.4668) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][20/625] eta 0:02:24 lr 0.001973 wd 0.0500 time 0.2123 (0.2394) data time 0.0010 (0.0253) model time 0.0000 (0.0000) loss 3.8205 (3.5816) grad_norm 2.5629 (1.4684) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][30/625] eta 0:02:16 lr 0.001973 wd 0.0500 time 0.2127 (0.2302) data time 0.0007 (0.0175) model time 0.0000 (0.0000) loss 4.0293 (3.5422) grad_norm 0.8268 (1.3462) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][40/625] eta 0:02:12 lr 0.001972 wd 0.0500 time 0.2048 (0.2257) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 3.0871 (3.5011) grad_norm 1.5910 (1.3011) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][50/625] eta 0:02:08 lr 0.001972 wd 0.0500 time 0.2165 (0.2227) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 2.6629 (3.4218) grad_norm 1.5508 (1.2975) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][60/625] eta 0:02:04 lr 0.001972 wd 0.0500 time 0.2102 (0.2204) data time 0.0010 (0.0094) model time 0.2092 (0.2079) loss 3.8468 (3.4408) grad_norm 0.8084 (1.2599) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][70/625] eta 0:02:01 lr 0.001972 wd 0.0500 time 0.2080 (0.2190) data time 0.0007 (0.0083) model time 0.2073 (0.2086) loss 3.8623 (3.4579) grad_norm 0.9945 (1.2649) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][80/625] eta 0:01:58 lr 0.001972 wd 0.0500 time 0.2129 (0.2177) data time 0.0009 (0.0074) model time 0.2119 (0.2082) loss 3.6437 (3.4859) grad_norm 1.1478 (1.2433) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][90/625] eta 0:01:56 lr 0.001972 wd 0.0500 time 0.2385 (0.2175) data time 0.0007 (0.0067) model time 0.2378 (0.2097) loss 2.7290 (3.5241) grad_norm 0.7450 (1.2321) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][100/625] eta 0:01:54 lr 0.001972 wd 0.0500 time 0.2072 (0.2175) data time 0.0008 (0.0066) model time 0.2064 (0.2102) loss 4.1135 (3.5373) grad_norm 0.7867 (1.2520) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][110/625] eta 0:01:51 lr 0.001972 wd 0.0500 time 0.2124 (0.2168) data time 0.0008 (0.0061) model time 0.2117 (0.2099) loss 3.0712 (3.5350) grad_norm 1.7387 (1.2581) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][120/625] eta 0:01:49 lr 0.001972 wd 0.0500 time 0.2180 (0.2165) data time 0.0007 (0.0056) model time 0.2173 (0.2102) loss 3.2558 (3.5585) grad_norm 0.7724 (1.2459) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][130/625] eta 0:01:46 lr 0.001972 wd 0.0500 time 0.2064 (0.2160) data time 0.0009 (0.0053) model time 0.2054 (0.2101) loss 2.6526 (3.5098) grad_norm 1.1741 (1.2296) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][140/625] eta 0:01:44 lr 0.001972 wd 0.0500 time 0.2072 (0.2156) data time 0.0009 (0.0050) model time 0.2063 (0.2100) loss 2.0729 (3.4836) grad_norm 1.3342 (1.2432) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][150/625] eta 0:01:42 lr 0.001972 wd 0.0500 time 0.2185 (0.2153) data time 0.0009 (0.0047) model time 0.2176 (0.2100) loss 2.6806 (3.4841) grad_norm 1.0437 (1.2550) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][160/625] eta 0:01:39 lr 0.001972 wd 0.0500 time 0.2106 (0.2149) data time 0.0009 (0.0045) model time 0.2097 (0.2098) loss 4.0310 (3.4953) grad_norm 1.1385 (1.2660) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][170/625] eta 0:01:37 lr 0.001972 wd 0.0500 time 0.2107 (0.2146) data time 0.0008 (0.0043) model time 0.2099 (0.2098) loss 3.5163 (3.5014) grad_norm 0.8144 (1.2547) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][180/625] eta 0:01:35 lr 0.001972 wd 0.0500 time 0.2138 (0.2146) data time 0.0007 (0.0041) model time 0.2131 (0.2100) loss 4.1796 (3.5146) grad_norm 1.3776 (1.2556) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][190/625] eta 0:01:33 lr 0.001972 wd 0.0500 time 0.2213 (0.2144) data time 0.0007 (0.0040) model time 0.2206 (0.2101) loss 3.4632 (3.5246) grad_norm 2.0691 (1.2523) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][200/625] eta 0:01:31 lr 0.001972 wd 0.0500 time 0.2109 (0.2143) data time 0.0007 (0.0039) model time 0.2102 (0.2101) loss 2.8971 (3.5156) grad_norm 0.8881 (1.2443) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][210/625] eta 0:01:28 lr 0.001972 wd 0.0500 time 0.2090 (0.2142) data time 0.0009 (0.0037) model time 0.2081 (0.2101) loss 3.9207 (3.5197) grad_norm 1.2676 (1.2555) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][220/625] eta 0:01:26 lr 0.001972 wd 0.0500 time 0.2207 (0.2140) data time 0.0009 (0.0036) model time 0.2198 (0.2101) loss 3.7386 (3.5253) grad_norm 1.4248 (1.2513) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][230/625] eta 0:01:24 lr 0.001972 wd 0.0500 time 0.2107 (0.2139) data time 0.0007 (0.0035) model time 0.2100 (0.2101) loss 4.2955 (3.5243) grad_norm 0.9026 (1.2443) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][240/625] eta 0:01:22 lr 0.001972 wd 0.0500 time 0.2119 (0.2138) data time 0.0008 (0.0034) model time 0.2111 (0.2101) loss 2.8034 (3.5080) grad_norm 1.3822 (1.2480) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][250/625] eta 0:01:20 lr 0.001972 wd 0.0500 time 0.2235 (0.2139) data time 0.0007 (0.0033) model time 0.2228 (0.2103) loss 3.4074 (3.5178) grad_norm 0.7652 (1.2508) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][260/625] eta 0:01:18 lr 0.001972 wd 0.0500 time 0.2055 (0.2138) data time 0.0009 (0.0032) model time 0.2046 (0.2103) loss 2.3715 (3.5200) grad_norm 0.9319 (1.2591) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][270/625] eta 0:01:15 lr 0.001972 wd 0.0500 time 0.2080 (0.2137) data time 0.0009 (0.0031) model time 0.2070 (0.2103) loss 3.5499 (3.5183) grad_norm 1.0037 (1.2531) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][280/625] eta 0:01:13 lr 0.001971 wd 0.0500 time 0.2131 (0.2136) data time 0.0008 (0.0031) model time 0.2123 (0.2103) loss 3.4799 (3.5272) grad_norm 1.3119 (1.2494) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][290/625] eta 0:01:11 lr 0.001971 wd 0.0500 time 0.2136 (0.2135) data time 0.0009 (0.0030) model time 0.2127 (0.2103) loss 3.4116 (3.5250) grad_norm 1.0659 (1.2483) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 16:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 16:04:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:04:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:13:42 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 41) [2024-07-29 16:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 16:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][300/625] eta 0:12:23 lr 0.001971 wd 0.0500 time 0.2070 (2.2878) data time 0.0007 (0.2655) model time 0.2063 (2.0222) loss 4.1610 (3.9580) grad_norm 0.8180 (0.9924) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][310/625] eta 0:04:44 lr 0.001971 wd 0.0500 time 0.2054 (0.9036) data time 0.0012 (0.0892) model time 0.2042 (0.8144) loss 3.8495 (3.8114) grad_norm 0.8970 (1.1507) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][320/625] eta 0:03:10 lr 0.001971 wd 0.0500 time 0.2097 (0.6255) data time 0.0011 (0.0540) model time 0.2086 (0.5715) loss 3.9332 (3.8501) grad_norm 1.1361 (1.1915) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][330/625] eta 0:02:29 lr 0.001971 wd 0.0500 time 0.2075 (0.5066) data time 0.0010 (0.0389) model time 0.2065 (0.4677) loss 3.5179 (3.8155) grad_norm 1.2255 (1.2129) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][340/625] eta 0:02:05 lr 0.001971 wd 0.0500 time 0.2129 (0.4410) data time 0.0010 (0.0305) model time 0.2119 (0.4105) loss 3.7095 (3.7567) grad_norm 1.4134 (1.2212) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][350/625] eta 0:01:50 lr 0.001971 wd 0.0500 time 0.2035 (0.4003) data time 0.0008 (0.0252) model time 0.2027 (0.3751) loss 2.3860 (3.7353) grad_norm 0.8573 (1.1898) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][360/625] eta 0:01:38 lr 0.001971 wd 0.0500 time 0.2175 (0.3711) data time 0.0009 (0.0215) model time 0.2166 (0.3496) loss 3.9122 (3.7297) grad_norm 1.0724 (1.1648) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][370/625] eta 0:01:29 lr 0.001971 wd 0.0500 time 0.2094 (0.3494) data time 0.0010 (0.0187) model time 0.2084 (0.3306) loss 2.8147 (3.6802) grad_norm 0.8061 (1.1578) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][380/625] eta 0:01:21 lr 0.001971 wd 0.0500 time 0.2100 (0.3329) data time 0.0008 (0.0167) model time 0.2092 (0.3162) loss 3.5172 (3.6504) grad_norm 1.1616 (1.1735) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][390/625] eta 0:01:15 lr 0.001971 wd 0.0500 time 0.2080 (0.3199) data time 0.0010 (0.0150) model time 0.2070 (0.3049) loss 3.8875 (3.6498) grad_norm 0.8904 (1.1784) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][400/625] eta 0:01:09 lr 0.001971 wd 0.0500 time 0.2049 (0.3094) data time 0.0011 (0.0137) model time 0.2038 (0.2957) loss 3.1606 (3.6702) grad_norm 1.0295 (1.1982) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][410/625] eta 0:01:04 lr 0.001971 wd 0.0500 time 0.2024 (0.3005) data time 0.0011 (0.0126) model time 0.2013 (0.2879) loss 2.8658 (3.6599) grad_norm 1.2624 (1.1995) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][420/625] eta 0:01:00 lr 0.001971 wd 0.0500 time 0.2101 (0.2931) data time 0.0009 (0.0117) model time 0.2092 (0.2814) loss 3.2872 (3.6542) grad_norm 1.0591 (1.2208) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][430/625] eta 0:00:55 lr 0.001971 wd 0.0500 time 0.2074 (0.2870) data time 0.0008 (0.0109) model time 0.2066 (0.2761) loss 3.4422 (3.6528) grad_norm 1.1161 (1.2325) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][440/625] eta 0:00:52 lr 0.001971 wd 0.0500 time 0.2116 (0.2820) data time 0.0011 (0.0102) model time 0.2105 (0.2718) loss 3.9264 (3.6390) grad_norm 0.9100 (1.2279) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][450/625] eta 0:00:48 lr 0.001971 wd 0.0500 time 0.2118 (0.2774) data time 0.0009 (0.0096) model time 0.2109 (0.2678) loss 3.7923 (3.6282) grad_norm 1.1155 (1.2120) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][460/625] eta 0:00:45 lr 0.001971 wd 0.0500 time 0.2104 (0.2734) data time 0.0012 (0.0091) model time 0.2093 (0.2643) loss 3.5808 (3.6217) grad_norm 1.2907 (1.2056) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][470/625] eta 0:00:41 lr 0.001971 wd 0.0500 time 0.2100 (0.2698) data time 0.0007 (0.0087) model time 0.2093 (0.2611) loss 3.7189 (3.6144) grad_norm 2.0752 (1.2123) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][480/625] eta 0:00:38 lr 0.001971 wd 0.0500 time 0.2054 (0.2665) data time 0.0010 (0.0082) model time 0.2044 (0.2583) loss 3.3559 (3.6067) grad_norm 0.8979 (1.2126) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][490/625] eta 0:00:35 lr 0.001971 wd 0.0500 time 0.2204 (0.2637) data time 0.0010 (0.0079) model time 0.2194 (0.2559) loss 2.5348 (3.5950) grad_norm 0.9226 (1.2154) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][500/625] eta 0:00:32 lr 0.001971 wd 0.0500 time 0.2118 (0.2611) data time 0.0010 (0.0075) model time 0.2109 (0.2535) loss 2.8102 (3.5867) grad_norm 1.0049 (1.2061) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][510/625] eta 0:00:29 lr 0.001970 wd 0.0500 time 0.2107 (0.2588) data time 0.0011 (0.0072) model time 0.2096 (0.2515) loss 3.1594 (3.5782) grad_norm 0.7751 (1.1945) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][520/625] eta 0:00:26 lr 0.001970 wd 0.0500 time 0.2121 (0.2565) data time 0.0008 (0.0070) model time 0.2114 (0.2496) loss 3.3522 (3.5817) grad_norm 1.1043 (1.1998) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][530/625] eta 0:00:24 lr 0.001970 wd 0.0500 time 0.2109 (0.2545) data time 0.0009 (0.0067) model time 0.2100 (0.2478) loss 3.8972 (3.5739) grad_norm 1.1448 (1.1996) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][540/625] eta 0:00:21 lr 0.001970 wd 0.0500 time 0.2103 (0.2528) data time 0.0009 (0.0065) model time 0.2094 (0.2463) loss 3.5413 (3.5778) grad_norm 1.1574 (1.2100) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 16:15:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:15:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:38:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 16:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:43:12 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 16:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 16:43:25 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 16:43:26 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 16:43:26 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 16:43:26 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 41) [2024-07-29 16:43:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 16:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 16:43:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:43:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:45:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:45:57 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 41) [2024-07-29 16:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 16:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][550/625] eta 0:01:44 lr 0.001970 wd 0.0500 time 0.1991 (1.3933) data time 0.0006 (0.2416) model time 0.1985 (1.1516) loss 4.0103 (3.9513) grad_norm 1.8537 (1.6203) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][560/625] eta 0:00:59 lr 0.001970 wd 0.0500 time 3.6210 (0.9203) data time 0.0007 (0.1079) model time 3.6203 (0.8125) loss 3.9538 (3.7576) grad_norm 1.4678 (1.5497) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][570/625] eta 0:00:37 lr 0.001970 wd 0.0500 time 0.2025 (0.6812) data time 0.0009 (0.0697) model time 0.2016 (0.6115) loss 3.8654 (3.8066) grad_norm 0.9228 (1.4165) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][580/625] eta 0:00:24 lr 0.001970 wd 0.0500 time 0.1952 (0.5547) data time 0.0009 (0.0516) model time 0.1943 (0.5032) loss 3.5773 (3.7337) grad_norm 0.8184 (1.2966) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][590/625] eta 0:00:16 lr 0.001970 wd 0.0500 time 0.1952 (0.4817) data time 0.0007 (0.0410) model time 0.1945 (0.4406) loss 3.7584 (3.7374) grad_norm 1.0538 (1.2597) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][600/625] eta 0:00:10 lr 0.001970 wd 0.0500 time 0.2004 (0.4337) data time 0.0008 (0.0341) model time 0.1995 (0.3996) loss 3.0372 (3.7261) grad_norm 1.6782 (1.2582) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][610/625] eta 0:00:05 lr 0.001970 wd 0.0500 time 0.1987 (0.3997) data time 0.0005 (0.0293) model time 0.1982 (0.3704) loss 2.7636 (3.7026) grad_norm 1.0869 (1.2482) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][620/625] eta 0:00:01 lr 0.001970 wd 0.0500 time 0.1984 (0.3742) data time 0.0004 (0.0256) model time 0.1980 (0.3485) loss 2.9244 (3.6649) grad_norm 0.9966 (1.2439) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 41 training takes 0:00:29 [2024-07-29 16:46:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:46:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:46:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.480 (1.480) Loss 0.8047 (0.8047) Acc@1 84.521 (84.521) Acc@5 96.924 (96.924) Mem 8977MB [2024-07-29 16:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.066 (0.194) Loss 1.3672 (0.9981) Acc@1 70.557 (79.186) Acc@5 90.332 (95.219) Mem 8977MB [2024-07-29 16:46:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.128) Loss 1.4561 (1.1852) Acc@1 67.773 (74.849) Acc@5 89.404 (92.785) Mem 8977MB [2024-07-29 16:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.560 Acc@5 92.710 [2024-07-29 16:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.6% [2024-07-29 16:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.56% [2024-07-29 16:46:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 16:46:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 16:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 2.142 (2.142) Loss 1.2734 (1.2734) Acc@1 70.850 (70.850) Acc@5 89.551 (89.551) Mem 8977MB [2024-07-29 16:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.247) Loss 1.8389 (1.5324) Acc@1 58.203 (64.102) Acc@5 83.105 (86.865) Mem 8977MB [2024-07-29 16:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.155) Loss 2.1504 (1.7406) Acc@1 52.002 (60.410) Acc@5 77.637 (83.596) Mem 8977MB [2024-07-29 16:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 60.075 Acc@5 83.537 [2024-07-29 16:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 60.1% [2024-07-29 16:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 60.07% [2024-07-29 16:46:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 16:46:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 16:47:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][0/625] eta 0:20:25 lr 0.001970 wd 0.0500 time 1.9614 (1.9614) data time 0.3570 (0.3570) model time 0.0000 (0.0000) loss 2.5276 (2.5276) grad_norm 1.0286 (1.0286) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 16:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][10/625] eta 0:03:43 lr 0.001970 wd 0.0500 time 0.2138 (0.3637) data time 0.0008 (0.0333) model time 0.0000 (0.0000) loss 4.2940 (3.4815) grad_norm 1.2910 (1.1414) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][20/625] eta 0:02:53 lr 0.001970 wd 0.0500 time 0.2059 (0.2866) data time 0.0007 (0.0179) model time 0.0000 (0.0000) loss 4.2998 (3.7067) grad_norm 0.8933 (1.2131) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][30/625] eta 0:02:34 lr 0.001970 wd 0.0500 time 0.2054 (0.2598) data time 0.0006 (0.0124) model time 0.0000 (0.0000) loss 3.2686 (3.6165) grad_norm 1.2114 (1.3328) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][40/625] eta 0:02:51 lr 0.001970 wd 0.0500 time 0.1973 (0.2930) data time 0.0009 (0.0096) model time 0.0000 (0.0000) loss 3.8034 (3.6241) grad_norm 0.9117 (1.2859) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][50/625] eta 0:02:39 lr 0.001970 wd 0.0500 time 0.2442 (0.2768) data time 0.0011 (0.0079) model time 0.0000 (0.0000) loss 3.8722 (3.6041) grad_norm 1.7180 (1.2728) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][60/625] eta 0:02:29 lr 0.001970 wd 0.0500 time 0.2020 (0.2652) data time 0.0010 (0.0068) model time 0.2010 (0.2050) loss 3.6963 (3.5746) grad_norm 1.2629 (1.2706) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][70/625] eta 0:02:22 lr 0.001970 wd 0.0500 time 0.2001 (0.2562) data time 0.0007 (0.0060) model time 0.1994 (0.2024) loss 3.7128 (3.5584) grad_norm 0.8869 (1.2708) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][80/625] eta 0:02:18 lr 0.001970 wd 0.0500 time 0.2294 (0.2537) data time 0.0010 (0.0054) model time 0.2284 (0.2132) loss 2.9710 (3.5628) grad_norm 1.0936 (1.2539) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][90/625] eta 0:02:12 lr 0.001970 wd 0.0500 time 0.2063 (0.2484) data time 0.0008 (0.0049) model time 0.2055 (0.2111) loss 4.0130 (3.5730) grad_norm 1.2032 (1.2545) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][100/625] eta 0:02:08 lr 0.001970 wd 0.0500 time 0.1996 (0.2440) data time 0.0012 (0.0045) model time 0.1984 (0.2094) loss 3.9806 (3.5485) grad_norm 1.4101 (1.2623) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][110/625] eta 0:02:03 lr 0.001970 wd 0.0500 time 0.2125 (0.2404) data time 0.0007 (0.0042) model time 0.2119 (0.2084) loss 3.6658 (3.5469) grad_norm 0.8630 (1.2445) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][120/625] eta 0:01:59 lr 0.001969 wd 0.0500 time 0.2033 (0.2373) data time 0.0007 (0.0039) model time 0.2027 (0.2075) loss 2.9391 (3.5270) grad_norm 0.8736 (1.2287) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][130/625] eta 0:01:56 lr 0.001969 wd 0.0500 time 0.2004 (0.2347) data time 0.0011 (0.0037) model time 0.1993 (0.2068) loss 2.3475 (3.5233) grad_norm 0.9130 (1.2322) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][140/625] eta 0:01:52 lr 0.001969 wd 0.0500 time 0.2019 (0.2325) data time 0.0009 (0.0035) model time 0.2010 (0.2064) loss 2.6398 (3.5223) grad_norm 1.2529 (1.2182) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][150/625] eta 0:01:49 lr 0.001969 wd 0.0500 time 0.2184 (0.2307) data time 0.0008 (0.0033) model time 0.2176 (0.2062) loss 2.6938 (3.5182) grad_norm 1.1329 (1.2222) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][160/625] eta 0:01:46 lr 0.001969 wd 0.0500 time 0.2034 (0.2289) data time 0.0008 (0.0032) model time 0.2027 (0.2057) loss 3.8362 (3.5281) grad_norm 2.8857 (1.2300) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][170/625] eta 0:01:43 lr 0.001969 wd 0.0500 time 0.2077 (0.2274) data time 0.0008 (0.0030) model time 0.2069 (0.2054) loss 3.9084 (3.5087) grad_norm 1.8032 (1.2349) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][180/625] eta 0:01:40 lr 0.001969 wd 0.0500 time 0.1975 (0.2260) data time 0.0007 (0.0029) model time 0.1968 (0.2050) loss 3.4624 (3.4907) grad_norm 1.5415 (1.2278) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][190/625] eta 0:01:37 lr 0.001969 wd 0.0500 time 0.2178 (0.2253) data time 0.0008 (0.0028) model time 0.2170 (0.2055) loss 4.6085 (3.4882) grad_norm 0.8986 (1.2249) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][200/625] eta 0:01:35 lr 0.001969 wd 0.0500 time 0.2074 (0.2243) data time 0.0008 (0.0027) model time 0.2066 (0.2055) loss 3.9049 (3.4901) grad_norm 0.9918 (1.2245) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][210/625] eta 0:01:32 lr 0.001969 wd 0.0500 time 0.2026 (0.2233) data time 0.0009 (0.0027) model time 0.2017 (0.2053) loss 3.1790 (3.4841) grad_norm 1.0053 (1.2236) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][220/625] eta 0:01:30 lr 0.001969 wd 0.0500 time 0.2020 (0.2224) data time 0.0009 (0.0026) model time 0.2011 (0.2051) loss 3.1970 (3.4775) grad_norm 1.7982 (1.2257) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][230/625] eta 0:01:27 lr 0.001969 wd 0.0500 time 0.2022 (0.2227) data time 0.0006 (0.0025) model time 0.2017 (0.2064) loss 4.0649 (3.4815) grad_norm 1.1451 (1.2211) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][240/625] eta 0:01:25 lr 0.001969 wd 0.0500 time 0.2042 (0.2219) data time 0.0011 (0.0024) model time 0.2032 (0.2062) loss 4.0057 (3.4992) grad_norm 0.8803 (1.2118) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][250/625] eta 0:01:22 lr 0.001969 wd 0.0500 time 0.1983 (0.2212) data time 0.0010 (0.0024) model time 0.1973 (0.2060) loss 3.8896 (3.5006) grad_norm 1.3668 (1.2232) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][260/625] eta 0:01:20 lr 0.001969 wd 0.0500 time 0.2059 (0.2205) data time 0.0008 (0.0023) model time 0.2051 (0.2059) loss 4.0007 (3.5079) grad_norm 1.9476 (1.2311) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:47:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][270/625] eta 0:01:18 lr 0.001969 wd 0.0500 time 0.2056 (0.2198) data time 0.0006 (0.0023) model time 0.2050 (0.2056) loss 3.9911 (3.5146) grad_norm 1.5492 (1.2384) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][280/625] eta 0:01:15 lr 0.001969 wd 0.0500 time 0.2027 (0.2192) data time 0.0009 (0.0022) model time 0.2018 (0.2055) loss 3.9431 (3.5150) grad_norm 0.9790 (1.2348) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][290/625] eta 0:01:13 lr 0.001969 wd 0.0500 time 0.2067 (0.2188) data time 0.0008 (0.0022) model time 0.2059 (0.2055) loss 3.6380 (3.5112) grad_norm 1.4290 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][300/625] eta 0:01:10 lr 0.001969 wd 0.0500 time 0.2020 (0.2183) data time 0.0009 (0.0021) model time 0.2011 (0.2053) loss 2.5242 (3.5081) grad_norm 1.3674 (1.2357) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][310/625] eta 0:01:09 lr 0.001969 wd 0.0500 time 0.2043 (0.2191) data time 0.0007 (0.0021) model time 0.2035 (0.2067) loss 3.5828 (3.5066) grad_norm 1.1013 (1.2298) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][320/625] eta 0:01:06 lr 0.001969 wd 0.0500 time 0.2001 (0.2186) data time 0.0010 (0.0021) model time 0.1991 (0.2066) loss 3.5436 (3.5164) grad_norm 0.7887 (1.2245) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][330/625] eta 0:01:04 lr 0.001969 wd 0.0500 time 0.2020 (0.2181) data time 0.0009 (0.0020) model time 0.2011 (0.2064) loss 3.8906 (3.5222) grad_norm 2.8501 (1.2361) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][340/625] eta 0:01:02 lr 0.001968 wd 0.0500 time 0.2033 (0.2177) data time 0.0008 (0.0020) model time 0.2025 (0.2063) loss 2.7726 (3.5199) grad_norm 0.9157 (1.2423) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][350/625] eta 0:00:59 lr 0.001968 wd 0.0500 time 0.2125 (0.2175) data time 0.0008 (0.0020) model time 0.2118 (0.2064) loss 4.1492 (3.5328) grad_norm 1.0544 (1.2354) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][360/625] eta 0:00:57 lr 0.001968 wd 0.0500 time 0.4320 (0.2177) data time 0.0006 (0.0019) model time 0.4313 (0.2070) loss 3.5716 (3.5365) grad_norm 0.8902 (1.2290) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][370/625] eta 0:00:55 lr 0.001968 wd 0.0500 time 0.2324 (0.2180) data time 0.0008 (0.0019) model time 0.2316 (0.2077) loss 2.7731 (3.5305) grad_norm 1.7935 (1.2305) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][380/625] eta 0:00:53 lr 0.001968 wd 0.0500 time 0.1992 (0.2177) data time 0.0008 (0.0019) model time 0.1984 (0.2076) loss 3.3076 (3.5256) grad_norm 0.9069 (1.2300) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][390/625] eta 0:00:51 lr 0.001968 wd 0.0500 time 0.2023 (0.2173) data time 0.0006 (0.0019) model time 0.2017 (0.2074) loss 3.4892 (3.5190) grad_norm 1.1474 (1.2248) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][400/625] eta 0:00:48 lr 0.001968 wd 0.0500 time 0.1993 (0.2169) data time 0.0008 (0.0018) model time 0.1986 (0.2072) loss 4.0395 (3.5190) grad_norm 0.7084 (1.2280) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][410/625] eta 0:00:46 lr 0.001968 wd 0.0500 time 0.2041 (0.2166) data time 0.0009 (0.0018) model time 0.2032 (0.2071) loss 2.3945 (3.5207) grad_norm 0.9300 (1.2288) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][420/625] eta 0:00:44 lr 0.001968 wd 0.0500 time 0.2322 (0.2163) data time 0.0009 (0.0018) model time 0.2313 (0.2070) loss 3.0256 (3.5210) grad_norm 1.0600 (1.2311) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][430/625] eta 0:00:42 lr 0.001968 wd 0.0500 time 0.2079 (0.2160) data time 0.0007 (0.0018) model time 0.2073 (0.2069) loss 4.0643 (3.5310) grad_norm 1.8315 (1.2306) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][440/625] eta 0:00:39 lr 0.001968 wd 0.0500 time 0.1985 (0.2158) data time 0.0009 (0.0018) model time 0.1977 (0.2069) loss 3.9181 (3.5280) grad_norm 0.7213 (1.2345) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 16:48:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:48:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:50:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:50:51 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 42) [2024-07-29 16:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 16:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][450/625] eta 0:03:14 lr 0.001968 wd 0.0500 time 0.1975 (1.1123) data time 0.0009 (0.2203) model time 0.1966 (0.8920) loss 3.8378 (3.9688) grad_norm 1.4567 (1.1814) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][460/625] eta 0:01:48 lr 0.001968 wd 0.0500 time 0.2057 (0.6568) data time 0.0007 (0.1106) model time 0.2049 (0.5462) loss 4.1609 (3.8639) grad_norm 0.9087 (1.1662) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][470/625] eta 0:01:22 lr 0.001968 wd 0.0500 time 0.1999 (0.5294) data time 0.0012 (0.0741) model time 0.1988 (0.4553) loss 4.0153 (3.8661) grad_norm 0.7341 (1.1293) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][480/625] eta 0:01:08 lr 0.001968 wd 0.0500 time 0.1965 (0.4706) data time 0.0007 (0.0558) model time 0.1958 (0.4148) loss 2.9768 (3.7705) grad_norm 1.0465 (1.1566) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][490/625] eta 0:01:08 lr 0.001968 wd 0.0500 time 0.2016 (0.5055) data time 0.0011 (0.0448) model time 0.2005 (0.4607) loss 3.2063 (3.7433) grad_norm 1.2941 (1.2166) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][500/625] eta 0:00:56 lr 0.001968 wd 0.0500 time 0.2076 (0.4549) data time 0.0007 (0.0375) model time 0.2069 (0.4174) loss 4.0317 (3.7342) grad_norm 1.7661 (1.2275) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][510/625] eta 0:00:48 lr 0.001968 wd 0.0500 time 0.2005 (0.4188) data time 0.0007 (0.0323) model time 0.1997 (0.3865) loss 2.8862 (3.6849) grad_norm 0.9749 (1.2481) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][520/625] eta 0:00:41 lr 0.001968 wd 0.0500 time 0.2016 (0.3918) data time 0.0011 (0.0284) model time 0.2005 (0.3635) loss 3.9839 (3.6641) grad_norm 0.8522 (1.2533) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][530/625] eta 0:00:35 lr 0.001968 wd 0.0500 time 0.1965 (0.3708) data time 0.0008 (0.0253) model time 0.1957 (0.3455) loss 4.3503 (3.6449) grad_norm 1.0016 (1.2471) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][540/625] eta 0:00:30 lr 0.001968 wd 0.0500 time 0.2028 (0.3538) data time 0.0010 (0.0229) model time 0.2018 (0.3309) loss 3.6453 (3.6480) grad_norm 0.8205 (1.2439) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][550/625] eta 0:00:25 lr 0.001968 wd 0.0500 time 0.1985 (0.3400) data time 0.0010 (0.0209) model time 0.1975 (0.3192) loss 3.4598 (3.6614) grad_norm 1.0677 (1.2553) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][560/625] eta 0:00:21 lr 0.001968 wd 0.0500 time 0.1979 (0.3288) data time 0.0013 (0.0192) model time 0.1966 (0.3096) loss 4.3370 (3.6647) grad_norm 0.8611 (1.2458) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][570/625] eta 0:00:17 lr 0.001967 wd 0.0500 time 0.2030 (0.3191) data time 0.0008 (0.0178) model time 0.2022 (0.3013) loss 3.5687 (3.6411) grad_norm 0.7674 (1.2195) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][580/625] eta 0:00:13 lr 0.001967 wd 0.0500 time 0.1998 (0.3109) data time 0.0007 (0.0166) model time 0.1991 (0.2943) loss 2.2862 (3.6310) grad_norm 0.9971 (1.2137) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][590/625] eta 0:00:10 lr 0.001967 wd 0.0500 time 0.2007 (0.3038) data time 0.0009 (0.0156) model time 0.1998 (0.2882) loss 3.6413 (3.6245) grad_norm 1.5676 (1.2072) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][600/625] eta 0:00:07 lr 0.001967 wd 0.0500 time 0.2100 (0.2977) data time 0.0011 (0.0147) model time 0.2089 (0.2830) loss 3.6432 (3.6216) grad_norm 2.4253 (1.2332) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][610/625] eta 0:00:04 lr 0.001967 wd 0.0500 time 0.1976 (0.2921) data time 0.0005 (0.0139) model time 0.1970 (0.2782) loss 2.5341 (3.6243) grad_norm 0.7885 (1.2389) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][620/625] eta 0:00:01 lr 0.001967 wd 0.0500 time 0.1987 (0.2870) data time 0.0004 (0.0131) model time 0.1983 (0.2738) loss 2.8269 (3.6065) grad_norm 1.0349 (1.2236) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 16:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 42 training takes 0:00:52 [2024-07-29 16:52:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:52:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.982 (1.982) Loss 0.7871 (0.7871) Acc@1 84.424 (84.424) Acc@5 96.924 (96.924) Mem 8977MB [2024-07-29 16:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.233) Loss 1.3906 (1.0030) Acc@1 71.875 (79.199) Acc@5 90.137 (95.206) Mem 8977MB [2024-07-29 16:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.148) Loss 1.4873 (1.1998) Acc@1 67.725 (74.681) Acc@5 90.088 (92.711) Mem 8977MB [2024-07-29 16:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.252 Acc@5 92.588 [2024-07-29 16:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.3% [2024-07-29 16:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 2.070 (2.070) Loss 1.1797 (1.1797) Acc@1 72.998 (72.998) Acc@5 90.869 (90.869) Mem 8977MB [2024-07-29 16:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.242) Loss 1.7432 (1.4316) Acc@1 59.717 (66.016) Acc@5 84.570 (88.312) Mem 8977MB [2024-07-29 16:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.152) Loss 2.0508 (1.6403) Acc@1 54.297 (62.291) Acc@5 79.395 (85.093) Mem 8977MB [2024-07-29 16:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 61.954 Acc@5 85.029 [2024-07-29 16:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 62.0% [2024-07-29 16:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 61.95% [2024-07-29 16:52:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 16:52:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 16:52:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][0/625] eta 0:23:24 lr 0.001967 wd 0.0500 time 2.2474 (2.2474) data time 1.9751 (1.9751) model time 0.0000 (0.0000) loss 3.7026 (3.7026) grad_norm 1.2174 (1.2174) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 16:52:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][10/625] eta 0:04:00 lr 0.001967 wd 0.0500 time 0.2499 (0.3913) data time 0.0009 (0.1804) model time 0.0000 (0.0000) loss 2.4936 (3.4968) grad_norm 0.9935 (1.2656) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][20/625] eta 0:03:02 lr 0.001967 wd 0.0500 time 0.2131 (0.3022) data time 0.0009 (0.0950) model time 0.0000 (0.0000) loss 3.1436 (3.4017) grad_norm 0.8249 (1.1411) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][30/625] eta 0:02:40 lr 0.001967 wd 0.0500 time 0.1986 (0.2694) data time 0.0010 (0.0647) model time 0.0000 (0.0000) loss 3.7269 (3.4320) grad_norm 1.1711 (1.2151) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][40/625] eta 0:03:03 lr 0.001967 wd 0.0500 time 0.2090 (0.3144) data time 0.0008 (0.0491) model time 0.0000 (0.0000) loss 3.8005 (3.4700) grad_norm 1.7212 (1.2938) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][50/625] eta 0:02:54 lr 0.001967 wd 0.0500 time 0.2046 (0.3034) data time 0.0011 (0.0397) model time 0.0000 (0.0000) loss 3.5763 (3.4330) grad_norm 1.4612 (1.3167) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][60/625] eta 0:02:41 lr 0.001967 wd 0.0500 time 0.1977 (0.2867) data time 0.0010 (0.0333) model time 0.1967 (0.2010) loss 3.8120 (3.4610) grad_norm 1.0857 (1.2900) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][70/625] eta 0:02:32 lr 0.001967 wd 0.0500 time 0.2012 (0.2746) data time 0.0007 (0.0287) model time 0.2006 (0.2005) loss 3.3205 (3.4294) grad_norm 1.1294 (1.2830) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][80/625] eta 0:02:25 lr 0.001967 wd 0.0500 time 0.2153 (0.2662) data time 0.0006 (0.0253) model time 0.2147 (0.2020) loss 2.7151 (3.4093) grad_norm 1.5418 (1.2817) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][90/625] eta 0:02:18 lr 0.001967 wd 0.0500 time 0.1991 (0.2592) data time 0.0009 (0.0226) model time 0.1981 (0.2020) loss 3.8678 (3.4370) grad_norm 1.5628 (1.2831) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 16:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 16:52:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 16:52:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 16:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 16:56:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 16:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 17:12:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 17:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 17:12:20 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 18:36:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 18:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 18:36:24 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 43) [2024-07-29 18:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 18:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][100/625] eta 0:16:41 lr 0.001967 wd 0.0500 time 0.2110 (1.9077) data time 0.0007 (0.1461) model time 0.2102 (1.7616) loss 4.2541 (4.0598) grad_norm 1.4560 (1.0073) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][110/625] eta 0:06:40 lr 0.001967 wd 0.0500 time 0.2195 (0.7783) data time 0.0011 (0.0494) model time 0.2184 (0.7289) loss 4.1952 (3.8201) grad_norm 1.3243 (1.6317) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][120/625] eta 0:04:38 lr 0.001967 wd 0.0500 time 0.2074 (0.5506) data time 0.0011 (0.0301) model time 0.2064 (0.5205) loss 3.8454 (3.8566) grad_norm 1.5581 (1.4707) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][130/625] eta 0:03:44 lr 0.001967 wd 0.0500 time 0.2141 (0.4540) data time 0.0010 (0.0218) model time 0.2131 (0.4322) loss 3.4299 (3.8116) grad_norm 0.9073 (1.3439) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][140/625] eta 0:03:14 lr 0.001967 wd 0.0500 time 0.2142 (0.4005) data time 0.0010 (0.0172) model time 0.2132 (0.3833) loss 3.5618 (3.7484) grad_norm 1.7018 (1.3718) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:36:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][150/625] eta 0:02:53 lr 0.001967 wd 0.0500 time 0.2073 (0.3659) data time 0.0008 (0.0143) model time 0.2065 (0.3517) loss 2.6891 (3.7243) grad_norm 0.7846 (1.3665) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][160/625] eta 0:02:39 lr 0.001966 wd 0.0500 time 0.2141 (0.3425) data time 0.0009 (0.0122) model time 0.2132 (0.3303) loss 4.0979 (3.6969) grad_norm 0.9830 (1.3241) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][170/625] eta 0:02:27 lr 0.001966 wd 0.0500 time 0.1991 (0.3253) data time 0.0011 (0.0107) model time 0.1980 (0.3145) loss 2.8459 (3.6336) grad_norm 1.1882 (1.2784) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][180/625] eta 0:02:18 lr 0.001966 wd 0.0500 time 0.2121 (0.3120) data time 0.0007 (0.0096) model time 0.2114 (0.3023) loss 3.6710 (3.6239) grad_norm 1.0246 (1.2433) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][190/625] eta 0:02:11 lr 0.001966 wd 0.0500 time 0.2039 (0.3015) data time 0.0010 (0.0087) model time 0.2029 (0.2928) loss 4.0005 (3.6289) grad_norm 1.0073 (1.2173) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][200/625] eta 0:02:04 lr 0.001966 wd 0.0500 time 0.2068 (0.2930) data time 0.0012 (0.0080) model time 0.2055 (0.2850) loss 3.3990 (3.6544) grad_norm 1.5541 (1.2189) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][210/625] eta 0:01:58 lr 0.001966 wd 0.0500 time 0.2063 (0.2861) data time 0.0010 (0.0074) model time 0.2053 (0.2787) loss 2.8568 (3.6391) grad_norm 1.4679 (1.2202) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][220/625] eta 0:01:53 lr 0.001966 wd 0.0500 time 0.2191 (0.2804) data time 0.0007 (0.0069) model time 0.2184 (0.2735) loss 3.2388 (3.6354) grad_norm 0.9360 (1.2159) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][230/625] eta 0:01:48 lr 0.001966 wd 0.0500 time 0.2048 (0.2755) data time 0.0010 (0.0065) model time 0.2038 (0.2690) loss 3.3699 (3.6241) grad_norm 1.1052 (1.2118) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][240/625] eta 0:01:44 lr 0.001966 wd 0.0500 time 0.2083 (0.2712) data time 0.0011 (0.0061) model time 0.2072 (0.2651) loss 3.5911 (3.6150) grad_norm 1.8982 (1.2187) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][250/625] eta 0:01:40 lr 0.001966 wd 0.0500 time 0.2124 (0.2676) data time 0.0010 (0.0058) model time 0.2114 (0.2618) loss 3.7069 (3.6038) grad_norm 1.3224 (1.2182) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][260/625] eta 0:01:36 lr 0.001966 wd 0.0500 time 0.2071 (0.2642) data time 0.0010 (0.0055) model time 0.2061 (0.2587) loss 3.9811 (3.6024) grad_norm 1.1582 (1.2207) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][270/625] eta 0:01:32 lr 0.001966 wd 0.0500 time 0.2205 (0.2613) data time 0.0009 (0.0053) model time 0.2196 (0.2561) loss 3.4960 (3.5877) grad_norm 1.0180 (1.2130) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][280/625] eta 0:01:29 lr 0.001966 wd 0.0500 time 0.2105 (0.2586) data time 0.0012 (0.0050) model time 0.2093 (0.2536) loss 3.7851 (3.5864) grad_norm 1.0978 (1.2049) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][290/625] eta 0:01:25 lr 0.001966 wd 0.0500 time 0.2085 (0.2564) data time 0.0010 (0.0048) model time 0.2076 (0.2515) loss 2.5764 (3.5816) grad_norm 1.1491 (1.2031) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][300/625] eta 0:01:22 lr 0.001966 wd 0.0500 time 0.2140 (0.2544) data time 0.0010 (0.0046) model time 0.2130 (0.2497) loss 2.9693 (3.5682) grad_norm 3.6805 (1.2252) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][310/625] eta 0:01:19 lr 0.001966 wd 0.0500 time 0.2110 (0.2525) data time 0.0010 (0.0045) model time 0.2100 (0.2480) loss 3.2516 (3.5617) grad_norm 1.1694 (1.2270) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][320/625] eta 0:01:16 lr 0.001966 wd 0.0500 time 0.2283 (0.2510) data time 0.0008 (0.0043) model time 0.2274 (0.2466) loss 3.6301 (3.5637) grad_norm 1.4540 (1.2290) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][330/625] eta 0:01:13 lr 0.001966 wd 0.0500 time 0.2154 (0.2493) data time 0.0009 (0.0042) model time 0.2145 (0.2452) loss 3.9762 (3.5565) grad_norm 2.4564 (1.2351) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][340/625] eta 0:01:10 lr 0.001966 wd 0.0500 time 0.2161 (0.2479) data time 0.0010 (0.0041) model time 0.2151 (0.2439) loss 3.6645 (3.5583) grad_norm 0.9233 (1.2457) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][350/625] eta 0:01:07 lr 0.001966 wd 0.0500 time 0.2048 (0.2467) data time 0.0008 (0.0040) model time 0.2040 (0.2426) loss 3.2792 (3.5470) grad_norm 1.3313 (1.2399) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][360/625] eta 0:01:05 lr 0.001966 wd 0.0500 time 0.2196 (0.2454) data time 0.0010 (0.0039) model time 0.2186 (0.2415) loss 2.8220 (3.5382) grad_norm 1.4114 (1.2307) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][370/625] eta 0:01:02 lr 0.001966 wd 0.0500 time 0.2037 (0.2442) data time 0.0011 (0.0038) model time 0.2026 (0.2403) loss 3.7004 (3.5416) grad_norm 0.9850 (1.2438) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][380/625] eta 0:00:59 lr 0.001965 wd 0.0500 time 0.2082 (0.2430) data time 0.0011 (0.0037) model time 0.2071 (0.2393) loss 3.6701 (3.5431) grad_norm 0.7885 (1.2441) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][390/625] eta 0:00:56 lr 0.001965 wd 0.0500 time 0.2220 (0.2421) data time 0.0011 (0.0036) model time 0.2210 (0.2384) loss 3.6342 (3.5351) grad_norm 1.4611 (1.2392) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][400/625] eta 0:00:54 lr 0.001965 wd 0.0500 time 0.2085 (0.2411) data time 0.0009 (0.0036) model time 0.2076 (0.2376) loss 2.8931 (3.5279) grad_norm 1.4525 (1.2425) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][410/625] eta 0:00:51 lr 0.001965 wd 0.0500 time 0.2135 (0.2403) data time 0.0009 (0.0035) model time 0.2126 (0.2368) loss 4.2495 (3.5345) grad_norm 2.5508 (1.2524) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][420/625] eta 0:00:49 lr 0.001965 wd 0.0500 time 0.2092 (0.2394) data time 0.0009 (0.0034) model time 0.2082 (0.2360) loss 4.1577 (3.5441) grad_norm 1.7340 (1.2605) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][430/625] eta 0:00:46 lr 0.001965 wd 0.0500 time 0.2167 (0.2388) data time 0.0007 (0.0034) model time 0.2160 (0.2355) loss 4.0899 (3.5372) grad_norm 1.1102 (1.2619) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][440/625] eta 0:00:44 lr 0.001965 wd 0.0500 time 0.2120 (0.2382) data time 0.0008 (0.0033) model time 0.2112 (0.2350) loss 3.1335 (3.5376) grad_norm 1.1547 (1.2607) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][450/625] eta 0:00:41 lr 0.001965 wd 0.0500 time 0.2101 (0.2376) data time 0.0010 (0.0032) model time 0.2091 (0.2344) loss 3.2053 (3.5418) grad_norm 0.9243 (1.2598) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:38:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 18:38:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 18:38:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 18:56:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 18:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 18:56:22 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 43) [2024-07-29 18:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 18:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][460/625] eta 0:02:54 lr 0.001965 wd 0.0500 time 0.2130 (1.0598) data time 0.0010 (0.1222) model time 0.2120 (0.9375) loss 3.9649 (4.0545) grad_norm 1.3579 (1.1473) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][470/625] eta 0:01:38 lr 0.001965 wd 0.0500 time 0.2123 (0.6352) data time 0.0009 (0.0617) model time 0.2114 (0.5735) loss 4.1324 (3.8695) grad_norm 1.6069 (1.1453) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:56:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][480/625] eta 0:01:11 lr 0.001965 wd 0.0500 time 0.2199 (0.4945) data time 0.0011 (0.0415) model time 0.2188 (0.4530) loss 4.1234 (3.9020) grad_norm 0.9690 (1.1078) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][490/625] eta 0:00:57 lr 0.001965 wd 0.0500 time 0.2059 (0.4234) data time 0.0009 (0.0314) model time 0.2050 (0.3920) loss 2.7786 (3.7737) grad_norm 1.2095 (1.1443) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:56:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][500/625] eta 0:00:47 lr 0.001965 wd 0.0500 time 0.2104 (0.3812) data time 0.0012 (0.0254) model time 0.2093 (0.3558) loss 3.3373 (3.7362) grad_norm 1.4703 (1.1787) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 18:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][510/625] eta 0:00:40 lr 0.001965 wd 0.0500 time 0.2076 (0.3531) data time 0.0007 (0.0213) model time 0.2068 (0.3318) loss 3.7369 (3.7033) grad_norm 0.8745 (1.1778) loss_scale 16384.0000 (9147.7333) mem 8977MB [2024-07-29 18:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 18:57:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 18:57:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:14:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 19:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 19:14:34 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 43) [2024-07-29 19:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 19:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][520/625] eta 0:02:57 lr 0.001965 wd 0.0500 time 0.2150 (1.6867) data time 0.0009 (0.1599) model time 0.2141 (1.5268) loss 4.4883 (4.1678) grad_norm 1.0700 (1.3337) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][530/625] eta 0:01:12 lr 0.001965 wd 0.0500 time 0.2054 (0.7653) data time 0.0011 (0.0606) model time 0.2044 (0.7046) loss 3.9256 (3.8549) grad_norm 0.7876 (1.0787) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][540/625] eta 0:00:46 lr 0.001965 wd 0.0500 time 0.2018 (0.5519) data time 0.0008 (0.0377) model time 0.2010 (0.5141) loss 3.6023 (3.8731) grad_norm 1.0588 (1.0517) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][550/625] eta 0:00:34 lr 0.001965 wd 0.0500 time 0.2023 (0.4575) data time 0.0010 (0.0275) model time 0.2013 (0.4300) loss 3.8490 (3.8269) grad_norm 0.9102 (1.0506) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][560/625] eta 0:00:26 lr 0.001965 wd 0.0500 time 0.2089 (0.4038) data time 0.0009 (0.0218) model time 0.2080 (0.3820) loss 3.1521 (3.7605) grad_norm 0.8855 (1.0385) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][570/625] eta 0:00:20 lr 0.001965 wd 0.0500 time 0.2134 (0.3698) data time 0.0007 (0.0181) model time 0.2128 (0.3517) loss 4.1896 (3.7483) grad_norm 0.9399 (1.0700) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][580/625] eta 0:00:15 lr 0.001965 wd 0.0500 time 0.2110 (0.3459) data time 0.0007 (0.0155) model time 0.2103 (0.3304) loss 3.0548 (3.7182) grad_norm 1.0279 (1.1074) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][590/625] eta 0:00:11 lr 0.001964 wd 0.0500 time 0.2092 (0.3281) data time 0.0012 (0.0136) model time 0.2080 (0.3145) loss 3.8375 (3.6744) grad_norm 1.3311 (1.1269) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][600/625] eta 0:00:07 lr 0.001964 wd 0.0500 time 0.2131 (0.3146) data time 0.0008 (0.0121) model time 0.2122 (0.3025) loss 3.0125 (3.6397) grad_norm 0.9465 (1.1125) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][610/625] eta 0:00:04 lr 0.001964 wd 0.0500 time 0.2091 (0.3047) data time 0.0005 (0.0113) model time 0.2086 (0.2934) loss 3.8520 (3.6474) grad_norm 2.0663 (1.1681) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][620/625] eta 0:00:01 lr 0.001964 wd 0.0500 time 0.2081 (0.2958) data time 0.0007 (0.0103) model time 0.2073 (0.2855) loss 4.1322 (3.6712) grad_norm 0.8227 (1.1694) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 43 training takes 0:00:32 [2024-07-29 19:15:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:15:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.505 (0.505) Loss 0.8081 (0.8081) Acc@1 85.107 (85.107) Acc@5 97.217 (97.217) Mem 8977MB [2024-07-29 19:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.064 (0.105) Loss 1.3984 (1.0103) Acc@1 70.459 (79.239) Acc@5 90.918 (95.215) Mem 8977MB [2024-07-29 19:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.5439 (1.2162) Acc@1 66.064 (74.616) Acc@5 88.379 (92.664) Mem 8977MB [2024-07-29 19:15:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.378 Acc@5 92.638 [2024-07-29 19:15:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.4% [2024-07-29 19:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.005 (1.005) Loss 1.0957 (1.0957) Acc@1 74.756 (74.756) Acc@5 91.846 (91.846) Mem 8977MB [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.148) Loss 1.6611 (1.3437) Acc@1 60.742 (67.707) Acc@5 85.645 (89.378) Mem 8977MB [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.104) Loss 1.9639 (1.5518) Acc@1 55.908 (63.937) Acc@5 81.006 (86.254) Mem 8977MB [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 63.572 Acc@5 86.168 [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 63.6% [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 63.57% [2024-07-29 19:15:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 19:15:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 19:15:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][0/625] eta 0:08:25 lr 0.001964 wd 0.0500 time 0.8093 (0.8093) data time 0.5089 (0.5089) model time 0.0000 (0.0000) loss 2.9091 (2.9091) grad_norm 1.0072 (1.0072) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 19:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][10/625] eta 0:02:45 lr 0.001964 wd 0.0500 time 0.2097 (0.2691) data time 0.0008 (0.0474) model time 0.0000 (0.0000) loss 2.5195 (3.5614) grad_norm 1.5098 (1.2977) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][20/625] eta 0:02:27 lr 0.001964 wd 0.0500 time 0.2076 (0.2437) data time 0.0011 (0.0253) model time 0.0000 (0.0000) loss 3.3727 (3.4940) grad_norm 2.0922 (1.3437) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][30/625] eta 0:02:18 lr 0.001964 wd 0.0500 time 0.2104 (0.2330) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 3.9500 (3.4947) grad_norm 1.1616 (1.2195) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][40/625] eta 0:02:13 lr 0.001964 wd 0.0500 time 0.2060 (0.2275) data time 0.0011 (0.0135) model time 0.0000 (0.0000) loss 3.2418 (3.4962) grad_norm 1.4916 (1.2483) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][50/625] eta 0:02:09 lr 0.001964 wd 0.0500 time 0.2066 (0.2245) data time 0.0011 (0.0110) model time 0.0000 (0.0000) loss 3.4087 (3.5205) grad_norm 1.4968 (1.2520) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][60/625] eta 0:02:06 lr 0.001964 wd 0.0500 time 0.2196 (0.2237) data time 0.0010 (0.0094) model time 0.2186 (0.2184) loss 3.6166 (3.4962) grad_norm 1.7153 (1.2382) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][70/625] eta 0:02:03 lr 0.001964 wd 0.0500 time 0.2129 (0.2223) data time 0.0012 (0.0082) model time 0.2116 (0.2156) loss 3.7985 (3.4629) grad_norm 1.4976 (1.2438) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][80/625] eta 0:02:01 lr 0.001964 wd 0.0500 time 0.2068 (0.2224) data time 0.0010 (0.0074) model time 0.2058 (0.2178) loss 3.4168 (3.4812) grad_norm 0.9329 (1.2270) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][90/625] eta 0:01:59 lr 0.001964 wd 0.0500 time 0.2261 (0.2238) data time 0.0010 (0.0068) model time 0.2252 (0.2215) loss 3.1698 (3.4635) grad_norm 1.5721 (1.2352) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][100/625] eta 0:01:56 lr 0.001964 wd 0.0500 time 0.2157 (0.2227) data time 0.0010 (0.0062) model time 0.2146 (0.2196) loss 3.7301 (3.4663) grad_norm 0.9972 (1.2305) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][110/625] eta 0:01:54 lr 0.001964 wd 0.0500 time 0.2131 (0.2216) data time 0.0009 (0.0058) model time 0.2123 (0.2180) loss 3.5687 (3.4655) grad_norm 0.8988 (1.2180) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][120/625] eta 0:01:51 lr 0.001964 wd 0.0500 time 0.2088 (0.2209) data time 0.0009 (0.0054) model time 0.2079 (0.2170) loss 2.2684 (3.4671) grad_norm 1.0965 (1.1999) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][130/625] eta 0:01:49 lr 0.001964 wd 0.0500 time 0.2157 (0.2210) data time 0.0008 (0.0051) model time 0.2149 (0.2174) loss 3.6751 (3.4686) grad_norm 0.8553 (1.1949) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][140/625] eta 0:01:46 lr 0.001964 wd 0.0500 time 0.2165 (0.2206) data time 0.0008 (0.0048) model time 0.2157 (0.2170) loss 3.5245 (3.4580) grad_norm 1.1619 (1.1848) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][150/625] eta 0:01:44 lr 0.001964 wd 0.0500 time 0.2119 (0.2201) data time 0.0009 (0.0046) model time 0.2111 (0.2166) loss 3.7945 (3.4491) grad_norm 1.4920 (1.1958) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][160/625] eta 0:01:42 lr 0.001964 wd 0.0500 time 0.2088 (0.2203) data time 0.0008 (0.0044) model time 0.2080 (0.2171) loss 3.8162 (3.4462) grad_norm 0.9499 (1.1900) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][170/625] eta 0:01:40 lr 0.001964 wd 0.0500 time 0.2168 (0.2200) data time 0.0009 (0.0042) model time 0.2159 (0.2169) loss 3.6261 (3.4599) grad_norm 1.1503 (1.1872) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][180/625] eta 0:01:37 lr 0.001963 wd 0.0500 time 0.2024 (0.2197) data time 0.0010 (0.0040) model time 0.2014 (0.2165) loss 2.4813 (3.4550) grad_norm 1.2977 (1.1797) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][190/625] eta 0:01:35 lr 0.001963 wd 0.0500 time 0.2190 (0.2194) data time 0.0010 (0.0039) model time 0.2180 (0.2163) loss 3.2671 (3.4457) grad_norm 2.3095 (1.1967) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][200/625] eta 0:01:33 lr 0.001963 wd 0.0500 time 0.2114 (0.2193) data time 0.0012 (0.0037) model time 0.2103 (0.2163) loss 4.1732 (3.4549) grad_norm 0.9087 (1.1950) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][210/625] eta 0:01:30 lr 0.001963 wd 0.0500 time 0.2090 (0.2190) data time 0.0009 (0.0036) model time 0.2081 (0.2161) loss 4.2860 (3.4790) grad_norm 1.7208 (1.2072) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][220/625] eta 0:01:28 lr 0.001963 wd 0.0500 time 0.2668 (0.2192) data time 0.0008 (0.0035) model time 0.2660 (0.2164) loss 2.4032 (3.4802) grad_norm 1.5851 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][230/625] eta 0:01:27 lr 0.001963 wd 0.0500 time 0.2188 (0.2205) data time 0.0010 (0.0034) model time 0.2178 (0.2181) loss 3.9196 (3.4889) grad_norm 0.9961 (1.2182) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][240/625] eta 0:01:24 lr 0.001963 wd 0.0500 time 0.2109 (0.2201) data time 0.0008 (0.0033) model time 0.2101 (0.2178) loss 3.7318 (3.4953) grad_norm 1.2870 (1.2315) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][250/625] eta 0:01:22 lr 0.001963 wd 0.0500 time 0.2483 (0.2200) data time 0.0008 (0.0032) model time 0.2475 (0.2177) loss 2.8766 (3.4958) grad_norm 1.5002 (1.2284) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][260/625] eta 0:01:20 lr 0.001963 wd 0.0500 time 0.2079 (0.2197) data time 0.0011 (0.0031) model time 0.2068 (0.2174) loss 2.7699 (3.4927) grad_norm 1.0653 (1.2358) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][270/625] eta 0:01:17 lr 0.001963 wd 0.0500 time 0.2110 (0.2195) data time 0.0008 (0.0030) model time 0.2102 (0.2171) loss 2.6118 (3.4929) grad_norm 1.2609 (1.2401) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][280/625] eta 0:01:15 lr 0.001963 wd 0.0500 time 0.4270 (0.2200) data time 0.0008 (0.0030) model time 0.4263 (0.2179) loss 4.2128 (3.4861) grad_norm 1.0526 (1.2452) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][290/625] eta 0:01:13 lr 0.001963 wd 0.0500 time 0.2188 (0.2199) data time 0.0010 (0.0029) model time 0.2178 (0.2178) loss 3.9174 (3.4952) grad_norm 1.2119 (1.2379) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][300/625] eta 0:01:11 lr 0.001963 wd 0.0500 time 0.2073 (0.2197) data time 0.0010 (0.0028) model time 0.2064 (0.2176) loss 3.9061 (3.5058) grad_norm 1.6985 (1.2439) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][310/625] eta 0:01:09 lr 0.001963 wd 0.0500 time 0.2072 (0.2195) data time 0.0008 (0.0028) model time 0.2063 (0.2173) loss 4.3665 (3.5058) grad_norm 0.8665 (1.2414) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][320/625] eta 0:01:06 lr 0.001963 wd 0.0500 time 0.2130 (0.2193) data time 0.0007 (0.0027) model time 0.2123 (0.2172) loss 3.7546 (3.5149) grad_norm 0.9695 (1.2366) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][330/625] eta 0:01:04 lr 0.001963 wd 0.0500 time 0.2202 (0.2192) data time 0.0007 (0.0027) model time 0.2195 (0.2171) loss 4.2593 (3.5165) grad_norm 0.8801 (1.2402) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][340/625] eta 0:01:02 lr 0.001963 wd 0.0500 time 0.2157 (0.2191) data time 0.0008 (0.0026) model time 0.2149 (0.2170) loss 3.2980 (3.5182) grad_norm 0.9595 (1.2432) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][350/625] eta 0:01:00 lr 0.001963 wd 0.0500 time 0.2010 (0.2198) data time 0.0010 (0.0026) model time 0.1999 (0.2179) loss 3.5452 (3.5129) grad_norm 1.4290 (1.2453) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][360/625] eta 0:00:58 lr 0.001963 wd 0.0500 time 0.2227 (0.2198) data time 0.0012 (0.0025) model time 0.2215 (0.2179) loss 3.4949 (3.5016) grad_norm 1.1226 (1.2461) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][370/625] eta 0:00:56 lr 0.001963 wd 0.0500 time 0.2240 (0.2197) data time 0.0007 (0.0025) model time 0.2233 (0.2178) loss 3.8325 (3.5000) grad_norm 1.6111 (1.2469) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][380/625] eta 0:00:53 lr 0.001963 wd 0.0500 time 0.2190 (0.2196) data time 0.0010 (0.0025) model time 0.2180 (0.2178) loss 3.9691 (3.5071) grad_norm 0.8936 (1.2468) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:16:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 19:16:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:16:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 19:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 19:22:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 44) [2024-07-29 19:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 19:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][390/625] eta 0:06:41 lr 0.001962 wd 0.0500 time 0.2124 (1.7098) data time 0.0008 (0.1250) model time 0.2116 (1.5848) loss 4.5393 (4.0997) grad_norm 1.2036 (1.1634) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][400/625] eta 0:02:53 lr 0.001962 wd 0.0500 time 0.2129 (0.7729) data time 0.0011 (0.0475) model time 0.2118 (0.7254) loss 3.9592 (3.8362) grad_norm 1.0623 (1.0857) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][410/625] eta 0:02:00 lr 0.001962 wd 0.0500 time 0.2124 (0.5584) data time 0.0008 (0.0296) model time 0.2116 (0.5287) loss 3.3608 (3.8037) grad_norm 1.5751 (1.2083) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][420/625] eta 0:01:35 lr 0.001962 wd 0.0500 time 0.2107 (0.4641) data time 0.0010 (0.0217) model time 0.2097 (0.4424) loss 3.5225 (3.7563) grad_norm 0.7323 (1.2487) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][430/625] eta 0:01:19 lr 0.001962 wd 0.0500 time 0.2115 (0.4096) data time 0.0011 (0.0172) model time 0.2104 (0.3924) loss 3.0787 (3.6987) grad_norm 1.8153 (1.2899) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][440/625] eta 0:01:09 lr 0.001962 wd 0.0500 time 0.2143 (0.3748) data time 0.0008 (0.0143) model time 0.2135 (0.3605) loss 4.4880 (3.6840) grad_norm 1.0052 (1.2501) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][450/625] eta 0:01:01 lr 0.001962 wd 0.0500 time 0.2112 (0.3506) data time 0.0008 (0.0123) model time 0.2104 (0.3383) loss 2.9187 (3.6494) grad_norm 0.9315 (1.2297) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][460/625] eta 0:00:54 lr 0.001962 wd 0.0500 time 0.2096 (0.3325) data time 0.0012 (0.0108) model time 0.2084 (0.3217) loss 3.8274 (3.6130) grad_norm 1.4172 (1.2380) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][470/625] eta 0:00:49 lr 0.001962 wd 0.0500 time 0.2146 (0.3186) data time 0.0007 (0.0097) model time 0.2138 (0.3090) loss 2.6635 (3.5874) grad_norm 1.5387 (1.2653) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][480/625] eta 0:00:44 lr 0.001962 wd 0.0500 time 0.2075 (0.3077) data time 0.0007 (0.0088) model time 0.2068 (0.2989) loss 3.7064 (3.5891) grad_norm 0.7954 (1.2907) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:22:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][490/625] eta 0:00:40 lr 0.001962 wd 0.0500 time 0.2111 (0.2988) data time 0.0009 (0.0080) model time 0.2102 (0.2908) loss 3.7114 (3.6002) grad_norm 2.3528 (1.2893) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][500/625] eta 0:00:36 lr 0.001962 wd 0.0500 time 0.2160 (0.2917) data time 0.0008 (0.0074) model time 0.2153 (0.2842) loss 3.9907 (3.5885) grad_norm 1.5670 (1.3001) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][510/625] eta 0:00:32 lr 0.001962 wd 0.0500 time 0.2132 (0.2855) data time 0.0009 (0.0069) model time 0.2123 (0.2786) loss 2.2224 (3.5812) grad_norm 1.0345 (1.2913) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][520/625] eta 0:00:29 lr 0.001962 wd 0.0500 time 0.2132 (0.2805) data time 0.0009 (0.0065) model time 0.2122 (0.2740) loss 3.2945 (3.5902) grad_norm 1.3489 (1.2814) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][530/625] eta 0:00:26 lr 0.001962 wd 0.0500 time 0.2153 (0.2759) data time 0.0008 (0.0061) model time 0.2145 (0.2698) loss 2.6907 (3.5799) grad_norm 1.8108 (1.2758) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][540/625] eta 0:00:23 lr 0.001962 wd 0.0500 time 0.2105 (0.2719) data time 0.0010 (0.0058) model time 0.2095 (0.2661) loss 3.5941 (3.5817) grad_norm 1.5382 (1.2709) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][550/625] eta 0:00:20 lr 0.001962 wd 0.0500 time 0.2163 (0.2684) data time 0.0010 (0.0055) model time 0.2153 (0.2629) loss 3.6382 (3.5827) grad_norm 1.0613 (1.2616) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][560/625] eta 0:00:17 lr 0.001962 wd 0.0500 time 0.2130 (0.2654) data time 0.0008 (0.0053) model time 0.2122 (0.2601) loss 3.5367 (3.5695) grad_norm 1.7684 (1.2749) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][570/625] eta 0:00:14 lr 0.001962 wd 0.0500 time 0.2178 (0.2627) data time 0.0010 (0.0050) model time 0.2168 (0.2577) loss 2.8653 (3.5547) grad_norm 1.2908 (1.2964) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][580/625] eta 0:00:11 lr 0.001962 wd 0.0500 time 0.2101 (0.2604) data time 0.0007 (0.0048) model time 0.2094 (0.2555) loss 3.0468 (3.5480) grad_norm 0.9673 (1.2850) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][590/625] eta 0:00:09 lr 0.001961 wd 0.0500 time 0.2101 (0.2581) data time 0.0008 (0.0047) model time 0.2093 (0.2535) loss 2.9353 (3.5360) grad_norm 0.8636 (1.2720) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][600/625] eta 0:00:06 lr 0.001961 wd 0.0500 time 0.2136 (0.2561) data time 0.0007 (0.0045) model time 0.2129 (0.2516) loss 2.5398 (3.5274) grad_norm 1.0872 (1.2679) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][610/625] eta 0:00:03 lr 0.001961 wd 0.0500 time 0.2104 (0.2542) data time 0.0005 (0.0043) model time 0.2098 (0.2499) loss 3.1990 (3.5338) grad_norm 0.8805 (1.2672) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][620/625] eta 0:00:01 lr 0.001961 wd 0.0500 time 0.2089 (0.2525) data time 0.0005 (0.0042) model time 0.2084 (0.2483) loss 4.0307 (3.5261) grad_norm 1.3942 (1.2720) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 44 training takes 0:01:00 [2024-07-29 19:23:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:23:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.440 (0.440) Loss 0.8105 (0.8105) Acc@1 85.059 (85.059) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 19:23:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.3652 (1.0156) Acc@1 70.605 (79.279) Acc@5 91.797 (95.539) Mem 8975MB [2024-07-29 19:23:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.4570 (1.1961) Acc@1 68.213 (74.963) Acc@5 90.234 (93.176) Mem 8975MB [2024-07-29 19:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.684 Acc@5 93.038 [2024-07-29 19:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.7% [2024-07-29 19:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.68% [2024-07-29 19:23:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 19:23:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 19:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 1.0234 (1.0234) Acc@1 75.879 (75.879) Acc@5 92.773 (92.773) Mem 8975MB [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.102) Loss 1.5918 (1.2675) Acc@1 62.207 (69.416) Acc@5 86.963 (90.443) Mem 8975MB [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.8828 (1.4752) Acc@1 57.178 (65.458) Acc@5 81.982 (87.267) Mem 8975MB [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 65.065 Acc@5 87.166 [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 65.1% [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 65.07% [2024-07-29 19:23:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 19:23:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 19:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][0/625] eta 0:12:48 lr 0.001961 wd 0.0500 time 1.2301 (1.2301) data time 0.4847 (0.4847) model time 0.0000 (0.0000) loss 3.5803 (3.5803) grad_norm 1.2534 (1.2534) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 19:23:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][10/625] eta 0:03:09 lr 0.001961 wd 0.0500 time 0.2147 (0.3085) data time 0.0007 (0.0468) model time 0.0000 (0.0000) loss 3.3997 (3.2527) grad_norm 0.9840 (1.0396) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][20/625] eta 0:02:39 lr 0.001961 wd 0.0500 time 0.2131 (0.2636) data time 0.0008 (0.0250) model time 0.0000 (0.0000) loss 3.6852 (3.2622) grad_norm 2.5292 (1.2503) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][30/625] eta 0:02:27 lr 0.001961 wd 0.0500 time 0.2089 (0.2482) data time 0.0009 (0.0172) model time 0.0000 (0.0000) loss 4.2110 (3.2849) grad_norm 0.9632 (1.2380) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][40/625] eta 0:02:24 lr 0.001961 wd 0.0500 time 0.2109 (0.2470) data time 0.0010 (0.0133) model time 0.0000 (0.0000) loss 3.5849 (3.3524) grad_norm 1.2087 (1.2365) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][50/625] eta 0:02:18 lr 0.001961 wd 0.0500 time 0.2132 (0.2409) data time 0.0007 (0.0109) model time 0.0000 (0.0000) loss 2.5302 (3.3426) grad_norm 0.9193 (1.1935) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][60/625] eta 0:02:13 lr 0.001961 wd 0.0500 time 0.2137 (0.2367) data time 0.0009 (0.0093) model time 0.2128 (0.2143) loss 3.6197 (3.3215) grad_norm 0.9329 (1.1700) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][70/625] eta 0:02:09 lr 0.001961 wd 0.0500 time 0.2156 (0.2336) data time 0.0010 (0.0081) model time 0.2146 (0.2140) loss 3.7012 (3.3426) grad_norm 1.5084 (1.1798) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][80/625] eta 0:02:06 lr 0.001961 wd 0.0500 time 0.2208 (0.2319) data time 0.0007 (0.0073) model time 0.2201 (0.2154) loss 4.3679 (3.3999) grad_norm 1.0201 (1.1701) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][90/625] eta 0:02:03 lr 0.001961 wd 0.0500 time 0.2225 (0.2301) data time 0.0007 (0.0066) model time 0.2217 (0.2152) loss 2.7824 (3.4187) grad_norm 1.1517 (1.1522) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][100/625] eta 0:02:00 lr 0.001961 wd 0.0500 time 0.2160 (0.2290) data time 0.0010 (0.0061) model time 0.2150 (0.2156) loss 3.9821 (3.4408) grad_norm 1.5459 (1.1542) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][110/625] eta 0:01:57 lr 0.001961 wd 0.0500 time 0.2284 (0.2281) data time 0.0008 (0.0056) model time 0.2276 (0.2161) loss 3.7450 (3.4501) grad_norm 2.1828 (1.1862) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][120/625] eta 0:01:54 lr 0.001961 wd 0.0500 time 0.2165 (0.2269) data time 0.0008 (0.0052) model time 0.2157 (0.2157) loss 3.1109 (3.4569) grad_norm 1.2650 (1.1897) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][130/625] eta 0:01:51 lr 0.001961 wd 0.0500 time 0.2144 (0.2261) data time 0.0009 (0.0049) model time 0.2135 (0.2156) loss 2.6562 (3.4514) grad_norm 0.8983 (1.1866) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][140/625] eta 0:01:49 lr 0.001961 wd 0.0500 time 0.2140 (0.2253) data time 0.0007 (0.0046) model time 0.2133 (0.2153) loss 2.2904 (3.4503) grad_norm 1.1907 (1.2008) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][150/625] eta 0:01:46 lr 0.001961 wd 0.0500 time 0.2185 (0.2246) data time 0.0008 (0.0044) model time 0.2178 (0.2152) loss 4.0667 (3.4428) grad_norm 0.9327 (1.2113) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][160/625] eta 0:01:44 lr 0.001961 wd 0.0500 time 0.2124 (0.2240) data time 0.0009 (0.0042) model time 0.2115 (0.2152) loss 4.0794 (3.4584) grad_norm 1.2746 (1.2188) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][170/625] eta 0:01:42 lr 0.001960 wd 0.0500 time 0.2157 (0.2250) data time 0.0010 (0.0040) model time 0.2147 (0.2172) loss 3.5981 (3.4704) grad_norm 1.5755 (1.2156) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][180/625] eta 0:01:39 lr 0.001960 wd 0.0500 time 0.2119 (0.2243) data time 0.0008 (0.0038) model time 0.2111 (0.2168) loss 4.2309 (3.4737) grad_norm 1.7501 (1.2228) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][190/625] eta 0:01:37 lr 0.001960 wd 0.0500 time 0.2222 (0.2240) data time 0.0007 (0.0037) model time 0.2215 (0.2169) loss 3.9946 (3.4910) grad_norm 1.1358 (1.2194) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][200/625] eta 0:01:35 lr 0.001960 wd 0.0500 time 0.2141 (0.2237) data time 0.0008 (0.0035) model time 0.2133 (0.2168) loss 4.6457 (3.5051) grad_norm 0.8423 (1.2081) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][210/625] eta 0:01:32 lr 0.001960 wd 0.0500 time 0.2217 (0.2233) data time 0.0008 (0.0034) model time 0.2209 (0.2167) loss 3.2383 (3.5028) grad_norm 1.5125 (1.1992) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][220/625] eta 0:01:30 lr 0.001960 wd 0.0500 time 0.2146 (0.2229) data time 0.0010 (0.0033) model time 0.2136 (0.2165) loss 3.7347 (3.4891) grad_norm 1.1197 (1.1972) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][230/625] eta 0:01:27 lr 0.001960 wd 0.0500 time 0.2198 (0.2225) data time 0.0010 (0.0032) model time 0.2188 (0.2163) loss 3.4755 (3.4736) grad_norm 0.9933 (1.1964) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][240/625] eta 0:01:25 lr 0.001960 wd 0.0500 time 0.2180 (0.2223) data time 0.0008 (0.0031) model time 0.2172 (0.2162) loss 3.7183 (3.4740) grad_norm 1.1413 (1.2064) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][250/625] eta 0:01:23 lr 0.001960 wd 0.0500 time 0.2104 (0.2220) data time 0.0010 (0.0030) model time 0.2094 (0.2162) loss 3.7445 (3.4877) grad_norm 0.9574 (1.2027) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][260/625] eta 0:01:20 lr 0.001960 wd 0.0500 time 0.2140 (0.2218) data time 0.0012 (0.0030) model time 0.2129 (0.2161) loss 3.9974 (3.4903) grad_norm 0.8011 (1.1965) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][270/625] eta 0:01:19 lr 0.001960 wd 0.0500 time 0.2101 (0.2226) data time 0.0006 (0.0029) model time 0.2095 (0.2174) loss 3.9227 (3.4953) grad_norm 1.3057 (1.1975) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][280/625] eta 0:01:16 lr 0.001960 wd 0.0500 time 0.2296 (0.2224) data time 0.0008 (0.0028) model time 0.2288 (0.2173) loss 2.8956 (3.4984) grad_norm 1.3433 (1.2001) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][290/625] eta 0:01:14 lr 0.001960 wd 0.0500 time 0.2142 (0.2224) data time 0.0010 (0.0028) model time 0.2132 (0.2174) loss 3.2961 (3.4916) grad_norm 0.9612 (1.1962) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][300/625] eta 0:01:12 lr 0.001960 wd 0.0500 time 0.2162 (0.2221) data time 0.0009 (0.0027) model time 0.2153 (0.2173) loss 3.8785 (3.4973) grad_norm 0.9347 (1.1881) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][310/625] eta 0:01:09 lr 0.001960 wd 0.0500 time 0.2125 (0.2219) data time 0.0008 (0.0026) model time 0.2117 (0.2171) loss 3.7636 (3.4988) grad_norm 0.8351 (1.1851) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][320/625] eta 0:01:07 lr 0.001960 wd 0.0500 time 0.2233 (0.2217) data time 0.0009 (0.0026) model time 0.2223 (0.2171) loss 3.4969 (3.5077) grad_norm 2.0887 (1.1939) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][330/625] eta 0:01:05 lr 0.001960 wd 0.0500 time 0.2123 (0.2215) data time 0.0012 (0.0026) model time 0.2111 (0.2170) loss 3.6992 (3.5129) grad_norm 1.1599 (1.2097) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][340/625] eta 0:01:03 lr 0.001960 wd 0.0500 time 0.2135 (0.2213) data time 0.0009 (0.0025) model time 0.2126 (0.2168) loss 3.9462 (3.5175) grad_norm 1.2440 (1.2109) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][350/625] eta 0:01:00 lr 0.001960 wd 0.0500 time 0.2106 (0.2213) data time 0.0010 (0.0025) model time 0.2097 (0.2169) loss 4.0408 (3.5201) grad_norm 1.1247 (1.2194) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][360/625] eta 0:00:58 lr 0.001960 wd 0.0500 time 0.2198 (0.2212) data time 0.0009 (0.0024) model time 0.2189 (0.2169) loss 2.5772 (3.5176) grad_norm 1.2399 (1.2239) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][370/625] eta 0:00:56 lr 0.001959 wd 0.0500 time 0.2345 (0.2211) data time 0.0010 (0.0024) model time 0.2335 (0.2169) loss 3.4301 (3.5183) grad_norm 0.9005 (1.2216) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][380/625] eta 0:00:54 lr 0.001959 wd 0.0500 time 0.2168 (0.2211) data time 0.0008 (0.0024) model time 0.2160 (0.2170) loss 4.1371 (3.5236) grad_norm 1.1859 (1.2246) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][390/625] eta 0:00:51 lr 0.001959 wd 0.0500 time 0.2126 (0.2209) data time 0.0012 (0.0023) model time 0.2115 (0.2169) loss 3.8357 (3.5234) grad_norm 0.9900 (1.2255) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][400/625] eta 0:00:49 lr 0.001959 wd 0.0500 time 0.2234 (0.2208) data time 0.0008 (0.0023) model time 0.2226 (0.2169) loss 2.2693 (3.5206) grad_norm 1.3294 (1.2261) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][410/625] eta 0:00:47 lr 0.001959 wd 0.0500 time 0.2174 (0.2208) data time 0.0009 (0.0023) model time 0.2166 (0.2169) loss 4.1012 (3.5207) grad_norm 1.0339 (1.2224) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][420/625] eta 0:00:45 lr 0.001959 wd 0.0500 time 0.2125 (0.2206) data time 0.0007 (0.0022) model time 0.2118 (0.2168) loss 2.3987 (3.5095) grad_norm 1.0206 (1.2254) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][430/625] eta 0:00:43 lr 0.001959 wd 0.0500 time 0.2298 (0.2206) data time 0.0008 (0.0022) model time 0.2290 (0.2168) loss 4.7764 (3.5164) grad_norm 1.6365 (1.2295) loss_scale 16384.0000 (16384.0000) mem 8978MB [2024-07-29 19:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 19:25:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:25:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 19:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 19:37:36 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 45) [2024-07-29 19:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 19:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][440/625] eta 0:03:47 lr 0.001959 wd 0.0500 time 0.2007 (1.2318) data time 0.0009 (0.2035) model time 0.1999 (1.0283) loss 3.9058 (4.0027) grad_norm 0.8752 (1.0844) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][450/625] eta 0:01:55 lr 0.001959 wd 0.0500 time 0.2019 (0.6599) data time 0.0008 (0.0911) model time 0.2011 (0.5687) loss 4.3849 (3.8558) grad_norm 1.2837 (1.0681) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][460/625] eta 0:01:21 lr 0.001959 wd 0.0500 time 0.1997 (0.4957) data time 0.0010 (0.0590) model time 0.1987 (0.4368) loss 3.9444 (3.8602) grad_norm 1.5038 (1.1234) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][470/625] eta 0:01:04 lr 0.001959 wd 0.0500 time 0.2050 (0.4185) data time 0.0009 (0.0437) model time 0.2042 (0.3747) loss 3.4794 (3.7802) grad_norm 1.1144 (1.1876) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][480/625] eta 0:00:54 lr 0.001959 wd 0.0500 time 0.2027 (0.3731) data time 0.0007 (0.0348) model time 0.2020 (0.3383) loss 4.0053 (3.7369) grad_norm 0.9187 (1.1518) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][490/625] eta 0:00:46 lr 0.001959 wd 0.0500 time 0.2013 (0.3438) data time 0.0007 (0.0290) model time 0.2005 (0.3147) loss 2.9494 (3.7051) grad_norm 1.0974 (1.1551) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][500/625] eta 0:00:40 lr 0.001959 wd 0.0500 time 0.1983 (0.3224) data time 0.0008 (0.0249) model time 0.1975 (0.2975) loss 3.0353 (3.6825) grad_norm 1.6037 (1.1768) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][510/625] eta 0:00:35 lr 0.001959 wd 0.0500 time 0.2027 (0.3067) data time 0.0007 (0.0218) model time 0.2020 (0.2848) loss 2.9780 (3.6441) grad_norm 0.8630 (1.1920) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][520/625] eta 0:00:30 lr 0.001959 wd 0.0500 time 0.1933 (0.2944) data time 0.0009 (0.0195) model time 0.1924 (0.2750) loss 3.6759 (3.6072) grad_norm 0.8363 (1.1762) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][530/625] eta 0:00:27 lr 0.001959 wd 0.0500 time 0.2001 (0.2850) data time 0.0007 (0.0176) model time 0.1994 (0.2674) loss 4.1134 (3.6118) grad_norm 1.5708 (1.1918) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][540/625] eta 0:00:23 lr 0.001959 wd 0.0500 time 0.2006 (0.2774) data time 0.0009 (0.0160) model time 0.1996 (0.2614) loss 2.7645 (3.6166) grad_norm 0.9885 (1.1846) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][550/625] eta 0:00:20 lr 0.001959 wd 0.0500 time 0.2016 (0.2711) data time 0.0009 (0.0148) model time 0.2008 (0.2563) loss 3.6689 (3.6160) grad_norm 0.8981 (1.1800) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][560/625] eta 0:00:17 lr 0.001959 wd 0.0500 time 0.2038 (0.2656) data time 0.0008 (0.0137) model time 0.2030 (0.2520) loss 3.9203 (3.6053) grad_norm 1.7200 (1.1981) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][570/625] eta 0:00:14 lr 0.001958 wd 0.0500 time 0.2016 (0.2615) data time 0.0011 (0.0128) model time 0.2005 (0.2488) loss 3.7160 (3.5990) grad_norm 1.6080 (1.2193) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][580/625] eta 0:00:11 lr 0.001958 wd 0.0500 time 0.2040 (0.2576) data time 0.0006 (0.0120) model time 0.2035 (0.2456) loss 3.8767 (3.5925) grad_norm 1.8382 (1.2271) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][590/625] eta 0:00:08 lr 0.001958 wd 0.0500 time 0.2016 (0.2541) data time 0.0010 (0.0113) model time 0.2006 (0.2428) loss 2.9033 (3.5782) grad_norm 1.0725 (1.2164) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][600/625] eta 0:00:06 lr 0.001958 wd 0.0500 time 0.2017 (0.2510) data time 0.0009 (0.0107) model time 0.2008 (0.2403) loss 3.6466 (3.5859) grad_norm 1.3232 (1.2132) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][610/625] eta 0:00:03 lr 0.001958 wd 0.0500 time 0.1991 (0.2486) data time 0.0005 (0.0102) model time 0.1985 (0.2384) loss 3.0621 (3.5721) grad_norm 1.2786 (1.2052) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][620/625] eta 0:00:01 lr 0.001958 wd 0.0500 time 0.2014 (0.2460) data time 0.0003 (0.0097) model time 0.2010 (0.2363) loss 3.9458 (3.5694) grad_norm 1.9262 (1.2193) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 45 training takes 0:00:47 [2024-07-29 19:38:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:38:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.398 (0.398) Loss 0.8145 (0.8145) Acc@1 84.863 (84.863) Acc@5 96.973 (96.973) Mem 8977MB [2024-07-29 19:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 1.3281 (0.9990) Acc@1 70.166 (79.537) Acc@5 92.090 (95.521) Mem 8977MB [2024-07-29 19:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.4570 (1.1962) Acc@1 69.287 (74.928) Acc@5 90.137 (92.992) Mem 8977MB [2024-07-29 19:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.710 Acc@5 92.948 [2024-07-29 19:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.7% [2024-07-29 19:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.71% [2024-07-29 19:38:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 19:38:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 19:38:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.402 (0.402) Loss 0.9624 (0.9624) Acc@1 76.953 (76.953) Acc@5 93.311 (93.311) Mem 8977MB [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 1.5283 (1.2020) Acc@1 64.111 (70.814) Acc@5 87.695 (91.224) Mem 8977MB [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.8125 (1.4092) Acc@1 58.545 (66.778) Acc@5 82.666 (88.116) Mem 8977MB [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 66.395 Acc@5 88.028 [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 66.4% [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 66.39% [2024-07-29 19:38:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 19:38:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 19:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][0/625] eta 0:11:22 lr 0.001958 wd 0.0500 time 1.0915 (1.0915) data time 0.8356 (0.8356) model time 0.0000 (0.0000) loss 3.6824 (3.6824) grad_norm 0.7914 (0.7914) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 19:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][10/625] eta 0:02:53 lr 0.001958 wd 0.0500 time 0.2015 (0.2819) data time 0.0008 (0.0769) model time 0.0000 (0.0000) loss 2.4283 (3.2952) grad_norm 0.7754 (0.9820) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][20/625] eta 0:02:27 lr 0.001958 wd 0.0500 time 0.1999 (0.2439) data time 0.0009 (0.0408) model time 0.0000 (0.0000) loss 2.3292 (3.3396) grad_norm 1.9603 (1.1508) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][30/625] eta 0:02:17 lr 0.001958 wd 0.0500 time 0.2019 (0.2306) data time 0.0008 (0.0279) model time 0.0000 (0.0000) loss 2.6197 (3.3638) grad_norm 1.6127 (1.2665) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][40/625] eta 0:02:10 lr 0.001958 wd 0.0500 time 0.2020 (0.2234) data time 0.0011 (0.0214) model time 0.0000 (0.0000) loss 3.1058 (3.3710) grad_norm 0.9204 (1.2363) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][50/625] eta 0:02:05 lr 0.001958 wd 0.0500 time 0.2004 (0.2190) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 3.9801 (3.4261) grad_norm 0.9296 (1.1841) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][60/625] eta 0:02:02 lr 0.001958 wd 0.0500 time 0.1992 (0.2161) data time 0.0009 (0.0147) model time 0.1982 (0.2002) loss 3.6789 (3.3905) grad_norm 1.3394 (1.1601) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][70/625] eta 0:01:59 lr 0.001958 wd 0.0500 time 0.1982 (0.2147) data time 0.0007 (0.0128) model time 0.1975 (0.2028) loss 3.4954 (3.3541) grad_norm 1.6544 (1.1565) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][80/625] eta 0:01:56 lr 0.001958 wd 0.0500 time 0.1990 (0.2131) data time 0.0007 (0.0113) model time 0.1983 (0.2021) loss 3.9817 (3.3586) grad_norm 1.2078 (1.1800) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][90/625] eta 0:01:54 lr 0.001958 wd 0.0500 time 0.4493 (0.2146) data time 0.0009 (0.0102) model time 0.4484 (0.2079) loss 3.6738 (3.3836) grad_norm 0.7683 (1.2147) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][100/625] eta 0:01:51 lr 0.001958 wd 0.0500 time 0.2020 (0.2131) data time 0.0009 (0.0093) model time 0.2011 (0.2061) loss 3.3284 (3.3778) grad_norm 1.0853 (1.2288) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][110/625] eta 0:01:49 lr 0.001958 wd 0.0500 time 0.1900 (0.2124) data time 0.0008 (0.0085) model time 0.1892 (0.2057) loss 3.2683 (3.3702) grad_norm 1.4793 (1.2467) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][120/625] eta 0:01:46 lr 0.001958 wd 0.0500 time 0.1941 (0.2115) data time 0.0007 (0.0079) model time 0.1934 (0.2050) loss 4.1405 (3.3795) grad_norm 1.6572 (1.2356) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][130/625] eta 0:01:44 lr 0.001958 wd 0.0500 time 0.2004 (0.2108) data time 0.0009 (0.0074) model time 0.1995 (0.2045) loss 4.0725 (3.4206) grad_norm 2.4039 (1.2478) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][140/625] eta 0:01:41 lr 0.001957 wd 0.0500 time 0.1986 (0.2101) data time 0.0010 (0.0069) model time 0.1976 (0.2040) loss 3.7288 (3.4250) grad_norm 1.6093 (1.2581) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:39:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 19:39:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:39:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:41:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 19:41:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 19:41:17 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 46) [2024-07-29 19:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 19:41:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][150/625] eta 0:14:16 lr 0.001957 wd 0.0500 time 0.2031 (1.8038) data time 0.0008 (0.1315) model time 0.2022 (1.6724) loss 4.1812 (4.0612) grad_norm 0.9173 (1.0460) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][160/625] eta 0:05:45 lr 0.001957 wd 0.0500 time 0.2091 (0.7426) data time 0.0011 (0.0446) model time 0.2080 (0.6980) loss 3.7625 (3.7716) grad_norm 1.6240 (1.1009) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][170/625] eta 0:04:00 lr 0.001957 wd 0.0500 time 0.2069 (0.5288) data time 0.0010 (0.0272) model time 0.2059 (0.5016) loss 3.5644 (3.7490) grad_norm 1.6123 (1.2241) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][180/625] eta 0:03:14 lr 0.001957 wd 0.0500 time 0.2080 (0.4377) data time 0.0012 (0.0197) model time 0.2069 (0.4180) loss 3.7622 (3.7206) grad_norm 0.9255 (1.1789) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][190/625] eta 0:02:48 lr 0.001957 wd 0.0500 time 0.2056 (0.3873) data time 0.0012 (0.0156) model time 0.2044 (0.3717) loss 3.4947 (3.6698) grad_norm 1.3628 (1.2402) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][200/625] eta 0:02:30 lr 0.001957 wd 0.0500 time 0.2061 (0.3549) data time 0.0009 (0.0129) model time 0.2052 (0.3420) loss 2.9248 (3.6543) grad_norm 1.4364 (1.2775) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][210/625] eta 0:02:17 lr 0.001957 wd 0.0500 time 0.2028 (0.3323) data time 0.0010 (0.0111) model time 0.2019 (0.3212) loss 3.8545 (3.6276) grad_norm 1.4387 (1.2845) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][220/625] eta 0:02:07 lr 0.001957 wd 0.0500 time 0.2052 (0.3155) data time 0.0010 (0.0098) model time 0.2041 (0.3058) loss 2.6549 (3.5724) grad_norm 0.8363 (1.2466) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][230/625] eta 0:01:59 lr 0.001957 wd 0.0500 time 0.2103 (0.3030) data time 0.0007 (0.0088) model time 0.2096 (0.2942) loss 3.4240 (3.5691) grad_norm 0.9151 (1.2295) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:41:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][240/625] eta 0:01:52 lr 0.001957 wd 0.0500 time 0.2067 (0.2930) data time 0.0010 (0.0080) model time 0.2057 (0.2850) loss 4.0111 (3.5661) grad_norm 1.2227 (1.2084) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][250/625] eta 0:01:46 lr 0.001957 wd 0.0500 time 0.2059 (0.2848) data time 0.0011 (0.0073) model time 0.2048 (0.2775) loss 3.0314 (3.5923) grad_norm 1.4702 (1.1929) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][260/625] eta 0:01:41 lr 0.001957 wd 0.0500 time 0.2023 (0.2781) data time 0.0008 (0.0068) model time 0.2015 (0.2713) loss 2.8747 (3.5768) grad_norm 1.1543 (1.1936) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][270/625] eta 0:01:36 lr 0.001957 wd 0.0500 time 0.2078 (0.2724) data time 0.0008 (0.0063) model time 0.2070 (0.2661) loss 2.9546 (3.5856) grad_norm 1.3439 (1.1890) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][280/625] eta 0:01:32 lr 0.001957 wd 0.0500 time 0.2166 (0.2680) data time 0.0008 (0.0060) model time 0.2158 (0.2620) loss 3.2160 (3.5818) grad_norm 1.0767 (1.1872) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 19:42:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 19:42:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:42:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 19:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 19:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 19:52:42 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 46) [2024-07-29 19:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 19:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][290/625] eta 0:42:10 lr 0.001957 wd 0.0500 time 7.5537 (7.5537) data time 0.5815 (0.5815) model time 6.9722 (6.9722) loss 4.8154 (4.8154) grad_norm 1.2263 (1.2263) loss_scale 16384.0000 (16384.0000) mem 10976MB [2024-07-29 19:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][300/625] eta 0:04:57 lr 0.001957 wd 0.0500 time 0.1983 (0.9150) data time 0.0008 (0.0536) model time 0.1975 (0.8613) loss 3.0489 (3.9044) grad_norm 1.3685 (1.4566) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][310/625] eta 0:03:00 lr 0.001957 wd 0.0500 time 0.1957 (0.5739) data time 0.0010 (0.0285) model time 0.1948 (0.5454) loss 3.6861 (3.7460) grad_norm 1.0676 (1.3576) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][320/625] eta 0:02:18 lr 0.001957 wd 0.0500 time 0.1987 (0.4531) data time 0.0006 (0.0196) model time 0.1981 (0.4336) loss 2.6741 (3.7476) grad_norm 0.9280 (1.2955) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][330/625] eta 0:01:56 lr 0.001956 wd 0.0500 time 0.1952 (0.3933) data time 0.0010 (0.0150) model time 0.1942 (0.3783) loss 3.5033 (3.6863) grad_norm 1.2735 (1.3276) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][340/625] eta 0:01:41 lr 0.001956 wd 0.0500 time 0.2085 (0.3558) data time 0.0006 (0.0122) model time 0.2078 (0.3435) loss 4.0529 (3.6746) grad_norm 0.9203 (1.3209) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][350/625] eta 0:01:30 lr 0.001956 wd 0.0500 time 0.1981 (0.3303) data time 0.0009 (0.0104) model time 0.1972 (0.3199) loss 3.3657 (3.6441) grad_norm 1.5780 (1.2692) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][360/625] eta 0:01:22 lr 0.001956 wd 0.0500 time 0.1966 (0.3118) data time 0.0008 (0.0090) model time 0.1958 (0.3028) loss 3.4141 (3.6243) grad_norm 1.2042 (1.3374) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][370/625] eta 0:01:16 lr 0.001956 wd 0.0500 time 0.1968 (0.2987) data time 0.0008 (0.0080) model time 0.1959 (0.2907) loss 3.1961 (3.6140) grad_norm 0.9205 (1.2971) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][380/625] eta 0:01:10 lr 0.001956 wd 0.0500 time 0.2000 (0.2882) data time 0.0007 (0.0072) model time 0.1994 (0.2809) loss 4.2621 (3.5940) grad_norm 0.9501 (1.2669) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][390/625] eta 0:01:05 lr 0.001956 wd 0.0500 time 0.1954 (0.2794) data time 0.0007 (0.0066) model time 0.1947 (0.2728) loss 3.6500 (3.5988) grad_norm 1.1228 (1.2481) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][400/625] eta 0:01:01 lr 0.001956 wd 0.0500 time 0.1972 (0.2723) data time 0.0009 (0.0061) model time 0.1963 (0.2662) loss 3.0397 (3.5991) grad_norm 0.9808 (1.2680) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][410/625] eta 0:00:57 lr 0.001956 wd 0.0500 time 0.1957 (0.2666) data time 0.0007 (0.0057) model time 0.1950 (0.2609) loss 2.7829 (3.6001) grad_norm 1.0786 (1.2705) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][420/625] eta 0:00:53 lr 0.001956 wd 0.0500 time 0.2007 (0.2616) data time 0.0008 (0.0054) model time 0.1999 (0.2562) loss 3.7284 (3.5844) grad_norm 0.8011 (1.2486) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][430/625] eta 0:00:50 lr 0.001956 wd 0.0500 time 0.1989 (0.2572) data time 0.0007 (0.0051) model time 0.1981 (0.2521) loss 3.9747 (3.5815) grad_norm 1.1569 (1.2349) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][440/625] eta 0:00:46 lr 0.001956 wd 0.0500 time 0.2047 (0.2537) data time 0.0009 (0.0048) model time 0.2038 (0.2489) loss 2.6299 (3.5707) grad_norm 2.2348 (1.2615) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][450/625] eta 0:00:43 lr 0.001956 wd 0.0500 time 0.2015 (0.2507) data time 0.0009 (0.0046) model time 0.2006 (0.2461) loss 3.7547 (3.5754) grad_norm 1.1669 (1.2691) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][460/625] eta 0:00:40 lr 0.001956 wd 0.0500 time 0.2029 (0.2478) data time 0.0008 (0.0043) model time 0.2021 (0.2434) loss 3.3468 (3.5718) grad_norm 1.4830 (1.2651) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][470/625] eta 0:00:37 lr 0.001956 wd 0.0500 time 0.1977 (0.2451) data time 0.0008 (0.0041) model time 0.1969 (0.2409) loss 3.7729 (3.5633) grad_norm 1.6026 (1.2603) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 19:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 19:53:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 19:53:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 20:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 20:23:13 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 46) [2024-07-29 20:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 20:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][480/625] eta 0:04:32 lr 0.001956 wd 0.0500 time 0.2043 (1.8770) data time 0.0008 (0.1615) model time 0.2036 (1.7154) loss 4.0014 (3.7827) grad_norm 1.0193 (1.3706) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][490/625] eta 0:01:43 lr 0.001956 wd 0.0500 time 0.2137 (0.7650) data time 0.0011 (0.0546) model time 0.2126 (0.7104) loss 3.7159 (3.7127) grad_norm 1.3136 (1.4208) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][500/625] eta 0:01:07 lr 0.001956 wd 0.0500 time 0.2084 (0.5434) data time 0.0011 (0.0333) model time 0.2073 (0.5102) loss 4.0163 (3.7444) grad_norm 1.7604 (1.3362) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][510/625] eta 0:00:51 lr 0.001956 wd 0.0500 time 0.2152 (0.4482) data time 0.0011 (0.0240) model time 0.2142 (0.4242) loss 3.7620 (3.7523) grad_norm 1.0692 (1.2795) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][520/625] eta 0:00:41 lr 0.001955 wd 0.0500 time 0.2057 (0.3952) data time 0.0011 (0.0191) model time 0.2046 (0.3761) loss 3.7232 (3.7135) grad_norm 1.2449 (1.2599) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][530/625] eta 0:00:34 lr 0.001955 wd 0.0500 time 0.2068 (0.3619) data time 0.0008 (0.0159) model time 0.2060 (0.3460) loss 2.2937 (3.6836) grad_norm 1.3071 (1.2728) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][540/625] eta 0:00:28 lr 0.001955 wd 0.0500 time 0.2117 (0.3387) data time 0.0010 (0.0137) model time 0.2107 (0.3251) loss 4.0148 (3.6717) grad_norm 1.0704 (1.2787) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][550/625] eta 0:00:24 lr 0.001955 wd 0.0500 time 0.2127 (0.3220) data time 0.0010 (0.0120) model time 0.2117 (0.3100) loss 2.7939 (3.6288) grad_norm 1.1297 (1.2945) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][560/625] eta 0:00:20 lr 0.001955 wd 0.0500 time 0.2040 (0.3088) data time 0.0010 (0.0107) model time 0.2030 (0.2980) loss 3.5173 (3.6123) grad_norm 1.0584 (1.2807) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][570/625] eta 0:00:16 lr 0.001955 wd 0.0500 time 0.2081 (0.2983) data time 0.0012 (0.0097) model time 0.2070 (0.2886) loss 3.8114 (3.6142) grad_norm 1.3420 (1.2617) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][580/625] eta 0:00:13 lr 0.001955 wd 0.0500 time 0.2026 (0.2904) data time 0.0011 (0.0089) model time 0.2016 (0.2814) loss 3.4879 (3.6459) grad_norm 2.6033 (1.3018) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][590/625] eta 0:00:09 lr 0.001955 wd 0.0500 time 0.2162 (0.2835) data time 0.0007 (0.0082) model time 0.2155 (0.2753) loss 2.7531 (3.6319) grad_norm 0.9999 (1.2995) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][600/625] eta 0:00:06 lr 0.001955 wd 0.0500 time 0.2109 (0.2778) data time 0.0008 (0.0077) model time 0.2101 (0.2702) loss 3.3662 (3.6285) grad_norm 1.6659 (1.3065) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][610/625] eta 0:00:04 lr 0.001955 wd 0.0500 time 0.2077 (0.2729) data time 0.0006 (0.0072) model time 0.2071 (0.2657) loss 3.5779 (3.6265) grad_norm 0.8837 (1.3067) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][620/625] eta 0:00:01 lr 0.001955 wd 0.0500 time 0.2078 (0.2683) data time 0.0008 (0.0068) model time 0.2071 (0.2615) loss 3.8478 (3.6132) grad_norm 0.9086 (1.3029) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 20:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 46 training takes 0:00:39 [2024-07-29 20:24:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:24:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.8467 (0.8467) Acc@1 83.984 (83.984) Acc@5 96.826 (96.826) Mem 8975MB [2024-07-29 20:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.097) Loss 1.3213 (0.9978) Acc@1 71.387 (79.803) Acc@5 91.895 (95.641) Mem 8975MB [2024-07-29 20:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.4785 (1.2015) Acc@1 68.213 (75.086) Acc@5 89.697 (92.941) Mem 8975MB [2024-07-29 20:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.872 Acc@5 92.920 [2024-07-29 20:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.9% [2024-07-29 20:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.87% [2024-07-29 20:24:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 20:24:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 20:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.617 (0.617) Loss 0.9121 (0.9121) Acc@1 78.125 (78.125) Acc@5 94.043 (94.043) Mem 8975MB [2024-07-29 20:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.113) Loss 1.4697 (1.1436) Acc@1 65.088 (71.942) Acc@5 88.574 (91.903) Mem 8975MB [2024-07-29 20:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.085) Loss 1.7490 (1.3503) Acc@1 59.912 (67.962) Acc@5 83.252 (88.800) Mem 8975MB [2024-07-29 20:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 67.590 Acc@5 88.738 [2024-07-29 20:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 67.6% [2024-07-29 20:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 67.59% [2024-07-29 20:24:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:24:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][0/625] eta 0:07:49 lr 0.001955 wd 0.0500 time 0.7513 (0.7513) data time 0.4644 (0.4644) model time 0.0000 (0.0000) loss 3.7746 (3.7746) grad_norm 1.0084 (1.0084) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][10/625] eta 0:02:40 lr 0.001955 wd 0.0500 time 0.2106 (0.2609) data time 0.0009 (0.0433) model time 0.0000 (0.0000) loss 3.7936 (3.6275) grad_norm 1.1690 (1.0314) loss_scale 32768.0000 (26810.1818) mem 8969MB [2024-07-29 20:24:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][20/625] eta 0:02:23 lr 0.001955 wd 0.0500 time 0.2048 (0.2374) data time 0.0009 (0.0232) model time 0.0000 (0.0000) loss 2.7385 (3.5692) grad_norm 1.6675 (1.1752) loss_scale 32768.0000 (29647.2381) mem 8969MB [2024-07-29 20:24:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][30/625] eta 0:02:16 lr 0.001955 wd 0.0500 time 0.2128 (0.2295) data time 0.0008 (0.0162) model time 0.0000 (0.0000) loss 3.1050 (3.4498) grad_norm 1.9418 (1.2395) loss_scale 32768.0000 (30653.9355) mem 8969MB [2024-07-29 20:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][40/625] eta 0:02:11 lr 0.001955 wd 0.0500 time 0.2170 (0.2247) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 2.8696 (3.4671) grad_norm 1.3684 (1.2275) loss_scale 32768.0000 (31169.5610) mem 8969MB [2024-07-29 20:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][50/625] eta 0:02:07 lr 0.001955 wd 0.0500 time 0.2074 (0.2216) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 3.8979 (3.4519) grad_norm 0.9284 (1.1970) loss_scale 32768.0000 (31482.9804) mem 8969MB [2024-07-29 20:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][60/625] eta 0:02:04 lr 0.001955 wd 0.0500 time 0.2136 (0.2205) data time 0.0008 (0.0088) model time 0.2127 (0.2138) loss 3.5613 (3.4460) grad_norm 1.1074 (1.2055) loss_scale 32768.0000 (31693.6393) mem 8969MB [2024-07-29 20:24:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][70/625] eta 0:02:01 lr 0.001955 wd 0.0500 time 0.2011 (0.2197) data time 0.0012 (0.0078) model time 0.1999 (0.2134) loss 3.2043 (3.4396) grad_norm 0.8567 (1.2057) loss_scale 32768.0000 (31844.9577) mem 8969MB [2024-07-29 20:24:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][80/625] eta 0:01:59 lr 0.001954 wd 0.0500 time 0.2059 (0.2198) data time 0.0012 (0.0071) model time 0.2047 (0.2152) loss 4.0527 (3.4614) grad_norm 1.3855 (1.2236) loss_scale 32768.0000 (31958.9136) mem 8969MB [2024-07-29 20:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][90/625] eta 0:01:57 lr 0.001954 wd 0.0500 time 0.2140 (0.2189) data time 0.0011 (0.0064) model time 0.2129 (0.2139) loss 4.1452 (3.4482) grad_norm 1.9431 (1.2183) loss_scale 32768.0000 (32047.8242) mem 8969MB [2024-07-29 20:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][100/625] eta 0:01:54 lr 0.001954 wd 0.0500 time 0.2091 (0.2181) data time 0.0008 (0.0059) model time 0.2082 (0.2132) loss 2.3986 (3.4329) grad_norm 1.1668 (1.2092) loss_scale 32768.0000 (32119.1287) mem 8969MB [2024-07-29 20:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][110/625] eta 0:01:52 lr 0.001954 wd 0.0500 time 0.2136 (0.2178) data time 0.0011 (0.0055) model time 0.2125 (0.2131) loss 3.1606 (3.4071) grad_norm 1.6112 (1.2431) loss_scale 32768.0000 (32177.5856) mem 8969MB [2024-07-29 20:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][120/625] eta 0:01:51 lr 0.001954 wd 0.0500 time 0.2554 (0.2201) data time 0.0008 (0.0051) model time 0.2546 (0.2177) loss 4.0575 (3.3932) grad_norm 1.8533 (1.2499) loss_scale 32768.0000 (32226.3802) mem 8969MB [2024-07-29 20:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][130/625] eta 0:01:48 lr 0.001954 wd 0.0500 time 0.2115 (0.2197) data time 0.0012 (0.0048) model time 0.2103 (0.2172) loss 4.0814 (3.4130) grad_norm 1.7606 (1.2360) loss_scale 32768.0000 (32267.7252) mem 8969MB [2024-07-29 20:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][140/625] eta 0:01:46 lr 0.001954 wd 0.0500 time 0.2165 (0.2195) data time 0.0011 (0.0045) model time 0.2154 (0.2170) loss 2.7787 (3.4111) grad_norm 1.4875 (1.2310) loss_scale 32768.0000 (32303.2057) mem 8969MB [2024-07-29 20:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][150/625] eta 0:01:44 lr 0.001954 wd 0.0500 time 0.2391 (0.2192) data time 0.0009 (0.0043) model time 0.2383 (0.2168) loss 3.5485 (3.3939) grad_norm 1.4885 (inf) loss_scale 16384.0000 (31899.9735) mem 8969MB [2024-07-29 20:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][160/625] eta 0:01:41 lr 0.001954 wd 0.0500 time 0.2180 (0.2188) data time 0.0009 (0.0041) model time 0.2171 (0.2163) loss 3.5942 (3.3914) grad_norm 1.3462 (inf) loss_scale 16384.0000 (30936.2484) mem 8969MB [2024-07-29 20:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][170/625] eta 0:01:39 lr 0.001954 wd 0.0500 time 0.2104 (0.2186) data time 0.0009 (0.0039) model time 0.2095 (0.2160) loss 3.6617 (3.4185) grad_norm 1.8496 (inf) loss_scale 16384.0000 (30085.2398) mem 8969MB [2024-07-29 20:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][180/625] eta 0:01:37 lr 0.001954 wd 0.0500 time 0.2030 (0.2183) data time 0.0010 (0.0038) model time 0.2021 (0.2157) loss 4.4126 (3.4369) grad_norm 1.0318 (inf) loss_scale 16384.0000 (29328.2652) mem 8969MB [2024-07-29 20:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][190/625] eta 0:01:34 lr 0.001954 wd 0.0500 time 0.2098 (0.2179) data time 0.0010 (0.0036) model time 0.2087 (0.2153) loss 3.8510 (3.4412) grad_norm 1.0232 (inf) loss_scale 16384.0000 (28650.5550) mem 8969MB [2024-07-29 20:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][200/625] eta 0:01:32 lr 0.001954 wd 0.0500 time 0.2071 (0.2175) data time 0.0011 (0.0035) model time 0.2059 (0.2149) loss 2.7388 (3.4370) grad_norm 0.8420 (inf) loss_scale 16384.0000 (28040.2786) mem 8969MB [2024-07-29 20:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][210/625] eta 0:01:30 lr 0.001954 wd 0.0500 time 0.2141 (0.2176) data time 0.0009 (0.0034) model time 0.2132 (0.2151) loss 4.1678 (3.4444) grad_norm 2.4709 (inf) loss_scale 16384.0000 (27487.8483) mem 8969MB [2024-07-29 20:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][220/625] eta 0:01:27 lr 0.001954 wd 0.0500 time 0.2132 (0.2172) data time 0.0010 (0.0033) model time 0.2121 (0.2147) loss 4.1727 (3.4489) grad_norm 1.1689 (inf) loss_scale 16384.0000 (26985.4118) mem 8969MB [2024-07-29 20:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][230/625] eta 0:01:25 lr 0.001954 wd 0.0500 time 0.2124 (0.2169) data time 0.0007 (0.0032) model time 0.2117 (0.2144) loss 3.0003 (3.4483) grad_norm 0.9480 (inf) loss_scale 16384.0000 (26526.4762) mem 8969MB [2024-07-29 20:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][240/625] eta 0:01:23 lr 0.001954 wd 0.0500 time 0.2101 (0.2166) data time 0.0011 (0.0031) model time 0.2090 (0.2141) loss 2.6377 (3.4379) grad_norm 0.8794 (inf) loss_scale 16384.0000 (26105.6266) mem 8969MB [2024-07-29 20:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][250/625] eta 0:01:21 lr 0.001954 wd 0.0500 time 0.2097 (0.2173) data time 0.0010 (0.0030) model time 0.2087 (0.2151) loss 3.4939 (3.4487) grad_norm 1.0607 (inf) loss_scale 16384.0000 (25718.3108) mem 8969MB [2024-07-29 20:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][260/625] eta 0:01:19 lr 0.001954 wd 0.0500 time 0.2134 (0.2171) data time 0.0010 (0.0030) model time 0.2124 (0.2149) loss 3.4651 (3.4605) grad_norm 0.9002 (inf) loss_scale 16384.0000 (25360.6743) mem 8969MB [2024-07-29 20:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][270/625] eta 0:01:17 lr 0.001953 wd 0.0500 time 0.2061 (0.2169) data time 0.0009 (0.0029) model time 0.2053 (0.2147) loss 3.5366 (3.4567) grad_norm 1.2554 (inf) loss_scale 16384.0000 (25029.4317) mem 8969MB [2024-07-29 20:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][280/625] eta 0:01:14 lr 0.001953 wd 0.0500 time 0.2130 (0.2167) data time 0.0011 (0.0028) model time 0.2119 (0.2144) loss 3.8908 (3.4674) grad_norm 0.7860 (inf) loss_scale 16384.0000 (24721.7651) mem 8969MB [2024-07-29 20:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][290/625] eta 0:01:12 lr 0.001953 wd 0.0500 time 0.2460 (0.2165) data time 0.0013 (0.0028) model time 0.2447 (0.2143) loss 2.9018 (3.4725) grad_norm 1.0764 (inf) loss_scale 16384.0000 (24435.2440) mem 8969MB [2024-07-29 20:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][300/625] eta 0:01:10 lr 0.001953 wd 0.0500 time 0.2070 (0.2165) data time 0.0010 (0.0027) model time 0.2061 (0.2143) loss 3.5348 (3.4739) grad_norm 1.1805 (inf) loss_scale 16384.0000 (24167.7608) mem 8969MB [2024-07-29 20:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][310/625] eta 0:01:08 lr 0.001953 wd 0.0500 time 0.2135 (0.2163) data time 0.0009 (0.0027) model time 0.2126 (0.2141) loss 3.1778 (3.4622) grad_norm 1.3122 (inf) loss_scale 16384.0000 (23917.4791) mem 8969MB [2024-07-29 20:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][320/625] eta 0:01:05 lr 0.001953 wd 0.0500 time 0.2124 (0.2162) data time 0.0010 (0.0026) model time 0.2114 (0.2141) loss 3.4742 (3.4510) grad_norm 1.2910 (inf) loss_scale 16384.0000 (23682.7913) mem 8969MB [2024-07-29 20:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][330/625] eta 0:01:03 lr 0.001953 wd 0.0500 time 0.2029 (0.2169) data time 0.0011 (0.0026) model time 0.2017 (0.2149) loss 3.0578 (3.4476) grad_norm 0.9436 (inf) loss_scale 16384.0000 (23462.2840) mem 8969MB [2024-07-29 20:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][340/625] eta 0:01:01 lr 0.001953 wd 0.0500 time 0.2158 (0.2168) data time 0.0009 (0.0025) model time 0.2149 (0.2148) loss 3.2855 (3.4561) grad_norm 2.3425 (inf) loss_scale 16384.0000 (23254.7097) mem 8969MB [2024-07-29 20:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][350/625] eta 0:00:59 lr 0.001953 wd 0.0500 time 0.2022 (0.2167) data time 0.0014 (0.0025) model time 0.2008 (0.2147) loss 2.6805 (3.4581) grad_norm 1.2196 (inf) loss_scale 16384.0000 (23058.9630) mem 8969MB [2024-07-29 20:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][360/625] eta 0:00:57 lr 0.001953 wd 0.0500 time 0.2086 (0.2166) data time 0.0011 (0.0025) model time 0.2075 (0.2146) loss 3.6941 (3.4659) grad_norm 1.1943 (inf) loss_scale 16384.0000 (22874.0609) mem 8969MB [2024-07-29 20:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][370/625] eta 0:00:55 lr 0.001953 wd 0.0500 time 0.2103 (0.2164) data time 0.0009 (0.0024) model time 0.2094 (0.2145) loss 3.7463 (3.4696) grad_norm 1.0933 (inf) loss_scale 16384.0000 (22699.1267) mem 8969MB [2024-07-29 20:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][380/625] eta 0:00:53 lr 0.001953 wd 0.0500 time 0.2013 (0.2163) data time 0.0012 (0.0024) model time 0.2001 (0.2144) loss 3.7120 (3.4654) grad_norm 1.4509 (inf) loss_scale 16384.0000 (22533.3753) mem 8969MB [2024-07-29 20:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][390/625] eta 0:00:50 lr 0.001953 wd 0.0500 time 0.2060 (0.2163) data time 0.0010 (0.0024) model time 0.2050 (0.2144) loss 4.0575 (3.4661) grad_norm 0.9709 (inf) loss_scale 16384.0000 (22376.1023) mem 8969MB [2024-07-29 20:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][400/625] eta 0:00:48 lr 0.001953 wd 0.0500 time 0.2055 (0.2162) data time 0.0010 (0.0023) model time 0.2045 (0.2142) loss 2.7826 (3.4648) grad_norm 1.5391 (inf) loss_scale 16384.0000 (22226.6733) mem 8969MB [2024-07-29 20:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][410/625] eta 0:00:46 lr 0.001953 wd 0.0500 time 0.2110 (0.2161) data time 0.0012 (0.0023) model time 0.2098 (0.2141) loss 3.9377 (3.4753) grad_norm 0.9125 (inf) loss_scale 16384.0000 (22084.5158) mem 8969MB [2024-07-29 20:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][420/625] eta 0:00:44 lr 0.001953 wd 0.0500 time 0.2044 (0.2160) data time 0.0010 (0.0023) model time 0.2034 (0.2140) loss 3.4501 (3.4803) grad_norm 0.9906 (inf) loss_scale 16384.0000 (21949.1116) mem 8969MB [2024-07-29 20:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][430/625] eta 0:00:42 lr 0.001953 wd 0.0500 time 0.2057 (0.2158) data time 0.0008 (0.0022) model time 0.2049 (0.2139) loss 3.5359 (3.4847) grad_norm 0.9442 (inf) loss_scale 16384.0000 (21819.9907) mem 8969MB [2024-07-29 20:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][440/625] eta 0:00:39 lr 0.001953 wd 0.0500 time 0.2101 (0.2157) data time 0.0008 (0.0022) model time 0.2093 (0.2138) loss 3.8568 (3.4880) grad_norm 1.2327 (inf) loss_scale 16384.0000 (21696.7256) mem 8969MB [2024-07-29 20:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][450/625] eta 0:00:37 lr 0.001953 wd 0.0500 time 0.2089 (0.2157) data time 0.0014 (0.0022) model time 0.2075 (0.2138) loss 4.0430 (3.4888) grad_norm 0.9772 (inf) loss_scale 16384.0000 (21578.9268) mem 8969MB [2024-07-29 20:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][460/625] eta 0:00:35 lr 0.001952 wd 0.0500 time 0.2095 (0.2156) data time 0.0010 (0.0022) model time 0.2086 (0.2137) loss 3.3345 (3.4860) grad_norm 1.1683 (inf) loss_scale 16384.0000 (21466.2386) mem 8969MB [2024-07-29 20:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][470/625] eta 0:00:33 lr 0.001952 wd 0.0500 time 0.2078 (0.2155) data time 0.0010 (0.0021) model time 0.2069 (0.2137) loss 3.9062 (3.4866) grad_norm 0.8603 (inf) loss_scale 16384.0000 (21358.3355) mem 8969MB [2024-07-29 20:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][480/625] eta 0:00:31 lr 0.001952 wd 0.0500 time 0.2119 (0.2155) data time 0.0008 (0.0021) model time 0.2111 (0.2136) loss 4.3547 (3.4889) grad_norm 0.9640 (inf) loss_scale 16384.0000 (21254.9189) mem 8969MB [2024-07-29 20:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][490/625] eta 0:00:29 lr 0.001952 wd 0.0500 time 0.2069 (0.2154) data time 0.0007 (0.0021) model time 0.2062 (0.2136) loss 2.1275 (3.4886) grad_norm 2.4912 (inf) loss_scale 16384.0000 (21155.7149) mem 8969MB [2024-07-29 20:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][500/625] eta 0:00:26 lr 0.001952 wd 0.0500 time 0.2070 (0.2153) data time 0.0011 (0.0021) model time 0.2058 (0.2135) loss 3.9005 (3.4865) grad_norm 0.8667 (inf) loss_scale 16384.0000 (21060.4711) mem 8969MB [2024-07-29 20:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][510/625] eta 0:00:24 lr 0.001952 wd 0.0500 time 0.2063 (0.2153) data time 0.0007 (0.0021) model time 0.2055 (0.2134) loss 3.3123 (3.4853) grad_norm 1.1834 (inf) loss_scale 16384.0000 (20968.9550) mem 8969MB [2024-07-29 20:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][520/625] eta 0:00:22 lr 0.001952 wd 0.0500 time 0.2051 (0.2152) data time 0.0008 (0.0020) model time 0.2043 (0.2134) loss 3.8302 (3.4887) grad_norm 1.0072 (inf) loss_scale 16384.0000 (20880.9520) mem 8969MB [2024-07-29 20:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][530/625] eta 0:00:20 lr 0.001952 wd 0.0500 time 0.2096 (0.2152) data time 0.0010 (0.0020) model time 0.2087 (0.2134) loss 3.3591 (3.4931) grad_norm 1.6950 (inf) loss_scale 16384.0000 (20796.2637) mem 8969MB [2024-07-29 20:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][540/625] eta 0:00:18 lr 0.001952 wd 0.0500 time 0.2054 (0.2152) data time 0.0007 (0.0020) model time 0.2047 (0.2133) loss 2.4579 (3.4918) grad_norm 1.2265 (inf) loss_scale 16384.0000 (20714.7061) mem 8969MB [2024-07-29 20:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][550/625] eta 0:00:16 lr 0.001952 wd 0.0500 time 0.2104 (0.2151) data time 0.0008 (0.0020) model time 0.2096 (0.2133) loss 2.7197 (3.4869) grad_norm 1.1279 (inf) loss_scale 16384.0000 (20636.1089) mem 8969MB [2024-07-29 20:26:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][560/625] eta 0:00:13 lr 0.001952 wd 0.0500 time 0.2120 (0.2151) data time 0.0010 (0.0020) model time 0.2110 (0.2133) loss 3.8565 (3.4867) grad_norm 1.2271 (inf) loss_scale 16384.0000 (20560.3137) mem 8969MB [2024-07-29 20:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][570/625] eta 0:00:11 lr 0.001952 wd 0.0500 time 0.2056 (0.2151) data time 0.0010 (0.0020) model time 0.2046 (0.2133) loss 2.7324 (3.4817) grad_norm 1.2957 (inf) loss_scale 16384.0000 (20487.1734) mem 8969MB [2024-07-29 20:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][580/625] eta 0:00:09 lr 0.001952 wd 0.0500 time 0.2062 (0.2151) data time 0.0012 (0.0019) model time 0.2050 (0.2133) loss 3.7476 (3.4814) grad_norm 0.9131 (inf) loss_scale 16384.0000 (20416.5508) mem 8969MB [2024-07-29 20:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][590/625] eta 0:00:07 lr 0.001952 wd 0.0500 time 0.2087 (0.2150) data time 0.0010 (0.0019) model time 0.2077 (0.2133) loss 4.1792 (3.4886) grad_norm 1.1195 (inf) loss_scale 16384.0000 (20348.3181) mem 8969MB [2024-07-29 20:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][600/625] eta 0:00:05 lr 0.001952 wd 0.0500 time 0.2111 (0.2153) data time 0.0009 (0.0019) model time 0.2102 (0.2135) loss 3.1805 (3.4883) grad_norm 1.1921 (inf) loss_scale 16384.0000 (20282.3561) mem 8969MB [2024-07-29 20:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][610/625] eta 0:00:03 lr 0.001952 wd 0.0500 time 0.2057 (0.2152) data time 0.0008 (0.0019) model time 0.2049 (0.2135) loss 3.8789 (3.4888) grad_norm 0.9469 (inf) loss_scale 16384.0000 (20218.5532) mem 8969MB [2024-07-29 20:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][620/625] eta 0:00:01 lr 0.001952 wd 0.0500 time 0.2058 (0.2151) data time 0.0007 (0.0019) model time 0.2051 (0.2134) loss 3.8387 (3.4898) grad_norm 1.2242 (inf) loss_scale 16384.0000 (20156.8052) mem 8969MB [2024-07-29 20:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 47 training takes 0:02:14 [2024-07-29 20:26:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:26:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.558 (0.558) Loss 0.7866 (0.7866) Acc@1 85.205 (85.205) Acc@5 97.412 (97.412) Mem 8969MB [2024-07-29 20:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.106) Loss 1.3740 (1.0255) Acc@1 71.875 (79.807) Acc@5 90.771 (95.397) Mem 8969MB [2024-07-29 20:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.4990 (1.2194) Acc@1 67.871 (75.246) Acc@5 89.307 (92.936) Mem 8969MB [2024-07-29 20:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.024 Acc@5 92.904 [2024-07-29 20:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.0% [2024-07-29 20:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.02% [2024-07-29 20:26:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 20:26:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 20:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.8643 (0.8643) Acc@1 79.150 (79.150) Acc@5 94.873 (94.873) Mem 8969MB [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.4189 (1.0924) Acc@1 65.869 (73.113) Acc@5 89.355 (92.565) Mem 8969MB [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.6904 (1.2979) Acc@1 61.182 (69.157) Acc@5 84.375 (89.476) Mem 8969MB [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 68.812 Acc@5 89.407 [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 68.8% [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 68.81% [2024-07-29 20:26:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:26:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][0/625] eta 0:08:33 lr 0.001952 wd 0.0500 time 0.8219 (0.8219) data time 0.6117 (0.6117) model time 0.0000 (0.0000) loss 2.6511 (2.6511) grad_norm 1.5819 (1.5819) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][10/625] eta 0:02:44 lr 0.001951 wd 0.0500 time 0.2228 (0.2673) data time 0.0010 (0.0566) model time 0.0000 (0.0000) loss 2.4340 (3.2583) grad_norm 1.6103 (1.2340) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][20/625] eta 0:02:25 lr 0.001951 wd 0.0500 time 0.2044 (0.2404) data time 0.0009 (0.0302) model time 0.0000 (0.0000) loss 3.1982 (3.4108) grad_norm 1.1022 (1.1924) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][30/625] eta 0:02:17 lr 0.001951 wd 0.0500 time 0.2110 (0.2315) data time 0.0009 (0.0208) model time 0.0000 (0.0000) loss 4.3014 (3.4037) grad_norm 0.8268 (1.0985) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][40/625] eta 0:02:12 lr 0.001951 wd 0.0500 time 0.2087 (0.2264) data time 0.0009 (0.0161) model time 0.0000 (0.0000) loss 4.0696 (3.3407) grad_norm 0.8444 (1.0884) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][50/625] eta 0:02:08 lr 0.001951 wd 0.0500 time 0.2502 (0.2239) data time 0.0011 (0.0131) model time 0.0000 (0.0000) loss 2.3571 (3.3688) grad_norm 1.4366 (1.1053) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][60/625] eta 0:02:07 lr 0.001951 wd 0.0500 time 0.2093 (0.2260) data time 0.0010 (0.0112) model time 0.2082 (0.2353) loss 2.7837 (3.3451) grad_norm 1.7259 (1.1672) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][70/625] eta 0:02:04 lr 0.001951 wd 0.0500 time 0.2070 (0.2238) data time 0.0008 (0.0098) model time 0.2062 (0.2222) loss 2.2923 (3.3570) grad_norm 1.7234 (1.1850) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][80/625] eta 0:02:01 lr 0.001951 wd 0.0500 time 0.2181 (0.2221) data time 0.0008 (0.0087) model time 0.2173 (0.2177) loss 3.8398 (3.3901) grad_norm 1.0993 (1.1908) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][90/625] eta 0:01:58 lr 0.001951 wd 0.0500 time 0.2256 (0.2211) data time 0.0012 (0.0079) model time 0.2244 (0.2163) loss 3.3077 (3.3939) grad_norm 1.5816 (1.1912) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][100/625] eta 0:01:55 lr 0.001951 wd 0.0500 time 0.2172 (0.2203) data time 0.0007 (0.0072) model time 0.2165 (0.2155) loss 2.9405 (3.4076) grad_norm 1.5183 (1.1913) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][110/625] eta 0:01:53 lr 0.001951 wd 0.0500 time 0.2100 (0.2198) data time 0.0008 (0.0067) model time 0.2092 (0.2151) loss 2.8068 (3.3925) grad_norm 0.9775 (1.1810) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][120/625] eta 0:01:50 lr 0.001951 wd 0.0500 time 0.2195 (0.2193) data time 0.0010 (0.0062) model time 0.2186 (0.2147) loss 3.6834 (3.3963) grad_norm 1.0158 (1.1670) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][130/625] eta 0:01:48 lr 0.001951 wd 0.0500 time 0.2139 (0.2188) data time 0.0008 (0.0058) model time 0.2132 (0.2144) loss 3.6585 (3.4077) grad_norm 0.8315 (1.1743) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][140/625] eta 0:01:46 lr 0.001951 wd 0.0500 time 0.2078 (0.2189) data time 0.0010 (0.0055) model time 0.2068 (0.2150) loss 4.3521 (3.4020) grad_norm 1.3684 (1.1831) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][150/625] eta 0:01:43 lr 0.001951 wd 0.0500 time 0.2181 (0.2184) data time 0.0009 (0.0052) model time 0.2172 (0.2145) loss 4.2184 (3.4330) grad_norm 0.9108 (1.1754) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][160/625] eta 0:01:41 lr 0.001951 wd 0.0500 time 0.2348 (0.2182) data time 0.0007 (0.0049) model time 0.2341 (0.2145) loss 2.4303 (3.4329) grad_norm 0.7900 (1.1694) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][170/625] eta 0:01:39 lr 0.001951 wd 0.0500 time 0.2133 (0.2178) data time 0.0008 (0.0047) model time 0.2126 (0.2140) loss 4.1240 (3.4317) grad_norm 1.7992 (1.1752) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][180/625] eta 0:01:36 lr 0.001951 wd 0.0500 time 0.2074 (0.2174) data time 0.0012 (0.0045) model time 0.2061 (0.2136) loss 3.8465 (3.4189) grad_norm 1.0035 (1.1775) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][190/625] eta 0:01:34 lr 0.001950 wd 0.0500 time 0.2143 (0.2171) data time 0.0010 (0.0043) model time 0.2132 (0.2135) loss 3.3109 (3.4231) grad_norm 1.5663 (1.1859) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][200/625] eta 0:01:32 lr 0.001950 wd 0.0500 time 0.2128 (0.2171) data time 0.0007 (0.0042) model time 0.2120 (0.2137) loss 3.9923 (3.4403) grad_norm 1.2882 (1.1979) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][210/625] eta 0:01:30 lr 0.001950 wd 0.0500 time 0.2196 (0.2169) data time 0.0012 (0.0040) model time 0.2184 (0.2135) loss 3.7356 (3.4409) grad_norm 1.8845 (1.1981) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][220/625] eta 0:01:27 lr 0.001950 wd 0.0500 time 0.2245 (0.2167) data time 0.0008 (0.0040) model time 0.2237 (0.2134) loss 4.2850 (3.4422) grad_norm 1.1468 (1.1966) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][230/625] eta 0:01:25 lr 0.001950 wd 0.0500 time 0.2154 (0.2166) data time 0.0009 (0.0039) model time 0.2145 (0.2133) loss 2.3681 (3.4378) grad_norm 1.8341 (1.1927) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][240/625] eta 0:01:23 lr 0.001950 wd 0.0500 time 0.2055 (0.2163) data time 0.0007 (0.0038) model time 0.2048 (0.2130) loss 3.9362 (3.4574) grad_norm 1.3123 (1.1903) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][250/625] eta 0:01:21 lr 0.001950 wd 0.0500 time 0.2223 (0.2166) data time 0.0010 (0.0037) model time 0.2212 (0.2134) loss 2.9170 (3.4428) grad_norm 1.1041 (1.1976) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][260/625] eta 0:01:18 lr 0.001950 wd 0.0500 time 0.2020 (0.2163) data time 0.0009 (0.0036) model time 0.2011 (0.2131) loss 4.3135 (3.4441) grad_norm 1.1986 (1.1932) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][270/625] eta 0:01:16 lr 0.001950 wd 0.0500 time 0.2257 (0.2162) data time 0.0010 (0.0035) model time 0.2247 (0.2131) loss 3.0977 (3.4550) grad_norm 0.9558 (1.1975) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][280/625] eta 0:01:14 lr 0.001950 wd 0.0500 time 0.2139 (0.2161) data time 0.0008 (0.0035) model time 0.2131 (0.2131) loss 4.2184 (3.4608) grad_norm 0.8089 (1.1975) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][290/625] eta 0:01:12 lr 0.001950 wd 0.0500 time 0.2135 (0.2161) data time 0.0010 (0.0034) model time 0.2125 (0.2131) loss 4.0765 (3.4684) grad_norm 0.9722 (1.1981) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][300/625] eta 0:01:10 lr 0.001950 wd 0.0500 time 0.2027 (0.2159) data time 0.0011 (0.0033) model time 0.2017 (0.2129) loss 3.5003 (3.4643) grad_norm 0.9589 (1.2060) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][310/625] eta 0:01:07 lr 0.001950 wd 0.0500 time 0.2252 (0.2159) data time 0.0007 (0.0032) model time 0.2244 (0.2130) loss 2.4149 (3.4635) grad_norm 1.7500 (1.2137) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][320/625] eta 0:01:05 lr 0.001950 wd 0.0500 time 0.2046 (0.2157) data time 0.0008 (0.0032) model time 0.2038 (0.2128) loss 3.0381 (3.4582) grad_norm 1.1356 (1.2064) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][330/625] eta 0:01:03 lr 0.001950 wd 0.0500 time 0.2065 (0.2155) data time 0.0008 (0.0031) model time 0.2057 (0.2126) loss 3.3857 (3.4603) grad_norm 1.0180 (1.2084) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][340/625] eta 0:01:01 lr 0.001950 wd 0.0500 time 0.2072 (0.2154) data time 0.0010 (0.0030) model time 0.2062 (0.2126) loss 4.3442 (3.4695) grad_norm 1.1444 (1.2101) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][350/625] eta 0:00:59 lr 0.001950 wd 0.0500 time 0.2241 (0.2153) data time 0.0008 (0.0030) model time 0.2233 (0.2126) loss 4.0257 (3.4702) grad_norm 0.8631 (1.2114) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][360/625] eta 0:00:57 lr 0.001950 wd 0.0500 time 0.2112 (0.2154) data time 0.0010 (0.0029) model time 0.2102 (0.2128) loss 2.9042 (3.4662) grad_norm 1.2261 (1.2111) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][370/625] eta 0:00:54 lr 0.001949 wd 0.0500 time 0.2077 (0.2154) data time 0.0010 (0.0029) model time 0.2067 (0.2127) loss 3.2638 (3.4716) grad_norm 1.0686 (1.2086) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][380/625] eta 0:00:52 lr 0.001949 wd 0.0500 time 0.2127 (0.2153) data time 0.0010 (0.0029) model time 0.2117 (0.2127) loss 3.6588 (3.4683) grad_norm 3.0127 (1.2136) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][390/625] eta 0:00:50 lr 0.001949 wd 0.0500 time 0.2072 (0.2152) data time 0.0010 (0.0028) model time 0.2061 (0.2126) loss 3.2895 (3.4617) grad_norm 1.7751 (1.2193) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][400/625] eta 0:00:48 lr 0.001949 wd 0.0500 time 0.2067 (0.2152) data time 0.0008 (0.0028) model time 0.2058 (0.2126) loss 3.8351 (3.4709) grad_norm 1.1479 (1.2250) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][410/625] eta 0:00:46 lr 0.001949 wd 0.0500 time 0.2117 (0.2156) data time 0.0008 (0.0028) model time 0.2109 (0.2131) loss 4.0833 (3.4805) grad_norm 1.4305 (1.2243) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][420/625] eta 0:00:44 lr 0.001949 wd 0.0500 time 0.2190 (0.2156) data time 0.0009 (0.0028) model time 0.2181 (0.2131) loss 3.1131 (3.4794) grad_norm 1.1488 (1.2304) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][430/625] eta 0:00:42 lr 0.001949 wd 0.0500 time 0.2136 (0.2157) data time 0.0009 (0.0028) model time 0.2127 (0.2132) loss 4.2570 (3.4833) grad_norm 0.8465 (1.2278) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][440/625] eta 0:00:39 lr 0.001949 wd 0.0500 time 0.2179 (0.2157) data time 0.0008 (0.0027) model time 0.2171 (0.2133) loss 4.4337 (3.4881) grad_norm 1.2048 (1.2258) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][450/625] eta 0:00:37 lr 0.001949 wd 0.0500 time 0.2057 (0.2157) data time 0.0011 (0.0027) model time 0.2046 (0.2132) loss 3.9628 (3.4854) grad_norm 1.1135 (1.2227) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][460/625] eta 0:00:35 lr 0.001949 wd 0.0500 time 0.2183 (0.2156) data time 0.0009 (0.0027) model time 0.2173 (0.2133) loss 4.0350 (3.4827) grad_norm 0.9644 (1.2181) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][470/625] eta 0:00:33 lr 0.001949 wd 0.0500 time 0.2046 (0.2156) data time 0.0010 (0.0026) model time 0.2036 (0.2132) loss 3.0571 (3.4848) grad_norm 1.3223 (1.2157) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][480/625] eta 0:00:31 lr 0.001949 wd 0.0500 time 0.2088 (0.2160) data time 0.0010 (0.0026) model time 0.2078 (0.2138) loss 3.5843 (3.4809) grad_norm 1.1034 (1.2220) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][490/625] eta 0:00:29 lr 0.001949 wd 0.0500 time 0.2084 (0.2161) data time 0.0008 (0.0026) model time 0.2076 (0.2138) loss 3.9115 (3.4855) grad_norm 1.7143 (1.2257) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][500/625] eta 0:00:26 lr 0.001949 wd 0.0500 time 0.2117 (0.2160) data time 0.0008 (0.0026) model time 0.2109 (0.2137) loss 3.7933 (3.4868) grad_norm 1.0702 (1.2223) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][510/625] eta 0:00:24 lr 0.001949 wd 0.0500 time 0.2049 (0.2160) data time 0.0012 (0.0025) model time 0.2038 (0.2137) loss 3.6498 (3.4926) grad_norm 1.5446 (1.2231) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][520/625] eta 0:00:22 lr 0.001949 wd 0.0500 time 0.2090 (0.2159) data time 0.0008 (0.0025) model time 0.2082 (0.2136) loss 4.0634 (3.4976) grad_norm 1.3420 (1.2239) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][530/625] eta 0:00:20 lr 0.001949 wd 0.0500 time 0.2110 (0.2158) data time 0.0010 (0.0025) model time 0.2100 (0.2135) loss 2.3679 (3.4960) grad_norm 1.1264 (1.2226) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][540/625] eta 0:00:18 lr 0.001949 wd 0.0500 time 0.2110 (0.2157) data time 0.0008 (0.0025) model time 0.2102 (0.2135) loss 2.9273 (3.4904) grad_norm 1.1769 (1.2200) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][550/625] eta 0:00:16 lr 0.001948 wd 0.0500 time 0.2187 (0.2157) data time 0.0009 (0.0024) model time 0.2178 (0.2135) loss 4.3083 (3.4862) grad_norm 1.5971 (1.2197) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][560/625] eta 0:00:14 lr 0.001948 wd 0.0500 time 0.2114 (0.2158) data time 0.0010 (0.0024) model time 0.2103 (0.2137) loss 2.1874 (3.4863) grad_norm 1.1943 (1.2211) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][570/625] eta 0:00:11 lr 0.001948 wd 0.0500 time 0.2126 (0.2158) data time 0.0009 (0.0024) model time 0.2117 (0.2137) loss 3.6840 (3.4885) grad_norm 0.9169 (1.2208) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][580/625] eta 0:00:09 lr 0.001948 wd 0.0500 time 0.2121 (0.2157) data time 0.0009 (0.0024) model time 0.2112 (0.2136) loss 3.0044 (3.4918) grad_norm 1.2871 (1.2181) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][590/625] eta 0:00:07 lr 0.001948 wd 0.0500 time 0.2119 (0.2156) data time 0.0010 (0.0024) model time 0.2110 (0.2135) loss 3.6930 (3.4968) grad_norm 1.4055 (1.2214) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:28:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][600/625] eta 0:00:05 lr 0.001948 wd 0.0500 time 0.2058 (0.2156) data time 0.0008 (0.0023) model time 0.2050 (0.2135) loss 3.9444 (3.5019) grad_norm 1.3282 (1.2234) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][610/625] eta 0:00:03 lr 0.001948 wd 0.0500 time 0.2085 (0.2155) data time 0.0005 (0.0023) model time 0.2080 (0.2134) loss 4.3654 (3.5015) grad_norm 1.2962 (1.2244) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][620/625] eta 0:00:01 lr 0.001948 wd 0.0500 time 0.2059 (0.2155) data time 0.0004 (0.0023) model time 0.2055 (0.2134) loss 4.0943 (3.5025) grad_norm 1.4625 (1.2224) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 48 training takes 0:02:14 [2024-07-29 20:29:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:29:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:29:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.7461 (0.7461) Acc@1 85.596 (85.596) Acc@5 97.070 (97.070) Mem 8969MB [2024-07-29 20:29:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.2764 (0.9209) Acc@1 72.119 (80.189) Acc@5 91.406 (95.663) Mem 8969MB [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.4492 (1.1098) Acc@1 66.943 (75.688) Acc@5 89.062 (93.211) Mem 8969MB [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.410 Acc@5 93.176 [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.4% [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.41% [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 20:29:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 20:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.587 (0.587) Loss 0.8228 (0.8228) Acc@1 79.980 (79.980) Acc@5 95.361 (95.361) Mem 8969MB [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 1.3730 (1.0471) Acc@1 66.797 (74.134) Acc@5 89.795 (93.066) Mem 8969MB [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.6357 (1.2511) Acc@1 61.963 (70.085) Acc@5 84.961 (90.060) Mem 8969MB [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.746 Acc@5 89.987 [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 69.7% [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 69.75% [2024-07-29 20:29:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:29:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][0/625] eta 0:06:56 lr 0.001948 wd 0.0500 time 0.6669 (0.6669) data time 0.4684 (0.4684) model time 0.0000 (0.0000) loss 3.7344 (3.7344) grad_norm 1.1883 (1.1883) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][10/625] eta 0:02:47 lr 0.001948 wd 0.0500 time 0.2235 (0.2728) data time 0.0010 (0.0436) model time 0.0000 (0.0000) loss 3.8010 (3.6933) grad_norm 1.6473 (1.2351) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][20/625] eta 0:02:27 lr 0.001948 wd 0.0500 time 0.2109 (0.2436) data time 0.0011 (0.0234) model time 0.0000 (0.0000) loss 3.8837 (3.6166) grad_norm 0.9865 (1.2328) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][30/625] eta 0:02:19 lr 0.001948 wd 0.0500 time 0.2084 (0.2337) data time 0.0011 (0.0162) model time 0.0000 (0.0000) loss 4.1300 (3.5877) grad_norm 1.2132 (1.2193) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][40/625] eta 0:02:14 lr 0.001948 wd 0.0500 time 0.2186 (0.2302) data time 0.0010 (0.0127) model time 0.0000 (0.0000) loss 3.9165 (3.5733) grad_norm 1.2102 (1.1907) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][50/625] eta 0:02:10 lr 0.001948 wd 0.0500 time 0.2144 (0.2264) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 3.2142 (3.5433) grad_norm 1.2867 (1.2028) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][60/625] eta 0:02:06 lr 0.001948 wd 0.0500 time 0.2147 (0.2245) data time 0.0009 (0.0090) model time 0.2138 (0.2135) loss 4.1593 (3.5919) grad_norm 1.3475 (1.2330) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][70/625] eta 0:02:03 lr 0.001948 wd 0.0500 time 0.2101 (0.2233) data time 0.0010 (0.0079) model time 0.2092 (0.2143) loss 3.6442 (3.5786) grad_norm 1.7270 (1.2791) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][80/625] eta 0:02:01 lr 0.001948 wd 0.0500 time 0.2065 (0.2224) data time 0.0008 (0.0070) model time 0.2056 (0.2144) loss 2.7040 (3.5008) grad_norm 0.9021 (1.2586) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][90/625] eta 0:01:58 lr 0.001948 wd 0.0500 time 0.2254 (0.2214) data time 0.0011 (0.0064) model time 0.2242 (0.2140) loss 2.2884 (3.4489) grad_norm 1.2439 (1.2465) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][100/625] eta 0:01:55 lr 0.001947 wd 0.0500 time 0.2063 (0.2203) data time 0.0014 (0.0059) model time 0.2049 (0.2130) loss 4.1281 (3.4415) grad_norm 1.3212 (1.2681) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][110/625] eta 0:01:53 lr 0.001947 wd 0.0500 time 0.2093 (0.2196) data time 0.0011 (0.0054) model time 0.2082 (0.2127) loss 2.6445 (3.4510) grad_norm 2.2122 (1.2972) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][120/625] eta 0:01:50 lr 0.001947 wd 0.0500 time 0.2234 (0.2190) data time 0.0012 (0.0051) model time 0.2223 (0.2124) loss 3.5298 (3.4702) grad_norm 1.0084 (1.2868) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][130/625] eta 0:01:48 lr 0.001947 wd 0.0500 time 0.2111 (0.2184) data time 0.0012 (0.0048) model time 0.2099 (0.2122) loss 3.3175 (3.4786) grad_norm 1.0741 (1.2787) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][140/625] eta 0:01:45 lr 0.001947 wd 0.0500 time 0.2034 (0.2182) data time 0.0010 (0.0045) model time 0.2024 (0.2123) loss 4.3056 (3.4819) grad_norm 1.8083 (1.2806) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][150/625] eta 0:01:43 lr 0.001947 wd 0.0500 time 0.2118 (0.2176) data time 0.0009 (0.0043) model time 0.2109 (0.2119) loss 3.2294 (3.4599) grad_norm 1.2701 (1.2911) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][160/625] eta 0:01:40 lr 0.001947 wd 0.0500 time 0.2099 (0.2172) data time 0.0011 (0.0041) model time 0.2088 (0.2117) loss 2.7296 (3.4700) grad_norm 1.3521 (1.3022) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][170/625] eta 0:01:38 lr 0.001947 wd 0.0500 time 0.2112 (0.2171) data time 0.0009 (0.0039) model time 0.2103 (0.2119) loss 2.7657 (3.4456) grad_norm 0.9559 (1.2870) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][180/625] eta 0:01:36 lr 0.001947 wd 0.0500 time 0.2177 (0.2169) data time 0.0009 (0.0038) model time 0.2168 (0.2120) loss 3.1079 (3.4473) grad_norm 1.1886 (1.2775) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][190/625] eta 0:01:34 lr 0.001947 wd 0.0500 time 0.2073 (0.2169) data time 0.0008 (0.0036) model time 0.2065 (0.2122) loss 3.8819 (3.4560) grad_norm 0.9990 (1.2737) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][200/625] eta 0:01:32 lr 0.001947 wd 0.0500 time 0.2082 (0.2165) data time 0.0007 (0.0035) model time 0.2074 (0.2120) loss 4.0673 (3.4706) grad_norm 2.0021 (1.2726) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][210/625] eta 0:01:29 lr 0.001947 wd 0.0500 time 0.2119 (0.2163) data time 0.0010 (0.0034) model time 0.2109 (0.2120) loss 3.4087 (3.4789) grad_norm 0.9712 (1.2710) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][220/625] eta 0:01:27 lr 0.001947 wd 0.0500 time 0.2077 (0.2160) data time 0.0012 (0.0033) model time 0.2065 (0.2118) loss 3.3817 (3.4824) grad_norm 0.9761 (1.2609) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][230/625] eta 0:01:25 lr 0.001947 wd 0.0500 time 0.2072 (0.2158) data time 0.0008 (0.0032) model time 0.2064 (0.2117) loss 3.4718 (3.4821) grad_norm 1.0054 (1.2504) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][240/625] eta 0:01:23 lr 0.001947 wd 0.0500 time 0.2175 (0.2158) data time 0.0010 (0.0031) model time 0.2165 (0.2117) loss 2.2116 (3.4810) grad_norm 0.9324 (1.2548) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][250/625] eta 0:01:20 lr 0.001947 wd 0.0500 time 0.2157 (0.2158) data time 0.0010 (0.0031) model time 0.2147 (0.2120) loss 3.7158 (3.4629) grad_norm 1.6442 (1.2511) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][260/625] eta 0:01:18 lr 0.001947 wd 0.0500 time 0.2111 (0.2157) data time 0.0009 (0.0030) model time 0.2101 (0.2120) loss 4.1389 (3.4485) grad_norm 1.0154 (1.2604) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][270/625] eta 0:01:16 lr 0.001947 wd 0.0500 time 0.2102 (0.2156) data time 0.0007 (0.0029) model time 0.2095 (0.2120) loss 3.5866 (3.4507) grad_norm 0.9460 (1.2526) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][280/625] eta 0:01:14 lr 0.001946 wd 0.0500 time 0.2169 (0.2156) data time 0.0009 (0.0028) model time 0.2160 (0.2120) loss 4.0223 (3.4476) grad_norm 1.2735 (1.2467) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][290/625] eta 0:01:12 lr 0.001946 wd 0.0500 time 0.2097 (0.2155) data time 0.0007 (0.0028) model time 0.2090 (0.2120) loss 3.2144 (3.4504) grad_norm 1.6697 (1.2465) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][300/625] eta 0:01:10 lr 0.001946 wd 0.0500 time 0.2134 (0.2155) data time 0.0010 (0.0027) model time 0.2124 (0.2121) loss 4.4860 (3.4600) grad_norm 2.0395 (1.2473) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][310/625] eta 0:01:07 lr 0.001946 wd 0.0500 time 0.2066 (0.2153) data time 0.0011 (0.0027) model time 0.2055 (0.2120) loss 2.9686 (3.4612) grad_norm 0.8627 (1.2425) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][320/625] eta 0:01:05 lr 0.001946 wd 0.0500 time 0.2170 (0.2152) data time 0.0011 (0.0026) model time 0.2159 (0.2120) loss 3.0645 (3.4585) grad_norm 1.1726 (1.2459) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][330/625] eta 0:01:03 lr 0.001946 wd 0.0500 time 0.2116 (0.2151) data time 0.0008 (0.0026) model time 0.2108 (0.2119) loss 3.8822 (3.4616) grad_norm 0.8939 (1.2402) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][340/625] eta 0:01:01 lr 0.001946 wd 0.0500 time 0.2114 (0.2149) data time 0.0012 (0.0025) model time 0.2101 (0.2118) loss 3.7854 (3.4613) grad_norm 0.8740 (1.2416) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][350/625] eta 0:00:59 lr 0.001946 wd 0.0500 time 0.2088 (0.2148) data time 0.0012 (0.0025) model time 0.2076 (0.2116) loss 2.6110 (3.4556) grad_norm 0.9537 (1.2370) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][360/625] eta 0:00:56 lr 0.001946 wd 0.0500 time 0.2215 (0.2147) data time 0.0009 (0.0025) model time 0.2206 (0.2116) loss 3.9900 (3.4528) grad_norm 1.4996 (1.2433) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][370/625] eta 0:00:54 lr 0.001946 wd 0.0500 time 0.2183 (0.2146) data time 0.0008 (0.0024) model time 0.2175 (0.2116) loss 3.5646 (3.4614) grad_norm 1.2170 (1.2536) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][380/625] eta 0:00:52 lr 0.001946 wd 0.0500 time 0.2120 (0.2145) data time 0.0012 (0.0024) model time 0.2108 (0.2115) loss 3.6155 (3.4625) grad_norm 1.3768 (1.2576) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][390/625] eta 0:00:50 lr 0.001946 wd 0.0500 time 0.2082 (0.2144) data time 0.0012 (0.0024) model time 0.2070 (0.2115) loss 4.0210 (3.4604) grad_norm 1.2716 (1.2524) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][400/625] eta 0:00:48 lr 0.001946 wd 0.0500 time 0.2081 (0.2143) data time 0.0008 (0.0023) model time 0.2072 (0.2114) loss 3.9004 (3.4647) grad_norm 0.7758 (1.2505) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][410/625] eta 0:00:46 lr 0.001946 wd 0.0500 time 0.2057 (0.2143) data time 0.0011 (0.0023) model time 0.2046 (0.2114) loss 2.8011 (3.4625) grad_norm 1.2563 (1.2503) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][420/625] eta 0:00:43 lr 0.001946 wd 0.0500 time 0.2102 (0.2142) data time 0.0011 (0.0023) model time 0.2090 (0.2114) loss 3.7399 (3.4627) grad_norm 1.3170 (1.2524) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][430/625] eta 0:00:41 lr 0.001946 wd 0.0500 time 0.2093 (0.2142) data time 0.0007 (0.0022) model time 0.2086 (0.2114) loss 3.8848 (3.4685) grad_norm 1.7985 (1.2540) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][440/625] eta 0:00:39 lr 0.001946 wd 0.0500 time 0.2129 (0.2141) data time 0.0008 (0.0022) model time 0.2120 (0.2114) loss 4.2107 (3.4714) grad_norm 1.9775 (1.2614) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][450/625] eta 0:00:37 lr 0.001945 wd 0.0500 time 0.2083 (0.2141) data time 0.0011 (0.0022) model time 0.2071 (0.2114) loss 3.5418 (3.4683) grad_norm 0.8759 (1.2561) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][460/625] eta 0:00:35 lr 0.001945 wd 0.0500 time 0.2059 (0.2140) data time 0.0010 (0.0022) model time 0.2048 (0.2113) loss 3.3399 (3.4642) grad_norm 1.0768 (1.2504) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][470/625] eta 0:00:33 lr 0.001945 wd 0.0500 time 0.2182 (0.2140) data time 0.0009 (0.0022) model time 0.2173 (0.2113) loss 4.1001 (3.4601) grad_norm 1.0716 (1.2460) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][480/625] eta 0:00:31 lr 0.001945 wd 0.0500 time 0.2154 (0.2140) data time 0.0010 (0.0021) model time 0.2144 (0.2114) loss 3.8648 (3.4682) grad_norm 1.2561 (1.2457) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][490/625] eta 0:00:28 lr 0.001945 wd 0.0500 time 0.2082 (0.2140) data time 0.0007 (0.0021) model time 0.2075 (0.2115) loss 3.4031 (3.4669) grad_norm 1.7372 (1.2530) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][500/625] eta 0:00:26 lr 0.001945 wd 0.0500 time 0.2240 (0.2141) data time 0.0008 (0.0021) model time 0.2232 (0.2115) loss 3.1651 (3.4631) grad_norm 1.1886 (1.2587) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:30:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][510/625] eta 0:00:24 lr 0.001945 wd 0.0500 time 0.2113 (0.2140) data time 0.0011 (0.0021) model time 0.2102 (0.2115) loss 3.6947 (3.4668) grad_norm 0.9100 (1.2537) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][520/625] eta 0:00:22 lr 0.001945 wd 0.0500 time 0.2203 (0.2139) data time 0.0008 (0.0021) model time 0.2195 (0.2114) loss 3.0181 (3.4595) grad_norm 0.8031 (1.2547) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][530/625] eta 0:00:20 lr 0.001945 wd 0.0500 time 0.2140 (0.2139) data time 0.0008 (0.0020) model time 0.2133 (0.2114) loss 3.4512 (3.4588) grad_norm 0.9902 (1.2597) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][540/625] eta 0:00:18 lr 0.001945 wd 0.0500 time 0.2122 (0.2139) data time 0.0010 (0.0020) model time 0.2112 (0.2114) loss 2.9212 (3.4565) grad_norm 1.3450 (1.2586) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][550/625] eta 0:00:16 lr 0.001945 wd 0.0500 time 0.2102 (0.2139) data time 0.0010 (0.0020) model time 0.2092 (0.2115) loss 3.9488 (3.4551) grad_norm 1.4261 (1.2593) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][560/625] eta 0:00:13 lr 0.001945 wd 0.0500 time 0.2067 (0.2138) data time 0.0009 (0.0020) model time 0.2058 (0.2114) loss 3.0660 (3.4547) grad_norm 1.4575 (1.2613) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][570/625] eta 0:00:11 lr 0.001945 wd 0.0500 time 0.2130 (0.2138) data time 0.0013 (0.0020) model time 0.2116 (0.2114) loss 3.3362 (3.4507) grad_norm 0.8940 (1.2639) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][580/625] eta 0:00:09 lr 0.001945 wd 0.0500 time 0.2036 (0.2138) data time 0.0009 (0.0020) model time 0.2028 (0.2114) loss 2.7583 (3.4513) grad_norm 1.2530 (1.2605) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][590/625] eta 0:00:07 lr 0.001945 wd 0.0500 time 0.2194 (0.2137) data time 0.0010 (0.0020) model time 0.2184 (0.2114) loss 3.6985 (3.4471) grad_norm 1.0771 (1.2569) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][600/625] eta 0:00:05 lr 0.001945 wd 0.0500 time 0.2185 (0.2137) data time 0.0007 (0.0019) model time 0.2178 (0.2114) loss 4.1257 (3.4477) grad_norm 1.0376 (1.2560) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][610/625] eta 0:00:03 lr 0.001945 wd 0.0500 time 0.2094 (0.2138) data time 0.0007 (0.0019) model time 0.2086 (0.2114) loss 3.3417 (3.4537) grad_norm 1.0893 (1.2521) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][620/625] eta 0:00:01 lr 0.001944 wd 0.0500 time 0.2091 (0.2137) data time 0.0007 (0.0019) model time 0.2084 (0.2114) loss 3.9969 (3.4553) grad_norm 1.1962 (1.2507) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 49 training takes 0:02:13 [2024-07-29 20:31:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:31:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.554 (0.554) Loss 0.8579 (0.8579) Acc@1 84.082 (84.082) Acc@5 96.826 (96.826) Mem 8969MB [2024-07-29 20:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.105) Loss 1.3340 (1.0202) Acc@1 73.047 (79.705) Acc@5 92.334 (95.552) Mem 8969MB [2024-07-29 20:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.4990 (1.2066) Acc@1 68.213 (75.467) Acc@5 89.258 (93.034) Mem 8969MB [2024-07-29 20:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.282 Acc@5 92.992 [2024-07-29 20:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.3% [2024-07-29 20:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.804 (0.804) Loss 0.7852 (0.7852) Acc@1 81.006 (81.006) Acc@5 95.410 (95.410) Mem 8969MB [2024-07-29 20:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.133) Loss 1.3311 (1.0071) Acc@1 67.822 (75.133) Acc@5 90.283 (93.541) Mem 8969MB [2024-07-29 20:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.5889 (1.2094) Acc@1 62.744 (70.938) Acc@5 85.693 (90.602) Mem 8969MB [2024-07-29 20:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 70.591 Acc@5 90.525 [2024-07-29 20:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 70.6% [2024-07-29 20:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 70.59% [2024-07-29 20:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:31:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][0/625] eta 0:06:14 lr 0.001944 wd 0.0500 time 0.5994 (0.5994) data time 0.4048 (0.4048) model time 0.0000 (0.0000) loss 3.8853 (3.8853) grad_norm 1.0378 (1.0378) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][10/625] eta 0:02:32 lr 0.001944 wd 0.0500 time 0.2061 (0.2478) data time 0.0010 (0.0378) model time 0.0000 (0.0000) loss 2.4315 (3.6919) grad_norm 1.3610 (1.3649) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][20/625] eta 0:02:20 lr 0.001944 wd 0.0500 time 0.2189 (0.2323) data time 0.0010 (0.0204) model time 0.0000 (0.0000) loss 3.8820 (3.6220) grad_norm 0.8595 (1.1930) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][30/625] eta 0:02:14 lr 0.001944 wd 0.0500 time 0.2103 (0.2257) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 2.7068 (3.5625) grad_norm 0.9156 (1.2003) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][40/625] eta 0:02:10 lr 0.001944 wd 0.0500 time 0.2044 (0.2227) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 3.9086 (3.4958) grad_norm 0.9295 (1.1317) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][50/625] eta 0:02:06 lr 0.001944 wd 0.0500 time 0.2127 (0.2203) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 3.3356 (3.5041) grad_norm 0.9190 (1.0936) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][60/625] eta 0:02:03 lr 0.001944 wd 0.0500 time 0.2056 (0.2186) data time 0.0010 (0.0077) model time 0.2046 (0.2087) loss 3.7832 (3.5086) grad_norm 1.2341 (1.1230) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][70/625] eta 0:02:02 lr 0.001944 wd 0.0500 time 0.2072 (0.2209) data time 0.0009 (0.0068) model time 0.2063 (0.2215) loss 4.1477 (3.5286) grad_norm 1.1385 (1.1611) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][80/625] eta 0:01:59 lr 0.001944 wd 0.0500 time 0.2252 (0.2197) data time 0.0012 (0.0061) model time 0.2241 (0.2177) loss 3.8687 (3.5142) grad_norm 1.3641 (1.1653) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][90/625] eta 0:01:57 lr 0.001944 wd 0.0500 time 0.3169 (0.2200) data time 0.0010 (0.0055) model time 0.3159 (0.2184) loss 4.0021 (3.5148) grad_norm 1.0092 (1.1804) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][100/625] eta 0:01:55 lr 0.001944 wd 0.0500 time 0.2098 (0.2192) data time 0.0009 (0.0051) model time 0.2089 (0.2171) loss 3.5714 (3.5117) grad_norm 0.8896 (1.2061) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][110/625] eta 0:01:52 lr 0.001944 wd 0.0500 time 0.2174 (0.2186) data time 0.0010 (0.0047) model time 0.2164 (0.2160) loss 3.7617 (3.5301) grad_norm 0.8620 (1.1961) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][120/625] eta 0:01:50 lr 0.001944 wd 0.0500 time 0.2188 (0.2183) data time 0.0011 (0.0044) model time 0.2177 (0.2158) loss 3.6826 (3.5201) grad_norm 0.9752 (1.1963) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][130/625] eta 0:01:47 lr 0.001944 wd 0.0500 time 0.2163 (0.2179) data time 0.0012 (0.0042) model time 0.2151 (0.2152) loss 3.5702 (3.5184) grad_norm 1.4221 (1.2119) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][140/625] eta 0:01:45 lr 0.001944 wd 0.0500 time 0.2169 (0.2178) data time 0.0009 (0.0040) model time 0.2161 (0.2153) loss 3.4349 (3.5191) grad_norm 1.3431 (1.2126) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][150/625] eta 0:01:43 lr 0.001944 wd 0.0500 time 0.2144 (0.2173) data time 0.0008 (0.0038) model time 0.2137 (0.2146) loss 2.9571 (3.5188) grad_norm 2.0218 (1.2180) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][160/625] eta 0:01:40 lr 0.001944 wd 0.0500 time 0.2193 (0.2170) data time 0.0011 (0.0036) model time 0.2182 (0.2143) loss 3.8853 (3.5167) grad_norm 1.1297 (1.2171) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][170/625] eta 0:01:38 lr 0.001943 wd 0.0500 time 0.2168 (0.2171) data time 0.0010 (0.0035) model time 0.2158 (0.2146) loss 3.3255 (3.5268) grad_norm 1.4490 (1.2288) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][180/625] eta 0:01:37 lr 0.001943 wd 0.0500 time 0.2076 (0.2181) data time 0.0010 (0.0033) model time 0.2066 (0.2161) loss 2.9279 (3.5091) grad_norm 1.1200 (1.2446) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][190/625] eta 0:01:34 lr 0.001943 wd 0.0500 time 0.2116 (0.2181) data time 0.0011 (0.0032) model time 0.2105 (0.2161) loss 4.0862 (3.5091) grad_norm 0.8662 (1.2433) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][200/625] eta 0:01:32 lr 0.001943 wd 0.0500 time 0.2102 (0.2177) data time 0.0008 (0.0031) model time 0.2094 (0.2157) loss 2.4032 (3.4897) grad_norm 1.0531 (1.2399) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][210/625] eta 0:01:30 lr 0.001943 wd 0.0500 time 0.2086 (0.2175) data time 0.0011 (0.0030) model time 0.2075 (0.2155) loss 3.8276 (3.4845) grad_norm 0.9636 (1.2377) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][220/625] eta 0:01:28 lr 0.001943 wd 0.0500 time 0.2069 (0.2173) data time 0.0010 (0.0029) model time 0.2059 (0.2153) loss 3.0847 (3.4815) grad_norm 1.4493 (1.2339) loss_scale 16384.0000 (16384.0000) mem 8969MB [2024-07-29 20:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][230/625] eta 0:01:25 lr 0.001943 wd 0.0500 time 0.2177 (0.2171) data time 0.0009 (0.0029) model time 0.2168 (0.2151) loss 3.0312 (3.4763) grad_norm 2.0265 (inf) loss_scale 8192.0000 (16171.2208) mem 8969MB [2024-07-29 20:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][240/625] eta 0:01:23 lr 0.001943 wd 0.0500 time 0.2104 (0.2174) data time 0.0009 (0.0028) model time 0.2095 (0.2155) loss 4.3592 (3.4758) grad_norm 1.4119 (inf) loss_scale 8192.0000 (15840.1328) mem 8969MB [2024-07-29 20:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][250/625] eta 0:01:21 lr 0.001943 wd 0.0500 time 0.2156 (0.2173) data time 0.0010 (0.0027) model time 0.2146 (0.2154) loss 3.6382 (3.4832) grad_norm 1.1519 (inf) loss_scale 8192.0000 (15535.4263) mem 8969MB [2024-07-29 20:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][260/625] eta 0:01:19 lr 0.001943 wd 0.0500 time 0.2104 (0.2170) data time 0.0007 (0.0027) model time 0.2097 (0.2151) loss 2.3480 (3.4758) grad_norm 1.2647 (inf) loss_scale 8192.0000 (15254.0690) mem 8969MB [2024-07-29 20:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][270/625] eta 0:01:17 lr 0.001943 wd 0.0500 time 0.2156 (0.2172) data time 0.0008 (0.0026) model time 0.2148 (0.2153) loss 2.6917 (3.4701) grad_norm 0.9772 (inf) loss_scale 8192.0000 (14993.4760) mem 8969MB [2024-07-29 20:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][280/625] eta 0:01:14 lr 0.001943 wd 0.0500 time 0.2207 (0.2172) data time 0.0008 (0.0026) model time 0.2199 (0.2153) loss 2.5746 (3.4671) grad_norm 1.0586 (inf) loss_scale 8192.0000 (14751.4306) mem 8969MB [2024-07-29 20:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][290/625] eta 0:01:12 lr 0.001943 wd 0.0500 time 0.2159 (0.2171) data time 0.0010 (0.0025) model time 0.2150 (0.2152) loss 3.6514 (3.4704) grad_norm 1.9099 (inf) loss_scale 8192.0000 (14526.0206) mem 8969MB [2024-07-29 20:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][300/625] eta 0:01:10 lr 0.001943 wd 0.0500 time 0.2083 (0.2170) data time 0.0010 (0.0025) model time 0.2073 (0.2152) loss 3.8241 (3.4767) grad_norm 0.7979 (inf) loss_scale 8192.0000 (14315.5880) mem 8969MB [2024-07-29 20:32:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][310/625] eta 0:01:08 lr 0.001943 wd 0.0500 time 0.2156 (0.2171) data time 0.0010 (0.0024) model time 0.2146 (0.2154) loss 4.3065 (3.4801) grad_norm 1.2773 (inf) loss_scale 8192.0000 (14118.6881) mem 8969MB [2024-07-29 20:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][320/625] eta 0:01:06 lr 0.001943 wd 0.0500 time 0.2090 (0.2171) data time 0.0012 (0.0024) model time 0.2078 (0.2154) loss 3.3139 (3.4765) grad_norm 1.1820 (inf) loss_scale 8192.0000 (13934.0561) mem 8969MB [2024-07-29 20:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][330/625] eta 0:01:04 lr 0.001942 wd 0.0500 time 0.2221 (0.2172) data time 0.0011 (0.0024) model time 0.2210 (0.2155) loss 3.0449 (3.4758) grad_norm 2.1098 (inf) loss_scale 8192.0000 (13760.5801) mem 8969MB [2024-07-29 20:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][340/625] eta 0:01:01 lr 0.001942 wd 0.0500 time 0.2115 (0.2171) data time 0.0011 (0.0023) model time 0.2104 (0.2154) loss 3.8218 (3.4720) grad_norm 0.8331 (inf) loss_scale 8192.0000 (13597.2786) mem 8969MB [2024-07-29 20:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][350/625] eta 0:00:59 lr 0.001942 wd 0.0500 time 0.2082 (0.2170) data time 0.0008 (0.0023) model time 0.2074 (0.2153) loss 2.6331 (3.4696) grad_norm 0.9747 (inf) loss_scale 8192.0000 (13443.2821) mem 8969MB [2024-07-29 20:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][360/625] eta 0:00:57 lr 0.001942 wd 0.0500 time 0.2274 (0.2169) data time 0.0009 (0.0023) model time 0.2265 (0.2152) loss 4.3483 (3.4731) grad_norm 1.0847 (inf) loss_scale 8192.0000 (13297.8172) mem 8969MB [2024-07-29 20:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][370/625] eta 0:00:55 lr 0.001942 wd 0.0500 time 0.2151 (0.2168) data time 0.0010 (0.0022) model time 0.2141 (0.2151) loss 2.8352 (3.4691) grad_norm 1.3013 (inf) loss_scale 8192.0000 (13160.1941) mem 8969MB [2024-07-29 20:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][380/625] eta 0:00:53 lr 0.001942 wd 0.0500 time 0.2093 (0.2166) data time 0.0014 (0.0022) model time 0.2079 (0.2149) loss 3.6220 (3.4716) grad_norm 1.1553 (inf) loss_scale 8192.0000 (13029.7953) mem 8969MB [2024-07-29 20:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][390/625] eta 0:00:50 lr 0.001942 wd 0.0500 time 0.2111 (0.2165) data time 0.0010 (0.0022) model time 0.2100 (0.2147) loss 3.8401 (3.4752) grad_norm 1.0158 (inf) loss_scale 8192.0000 (12906.0665) mem 8969MB [2024-07-29 20:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][400/625] eta 0:00:48 lr 0.001942 wd 0.0500 time 0.2113 (0.2164) data time 0.0009 (0.0021) model time 0.2105 (0.2147) loss 3.2994 (3.4705) grad_norm 1.0842 (inf) loss_scale 8192.0000 (12788.5087) mem 8969MB [2024-07-29 20:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][410/625] eta 0:00:46 lr 0.001942 wd 0.0500 time 0.2224 (0.2164) data time 0.0010 (0.0021) model time 0.2214 (0.2147) loss 3.5464 (3.4741) grad_norm 1.2248 (inf) loss_scale 8192.0000 (12676.6715) mem 8969MB [2024-07-29 20:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][420/625] eta 0:00:44 lr 0.001942 wd 0.0500 time 0.2148 (0.2164) data time 0.0007 (0.0021) model time 0.2141 (0.2147) loss 3.9103 (3.4770) grad_norm 0.9579 (inf) loss_scale 8192.0000 (12570.1473) mem 8969MB [2024-07-29 20:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][430/625] eta 0:00:42 lr 0.001942 wd 0.0500 time 0.2154 (0.2163) data time 0.0010 (0.0021) model time 0.2144 (0.2146) loss 3.7512 (3.4861) grad_norm 1.0222 (inf) loss_scale 8192.0000 (12468.5661) mem 8969MB [2024-07-29 20:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][440/625] eta 0:00:39 lr 0.001942 wd 0.0500 time 0.2160 (0.2162) data time 0.0010 (0.0021) model time 0.2151 (0.2145) loss 3.5890 (3.4766) grad_norm 0.9653 (inf) loss_scale 8192.0000 (12371.5918) mem 8969MB [2024-07-29 20:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][450/625] eta 0:00:37 lr 0.001942 wd 0.0500 time 0.2180 (0.2162) data time 0.0010 (0.0020) model time 0.2170 (0.2145) loss 3.4097 (3.4761) grad_norm 1.4280 (inf) loss_scale 8192.0000 (12278.9180) mem 8969MB [2024-07-29 20:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][460/625] eta 0:00:35 lr 0.001942 wd 0.0500 time 0.2139 (0.2161) data time 0.0009 (0.0020) model time 0.2129 (0.2144) loss 3.6140 (3.4743) grad_norm 1.4677 (inf) loss_scale 8192.0000 (12190.2646) mem 8969MB [2024-07-29 20:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][470/625] eta 0:00:33 lr 0.001942 wd 0.0500 time 0.2082 (0.2160) data time 0.0011 (0.0020) model time 0.2071 (0.2143) loss 3.3135 (3.4771) grad_norm 0.9497 (inf) loss_scale 8192.0000 (12105.3758) mem 8969MB [2024-07-29 20:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][480/625] eta 0:00:31 lr 0.001942 wd 0.0500 time 0.2162 (0.2159) data time 0.0009 (0.0020) model time 0.2152 (0.2142) loss 4.2851 (3.4754) grad_norm 1.0460 (inf) loss_scale 8192.0000 (12024.0166) mem 8969MB [2024-07-29 20:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][490/625] eta 0:00:29 lr 0.001942 wd 0.0500 time 0.2048 (0.2158) data time 0.0011 (0.0020) model time 0.2036 (0.2141) loss 3.7638 (3.4748) grad_norm 1.5178 (inf) loss_scale 8192.0000 (11945.9715) mem 8969MB [2024-07-29 20:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][500/625] eta 0:00:26 lr 0.001941 wd 0.0500 time 0.2048 (0.2157) data time 0.0012 (0.0020) model time 0.2037 (0.2140) loss 3.8880 (3.4780) grad_norm 1.2806 (inf) loss_scale 8192.0000 (11871.0419) mem 8969MB [2024-07-29 20:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][510/625] eta 0:00:24 lr 0.001941 wd 0.0500 time 0.2096 (0.2156) data time 0.0010 (0.0019) model time 0.2086 (0.2139) loss 4.0722 (3.4795) grad_norm 1.0188 (inf) loss_scale 8192.0000 (11799.0450) mem 8969MB [2024-07-29 20:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][520/625] eta 0:00:22 lr 0.001941 wd 0.0500 time 0.2165 (0.2155) data time 0.0010 (0.0019) model time 0.2155 (0.2139) loss 3.5023 (3.4752) grad_norm 1.7005 (inf) loss_scale 8192.0000 (11729.8119) mem 8969MB [2024-07-29 20:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][530/625] eta 0:00:20 lr 0.001941 wd 0.0500 time 0.2059 (0.2154) data time 0.0011 (0.0019) model time 0.2048 (0.2138) loss 3.8556 (3.4801) grad_norm 0.8128 (inf) loss_scale 8192.0000 (11663.1864) mem 8969MB [2024-07-29 20:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][540/625] eta 0:00:18 lr 0.001941 wd 0.0500 time 0.2187 (0.2154) data time 0.0009 (0.0019) model time 0.2177 (0.2137) loss 4.0751 (3.4862) grad_norm 1.4584 (inf) loss_scale 8192.0000 (11599.0240) mem 8969MB [2024-07-29 20:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][550/625] eta 0:00:16 lr 0.001941 wd 0.0500 time 0.2092 (0.2153) data time 0.0010 (0.0019) model time 0.2082 (0.2137) loss 3.1589 (3.4879) grad_norm 0.9325 (inf) loss_scale 8192.0000 (11537.1906) mem 8969MB [2024-07-29 20:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][560/625] eta 0:00:13 lr 0.001941 wd 0.0500 time 0.2086 (0.2152) data time 0.0008 (0.0019) model time 0.2078 (0.2136) loss 3.4943 (3.4883) grad_norm 1.6808 (inf) loss_scale 8192.0000 (11477.5615) mem 8969MB [2024-07-29 20:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][570/625] eta 0:00:11 lr 0.001941 wd 0.0500 time 0.2094 (0.2152) data time 0.0007 (0.0019) model time 0.2086 (0.2136) loss 3.0716 (3.4871) grad_norm 0.8180 (inf) loss_scale 8192.0000 (11420.0210) mem 8969MB [2024-07-29 20:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][580/625] eta 0:00:09 lr 0.001941 wd 0.0500 time 0.2057 (0.2152) data time 0.0009 (0.0018) model time 0.2048 (0.2135) loss 3.4499 (3.4894) grad_norm 1.7885 (inf) loss_scale 8192.0000 (11364.4613) mem 8969MB [2024-07-29 20:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][590/625] eta 0:00:07 lr 0.001941 wd 0.0500 time 0.2065 (0.2151) data time 0.0009 (0.0018) model time 0.2055 (0.2134) loss 4.2217 (3.4889) grad_norm 1.9499 (inf) loss_scale 8192.0000 (11310.7817) mem 8969MB [2024-07-29 20:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][600/625] eta 0:00:05 lr 0.001941 wd 0.0500 time 0.2143 (0.2150) data time 0.0010 (0.0018) model time 0.2133 (0.2134) loss 3.5029 (3.4880) grad_norm 0.9040 (inf) loss_scale 8192.0000 (11258.8885) mem 8969MB [2024-07-29 20:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 20:33:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:33:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:53:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 20:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 20:53:32 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 20:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 20:53:39 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 20:53:39 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 20:53:39 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 20:53:40 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 50) [2024-07-29 20:53:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 20:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][610/625] eta 0:00:29 lr 0.001941 wd 0.0500 time 0.1982 (1.9803) data time 0.0004 (0.2157) model time 0.1978 (1.7646) loss 3.8551 (3.7528) grad_norm 1.2166 (0.9988) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 20:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][620/625] eta 0:00:03 lr 0.001941 wd 0.0500 time 0.1988 (0.7084) data time 0.0003 (0.0621) model time 0.1985 (0.6463) loss 4.3914 (3.7294) grad_norm 1.8057 (1.1642) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 20:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 50 training takes 0:00:10 [2024-07-29 20:53:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:53:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.400 (0.400) Loss 0.8550 (0.8550) Acc@1 85.205 (85.205) Acc@5 97.168 (97.168) Mem 8977MB [2024-07-29 20:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 1.3701 (1.0547) Acc@1 72.949 (79.603) Acc@5 91.504 (95.579) Mem 8977MB [2024-07-29 20:53:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.071) Loss 1.5352 (1.2385) Acc@1 68.457 (75.472) Acc@5 89.697 (93.285) Mem 8977MB [2024-07-29 20:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.202 Acc@5 93.294 [2024-07-29 20:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.2% [2024-07-29 20:54:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.785 (0.785) Loss 0.7520 (0.7520) Acc@1 82.080 (82.080) Acc@5 95.557 (95.557) Mem 8977MB [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.128) Loss 1.2930 (0.9707) Acc@1 68.506 (75.968) Acc@5 90.283 (93.848) Mem 8977MB [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.5410 (1.1710) Acc@1 63.428 (71.724) Acc@5 86.035 (91.016) Mem 8977MB [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.381 Acc@5 90.937 [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 71.4% [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 71.38% [2024-07-29 20:54:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:54:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][0/625] eta 0:20:28 lr 0.001941 wd 0.0500 time 1.9659 (1.9659) data time 0.4630 (0.4630) model time 0.0000 (0.0000) loss 3.5722 (3.5722) grad_norm 1.7391 (1.7391) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 20:54:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][10/625] eta 0:03:57 lr 0.001941 wd 0.0500 time 0.2000 (0.3869) data time 0.0007 (0.0430) model time 0.0000 (0.0000) loss 4.1059 (3.8143) grad_norm 1.0894 (1.2694) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][20/625] eta 0:03:00 lr 0.001941 wd 0.0500 time 0.2017 (0.2977) data time 0.0008 (0.0230) model time 0.0000 (0.0000) loss 3.4794 (3.7159) grad_norm 0.8757 (1.2375) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][30/625] eta 0:02:38 lr 0.001941 wd 0.0500 time 0.2015 (0.2662) data time 0.0010 (0.0159) model time 0.0000 (0.0000) loss 3.4078 (3.6549) grad_norm 1.6886 (1.1971) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][40/625] eta 0:02:26 lr 0.001940 wd 0.0500 time 0.1988 (0.2501) data time 0.0007 (0.0123) model time 0.0000 (0.0000) loss 2.4185 (3.6053) grad_norm 1.1956 (1.3280) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][50/625] eta 0:02:18 lr 0.001940 wd 0.0500 time 0.1999 (0.2403) data time 0.0009 (0.0100) model time 0.0000 (0.0000) loss 3.7059 (3.5819) grad_norm 1.3000 (1.3141) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][60/625] eta 0:02:12 lr 0.001940 wd 0.0500 time 0.2020 (0.2338) data time 0.0008 (0.0085) model time 0.2012 (0.1994) loss 3.3837 (3.5437) grad_norm 1.6277 (1.3171) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][70/625] eta 0:02:07 lr 0.001940 wd 0.0500 time 0.2024 (0.2292) data time 0.0007 (0.0075) model time 0.2016 (0.1999) loss 3.6452 (3.5318) grad_norm 1.0066 (1.3259) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][80/625] eta 0:02:03 lr 0.001940 wd 0.0500 time 0.2194 (0.2260) data time 0.0010 (0.0067) model time 0.2184 (0.2006) loss 3.3985 (3.5451) grad_norm 1.4613 (1.3410) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][90/625] eta 0:01:59 lr 0.001940 wd 0.0500 time 0.1993 (0.2232) data time 0.0008 (0.0061) model time 0.1986 (0.2005) loss 3.9984 (3.5779) grad_norm 1.6471 (1.3716) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][100/625] eta 0:01:56 lr 0.001940 wd 0.0500 time 0.2049 (0.2210) data time 0.0009 (0.0056) model time 0.2040 (0.2002) loss 3.5893 (3.5810) grad_norm 1.4097 (1.3756) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][110/625] eta 0:01:52 lr 0.001940 wd 0.0500 time 0.2009 (0.2191) data time 0.0007 (0.0051) model time 0.2002 (0.2000) loss 3.4967 (3.5606) grad_norm 1.5022 (1.3602) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][120/625] eta 0:01:49 lr 0.001940 wd 0.0500 time 0.1997 (0.2174) data time 0.0007 (0.0048) model time 0.1990 (0.1998) loss 4.2509 (3.5623) grad_norm 1.4427 (1.3579) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][130/625] eta 0:01:47 lr 0.001940 wd 0.0500 time 0.1965 (0.2162) data time 0.0007 (0.0045) model time 0.1958 (0.1999) loss 3.0832 (3.5456) grad_norm 1.1491 (1.3511) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][140/625] eta 0:01:44 lr 0.001940 wd 0.0500 time 0.2004 (0.2152) data time 0.0008 (0.0043) model time 0.1997 (0.2000) loss 3.9272 (3.5407) grad_norm 1.4871 (1.3419) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][150/625] eta 0:01:41 lr 0.001940 wd 0.0500 time 0.1997 (0.2143) data time 0.0009 (0.0040) model time 0.1988 (0.2001) loss 3.7009 (3.5449) grad_norm 1.0290 (1.3326) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][160/625] eta 0:01:39 lr 0.001940 wd 0.0500 time 0.1975 (0.2137) data time 0.0007 (0.0038) model time 0.1969 (0.2004) loss 3.2759 (3.5217) grad_norm 1.1879 (1.3208) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][170/625] eta 0:01:37 lr 0.001940 wd 0.0500 time 0.1987 (0.2132) data time 0.0007 (0.0038) model time 0.1980 (0.2005) loss 3.9284 (3.5200) grad_norm 1.2116 (1.3192) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][180/625] eta 0:01:34 lr 0.001940 wd 0.0500 time 0.1957 (0.2125) data time 0.0008 (0.0037) model time 0.1949 (0.2004) loss 2.6473 (3.5008) grad_norm 1.4201 (1.3338) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][190/625] eta 0:01:32 lr 0.001940 wd 0.0500 time 0.2005 (0.2120) data time 0.0007 (0.0035) model time 0.1998 (0.2005) loss 3.6448 (3.4997) grad_norm 1.4109 (1.3191) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][200/625] eta 0:01:29 lr 0.001939 wd 0.0500 time 0.1979 (0.2116) data time 0.0006 (0.0034) model time 0.1973 (0.2006) loss 4.2150 (3.4944) grad_norm 1.0133 (1.3076) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][210/625] eta 0:01:27 lr 0.001939 wd 0.0500 time 0.1981 (0.2112) data time 0.0008 (0.0033) model time 0.1973 (0.2007) loss 3.2901 (3.5000) grad_norm 1.1302 (1.2972) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][220/625] eta 0:01:25 lr 0.001939 wd 0.0500 time 0.1988 (0.2110) data time 0.0007 (0.0032) model time 0.1981 (0.2010) loss 2.5620 (3.4928) grad_norm 0.8622 (1.2816) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][230/625] eta 0:01:23 lr 0.001939 wd 0.0500 time 0.1995 (0.2106) data time 0.0008 (0.0031) model time 0.1987 (0.2010) loss 3.3207 (3.4849) grad_norm 1.3216 (1.2774) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][240/625] eta 0:01:20 lr 0.001939 wd 0.0500 time 0.2021 (0.2101) data time 0.0008 (0.0030) model time 0.2012 (0.2009) loss 3.9667 (3.4745) grad_norm 1.0283 (1.2732) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][250/625] eta 0:01:18 lr 0.001939 wd 0.0500 time 0.2005 (0.2098) data time 0.0007 (0.0029) model time 0.1998 (0.2010) loss 2.5159 (3.4650) grad_norm 1.4270 (1.2716) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][260/625] eta 0:01:16 lr 0.001939 wd 0.0500 time 0.1988 (0.2103) data time 0.0008 (0.0029) model time 0.1980 (0.2018) loss 3.5573 (3.4736) grad_norm 1.0595 (1.2649) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][270/625] eta 0:01:14 lr 0.001939 wd 0.0500 time 0.1998 (0.2100) data time 0.0009 (0.0029) model time 0.1989 (0.2018) loss 3.4255 (3.4740) grad_norm 0.8783 (1.2633) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][280/625] eta 0:01:12 lr 0.001939 wd 0.0500 time 0.2009 (0.2097) data time 0.0009 (0.0028) model time 0.2000 (0.2018) loss 3.5905 (3.4592) grad_norm 0.9228 (1.2614) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][290/625] eta 0:01:10 lr 0.001939 wd 0.0500 time 0.1994 (0.2094) data time 0.0009 (0.0027) model time 0.1986 (0.2016) loss 3.8422 (3.4584) grad_norm 1.2413 (1.2554) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][300/625] eta 0:01:07 lr 0.001939 wd 0.0500 time 0.1999 (0.2091) data time 0.0008 (0.0027) model time 0.1991 (0.2015) loss 4.6396 (3.4681) grad_norm 1.4375 (1.2520) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][310/625] eta 0:01:05 lr 0.001939 wd 0.0500 time 0.2056 (0.2089) data time 0.0008 (0.0026) model time 0.2048 (0.2016) loss 2.1732 (3.4739) grad_norm 2.0198 (1.2541) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][320/625] eta 0:01:03 lr 0.001939 wd 0.0500 time 0.1983 (0.2086) data time 0.0009 (0.0026) model time 0.1975 (0.2015) loss 3.2419 (3.4710) grad_norm 1.1109 (1.2577) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][330/625] eta 0:01:01 lr 0.001939 wd 0.0500 time 0.1978 (0.2094) data time 0.0007 (0.0025) model time 0.1971 (0.2027) loss 3.8277 (3.4728) grad_norm 0.9427 (1.2544) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][340/625] eta 0:00:59 lr 0.001939 wd 0.0500 time 0.1932 (0.2095) data time 0.0008 (0.0025) model time 0.1923 (0.2029) loss 2.9094 (3.4740) grad_norm 1.1989 (1.2557) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][350/625] eta 0:00:57 lr 0.001939 wd 0.0500 time 0.2005 (0.2092) data time 0.0009 (0.0024) model time 0.1997 (0.2028) loss 3.6017 (3.4735) grad_norm 1.0087 (1.2578) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][360/625] eta 0:00:55 lr 0.001939 wd 0.0500 time 0.2005 (0.2091) data time 0.0008 (0.0024) model time 0.1997 (0.2028) loss 3.8556 (3.4766) grad_norm 0.7509 (1.2548) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][370/625] eta 0:00:53 lr 0.001938 wd 0.0500 time 0.2028 (0.2095) data time 0.0008 (0.0024) model time 0.2020 (0.2035) loss 3.3591 (3.4668) grad_norm 0.9562 (1.2522) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][380/625] eta 0:00:51 lr 0.001938 wd 0.0500 time 0.2003 (0.2093) data time 0.0008 (0.0023) model time 0.1995 (0.2033) loss 3.9478 (3.4702) grad_norm 1.2721 (1.2620) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][390/625] eta 0:00:49 lr 0.001938 wd 0.0500 time 0.2024 (0.2091) data time 0.0008 (0.0023) model time 0.2016 (0.2033) loss 3.4949 (3.4745) grad_norm 1.6887 (1.2642) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][400/625] eta 0:00:47 lr 0.001938 wd 0.0500 time 0.2017 (0.2089) data time 0.0008 (0.0022) model time 0.2009 (0.2033) loss 2.3028 (3.4694) grad_norm 1.1364 (1.2596) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][410/625] eta 0:00:44 lr 0.001938 wd 0.0500 time 0.2253 (0.2089) data time 0.0007 (0.0022) model time 0.2245 (0.2033) loss 3.5402 (3.4776) grad_norm 1.1441 (1.2551) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][420/625] eta 0:00:42 lr 0.001938 wd 0.0500 time 0.2008 (0.2087) data time 0.0009 (0.0022) model time 0.1999 (0.2032) loss 3.5461 (3.4837) grad_norm 0.9403 (1.2492) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][430/625] eta 0:00:40 lr 0.001938 wd 0.0500 time 0.2005 (0.2085) data time 0.0008 (0.0022) model time 0.1997 (0.2032) loss 3.4038 (3.4831) grad_norm 0.8934 (1.2431) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][440/625] eta 0:00:38 lr 0.001938 wd 0.0500 time 0.2192 (0.2084) data time 0.0009 (0.0021) model time 0.2183 (0.2032) loss 3.1422 (3.4799) grad_norm 0.8863 (1.2380) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][450/625] eta 0:00:36 lr 0.001938 wd 0.0500 time 0.2026 (0.2083) data time 0.0008 (0.0021) model time 0.2018 (0.2031) loss 2.2667 (3.4720) grad_norm 1.1103 (1.2409) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][460/625] eta 0:00:34 lr 0.001938 wd 0.0500 time 0.2110 (0.2082) data time 0.0009 (0.0021) model time 0.2101 (0.2031) loss 3.9262 (3.4721) grad_norm 1.0350 (1.2366) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][470/625] eta 0:00:32 lr 0.001938 wd 0.0500 time 0.2498 (0.2083) data time 0.0008 (0.0021) model time 0.2490 (0.2033) loss 2.9239 (3.4777) grad_norm 0.9358 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][480/625] eta 0:00:30 lr 0.001938 wd 0.0500 time 0.2007 (0.2082) data time 0.0007 (0.0021) model time 0.2000 (0.2033) loss 3.4388 (3.4784) grad_norm 2.1217 (1.2329) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][490/625] eta 0:00:28 lr 0.001938 wd 0.0500 time 0.2138 (0.2082) data time 0.0007 (0.0020) model time 0.2131 (0.2033) loss 4.2072 (3.4802) grad_norm 1.1391 (1.2339) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][500/625] eta 0:00:26 lr 0.001938 wd 0.0500 time 0.2019 (0.2080) data time 0.0009 (0.0020) model time 0.2011 (0.2033) loss 2.8795 (3.4822) grad_norm 1.2324 (1.2347) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][510/625] eta 0:00:23 lr 0.001938 wd 0.0500 time 0.2002 (0.2079) data time 0.0007 (0.0020) model time 0.1994 (0.2032) loss 4.0938 (3.4769) grad_norm 1.1955 (1.2377) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][520/625] eta 0:00:21 lr 0.001938 wd 0.0500 time 0.2040 (0.2078) data time 0.0008 (0.0020) model time 0.2032 (0.2032) loss 3.3422 (3.4758) grad_norm 0.8689 (1.2330) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][530/625] eta 0:00:19 lr 0.001937 wd 0.0500 time 0.1985 (0.2077) data time 0.0009 (0.0019) model time 0.1976 (0.2031) loss 4.4369 (3.4765) grad_norm 1.2774 (1.2304) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][540/625] eta 0:00:17 lr 0.001937 wd 0.0500 time 0.2144 (0.2076) data time 0.0007 (0.0019) model time 0.2137 (0.2031) loss 4.2578 (3.4791) grad_norm 1.0889 (1.2320) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][550/625] eta 0:00:15 lr 0.001937 wd 0.0500 time 0.2014 (0.2077) data time 0.0009 (0.0019) model time 0.2006 (0.2033) loss 3.1920 (3.4820) grad_norm 1.2497 (1.2281) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:55:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][560/625] eta 0:00:13 lr 0.001937 wd 0.0500 time 0.2018 (0.2076) data time 0.0007 (0.0019) model time 0.2011 (0.2032) loss 4.0999 (3.4839) grad_norm 1.7091 (1.2263) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][570/625] eta 0:00:11 lr 0.001937 wd 0.0500 time 0.2005 (0.2075) data time 0.0010 (0.0019) model time 0.1995 (0.2032) loss 2.6959 (3.4835) grad_norm 1.8154 (1.2306) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][580/625] eta 0:00:09 lr 0.001937 wd 0.0500 time 0.2034 (0.2075) data time 0.0008 (0.0019) model time 0.2026 (0.2032) loss 3.7265 (3.4866) grad_norm 1.4198 (1.2308) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][590/625] eta 0:00:07 lr 0.001937 wd 0.0500 time 0.1996 (0.2073) data time 0.0008 (0.0019) model time 0.1987 (0.2031) loss 3.9667 (3.4856) grad_norm 0.8306 (1.2328) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][600/625] eta 0:00:05 lr 0.001937 wd 0.0500 time 0.2065 (0.2072) data time 0.0007 (0.0018) model time 0.2058 (0.2031) loss 4.0990 (3.4870) grad_norm 1.5630 (1.2339) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][610/625] eta 0:00:03 lr 0.001937 wd 0.0500 time 0.2178 (0.2072) data time 0.0006 (0.0018) model time 0.2173 (0.2031) loss 3.3242 (3.4869) grad_norm 1.1317 (1.2321) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][620/625] eta 0:00:01 lr 0.001937 wd 0.0500 time 0.2020 (0.2071) data time 0.0003 (0.0018) model time 0.2017 (0.2031) loss 3.8939 (3.4910) grad_norm 0.9401 (1.2327) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 51 training takes 0:02:09 [2024-07-29 20:56:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:56:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.508 (0.508) Loss 0.8154 (0.8154) Acc@1 83.984 (83.984) Acc@5 97.412 (97.412) Mem 8975MB [2024-07-29 20:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.099) Loss 1.3174 (1.0036) Acc@1 72.412 (79.492) Acc@5 91.602 (95.557) Mem 8975MB [2024-07-29 20:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.4355 (1.1951) Acc@1 68.896 (75.144) Acc@5 89.551 (92.939) Mem 8975MB [2024-07-29 20:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.832 Acc@5 92.930 [2024-07-29 20:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.8% [2024-07-29 20:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.716 (0.716) Loss 0.7231 (0.7231) Acc@1 82.617 (82.617) Acc@5 95.996 (95.996) Mem 8975MB [2024-07-29 20:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 1.2578 (0.9387) Acc@1 69.287 (76.629) Acc@5 90.576 (94.198) Mem 8975MB [2024-07-29 20:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.5010 (1.1367) Acc@1 64.307 (72.426) Acc@5 86.719 (91.406) Mem 8975MB [2024-07-29 20:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.085 Acc@5 91.349 [2024-07-29 20:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 72.1% [2024-07-29 20:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 72.08% [2024-07-29 20:56:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:56:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][0/625] eta 0:05:32 lr 0.001937 wd 0.0500 time 0.5328 (0.5328) data time 0.3399 (0.3399) model time 0.0000 (0.0000) loss 3.3532 (3.3532) grad_norm 1.8007 (1.8007) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][10/625] eta 0:02:22 lr 0.001937 wd 0.0500 time 0.2006 (0.2322) data time 0.0007 (0.0318) model time 0.0000 (0.0000) loss 3.7930 (3.4838) grad_norm 0.9213 (1.6591) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][20/625] eta 0:02:11 lr 0.001937 wd 0.0500 time 0.1984 (0.2181) data time 0.0010 (0.0171) model time 0.0000 (0.0000) loss 3.8595 (3.3214) grad_norm 1.5652 (1.5166) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][30/625] eta 0:02:06 lr 0.001937 wd 0.0500 time 0.1999 (0.2129) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 3.9272 (3.4844) grad_norm 0.8796 (1.3719) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][40/625] eta 0:02:03 lr 0.001937 wd 0.0500 time 0.1972 (0.2108) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 3.7705 (3.4460) grad_norm 1.2207 (1.3654) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][50/625] eta 0:02:00 lr 0.001937 wd 0.0500 time 0.2076 (0.2090) data time 0.0009 (0.0076) model time 0.0000 (0.0000) loss 3.7885 (3.4262) grad_norm 1.1692 (1.3366) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][60/625] eta 0:01:57 lr 0.001936 wd 0.0500 time 0.1981 (0.2078) data time 0.0010 (0.0065) model time 0.1972 (0.2006) loss 3.8513 (3.3982) grad_norm 1.1736 (1.3249) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][70/625] eta 0:01:54 lr 0.001936 wd 0.0500 time 0.2034 (0.2069) data time 0.0007 (0.0057) model time 0.2028 (0.2005) loss 2.9809 (3.3806) grad_norm 0.8476 (1.2811) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][80/625] eta 0:01:52 lr 0.001936 wd 0.0500 time 0.2015 (0.2062) data time 0.0006 (0.0051) model time 0.2009 (0.2004) loss 3.5314 (3.4028) grad_norm 1.1885 (1.2780) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][90/625] eta 0:01:50 lr 0.001936 wd 0.0500 time 0.1993 (0.2056) data time 0.0009 (0.0047) model time 0.1983 (0.2004) loss 3.6296 (3.4219) grad_norm 1.5153 (1.2820) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][100/625] eta 0:01:47 lr 0.001936 wd 0.0500 time 0.2007 (0.2051) data time 0.0006 (0.0043) model time 0.2001 (0.2003) loss 2.0351 (3.4420) grad_norm 1.1816 (1.2757) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][110/625] eta 0:01:45 lr 0.001936 wd 0.0500 time 0.1985 (0.2048) data time 0.0009 (0.0040) model time 0.1976 (0.2003) loss 2.4943 (3.4419) grad_norm 1.1203 (1.2984) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][120/625] eta 0:01:43 lr 0.001936 wd 0.0500 time 0.1978 (0.2047) data time 0.0009 (0.0037) model time 0.1969 (0.2005) loss 3.1926 (3.4591) grad_norm 1.1748 (1.2805) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][130/625] eta 0:01:41 lr 0.001936 wd 0.0500 time 0.1986 (0.2044) data time 0.0009 (0.0035) model time 0.1977 (0.2006) loss 3.4311 (3.4812) grad_norm 1.0308 (1.2666) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][140/625] eta 0:01:39 lr 0.001936 wd 0.0500 time 0.2057 (0.2043) data time 0.0009 (0.0034) model time 0.2048 (0.2007) loss 2.7826 (3.4779) grad_norm 1.2892 (1.2767) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][150/625] eta 0:01:36 lr 0.001936 wd 0.0500 time 0.1989 (0.2040) data time 0.0009 (0.0032) model time 0.1980 (0.2005) loss 4.2280 (3.4848) grad_norm 1.3022 (1.2737) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][160/625] eta 0:01:34 lr 0.001936 wd 0.0500 time 0.2015 (0.2038) data time 0.0008 (0.0030) model time 0.2007 (0.2005) loss 3.5064 (3.4685) grad_norm 0.9400 (1.2565) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][170/625] eta 0:01:32 lr 0.001936 wd 0.0500 time 0.2018 (0.2036) data time 0.0008 (0.0029) model time 0.2010 (0.2004) loss 3.9514 (3.4509) grad_norm 1.8244 (1.2519) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][180/625] eta 0:01:30 lr 0.001936 wd 0.0500 time 0.2005 (0.2036) data time 0.0011 (0.0028) model time 0.1994 (0.2005) loss 3.7688 (3.4546) grad_norm 1.1839 (1.2592) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][190/625] eta 0:01:28 lr 0.001936 wd 0.0500 time 0.2002 (0.2034) data time 0.0009 (0.0027) model time 0.1993 (0.2005) loss 3.1177 (3.4372) grad_norm 1.7555 (1.2758) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][200/625] eta 0:01:26 lr 0.001936 wd 0.0500 time 0.2006 (0.2033) data time 0.0006 (0.0026) model time 0.2000 (0.2004) loss 2.2502 (3.4308) grad_norm 1.6655 (1.2998) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][210/625] eta 0:01:24 lr 0.001936 wd 0.0500 time 0.2006 (0.2032) data time 0.0008 (0.0025) model time 0.1997 (0.2004) loss 4.0680 (3.4337) grad_norm 1.2981 (1.2974) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][220/625] eta 0:01:22 lr 0.001935 wd 0.0500 time 0.2024 (0.2031) data time 0.0009 (0.0025) model time 0.2016 (0.2004) loss 3.5281 (3.4290) grad_norm 0.9248 (1.2872) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][230/625] eta 0:01:20 lr 0.001935 wd 0.0500 time 0.2159 (0.2031) data time 0.0009 (0.0024) model time 0.2151 (0.2005) loss 3.8974 (3.4367) grad_norm 0.8036 (1.2829) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][240/625] eta 0:01:18 lr 0.001935 wd 0.0500 time 0.2021 (0.2030) data time 0.0007 (0.0023) model time 0.2014 (0.2005) loss 4.1203 (3.4301) grad_norm 1.3475 (1.2874) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][250/625] eta 0:01:16 lr 0.001935 wd 0.0500 time 0.2017 (0.2031) data time 0.0007 (0.0023) model time 0.2010 (0.2006) loss 3.7373 (3.4270) grad_norm 0.7220 (1.2802) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][260/625] eta 0:01:14 lr 0.001935 wd 0.0500 time 0.1992 (0.2030) data time 0.0007 (0.0022) model time 0.1985 (0.2007) loss 2.8644 (3.4326) grad_norm 1.5084 (1.2799) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][270/625] eta 0:01:12 lr 0.001935 wd 0.0500 time 0.1993 (0.2031) data time 0.0009 (0.0022) model time 0.1984 (0.2008) loss 3.2885 (3.4343) grad_norm 1.1435 (1.2799) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][280/625] eta 0:01:10 lr 0.001935 wd 0.0500 time 0.2011 (0.2030) data time 0.0006 (0.0021) model time 0.2006 (0.2008) loss 4.7103 (3.4504) grad_norm 1.8623 (1.2775) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][290/625] eta 0:01:08 lr 0.001935 wd 0.0500 time 0.2047 (0.2030) data time 0.0008 (0.0021) model time 0.2039 (0.2008) loss 3.1662 (3.4641) grad_norm 0.9528 (1.2721) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][300/625] eta 0:01:06 lr 0.001935 wd 0.0500 time 0.1999 (0.2037) data time 0.0008 (0.0021) model time 0.1991 (0.2017) loss 3.2689 (3.4521) grad_norm 0.8863 (1.2609) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][310/625] eta 0:01:04 lr 0.001935 wd 0.0500 time 0.2080 (0.2036) data time 0.0007 (0.0020) model time 0.2073 (0.2017) loss 2.7130 (3.4406) grad_norm 0.9904 (1.2588) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][320/625] eta 0:01:02 lr 0.001935 wd 0.0500 time 0.2055 (0.2036) data time 0.0006 (0.0020) model time 0.2049 (0.2017) loss 2.9457 (3.4470) grad_norm 1.5711 (1.2592) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][330/625] eta 0:01:00 lr 0.001935 wd 0.0500 time 0.2007 (0.2035) data time 0.0008 (0.0020) model time 0.1999 (0.2016) loss 3.3912 (3.4516) grad_norm 1.0689 (1.2562) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][340/625] eta 0:00:57 lr 0.001935 wd 0.0500 time 0.2027 (0.2034) data time 0.0009 (0.0019) model time 0.2019 (0.2015) loss 4.0922 (3.4503) grad_norm 1.5347 (1.2630) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][350/625] eta 0:00:55 lr 0.001935 wd 0.0500 time 0.2018 (0.2034) data time 0.0009 (0.0019) model time 0.2009 (0.2016) loss 2.7625 (3.4509) grad_norm 1.0234 (1.2600) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][360/625] eta 0:00:53 lr 0.001935 wd 0.0500 time 0.2018 (0.2034) data time 0.0006 (0.0019) model time 0.2012 (0.2015) loss 3.9954 (3.4536) grad_norm 1.0298 (1.2531) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][370/625] eta 0:00:51 lr 0.001935 wd 0.0500 time 0.1995 (0.2033) data time 0.0006 (0.0019) model time 0.1989 (0.2015) loss 3.6207 (3.4621) grad_norm 2.1201 (1.2515) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][380/625] eta 0:00:49 lr 0.001934 wd 0.0500 time 0.2020 (0.2034) data time 0.0009 (0.0018) model time 0.2011 (0.2016) loss 2.4897 (3.4549) grad_norm 1.1287 (1.2502) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][390/625] eta 0:00:47 lr 0.001934 wd 0.0500 time 0.2013 (0.2035) data time 0.0006 (0.0018) model time 0.2006 (0.2017) loss 3.8928 (3.4519) grad_norm 0.9546 (1.2485) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][400/625] eta 0:00:45 lr 0.001934 wd 0.0500 time 0.2007 (0.2035) data time 0.0009 (0.0018) model time 0.1998 (0.2017) loss 2.4838 (3.4584) grad_norm 1.1037 (1.2466) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][410/625] eta 0:00:43 lr 0.001934 wd 0.0500 time 0.2009 (0.2034) data time 0.0009 (0.0018) model time 0.2000 (0.2017) loss 3.8184 (3.4620) grad_norm 1.4603 (1.2490) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][420/625] eta 0:00:41 lr 0.001934 wd 0.0500 time 0.2046 (0.2034) data time 0.0009 (0.0018) model time 0.2037 (0.2017) loss 3.6981 (3.4650) grad_norm 0.9625 (1.2535) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][430/625] eta 0:00:39 lr 0.001934 wd 0.0500 time 0.2014 (0.2034) data time 0.0008 (0.0017) model time 0.2006 (0.2017) loss 3.7638 (3.4582) grad_norm 1.4629 (1.2494) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][440/625] eta 0:00:37 lr 0.001934 wd 0.0500 time 0.2010 (0.2034) data time 0.0009 (0.0017) model time 0.2002 (0.2018) loss 3.5435 (3.4589) grad_norm 0.8590 (1.2427) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][450/625] eta 0:00:35 lr 0.001934 wd 0.0500 time 0.1984 (0.2035) data time 0.0009 (0.0017) model time 0.1975 (0.2018) loss 2.8967 (3.4540) grad_norm 1.0690 (1.2416) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][460/625] eta 0:00:33 lr 0.001934 wd 0.0500 time 0.2031 (0.2034) data time 0.0008 (0.0017) model time 0.2022 (0.2018) loss 4.1842 (3.4584) grad_norm 1.3148 (1.2431) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][470/625] eta 0:00:31 lr 0.001934 wd 0.0500 time 0.2037 (0.2036) data time 0.0008 (0.0017) model time 0.2028 (0.2019) loss 3.7267 (3.4609) grad_norm 0.9118 (1.2424) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][480/625] eta 0:00:29 lr 0.001934 wd 0.0500 time 0.1969 (0.2036) data time 0.0010 (0.0017) model time 0.1960 (0.2020) loss 2.8812 (3.4652) grad_norm 0.8913 (1.2426) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][490/625] eta 0:00:27 lr 0.001934 wd 0.0500 time 0.2009 (0.2036) data time 0.0007 (0.0017) model time 0.2002 (0.2020) loss 3.6029 (3.4688) grad_norm 1.0235 (1.2417) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][500/625] eta 0:00:25 lr 0.001934 wd 0.0500 time 0.1993 (0.2036) data time 0.0009 (0.0017) model time 0.1983 (0.2020) loss 3.9541 (3.4702) grad_norm 0.7759 (1.2367) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][510/625] eta 0:00:23 lr 0.001934 wd 0.0500 time 0.2055 (0.2036) data time 0.0007 (0.0017) model time 0.2048 (0.2020) loss 2.7044 (3.4696) grad_norm 0.9320 (1.2364) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][520/625] eta 0:00:21 lr 0.001934 wd 0.0500 time 0.2017 (0.2036) data time 0.0007 (0.0016) model time 0.2010 (0.2020) loss 3.1829 (3.4666) grad_norm 1.1026 (1.2334) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][530/625] eta 0:00:19 lr 0.001934 wd 0.0500 time 0.2030 (0.2035) data time 0.0007 (0.0016) model time 0.2023 (0.2020) loss 4.4121 (3.4731) grad_norm 0.8764 (1.2321) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][540/625] eta 0:00:17 lr 0.001933 wd 0.0500 time 0.2006 (0.2035) data time 0.0009 (0.0016) model time 0.1997 (0.2019) loss 4.1661 (3.4817) grad_norm 0.8762 (1.2344) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][550/625] eta 0:00:15 lr 0.001933 wd 0.0500 time 0.4182 (0.2039) data time 0.0008 (0.0016) model time 0.4173 (0.2023) loss 3.4708 (3.4808) grad_norm 0.9046 (1.2306) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][560/625] eta 0:00:13 lr 0.001933 wd 0.0500 time 0.1987 (0.2038) data time 0.0009 (0.0016) model time 0.1978 (0.2023) loss 3.8288 (3.4837) grad_norm 1.9071 (1.2309) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][570/625] eta 0:00:11 lr 0.001933 wd 0.0500 time 0.2064 (0.2038) data time 0.0009 (0.0016) model time 0.2055 (0.2023) loss 3.4693 (3.4855) grad_norm 1.3933 (1.2319) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][580/625] eta 0:00:09 lr 0.001933 wd 0.0500 time 0.2026 (0.2037) data time 0.0008 (0.0016) model time 0.2018 (0.2022) loss 3.6557 (3.4842) grad_norm 1.7912 (1.2314) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][590/625] eta 0:00:07 lr 0.001933 wd 0.0500 time 0.1981 (0.2037) data time 0.0009 (0.0016) model time 0.1973 (0.2022) loss 3.1724 (3.4821) grad_norm 0.7803 (1.2359) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][600/625] eta 0:00:05 lr 0.001933 wd 0.0500 time 0.2015 (0.2037) data time 0.0009 (0.0015) model time 0.2006 (0.2022) loss 3.1370 (3.4839) grad_norm 0.7823 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][610/625] eta 0:00:03 lr 0.001933 wd 0.0500 time 0.2020 (0.2037) data time 0.0006 (0.0015) model time 0.2014 (0.2022) loss 3.5265 (3.4806) grad_norm 1.3252 (1.2353) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][620/625] eta 0:00:01 lr 0.001933 wd 0.0500 time 0.1985 (0.2037) data time 0.0007 (0.0015) model time 0.1978 (0.2022) loss 3.5992 (3.4826) grad_norm 1.8511 (1.2388) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 52 training takes 0:02:07 [2024-07-29 20:58:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 20:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.409 (0.409) Loss 0.7969 (0.7969) Acc@1 85.645 (85.645) Acc@5 97.559 (97.559) Mem 8975MB [2024-07-29 20:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 1.3252 (0.9820) Acc@1 71.240 (80.034) Acc@5 91.943 (95.690) Mem 8975MB [2024-07-29 20:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.4473 (1.1674) Acc@1 68.750 (75.616) Acc@5 89.795 (93.162) Mem 8975MB [2024-07-29 20:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.358 Acc@5 93.070 [2024-07-29 20:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.4% [2024-07-29 20:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.736 (0.736) Loss 0.6987 (0.6987) Acc@1 82.959 (82.959) Acc@5 96.240 (96.240) Mem 8975MB [2024-07-29 20:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 1.2246 (0.9106) Acc@1 69.922 (77.264) Acc@5 90.967 (94.491) Mem 8975MB [2024-07-29 20:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.4678 (1.1064) Acc@1 64.746 (73.093) Acc@5 87.109 (91.778) Mem 8975MB [2024-07-29 20:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.751 Acc@5 91.737 [2024-07-29 20:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 72.8% [2024-07-29 20:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 72.75% [2024-07-29 20:58:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 20:58:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 20:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][0/625] eta 0:06:38 lr 0.001933 wd 0.0500 time 0.6376 (0.6376) data time 0.4405 (0.4405) model time 0.0000 (0.0000) loss 4.0486 (4.0486) grad_norm 1.1408 (1.1408) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][10/625] eta 0:02:28 lr 0.001933 wd 0.0500 time 0.2025 (0.2408) data time 0.0007 (0.0409) model time 0.0000 (0.0000) loss 3.4162 (3.7343) grad_norm 1.6257 (1.1292) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][20/625] eta 0:02:14 lr 0.001933 wd 0.0500 time 0.1989 (0.2216) data time 0.0009 (0.0219) model time 0.0000 (0.0000) loss 3.3457 (3.6599) grad_norm 1.1799 (1.2006) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][30/625] eta 0:02:08 lr 0.001933 wd 0.0500 time 0.2003 (0.2152) data time 0.0010 (0.0151) model time 0.0000 (0.0000) loss 3.6308 (3.6041) grad_norm 0.7802 (1.1434) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][40/625] eta 0:02:04 lr 0.001933 wd 0.0500 time 0.2034 (0.2120) data time 0.0006 (0.0117) model time 0.0000 (0.0000) loss 2.8135 (3.5420) grad_norm 0.9043 (1.1342) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][50/625] eta 0:02:00 lr 0.001933 wd 0.0500 time 0.2005 (0.2100) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 3.3228 (3.4813) grad_norm 0.9897 (1.1179) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][60/625] eta 0:01:57 lr 0.001933 wd 0.0500 time 0.2093 (0.2087) data time 0.0009 (0.0082) model time 0.2084 (0.2010) loss 3.5487 (3.4957) grad_norm 1.1635 (1.1656) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][70/625] eta 0:01:55 lr 0.001932 wd 0.0500 time 0.1929 (0.2077) data time 0.0008 (0.0072) model time 0.1921 (0.2010) loss 4.0236 (3.5012) grad_norm 1.2326 (1.1825) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][80/625] eta 0:01:52 lr 0.001932 wd 0.0500 time 0.2017 (0.2069) data time 0.0008 (0.0064) model time 0.2009 (0.2008) loss 3.8389 (3.4942) grad_norm 1.1218 (1.1787) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][90/625] eta 0:01:50 lr 0.001932 wd 0.0500 time 0.2029 (0.2063) data time 0.0009 (0.0058) model time 0.2020 (0.2007) loss 3.1797 (3.5117) grad_norm 1.2865 (1.1695) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][100/625] eta 0:01:48 lr 0.001932 wd 0.0500 time 0.2046 (0.2058) data time 0.0009 (0.0053) model time 0.2037 (0.2005) loss 2.8307 (3.5276) grad_norm 2.5023 (1.1852) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][110/625] eta 0:01:45 lr 0.001932 wd 0.0500 time 0.2052 (0.2054) data time 0.0007 (0.0049) model time 0.2045 (0.2006) loss 4.0000 (3.5451) grad_norm 1.0219 (1.1861) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][120/625] eta 0:01:43 lr 0.001932 wd 0.0500 time 0.2096 (0.2051) data time 0.0007 (0.0046) model time 0.2089 (0.2006) loss 3.9801 (3.5463) grad_norm 1.1238 (1.1705) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][130/625] eta 0:01:41 lr 0.001932 wd 0.0500 time 0.2078 (0.2049) data time 0.0007 (0.0044) model time 0.2070 (0.2006) loss 2.9525 (3.5348) grad_norm 0.9337 (1.1788) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][140/625] eta 0:01:39 lr 0.001932 wd 0.0500 time 0.2008 (0.2047) data time 0.0007 (0.0041) model time 0.2000 (0.2008) loss 3.3843 (3.5441) grad_norm 1.1009 (1.1885) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][150/625] eta 0:01:37 lr 0.001932 wd 0.0500 time 0.2236 (0.2048) data time 0.0008 (0.0039) model time 0.2228 (0.2011) loss 3.6122 (3.5412) grad_norm 1.1388 (1.1897) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][160/625] eta 0:01:35 lr 0.001932 wd 0.0500 time 0.2059 (0.2046) data time 0.0007 (0.0037) model time 0.2053 (0.2011) loss 3.4920 (3.5311) grad_norm 1.1895 (1.1910) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][170/625] eta 0:01:33 lr 0.001932 wd 0.0500 time 0.2263 (0.2045) data time 0.0011 (0.0036) model time 0.2252 (0.2012) loss 2.5235 (3.5218) grad_norm 0.8589 (1.1999) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][180/625] eta 0:01:30 lr 0.001932 wd 0.0500 time 0.2147 (0.2044) data time 0.0007 (0.0034) model time 0.2140 (0.2012) loss 2.9170 (3.5248) grad_norm 1.4408 (1.2178) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 20:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 20:59:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 20:59:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:01:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 53) [2024-07-29 21:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:01:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][190/625] eta 0:56:10 lr 0.001932 wd 0.0500 time 7.7482 (7.7482) data time 0.7487 (0.7487) model time 6.9995 (6.9995) loss 4.3543 (4.3543) grad_norm 0.9937 (0.9937) loss_scale 8192.0000 (8192.0000) mem 10976MB [2024-07-29 21:01:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][200/625] eta 0:06:44 lr 0.001932 wd 0.0500 time 0.2083 (0.9519) data time 0.0010 (0.0690) model time 0.2074 (0.8829) loss 3.0146 (3.9048) grad_norm 1.0974 (1.2180) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][210/625] eta 0:04:08 lr 0.001932 wd 0.0500 time 0.2110 (0.5991) data time 0.0010 (0.0366) model time 0.2100 (0.5625) loss 3.4077 (3.7388) grad_norm 0.7965 (1.1656) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][220/625] eta 0:03:11 lr 0.001931 wd 0.0500 time 0.2095 (0.4739) data time 0.0009 (0.0251) model time 0.2086 (0.4488) loss 2.8835 (3.7303) grad_norm 1.1124 (1.1389) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][230/625] eta 0:02:42 lr 0.001931 wd 0.0500 time 0.2110 (0.4102) data time 0.0010 (0.0193) model time 0.2100 (0.3910) loss 3.5215 (3.6869) grad_norm 1.3395 (1.1747) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][240/625] eta 0:02:22 lr 0.001931 wd 0.0500 time 0.2080 (0.3709) data time 0.0007 (0.0157) model time 0.2073 (0.3552) loss 3.9306 (3.6794) grad_norm 1.4594 (1.2023) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][250/625] eta 0:02:09 lr 0.001931 wd 0.0500 time 0.2064 (0.3446) data time 0.0010 (0.0133) model time 0.2054 (0.3313) loss 3.5892 (3.6672) grad_norm 1.3160 (1.2245) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][260/625] eta 0:01:58 lr 0.001931 wd 0.0500 time 0.2122 (0.3257) data time 0.0010 (0.0116) model time 0.2112 (0.3141) loss 3.1359 (3.6061) grad_norm 1.2873 (1.2357) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][270/625] eta 0:01:50 lr 0.001931 wd 0.0500 time 0.2157 (0.3116) data time 0.0011 (0.0103) model time 0.2147 (0.3014) loss 3.0723 (3.5906) grad_norm 1.0583 (1.2149) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][280/625] eta 0:01:43 lr 0.001931 wd 0.0500 time 0.2144 (0.3008) data time 0.0007 (0.0092) model time 0.2137 (0.2915) loss 4.2237 (3.5897) grad_norm 1.4510 (1.2163) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][290/625] eta 0:01:37 lr 0.001931 wd 0.0500 time 0.2121 (0.2919) data time 0.0010 (0.0084) model time 0.2112 (0.2835) loss 3.7635 (3.5942) grad_norm 0.8986 (1.2111) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][300/625] eta 0:01:32 lr 0.001931 wd 0.0500 time 0.2157 (0.2847) data time 0.0009 (0.0078) model time 0.2149 (0.2770) loss 3.1102 (3.5895) grad_norm 1.0694 (1.2138) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][310/625] eta 0:01:27 lr 0.001931 wd 0.0500 time 0.2083 (0.2786) data time 0.0007 (0.0072) model time 0.2076 (0.2714) loss 2.4338 (3.5811) grad_norm 1.0539 (1.2309) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:01:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][320/625] eta 0:01:23 lr 0.001931 wd 0.0500 time 0.2122 (0.2737) data time 0.0011 (0.0067) model time 0.2111 (0.2669) loss 3.9490 (3.5743) grad_norm 1.4349 (1.2303) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:02:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][330/625] eta 0:01:19 lr 0.001931 wd 0.0500 time 0.2107 (0.2694) data time 0.0010 (0.0063) model time 0.2097 (0.2631) loss 3.8305 (3.5556) grad_norm 1.8242 (1.2608) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][340/625] eta 0:01:15 lr 0.001931 wd 0.0500 time 0.2270 (0.2658) data time 0.0012 (0.0060) model time 0.2259 (0.2598) loss 2.6569 (3.5433) grad_norm 1.2238 (1.2595) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][350/625] eta 0:01:12 lr 0.001931 wd 0.0500 time 0.2127 (0.2625) data time 0.0010 (0.0057) model time 0.2117 (0.2569) loss 3.7397 (3.5412) grad_norm 1.4984 (1.2630) loss_scale 16384.0000 (8242.8820) mem 8977MB [2024-07-29 21:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][360/625] eta 0:01:08 lr 0.001931 wd 0.0500 time 0.2158 (0.2596) data time 0.0011 (0.0054) model time 0.2147 (0.2542) loss 3.4560 (3.5335) grad_norm 0.9585 (1.2587) loss_scale 16384.0000 (8718.9708) mem 8977MB [2024-07-29 21:02:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][370/625] eta 0:01:05 lr 0.001931 wd 0.0500 time 0.2126 (0.2570) data time 0.0011 (0.0052) model time 0.2115 (0.2519) loss 3.6940 (3.5203) grad_norm 1.0544 (1.2622) loss_scale 16384.0000 (9142.4530) mem 8977MB [2024-07-29 21:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][380/625] eta 0:01:02 lr 0.001930 wd 0.0500 time 0.2157 (0.2547) data time 0.0010 (0.0049) model time 0.2148 (0.2497) loss 3.0639 (3.5186) grad_norm 0.9141 (1.2678) loss_scale 16384.0000 (9521.5916) mem 8977MB [2024-07-29 21:02:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][390/625] eta 0:00:59 lr 0.001930 wd 0.0500 time 0.2069 (0.2525) data time 0.0011 (0.0048) model time 0.2059 (0.2477) loss 3.3203 (3.5059) grad_norm 1.1839 (1.2549) loss_scale 16384.0000 (9863.0050) mem 8977MB [2024-07-29 21:02:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][400/625] eta 0:00:56 lr 0.001930 wd 0.0500 time 0.2237 (0.2505) data time 0.0009 (0.0046) model time 0.2227 (0.2459) loss 3.7569 (3.4968) grad_norm 1.3375 (1.2548) loss_scale 16384.0000 (10172.0569) mem 8977MB [2024-07-29 21:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][410/625] eta 0:00:53 lr 0.001930 wd 0.0500 time 0.2031 (0.2489) data time 0.0009 (0.0044) model time 0.2022 (0.2445) loss 3.7420 (3.4888) grad_norm 1.7113 (1.2586) loss_scale 16384.0000 (10453.1403) mem 8977MB [2024-07-29 21:02:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][420/625] eta 0:00:50 lr 0.001930 wd 0.0500 time 0.2157 (0.2474) data time 0.0008 (0.0043) model time 0.2149 (0.2431) loss 2.2954 (3.4869) grad_norm 2.6779 (1.2775) loss_scale 16384.0000 (10709.8874) mem 8977MB [2024-07-29 21:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][430/625] eta 0:00:47 lr 0.001930 wd 0.0500 time 0.2095 (0.2460) data time 0.0009 (0.0041) model time 0.2086 (0.2419) loss 3.6892 (3.4905) grad_norm 0.8386 (1.2738) loss_scale 16384.0000 (10945.3278) mem 8977MB [2024-07-29 21:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][440/625] eta 0:00:45 lr 0.001930 wd 0.0500 time 0.2088 (0.2446) data time 0.0008 (0.0040) model time 0.2080 (0.2406) loss 3.4624 (3.4793) grad_norm 0.9321 (1.2742) loss_scale 16384.0000 (11162.0080) mem 8977MB [2024-07-29 21:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][450/625] eta 0:00:42 lr 0.001930 wd 0.0500 time 0.2325 (0.2435) data time 0.0009 (0.0039) model time 0.2316 (0.2396) loss 3.2068 (3.4686) grad_norm 1.1406 (1.2650) loss_scale 16384.0000 (11362.0843) mem 8977MB [2024-07-29 21:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][460/625] eta 0:00:40 lr 0.001930 wd 0.0500 time 0.2380 (0.2426) data time 0.0008 (0.0038) model time 0.2372 (0.2388) loss 3.9177 (3.4637) grad_norm 1.1153 (1.2610) loss_scale 16384.0000 (11547.3948) mem 8977MB [2024-07-29 21:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][470/625] eta 0:00:37 lr 0.001930 wd 0.0500 time 0.2228 (0.2417) data time 0.0010 (0.0037) model time 0.2218 (0.2380) loss 3.4878 (3.4627) grad_norm 0.8473 (1.2567) loss_scale 16384.0000 (11719.5160) mem 8977MB [2024-07-29 21:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][480/625] eta 0:00:34 lr 0.001930 wd 0.0500 time 0.2154 (0.2407) data time 0.0008 (0.0036) model time 0.2146 (0.2371) loss 2.5538 (3.4557) grad_norm 1.1279 (1.2519) loss_scale 16384.0000 (11879.8076) mem 8977MB [2024-07-29 21:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][490/625] eta 0:00:32 lr 0.001930 wd 0.0500 time 0.2144 (0.2398) data time 0.0010 (0.0035) model time 0.2133 (0.2363) loss 3.0090 (3.4451) grad_norm 0.9301 (1.2550) loss_scale 16384.0000 (12029.4485) mem 8977MB [2024-07-29 21:02:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][500/625] eta 0:00:29 lr 0.001930 wd 0.0500 time 0.2154 (0.2391) data time 0.0009 (0.0034) model time 0.2145 (0.2356) loss 3.5059 (3.4417) grad_norm 1.1514 (1.2563) loss_scale 16384.0000 (12169.4662) mem 8977MB [2024-07-29 21:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][510/625] eta 0:00:27 lr 0.001930 wd 0.0500 time 0.2087 (0.2382) data time 0.0008 (0.0034) model time 0.2079 (0.2348) loss 4.5836 (3.4554) grad_norm 1.2619 (1.2567) loss_scale 16384.0000 (12300.7601) mem 8977MB [2024-07-29 21:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][520/625] eta 0:00:24 lr 0.001930 wd 0.0500 time 0.2094 (0.2374) data time 0.0010 (0.0033) model time 0.2084 (0.2341) loss 2.4696 (3.4569) grad_norm 1.3707 (1.2605) loss_scale 16384.0000 (12424.1208) mem 8977MB [2024-07-29 21:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][530/625] eta 0:00:22 lr 0.001929 wd 0.0500 time 0.2109 (0.2367) data time 0.0009 (0.0032) model time 0.2100 (0.2335) loss 3.7696 (3.4612) grad_norm 1.3533 (1.2622) loss_scale 16384.0000 (12540.2463) mem 8977MB [2024-07-29 21:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][540/625] eta 0:00:20 lr 0.001929 wd 0.0500 time 0.2133 (0.2360) data time 0.0007 (0.0032) model time 0.2126 (0.2329) loss 3.9892 (3.4615) grad_norm 1.1883 (1.2582) loss_scale 16384.0000 (12649.7550) mem 8977MB [2024-07-29 21:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][550/625] eta 0:00:17 lr 0.001929 wd 0.0500 time 0.2120 (0.2353) data time 0.0008 (0.0031) model time 0.2112 (0.2322) loss 3.2342 (3.4637) grad_norm 1.6889 (1.2580) loss_scale 16384.0000 (12753.1967) mem 8977MB [2024-07-29 21:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][560/625] eta 0:00:15 lr 0.001929 wd 0.0500 time 0.2128 (0.2347) data time 0.0009 (0.0030) model time 0.2119 (0.2317) loss 2.7146 (3.4632) grad_norm 1.5071 (1.2613) loss_scale 16384.0000 (12851.0620) mem 8977MB [2024-07-29 21:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][570/625] eta 0:00:12 lr 0.001929 wd 0.0500 time 0.2073 (0.2342) data time 0.0009 (0.0030) model time 0.2064 (0.2312) loss 2.2635 (3.4606) grad_norm 0.9631 (1.2537) loss_scale 16384.0000 (12943.7900) mem 8977MB [2024-07-29 21:02:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][580/625] eta 0:00:10 lr 0.001929 wd 0.0500 time 0.2171 (0.2343) data time 0.0007 (0.0029) model time 0.2164 (0.2313) loss 4.2414 (3.4574) grad_norm 1.0210 (1.2534) loss_scale 16384.0000 (13031.7749) mem 8977MB [2024-07-29 21:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][590/625] eta 0:00:08 lr 0.001929 wd 0.0500 time 0.2206 (0.2338) data time 0.0010 (0.0029) model time 0.2196 (0.2309) loss 4.0555 (3.4641) grad_norm 0.9454 (1.2470) loss_scale 16384.0000 (13115.3716) mem 8977MB [2024-07-29 21:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][600/625] eta 0:00:05 lr 0.001929 wd 0.0500 time 0.2156 (0.2332) data time 0.0011 (0.0028) model time 0.2145 (0.2304) loss 3.9754 (3.4697) grad_norm 1.7881 (1.2586) loss_scale 16384.0000 (13194.9002) mem 8977MB [2024-07-29 21:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][610/625] eta 0:00:03 lr 0.001929 wd 0.0500 time 0.2090 (0.2333) data time 0.0005 (0.0028) model time 0.2085 (0.2305) loss 4.0778 (3.4671) grad_norm 1.6295 (1.2586) loss_scale 16384.0000 (13270.6508) mem 8977MB [2024-07-29 21:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][620/625] eta 0:00:01 lr 0.001929 wd 0.0500 time 0.2030 (0.2327) data time 0.0005 (0.0028) model time 0.2024 (0.2299) loss 3.7960 (3.4721) grad_norm 0.9275 (1.2566) loss_scale 16384.0000 (13342.8863) mem 8977MB [2024-07-29 21:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 53 training takes 0:01:41 [2024-07-29 21:03:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:03:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:03:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.8350 (0.8350) Acc@1 85.010 (85.010) Acc@5 97.070 (97.070) Mem 8977MB [2024-07-29 21:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 1.3672 (1.0404) Acc@1 70.947 (79.550) Acc@5 91.504 (95.508) Mem 8977MB [2024-07-29 21:03:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.4688 (1.2126) Acc@1 68.555 (75.567) Acc@5 90.234 (93.190) Mem 8977MB [2024-07-29 21:03:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.382 Acc@5 93.176 [2024-07-29 21:03:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.4% [2024-07-29 21:03:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.687 (0.687) Loss 0.6777 (0.6777) Acc@1 83.252 (83.252) Acc@5 96.533 (96.533) Mem 8977MB [2024-07-29 21:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.127) Loss 1.1963 (0.8880) Acc@1 70.508 (77.863) Acc@5 91.357 (94.735) Mem 8977MB [2024-07-29 21:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.4365 (1.0811) Acc@1 65.234 (73.658) Acc@5 87.305 (92.062) Mem 8977MB [2024-07-29 21:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.349 Acc@5 92.019 [2024-07-29 21:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 73.3% [2024-07-29 21:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 73.35% [2024-07-29 21:03:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:03:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][0/625] eta 0:20:39 lr 0.001929 wd 0.0500 time 1.9829 (1.9829) data time 0.4115 (0.4115) model time 0.0000 (0.0000) loss 3.1229 (3.1229) grad_norm 1.0078 (1.0078) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 21:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][10/625] eta 0:03:48 lr 0.001929 wd 0.0500 time 0.2085 (0.3718) data time 0.0008 (0.0384) model time 0.0000 (0.0000) loss 3.1622 (3.4665) grad_norm 2.3541 (1.0906) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][20/625] eta 0:02:59 lr 0.001929 wd 0.0500 time 0.2107 (0.2966) data time 0.0007 (0.0206) model time 0.0000 (0.0000) loss 3.2626 (3.3851) grad_norm 1.1160 (1.0606) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][30/625] eta 0:02:40 lr 0.001929 wd 0.0500 time 0.2179 (0.2703) data time 0.0007 (0.0143) model time 0.0000 (0.0000) loss 2.6212 (3.3022) grad_norm 1.3802 (1.2030) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][40/625] eta 0:02:30 lr 0.001929 wd 0.0500 time 0.2141 (0.2565) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 2.9241 (3.2689) grad_norm 1.3416 (1.2393) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][50/625] eta 0:02:22 lr 0.001929 wd 0.0500 time 0.2143 (0.2485) data time 0.0010 (0.0092) model time 0.0000 (0.0000) loss 3.4054 (3.3394) grad_norm 1.5806 (1.2600) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][60/625] eta 0:02:16 lr 0.001928 wd 0.0500 time 0.2114 (0.2423) data time 0.0009 (0.0079) model time 0.2105 (0.2096) loss 3.1867 (3.3472) grad_norm 1.2475 (1.2396) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][70/625] eta 0:02:12 lr 0.001928 wd 0.0500 time 0.2091 (0.2381) data time 0.0009 (0.0069) model time 0.2082 (0.2106) loss 3.6982 (3.3764) grad_norm 0.9863 (1.2151) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][80/625] eta 0:02:07 lr 0.001928 wd 0.0500 time 0.2102 (0.2348) data time 0.0010 (0.0062) model time 0.2091 (0.2105) loss 2.8041 (3.4175) grad_norm 1.6536 (1.1971) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][90/625] eta 0:02:04 lr 0.001928 wd 0.0500 time 0.2095 (0.2322) data time 0.0007 (0.0056) model time 0.2088 (0.2105) loss 1.9756 (3.3774) grad_norm 1.5407 (1.1890) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][100/625] eta 0:02:00 lr 0.001928 wd 0.0500 time 0.2221 (0.2304) data time 0.0010 (0.0052) model time 0.2211 (0.2109) loss 3.4018 (3.3807) grad_norm 1.5127 (1.2180) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][110/625] eta 0:01:57 lr 0.001928 wd 0.0500 time 0.2101 (0.2288) data time 0.0008 (0.0048) model time 0.2092 (0.2110) loss 2.7210 (3.3808) grad_norm 1.9483 (1.2113) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][120/625] eta 0:01:55 lr 0.001928 wd 0.0500 time 0.2241 (0.2278) data time 0.0010 (0.0045) model time 0.2231 (0.2118) loss 3.7850 (3.4066) grad_norm 1.1445 (1.2040) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][130/625] eta 0:01:52 lr 0.001928 wd 0.0500 time 0.2154 (0.2267) data time 0.0009 (0.0042) model time 0.2146 (0.2118) loss 2.7525 (3.4067) grad_norm 0.8560 (1.2045) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][140/625] eta 0:01:49 lr 0.001928 wd 0.0500 time 0.2061 (0.2256) data time 0.0011 (0.0040) model time 0.2050 (0.2116) loss 2.6280 (3.4091) grad_norm 1.1646 (1.2013) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][150/625] eta 0:01:47 lr 0.001928 wd 0.0500 time 0.2291 (0.2270) data time 0.0011 (0.0038) model time 0.2280 (0.2150) loss 3.5492 (3.4245) grad_norm 1.0590 (1.2123) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][160/625] eta 0:01:45 lr 0.001928 wd 0.0500 time 0.2093 (0.2261) data time 0.0008 (0.0036) model time 0.2086 (0.2147) loss 2.2611 (3.4276) grad_norm 1.1900 (1.2109) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][170/625] eta 0:01:42 lr 0.001928 wd 0.0500 time 0.2062 (0.2253) data time 0.0009 (0.0035) model time 0.2054 (0.2144) loss 3.0436 (3.4260) grad_norm 0.7774 (1.2097) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][180/625] eta 0:01:39 lr 0.001928 wd 0.0500 time 0.2090 (0.2245) data time 0.0010 (0.0034) model time 0.2080 (0.2141) loss 3.5159 (3.4305) grad_norm 1.1187 (1.2010) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][190/625] eta 0:01:37 lr 0.001928 wd 0.0500 time 0.2107 (0.2238) data time 0.0009 (0.0032) model time 0.2098 (0.2138) loss 2.7407 (3.4449) grad_norm 0.7992 (1.1916) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][200/625] eta 0:01:34 lr 0.001928 wd 0.0500 time 0.2156 (0.2232) data time 0.0007 (0.0031) model time 0.2149 (0.2136) loss 3.3261 (3.4528) grad_norm 0.9577 (1.1974) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][210/625] eta 0:01:32 lr 0.001927 wd 0.0500 time 0.2113 (0.2228) data time 0.0011 (0.0030) model time 0.2102 (0.2136) loss 3.6775 (3.4413) grad_norm 1.0386 (1.1938) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][220/625] eta 0:01:30 lr 0.001927 wd 0.0500 time 0.2113 (0.2223) data time 0.0009 (0.0029) model time 0.2103 (0.2134) loss 2.5266 (3.4400) grad_norm 1.2316 (1.1900) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][230/625] eta 0:01:27 lr 0.001927 wd 0.0500 time 0.2095 (0.2220) data time 0.0007 (0.0028) model time 0.2088 (0.2134) loss 4.1448 (3.4336) grad_norm 0.8295 (1.1863) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][240/625] eta 0:01:25 lr 0.001927 wd 0.0500 time 0.2142 (0.2216) data time 0.0009 (0.0028) model time 0.2133 (0.2133) loss 2.7040 (3.4466) grad_norm 0.8026 (1.1837) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][250/625] eta 0:01:22 lr 0.001927 wd 0.0500 time 0.2148 (0.2213) data time 0.0008 (0.0027) model time 0.2140 (0.2133) loss 3.4632 (3.4499) grad_norm 0.8367 (1.1808) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][260/625] eta 0:01:20 lr 0.001927 wd 0.0500 time 0.2136 (0.2210) data time 0.0008 (0.0026) model time 0.2128 (0.2132) loss 3.2532 (3.4435) grad_norm 1.6400 (1.1936) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][270/625] eta 0:01:18 lr 0.001927 wd 0.0500 time 0.2062 (0.2205) data time 0.0009 (0.0026) model time 0.2053 (0.2130) loss 4.2821 (3.4452) grad_norm 1.0576 (1.1974) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][280/625] eta 0:01:15 lr 0.001927 wd 0.0500 time 0.2129 (0.2202) data time 0.0009 (0.0025) model time 0.2119 (0.2129) loss 3.7375 (3.4334) grad_norm 1.1334 (1.1934) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][290/625] eta 0:01:13 lr 0.001927 wd 0.0500 time 0.2081 (0.2199) data time 0.0009 (0.0025) model time 0.2072 (0.2128) loss 3.9222 (3.4311) grad_norm 1.5765 (1.2083) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][300/625] eta 0:01:11 lr 0.001927 wd 0.0500 time 0.2194 (0.2198) data time 0.0008 (0.0025) model time 0.2186 (0.2128) loss 4.3185 (3.4402) grad_norm 1.0458 (1.2062) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][310/625] eta 0:01:09 lr 0.001927 wd 0.0500 time 0.2167 (0.2196) data time 0.0009 (0.0024) model time 0.2158 (0.2128) loss 4.1615 (3.4463) grad_norm 1.0030 (1.2099) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][320/625] eta 0:01:06 lr 0.001927 wd 0.0500 time 0.2141 (0.2193) data time 0.0007 (0.0024) model time 0.2133 (0.2127) loss 3.8262 (3.4404) grad_norm 0.9101 (1.2061) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][330/625] eta 0:01:04 lr 0.001927 wd 0.0500 time 0.2163 (0.2192) data time 0.0012 (0.0024) model time 0.2151 (0.2128) loss 4.0658 (3.4487) grad_norm 0.9740 (1.2048) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][340/625] eta 0:01:02 lr 0.001927 wd 0.0500 time 0.2123 (0.2191) data time 0.0010 (0.0023) model time 0.2113 (0.2128) loss 3.6481 (3.4533) grad_norm 1.5366 (1.2165) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][350/625] eta 0:01:00 lr 0.001926 wd 0.0500 time 0.2188 (0.2190) data time 0.0011 (0.0024) model time 0.2177 (0.2128) loss 3.2133 (3.4506) grad_norm 0.8646 (1.2138) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][360/625] eta 0:00:57 lr 0.001926 wd 0.0500 time 0.2061 (0.2188) data time 0.0008 (0.0024) model time 0.2053 (0.2127) loss 2.4272 (3.4478) grad_norm 0.9643 (1.2092) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][370/625] eta 0:00:55 lr 0.001926 wd 0.0500 time 0.2081 (0.2189) data time 0.0009 (0.0023) model time 0.2072 (0.2129) loss 2.2417 (3.4462) grad_norm 1.1620 (1.2142) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][380/625] eta 0:00:53 lr 0.001926 wd 0.0500 time 0.2129 (0.2187) data time 0.0008 (0.0023) model time 0.2121 (0.2128) loss 4.0694 (3.4417) grad_norm 0.9564 (1.2138) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][390/625] eta 0:00:51 lr 0.001926 wd 0.0500 time 0.2087 (0.2185) data time 0.0010 (0.0023) model time 0.2077 (0.2128) loss 2.3185 (3.4354) grad_norm 1.4352 (1.2161) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][400/625] eta 0:00:49 lr 0.001926 wd 0.0500 time 0.2188 (0.2184) data time 0.0007 (0.0022) model time 0.2181 (0.2128) loss 4.0873 (3.4368) grad_norm 0.9425 (1.2149) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][410/625] eta 0:00:46 lr 0.001926 wd 0.0500 time 0.2098 (0.2183) data time 0.0008 (0.0022) model time 0.2090 (0.2128) loss 4.1093 (3.4333) grad_norm 1.7474 (1.2172) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][420/625] eta 0:00:44 lr 0.001926 wd 0.0500 time 0.2118 (0.2182) data time 0.0009 (0.0022) model time 0.2109 (0.2128) loss 3.8359 (3.4347) grad_norm 1.3342 (1.2166) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][430/625] eta 0:00:42 lr 0.001926 wd 0.0500 time 0.2185 (0.2181) data time 0.0009 (0.0021) model time 0.2176 (0.2129) loss 3.6643 (3.4308) grad_norm 1.0585 (1.2168) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][440/625] eta 0:00:40 lr 0.001926 wd 0.0500 time 0.2093 (0.2181) data time 0.0011 (0.0021) model time 0.2083 (0.2129) loss 3.3271 (3.4339) grad_norm 0.8860 (1.2161) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][450/625] eta 0:00:38 lr 0.001926 wd 0.0500 time 0.2129 (0.2179) data time 0.0008 (0.0021) model time 0.2121 (0.2129) loss 4.1084 (3.4322) grad_norm 1.0852 (1.2130) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][460/625] eta 0:00:35 lr 0.001926 wd 0.0500 time 0.2278 (0.2179) data time 0.0010 (0.0021) model time 0.2268 (0.2129) loss 3.9229 (3.4311) grad_norm 1.8492 (1.2179) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][470/625] eta 0:00:33 lr 0.001926 wd 0.0500 time 0.2058 (0.2177) data time 0.0010 (0.0020) model time 0.2048 (0.2128) loss 3.6777 (3.4341) grad_norm 1.7953 (1.2201) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][480/625] eta 0:00:31 lr 0.001926 wd 0.0500 time 0.2097 (0.2176) data time 0.0009 (0.0020) model time 0.2088 (0.2128) loss 4.0147 (3.4322) grad_norm 1.0067 (1.2233) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][490/625] eta 0:00:29 lr 0.001926 wd 0.0500 time 0.2154 (0.2175) data time 0.0010 (0.0020) model time 0.2145 (0.2128) loss 3.9504 (3.4383) grad_norm 0.9347 (1.2233) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][500/625] eta 0:00:27 lr 0.001925 wd 0.0500 time 0.2086 (0.2174) data time 0.0009 (0.0020) model time 0.2077 (0.2127) loss 2.2328 (3.4374) grad_norm 1.0816 (1.2237) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][510/625] eta 0:00:25 lr 0.001925 wd 0.0500 time 0.2118 (0.2175) data time 0.0009 (0.0020) model time 0.2109 (0.2128) loss 3.5142 (3.4350) grad_norm 0.8631 (1.2239) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][520/625] eta 0:00:22 lr 0.001925 wd 0.0500 time 0.2139 (0.2174) data time 0.0007 (0.0020) model time 0.2132 (0.2128) loss 3.7015 (3.4294) grad_norm 2.5474 (1.2275) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][530/625] eta 0:00:20 lr 0.001925 wd 0.0500 time 0.2106 (0.2173) data time 0.0010 (0.0020) model time 0.2095 (0.2128) loss 4.1540 (3.4333) grad_norm 1.1953 (1.2268) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][540/625] eta 0:00:18 lr 0.001925 wd 0.0500 time 0.2239 (0.2172) data time 0.0011 (0.0020) model time 0.2229 (0.2128) loss 4.0035 (3.4418) grad_norm 1.1136 (1.2252) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][550/625] eta 0:00:16 lr 0.001925 wd 0.0500 time 0.2089 (0.2171) data time 0.0010 (0.0020) model time 0.2080 (0.2127) loss 2.4133 (3.4377) grad_norm 1.4682 (1.2230) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][560/625] eta 0:00:14 lr 0.001925 wd 0.0500 time 0.2077 (0.2171) data time 0.0010 (0.0020) model time 0.2066 (0.2127) loss 3.9667 (3.4394) grad_norm 0.8523 (1.2224) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][570/625] eta 0:00:11 lr 0.001925 wd 0.0500 time 0.2159 (0.2171) data time 0.0010 (0.0019) model time 0.2149 (0.2127) loss 3.8879 (3.4376) grad_norm 1.9671 (1.2294) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][580/625] eta 0:00:09 lr 0.001925 wd 0.0500 time 0.2168 (0.2170) data time 0.0007 (0.0019) model time 0.2161 (0.2127) loss 3.1517 (3.4442) grad_norm 0.7904 (1.2273) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][590/625] eta 0:00:07 lr 0.001925 wd 0.0500 time 0.2106 (0.2169) data time 0.0010 (0.0019) model time 0.2096 (0.2127) loss 3.0862 (3.4370) grad_norm 0.9309 (1.2257) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][600/625] eta 0:00:05 lr 0.001925 wd 0.0500 time 0.2159 (0.2168) data time 0.0007 (0.0019) model time 0.2151 (0.2127) loss 3.8681 (3.4369) grad_norm 0.9487 (1.2241) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][610/625] eta 0:00:03 lr 0.001925 wd 0.0500 time 0.2088 (0.2167) data time 0.0007 (0.0019) model time 0.2081 (0.2126) loss 3.6862 (3.4396) grad_norm 0.9825 (1.2204) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][620/625] eta 0:00:01 lr 0.001925 wd 0.0500 time 0.2069 (0.2166) data time 0.0007 (0.0019) model time 0.2062 (0.2125) loss 3.6054 (3.4406) grad_norm 1.2495 (1.2168) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 54 training takes 0:02:15 [2024-07-29 21:05:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:05:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.578 (0.578) Loss 0.8877 (0.8877) Acc@1 83.887 (83.887) Acc@5 97.021 (97.021) Mem 8975MB [2024-07-29 21:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.110) Loss 1.3320 (1.0243) Acc@1 70.361 (79.750) Acc@5 91.846 (95.552) Mem 8975MB [2024-07-29 21:05:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.4883 (1.1996) Acc@1 67.871 (75.556) Acc@5 89.941 (93.180) Mem 8975MB [2024-07-29 21:05:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.368 Acc@5 93.148 [2024-07-29 21:05:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.4% [2024-07-29 21:05:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.6597 (0.6597) Acc@1 83.545 (83.545) Acc@5 96.777 (96.777) Mem 8975MB [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.132) Loss 1.1719 (0.8666) Acc@1 70.996 (78.338) Acc@5 91.602 (94.895) Mem 8975MB [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.095) Loss 1.4072 (1.0573) Acc@1 65.674 (74.154) Acc@5 87.598 (92.336) Mem 8975MB [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.860 Acc@5 92.312 [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 73.9% [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 73.86% [2024-07-29 21:05:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:05:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][0/625] eta 0:06:14 lr 0.001925 wd 0.0500 time 0.5997 (0.5997) data time 0.3920 (0.3920) model time 0.0000 (0.0000) loss 2.6863 (2.6863) grad_norm 1.0328 (1.0328) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][10/625] eta 0:02:33 lr 0.001925 wd 0.0500 time 0.2298 (0.2497) data time 0.0011 (0.0366) model time 0.0000 (0.0000) loss 4.3995 (3.2968) grad_norm 2.1704 (1.3040) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][20/625] eta 0:02:21 lr 0.001925 wd 0.0500 time 0.2172 (0.2334) data time 0.0010 (0.0197) model time 0.0000 (0.0000) loss 2.0938 (3.2952) grad_norm 1.2248 (1.2448) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][30/625] eta 0:02:18 lr 0.001924 wd 0.0500 time 0.2097 (0.2334) data time 0.0010 (0.0136) model time 0.0000 (0.0000) loss 3.6359 (3.2740) grad_norm 0.9652 (1.1917) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][40/625] eta 0:02:13 lr 0.001924 wd 0.0500 time 0.2085 (0.2279) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 3.6157 (3.2904) grad_norm 1.6280 (1.1987) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][50/625] eta 0:02:09 lr 0.001924 wd 0.0500 time 0.2144 (0.2249) data time 0.0007 (0.0087) model time 0.0000 (0.0000) loss 3.9425 (3.3530) grad_norm 1.0651 (1.1851) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][60/625] eta 0:02:05 lr 0.001924 wd 0.0500 time 0.2164 (0.2227) data time 0.0009 (0.0074) model time 0.2155 (0.2100) loss 2.4130 (3.3974) grad_norm 1.3890 (1.1885) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][70/625] eta 0:02:02 lr 0.001924 wd 0.0500 time 0.2159 (0.2209) data time 0.0007 (0.0065) model time 0.2152 (0.2096) loss 4.0070 (3.4149) grad_norm 1.1090 (1.2049) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][80/625] eta 0:01:59 lr 0.001924 wd 0.0500 time 0.2120 (0.2197) data time 0.0010 (0.0059) model time 0.2110 (0.2097) loss 2.5175 (3.3967) grad_norm 1.5214 (1.2212) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][90/625] eta 0:01:57 lr 0.001924 wd 0.0500 time 0.2060 (0.2188) data time 0.0010 (0.0053) model time 0.2050 (0.2098) loss 3.5237 (3.3970) grad_norm 1.4841 (1.2240) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:05:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][100/625] eta 0:01:54 lr 0.001924 wd 0.0500 time 0.2114 (0.2181) data time 0.0007 (0.0049) model time 0.2106 (0.2101) loss 4.2620 (3.4087) grad_norm 0.9687 (1.2306) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][110/625] eta 0:01:52 lr 0.001924 wd 0.0500 time 0.2139 (0.2176) data time 0.0010 (0.0046) model time 0.2129 (0.2103) loss 3.5402 (3.4187) grad_norm 1.6446 (1.2218) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][120/625] eta 0:01:49 lr 0.001924 wd 0.0500 time 0.2147 (0.2174) data time 0.0010 (0.0043) model time 0.2137 (0.2109) loss 3.6254 (3.4656) grad_norm 1.1683 (1.2345) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][130/625] eta 0:01:47 lr 0.001924 wd 0.0500 time 0.2095 (0.2168) data time 0.0011 (0.0040) model time 0.2084 (0.2106) loss 2.5895 (3.4802) grad_norm 1.1479 (1.2623) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][140/625] eta 0:01:44 lr 0.001924 wd 0.0500 time 0.2085 (0.2165) data time 0.0009 (0.0038) model time 0.2076 (0.2106) loss 3.8401 (3.4860) grad_norm 0.8906 (1.2645) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][150/625] eta 0:01:42 lr 0.001924 wd 0.0500 time 0.2422 (0.2163) data time 0.0008 (0.0036) model time 0.2414 (0.2108) loss 2.6745 (3.4779) grad_norm 0.9062 (1.2509) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][160/625] eta 0:01:41 lr 0.001924 wd 0.0500 time 0.2213 (0.2176) data time 0.0009 (0.0035) model time 0.2204 (0.2131) loss 2.6227 (3.4726) grad_norm 1.1950 (1.2496) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][170/625] eta 0:01:38 lr 0.001923 wd 0.0500 time 0.2217 (0.2172) data time 0.0010 (0.0033) model time 0.2207 (0.2129) loss 3.6910 (3.4758) grad_norm 0.9174 (1.2447) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][180/625] eta 0:01:36 lr 0.001923 wd 0.0500 time 0.2172 (0.2170) data time 0.0009 (0.0032) model time 0.2163 (0.2128) loss 3.9762 (3.4677) grad_norm 1.7607 (1.2635) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][190/625] eta 0:01:34 lr 0.001923 wd 0.0500 time 0.2098 (0.2166) data time 0.0007 (0.0031) model time 0.2091 (0.2126) loss 3.6555 (3.4563) grad_norm 1.0435 (1.2674) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][200/625] eta 0:01:31 lr 0.001923 wd 0.0500 time 0.2151 (0.2164) data time 0.0009 (0.0030) model time 0.2142 (0.2125) loss 3.3366 (3.4658) grad_norm 0.7813 (1.2517) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][210/625] eta 0:01:29 lr 0.001923 wd 0.0500 time 0.2094 (0.2161) data time 0.0011 (0.0029) model time 0.2083 (0.2123) loss 2.8685 (3.4767) grad_norm 1.2699 (1.2451) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][220/625] eta 0:01:27 lr 0.001923 wd 0.0500 time 0.2150 (0.2159) data time 0.0009 (0.0028) model time 0.2141 (0.2122) loss 3.8514 (3.4887) grad_norm 0.7525 (1.2392) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][230/625] eta 0:01:25 lr 0.001923 wd 0.0500 time 0.2184 (0.2158) data time 0.0008 (0.0027) model time 0.2176 (0.2122) loss 4.1315 (3.4991) grad_norm 1.1041 (1.2358) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][240/625] eta 0:01:22 lr 0.001923 wd 0.0500 time 0.2101 (0.2156) data time 0.0008 (0.0027) model time 0.2093 (0.2120) loss 3.8878 (3.4944) grad_norm 1.1224 (1.2340) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 21:06:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:06:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:10:38 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 55) [2024-07-29 21:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:11:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][250/625] eta 0:25:15 lr 0.001923 wd 0.0500 time 0.7415 (4.0426) data time 0.0007 (0.3945) model time 0.7408 (3.6481) loss 3.9586 (4.1506) grad_norm 0.7983 (0.9590) loss_scale 16384.0000 (16384.0000) mem 8976MB [2024-07-29 21:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][260/625] eta 0:05:06 lr 0.001923 wd 0.0500 time 0.1973 (0.8411) data time 0.0006 (0.0665) model time 0.1968 (0.7746) loss 2.6525 (3.7539) grad_norm 0.9369 (1.0398) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][270/625] eta 0:03:15 lr 0.001923 wd 0.0500 time 0.1985 (0.5498) data time 0.0008 (0.0368) model time 0.1977 (0.5130) loss 3.8044 (3.6968) grad_norm 0.9858 (1.0671) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][280/625] eta 0:02:31 lr 0.001923 wd 0.0500 time 0.1961 (0.4404) data time 0.0007 (0.0256) model time 0.1954 (0.4148) loss 3.7618 (3.7202) grad_norm 1.1251 (1.1197) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][290/625] eta 0:02:08 lr 0.001923 wd 0.0500 time 0.2055 (0.3833) data time 0.0009 (0.0197) model time 0.2046 (0.3636) loss 3.6125 (3.6668) grad_norm 0.7627 (1.1673) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][300/625] eta 0:01:53 lr 0.001923 wd 0.0500 time 0.2040 (0.3485) data time 0.0006 (0.0161) model time 0.2033 (0.3325) loss 3.6484 (3.6670) grad_norm 1.1784 (1.1648) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][310/625] eta 0:01:42 lr 0.001923 wd 0.0500 time 0.1994 (0.3247) data time 0.0006 (0.0136) model time 0.1988 (0.3111) loss 4.3096 (3.6201) grad_norm 1.2053 (1.1655) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][320/625] eta 0:01:33 lr 0.001922 wd 0.0500 time 0.1942 (0.3072) data time 0.0011 (0.0119) model time 0.1931 (0.2954) loss 3.1007 (3.5589) grad_norm 1.5056 (1.1773) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][330/625] eta 0:01:26 lr 0.001922 wd 0.0500 time 0.1973 (0.2940) data time 0.0009 (0.0105) model time 0.1963 (0.2835) loss 3.3721 (3.5470) grad_norm 1.3873 (1.2527) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][340/625] eta 0:01:20 lr 0.001922 wd 0.0500 time 0.1967 (0.2837) data time 0.0006 (0.0095) model time 0.1960 (0.2742) loss 2.4469 (3.5283) grad_norm 1.1702 (1.2511) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][350/625] eta 0:01:15 lr 0.001922 wd 0.0500 time 0.1978 (0.2755) data time 0.0006 (0.0086) model time 0.1972 (0.2669) loss 4.1082 (3.5549) grad_norm 0.8717 (1.2541) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][360/625] eta 0:01:11 lr 0.001922 wd 0.0500 time 0.1974 (0.2687) data time 0.0009 (0.0080) model time 0.1965 (0.2608) loss 3.9635 (3.5514) grad_norm 1.0394 (1.2537) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][370/625] eta 0:01:07 lr 0.001922 wd 0.0500 time 0.1987 (0.2633) data time 0.0007 (0.0074) model time 0.1980 (0.2560) loss 4.0024 (3.5527) grad_norm 0.8186 (1.2339) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][380/625] eta 0:01:03 lr 0.001922 wd 0.0500 time 0.1963 (0.2584) data time 0.0008 (0.0069) model time 0.1955 (0.2515) loss 3.6030 (3.5384) grad_norm 0.9672 (1.2309) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][390/625] eta 0:00:59 lr 0.001922 wd 0.0500 time 0.1994 (0.2542) data time 0.0007 (0.0065) model time 0.1987 (0.2477) loss 3.3522 (3.5253) grad_norm 1.1135 (1.2266) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][400/625] eta 0:00:56 lr 0.001922 wd 0.0500 time 0.1989 (0.2506) data time 0.0008 (0.0061) model time 0.1981 (0.2444) loss 3.7656 (3.5234) grad_norm 1.5778 (1.2289) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][410/625] eta 0:00:53 lr 0.001922 wd 0.0500 time 0.2005 (0.2475) data time 0.0008 (0.0058) model time 0.1997 (0.2417) loss 3.5405 (3.5270) grad_norm 1.7340 (1.2273) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][420/625] eta 0:00:50 lr 0.001922 wd 0.0500 time 0.1998 (0.2448) data time 0.0008 (0.0055) model time 0.1990 (0.2392) loss 3.3932 (3.5188) grad_norm 1.1871 (1.2347) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][430/625] eta 0:00:47 lr 0.001922 wd 0.0500 time 0.1968 (0.2424) data time 0.0009 (0.0053) model time 0.1958 (0.2371) loss 3.9003 (3.5116) grad_norm 2.4410 (1.2419) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][440/625] eta 0:00:44 lr 0.001922 wd 0.0500 time 0.2027 (0.2402) data time 0.0008 (0.0051) model time 0.2019 (0.2351) loss 4.2112 (3.5110) grad_norm 1.3296 (1.2426) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][450/625] eta 0:00:41 lr 0.001922 wd 0.0500 time 0.2027 (0.2382) data time 0.0006 (0.0048) model time 0.2021 (0.2334) loss 3.6935 (3.4977) grad_norm 1.3135 (1.2411) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][460/625] eta 0:00:39 lr 0.001921 wd 0.0500 time 0.1994 (0.2364) data time 0.0009 (0.0047) model time 0.1985 (0.2318) loss 3.1556 (3.4856) grad_norm 1.1238 (1.2355) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][470/625] eta 0:00:36 lr 0.001921 wd 0.0500 time 0.1970 (0.2348) data time 0.0007 (0.0045) model time 0.1963 (0.2303) loss 4.1937 (3.4837) grad_norm 1.5062 (1.2372) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][480/625] eta 0:00:33 lr 0.001921 wd 0.0500 time 0.2022 (0.2333) data time 0.0008 (0.0044) model time 0.2014 (0.2289) loss 3.3627 (3.4795) grad_norm 1.0667 (1.2295) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][490/625] eta 0:00:31 lr 0.001921 wd 0.0500 time 0.2006 (0.2319) data time 0.0006 (0.0042) model time 0.2000 (0.2277) loss 4.2743 (3.4771) grad_norm 0.9035 (1.2314) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][500/625] eta 0:00:28 lr 0.001921 wd 0.0500 time 0.1977 (0.2307) data time 0.0006 (0.0041) model time 0.1970 (0.2266) loss 4.0134 (3.4665) grad_norm 1.1163 (1.2315) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][510/625] eta 0:00:26 lr 0.001921 wd 0.0500 time 0.1984 (0.2295) data time 0.0009 (0.0040) model time 0.1976 (0.2255) loss 3.8467 (3.4548) grad_norm 1.8599 (1.2383) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 21:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][520/625] eta 0:00:23 lr 0.001921 wd 0.0500 time 0.2007 (0.2284) data time 0.0009 (0.0038) model time 0.1998 (0.2246) loss 3.1190 (3.4494) grad_norm 1.9507 (inf) loss_scale 8192.0000 (16173.1765) mem 8977MB [2024-07-29 21:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][530/625] eta 0:00:21 lr 0.001921 wd 0.0500 time 0.1996 (0.2274) data time 0.0009 (0.0037) model time 0.1987 (0.2237) loss 3.1216 (3.4543) grad_norm 0.9563 (inf) loss_scale 8192.0000 (15890.1560) mem 8977MB [2024-07-29 21:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][540/625] eta 0:00:19 lr 0.001921 wd 0.0500 time 0.1990 (0.2265) data time 0.0009 (0.0037) model time 0.1980 (0.2228) loss 3.5088 (3.4509) grad_norm 1.0397 (inf) loss_scale 8192.0000 (15626.5205) mem 8977MB [2024-07-29 21:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][550/625] eta 0:00:16 lr 0.001921 wd 0.0500 time 0.1963 (0.2256) data time 0.0009 (0.0036) model time 0.1955 (0.2220) loss 3.6829 (3.4463) grad_norm 1.5216 (inf) loss_scale 8192.0000 (15380.3444) mem 8977MB [2024-07-29 21:12:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][560/625] eta 0:00:14 lr 0.001921 wd 0.0500 time 0.2028 (0.2248) data time 0.0009 (0.0035) model time 0.2019 (0.2213) loss 3.3375 (3.4423) grad_norm 1.0306 (inf) loss_scale 8192.0000 (15149.9487) mem 8977MB [2024-07-29 21:12:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][570/625] eta 0:00:12 lr 0.001921 wd 0.0500 time 0.2002 (0.2240) data time 0.0007 (0.0034) model time 0.1995 (0.2206) loss 3.5226 (3.4545) grad_norm 1.2223 (inf) loss_scale 8192.0000 (14933.8634) mem 8977MB [2024-07-29 21:12:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][580/625] eta 0:00:10 lr 0.001921 wd 0.0500 time 0.1981 (0.2233) data time 0.0008 (0.0033) model time 0.1973 (0.2200) loss 3.7964 (3.4545) grad_norm 1.7523 (inf) loss_scale 8192.0000 (14730.7952) mem 8977MB [2024-07-29 21:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][590/625] eta 0:00:07 lr 0.001921 wd 0.0500 time 0.1997 (0.2226) data time 0.0007 (0.0033) model time 0.1990 (0.2194) loss 3.6206 (3.4565) grad_norm 1.2619 (inf) loss_scale 8192.0000 (14539.6023) mem 8977MB [2024-07-29 21:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][600/625] eta 0:00:05 lr 0.001921 wd 0.0500 time 0.1985 (0.2221) data time 0.0007 (0.0032) model time 0.1978 (0.2189) loss 4.0220 (3.4591) grad_norm 0.9693 (inf) loss_scale 8192.0000 (14359.2727) mem 8977MB [2024-07-29 21:12:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][610/625] eta 0:00:03 lr 0.001920 wd 0.0500 time 0.1976 (0.2216) data time 0.0004 (0.0031) model time 0.1972 (0.2184) loss 4.1107 (3.4608) grad_norm 1.7799 (inf) loss_scale 8192.0000 (14188.9061) mem 8977MB [2024-07-29 21:12:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][620/625] eta 0:00:01 lr 0.001920 wd 0.0500 time 0.1995 (0.2210) data time 0.0005 (0.0031) model time 0.1990 (0.2179) loss 3.7280 (3.4573) grad_norm 0.9044 (inf) loss_scale 8192.0000 (14027.6989) mem 8977MB [2024-07-29 21:12:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 55 training takes 0:01:23 [2024-07-29 21:12:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:12:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:12:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.426 (0.426) Loss 0.8657 (0.8657) Acc@1 84.473 (84.473) Acc@5 97.412 (97.412) Mem 8977MB [2024-07-29 21:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.091) Loss 1.4043 (1.0330) Acc@1 70.459 (80.158) Acc@5 91.895 (95.854) Mem 8977MB [2024-07-29 21:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.074) Loss 1.4307 (1.2156) Acc@1 70.166 (76.077) Acc@5 91.650 (93.497) Mem 8977MB [2024-07-29 21:12:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.754 Acc@5 93.364 [2024-07-29 21:12:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.8% [2024-07-29 21:12:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.75% [2024-07-29 21:12:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 21:12:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 21:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.387 (0.387) Loss 0.6440 (0.6440) Acc@1 84.082 (84.082) Acc@5 96.826 (96.826) Mem 8977MB [2024-07-29 21:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.085) Loss 1.1484 (0.8473) Acc@1 71.680 (78.893) Acc@5 91.748 (95.095) Mem 8977MB [2024-07-29 21:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.071) Loss 1.3818 (1.0358) Acc@1 66.162 (74.630) Acc@5 88.086 (92.590) Mem 8977MB [2024-07-29 21:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.324 Acc@5 92.580 [2024-07-29 21:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 74.3% [2024-07-29 21:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 74.32% [2024-07-29 21:12:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:12:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][0/625] eta 0:10:23 lr 0.001920 wd 0.0500 time 0.9984 (0.9984) data time 0.5834 (0.5834) model time 0.0000 (0.0000) loss 3.4486 (3.4486) grad_norm 1.0243 (1.0243) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 21:12:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][10/625] eta 0:02:47 lr 0.001920 wd 0.0500 time 0.2009 (0.2721) data time 0.0006 (0.0538) model time 0.0000 (0.0000) loss 3.8810 (3.3675) grad_norm 1.0399 (1.5454) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][20/625] eta 0:02:23 lr 0.001920 wd 0.0500 time 0.1986 (0.2376) data time 0.0006 (0.0286) model time 0.0000 (0.0000) loss 3.2553 (3.4278) grad_norm 1.4958 (1.3540) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][30/625] eta 0:02:14 lr 0.001920 wd 0.0500 time 0.2007 (0.2258) data time 0.0008 (0.0197) model time 0.0000 (0.0000) loss 3.3156 (3.4804) grad_norm 1.7267 (1.3232) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][40/625] eta 0:02:11 lr 0.001920 wd 0.0500 time 0.1964 (0.2243) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 2.2116 (3.4501) grad_norm 0.9361 (1.2675) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][50/625] eta 0:02:06 lr 0.001920 wd 0.0500 time 0.2047 (0.2198) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 3.9445 (3.4895) grad_norm 0.8556 (1.2632) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][60/625] eta 0:02:02 lr 0.001920 wd 0.0500 time 0.2000 (0.2164) data time 0.0008 (0.0105) model time 0.1992 (0.1983) loss 3.9812 (3.5397) grad_norm 0.9439 (1.2400) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][70/625] eta 0:01:58 lr 0.001920 wd 0.0500 time 0.1977 (0.2142) data time 0.0007 (0.0091) model time 0.1969 (0.1991) loss 2.6195 (3.5234) grad_norm 0.7872 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][80/625] eta 0:01:55 lr 0.001920 wd 0.0500 time 0.2034 (0.2124) data time 0.0008 (0.0081) model time 0.2026 (0.1991) loss 2.8899 (3.4896) grad_norm 1.2234 (1.2085) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][90/625] eta 0:01:52 lr 0.001920 wd 0.0500 time 0.2019 (0.2110) data time 0.0008 (0.0073) model time 0.2011 (0.1990) loss 3.0105 (3.4743) grad_norm 1.3658 (1.2154) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][100/625] eta 0:01:50 lr 0.001920 wd 0.0500 time 0.2087 (0.2101) data time 0.0006 (0.0067) model time 0.2081 (0.1994) loss 4.4962 (3.4585) grad_norm 1.7593 (1.2037) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][110/625] eta 0:01:47 lr 0.001920 wd 0.0500 time 0.1998 (0.2092) data time 0.0007 (0.0061) model time 0.1991 (0.1992) loss 3.4539 (3.4661) grad_norm 0.9535 (1.1977) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][120/625] eta 0:01:45 lr 0.001919 wd 0.0500 time 0.2004 (0.2086) data time 0.0006 (0.0057) model time 0.1999 (0.1996) loss 3.5839 (3.4660) grad_norm 1.4241 (1.1980) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][130/625] eta 0:01:42 lr 0.001919 wd 0.0500 time 0.1988 (0.2078) data time 0.0008 (0.0053) model time 0.1980 (0.1994) loss 3.6773 (3.4592) grad_norm 1.1120 (1.1924) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][140/625] eta 0:01:40 lr 0.001919 wd 0.0500 time 0.1982 (0.2074) data time 0.0008 (0.0050) model time 0.1973 (0.1995) loss 3.5158 (3.4875) grad_norm 2.0089 (1.1980) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][150/625] eta 0:01:38 lr 0.001919 wd 0.0500 time 0.2078 (0.2073) data time 0.0007 (0.0047) model time 0.2071 (0.2000) loss 3.9104 (3.4561) grad_norm 0.9393 (1.2197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][160/625] eta 0:01:36 lr 0.001919 wd 0.0500 time 0.2011 (0.2069) data time 0.0006 (0.0045) model time 0.2005 (0.2000) loss 2.3299 (3.4412) grad_norm 1.4888 (1.2141) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][170/625] eta 0:01:33 lr 0.001919 wd 0.0500 time 0.1986 (0.2065) data time 0.0006 (0.0043) model time 0.1980 (0.2000) loss 3.7414 (3.4421) grad_norm 1.2393 (1.2106) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][180/625] eta 0:01:31 lr 0.001919 wd 0.0500 time 0.1984 (0.2064) data time 0.0006 (0.0041) model time 0.1978 (0.2002) loss 2.4487 (3.4528) grad_norm 1.7108 (1.2133) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][190/625] eta 0:01:30 lr 0.001919 wd 0.0500 time 0.1995 (0.2074) data time 0.0008 (0.0039) model time 0.1986 (0.2020) loss 3.6333 (3.4610) grad_norm 0.7819 (1.2123) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][200/625] eta 0:01:28 lr 0.001919 wd 0.0500 time 0.2017 (0.2073) data time 0.0006 (0.0039) model time 0.2011 (0.2021) loss 4.0198 (3.4595) grad_norm 1.2211 (1.2220) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][210/625] eta 0:01:25 lr 0.001919 wd 0.0500 time 0.1962 (0.2071) data time 0.0006 (0.0037) model time 0.1955 (0.2020) loss 4.0485 (3.4680) grad_norm 0.9626 (1.2214) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][220/625] eta 0:01:23 lr 0.001919 wd 0.0500 time 0.1987 (0.2068) data time 0.0006 (0.0036) model time 0.1981 (0.2019) loss 4.0564 (3.4694) grad_norm 1.1454 (1.2276) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][230/625] eta 0:01:21 lr 0.001919 wd 0.0500 time 0.1976 (0.2065) data time 0.0007 (0.0035) model time 0.1969 (0.2017) loss 3.9481 (3.4644) grad_norm 1.0103 (1.2258) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][240/625] eta 0:01:19 lr 0.001919 wd 0.0500 time 0.1977 (0.2063) data time 0.0009 (0.0034) model time 0.1968 (0.2016) loss 4.0407 (3.4654) grad_norm 1.3042 (1.2229) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][250/625] eta 0:01:17 lr 0.001919 wd 0.0500 time 0.2008 (0.2060) data time 0.0008 (0.0033) model time 0.2000 (0.2016) loss 3.3704 (3.4715) grad_norm 0.9213 (1.2178) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][260/625] eta 0:01:15 lr 0.001919 wd 0.0500 time 0.2162 (0.2059) data time 0.0006 (0.0032) model time 0.2156 (0.2016) loss 3.5634 (3.4714) grad_norm 0.9911 (1.2116) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][270/625] eta 0:01:13 lr 0.001918 wd 0.0500 time 0.1970 (0.2059) data time 0.0008 (0.0031) model time 0.1962 (0.2017) loss 3.6277 (3.4644) grad_norm 1.0237 (1.2187) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][280/625] eta 0:01:11 lr 0.001918 wd 0.0500 time 0.2024 (0.2063) data time 0.0008 (0.0032) model time 0.2016 (0.2021) loss 2.0461 (3.4588) grad_norm 1.4415 (1.2197) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][290/625] eta 0:01:09 lr 0.001918 wd 0.0500 time 0.1980 (0.2061) data time 0.0009 (0.0031) model time 0.1972 (0.2021) loss 3.7542 (3.4563) grad_norm 3.0776 (1.2278) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][300/625] eta 0:01:06 lr 0.001918 wd 0.0500 time 0.1984 (0.2059) data time 0.0007 (0.0030) model time 0.1978 (0.2019) loss 3.4129 (3.4666) grad_norm 2.0534 (1.2376) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][310/625] eta 0:01:04 lr 0.001918 wd 0.0500 time 0.1983 (0.2057) data time 0.0007 (0.0030) model time 0.1976 (0.2018) loss 3.6821 (3.4667) grad_norm 1.8977 (1.2462) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][320/625] eta 0:01:02 lr 0.001918 wd 0.0500 time 0.2048 (0.2056) data time 0.0011 (0.0029) model time 0.2036 (0.2018) loss 2.5440 (3.4525) grad_norm 1.1951 (1.2443) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][330/625] eta 0:01:00 lr 0.001918 wd 0.0500 time 0.1976 (0.2054) data time 0.0006 (0.0028) model time 0.1969 (0.2017) loss 2.5065 (3.4513) grad_norm 1.4115 (1.2370) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][340/625] eta 0:00:58 lr 0.001918 wd 0.0500 time 0.1964 (0.2053) data time 0.0008 (0.0028) model time 0.1956 (0.2017) loss 3.0049 (3.4439) grad_norm 0.9925 (1.2320) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][350/625] eta 0:00:56 lr 0.001918 wd 0.0500 time 0.2003 (0.2054) data time 0.0006 (0.0028) model time 0.1997 (0.2019) loss 2.6274 (3.4422) grad_norm 2.1313 (1.2367) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][360/625] eta 0:00:54 lr 0.001918 wd 0.0500 time 0.1973 (0.2053) data time 0.0007 (0.0027) model time 0.1966 (0.2018) loss 3.9556 (3.4536) grad_norm 1.2342 (1.2321) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][370/625] eta 0:00:52 lr 0.001918 wd 0.0500 time 0.2070 (0.2052) data time 0.0006 (0.0027) model time 0.2063 (0.2018) loss 3.9374 (3.4524) grad_norm 1.4987 (1.2306) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][380/625] eta 0:00:50 lr 0.001918 wd 0.0500 time 0.2118 (0.2051) data time 0.0008 (0.0026) model time 0.2109 (0.2017) loss 3.5136 (3.4456) grad_norm 0.8930 (1.2286) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][390/625] eta 0:00:48 lr 0.001918 wd 0.0500 time 0.2004 (0.2051) data time 0.0009 (0.0026) model time 0.1995 (0.2017) loss 2.8460 (3.4529) grad_norm 1.1942 (1.2360) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][400/625] eta 0:00:46 lr 0.001918 wd 0.0500 time 0.1982 (0.2049) data time 0.0008 (0.0026) model time 0.1973 (0.2016) loss 2.6621 (3.4544) grad_norm 2.1180 (1.2401) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][410/625] eta 0:00:44 lr 0.001917 wd 0.0500 time 0.2107 (0.2048) data time 0.0009 (0.0025) model time 0.2099 (0.2016) loss 4.1584 (3.4540) grad_norm 0.9773 (1.2436) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][420/625] eta 0:00:42 lr 0.001917 wd 0.0500 time 0.2072 (0.2049) data time 0.0006 (0.0025) model time 0.2067 (0.2017) loss 3.8414 (3.4540) grad_norm 1.0417 (1.2384) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][430/625] eta 0:00:39 lr 0.001917 wd 0.0500 time 0.2009 (0.2050) data time 0.0008 (0.0024) model time 0.2001 (0.2018) loss 2.9448 (3.4492) grad_norm 1.5646 (1.2405) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][440/625] eta 0:00:37 lr 0.001917 wd 0.0500 time 0.1998 (0.2048) data time 0.0008 (0.0024) model time 0.1990 (0.2018) loss 3.5187 (3.4448) grad_norm 1.2745 (1.2390) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:13:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][450/625] eta 0:00:35 lr 0.001917 wd 0.0500 time 0.2004 (0.2047) data time 0.0006 (0.0024) model time 0.1998 (0.2017) loss 2.8936 (3.4389) grad_norm 1.1463 (1.2380) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][460/625] eta 0:00:33 lr 0.001917 wd 0.0500 time 0.2009 (0.2046) data time 0.0008 (0.0023) model time 0.2001 (0.2017) loss 2.4723 (3.4342) grad_norm 0.8214 (1.2322) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][470/625] eta 0:00:31 lr 0.001917 wd 0.0500 time 0.1959 (0.2046) data time 0.0008 (0.0023) model time 0.1950 (0.2016) loss 3.5389 (3.4344) grad_norm 0.9300 (1.2319) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][480/625] eta 0:00:29 lr 0.001917 wd 0.0500 time 0.2084 (0.2045) data time 0.0010 (0.0023) model time 0.2074 (0.2016) loss 3.3076 (3.4365) grad_norm 1.5977 (1.2336) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][490/625] eta 0:00:27 lr 0.001917 wd 0.0500 time 0.2038 (0.2044) data time 0.0007 (0.0023) model time 0.2030 (0.2016) loss 2.8525 (3.4336) grad_norm 0.8922 (1.2316) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][500/625] eta 0:00:25 lr 0.001917 wd 0.0500 time 0.1949 (0.2044) data time 0.0008 (0.0022) model time 0.1941 (0.2015) loss 2.4793 (3.4340) grad_norm 1.0976 (1.2302) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][510/625] eta 0:00:23 lr 0.001917 wd 0.0500 time 0.2055 (0.2043) data time 0.0006 (0.0022) model time 0.2050 (0.2015) loss 3.0328 (3.4338) grad_norm 1.9066 (1.2362) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][520/625] eta 0:00:21 lr 0.001917 wd 0.0500 time 0.1964 (0.2042) data time 0.0007 (0.0022) model time 0.1957 (0.2015) loss 3.3607 (3.4331) grad_norm 1.5498 (1.2413) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][530/625] eta 0:00:19 lr 0.001917 wd 0.0500 time 0.1961 (0.2042) data time 0.0008 (0.0022) model time 0.1953 (0.2014) loss 2.0230 (3.4319) grad_norm 1.1729 (1.2403) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][540/625] eta 0:00:17 lr 0.001917 wd 0.0500 time 0.1962 (0.2041) data time 0.0009 (0.0021) model time 0.1953 (0.2014) loss 3.5648 (3.4360) grad_norm 0.9868 (1.2376) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][550/625] eta 0:00:15 lr 0.001916 wd 0.0500 time 0.2006 (0.2040) data time 0.0007 (0.0021) model time 0.1999 (0.2013) loss 3.6305 (3.4431) grad_norm 0.9008 (1.2370) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][560/625] eta 0:00:13 lr 0.001916 wd 0.0500 time 0.2009 (0.2039) data time 0.0009 (0.0021) model time 0.2000 (0.2013) loss 3.7887 (3.4423) grad_norm 0.8849 (1.2398) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][570/625] eta 0:00:11 lr 0.001916 wd 0.0500 time 0.1980 (0.2038) data time 0.0007 (0.0021) model time 0.1973 (0.2012) loss 2.7852 (3.4405) grad_norm 1.1967 (1.2370) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][580/625] eta 0:00:09 lr 0.001916 wd 0.0500 time 0.1988 (0.2038) data time 0.0008 (0.0021) model time 0.1980 (0.2012) loss 3.2788 (3.4360) grad_norm 1.4099 (1.2380) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][590/625] eta 0:00:07 lr 0.001916 wd 0.0500 time 0.1988 (0.2037) data time 0.0008 (0.0020) model time 0.1980 (0.2012) loss 3.6567 (3.4381) grad_norm 0.8077 (1.2344) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][600/625] eta 0:00:05 lr 0.001916 wd 0.0500 time 0.1968 (0.2037) data time 0.0008 (0.0020) model time 0.1959 (0.2011) loss 2.3939 (3.4399) grad_norm 1.0799 (1.2308) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][610/625] eta 0:00:03 lr 0.001916 wd 0.0500 time 0.1988 (0.2036) data time 0.0004 (0.0020) model time 0.1984 (0.2011) loss 4.2587 (3.4399) grad_norm 2.7309 (1.2379) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][620/625] eta 0:00:01 lr 0.001916 wd 0.0500 time 0.1999 (0.2036) data time 0.0004 (0.0020) model time 0.1996 (0.2011) loss 2.5516 (3.4406) grad_norm 0.9978 (1.2407) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 56 training takes 0:02:07 [2024-07-29 21:14:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:14:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.430 (0.430) Loss 0.8242 (0.8242) Acc@1 85.205 (85.205) Acc@5 97.217 (97.217) Mem 8975MB [2024-07-29 21:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 1.3965 (0.9935) Acc@1 71.973 (80.362) Acc@5 91.162 (95.770) Mem 8975MB [2024-07-29 21:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.4531 (1.1928) Acc@1 69.727 (76.051) Acc@5 90.527 (93.385) Mem 8975MB [2024-07-29 21:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.860 Acc@5 93.340 [2024-07-29 21:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.9% [2024-07-29 21:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.86% [2024-07-29 21:14:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 21:14:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 21:14:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.6294 (0.6294) Acc@1 84.521 (84.521) Acc@5 96.973 (96.973) Mem 8975MB [2024-07-29 21:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.099) Loss 1.1289 (0.8298) Acc@1 72.021 (79.341) Acc@5 91.943 (95.215) Mem 8975MB [2024-07-29 21:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.3574 (1.0168) Acc@1 66.504 (75.016) Acc@5 88.428 (92.780) Mem 8975MB [2024-07-29 21:14:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.748 Acc@5 92.770 [2024-07-29 21:14:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 74.7% [2024-07-29 21:14:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 74.75% [2024-07-29 21:14:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:14:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][0/625] eta 0:07:23 lr 0.001916 wd 0.0500 time 0.7103 (0.7103) data time 0.5219 (0.5219) model time 0.0000 (0.0000) loss 3.4821 (3.4821) grad_norm 0.8217 (0.8217) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][10/625] eta 0:02:30 lr 0.001916 wd 0.0500 time 0.1953 (0.2451) data time 0.0006 (0.0483) model time 0.0000 (0.0000) loss 3.8127 (3.5341) grad_norm 1.1131 (1.1160) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][20/625] eta 0:02:15 lr 0.001916 wd 0.0500 time 0.2077 (0.2235) data time 0.0008 (0.0257) model time 0.0000 (0.0000) loss 3.8521 (3.4996) grad_norm 0.8750 (1.0966) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][30/625] eta 0:02:09 lr 0.001916 wd 0.0500 time 0.2104 (0.2174) data time 0.0007 (0.0177) model time 0.0000 (0.0000) loss 3.5228 (3.3748) grad_norm 0.9748 (1.1543) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][40/625] eta 0:02:05 lr 0.001916 wd 0.0500 time 0.2010 (0.2139) data time 0.0006 (0.0136) model time 0.0000 (0.0000) loss 4.0159 (3.4497) grad_norm 1.2412 (1.1432) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][50/625] eta 0:02:01 lr 0.001916 wd 0.0500 time 0.1952 (0.2110) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 3.6093 (3.4352) grad_norm 1.4154 (1.1445) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][60/625] eta 0:01:58 lr 0.001915 wd 0.0500 time 0.1982 (0.2091) data time 0.0006 (0.0095) model time 0.1976 (0.1989) loss 3.5377 (3.4558) grad_norm 1.0059 (1.1627) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][70/625] eta 0:01:55 lr 0.001915 wd 0.0500 time 0.2060 (0.2079) data time 0.0008 (0.0083) model time 0.2052 (0.1992) loss 2.3463 (3.4100) grad_norm 1.6844 (1.1857) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][80/625] eta 0:01:52 lr 0.001915 wd 0.0500 time 0.2085 (0.2070) data time 0.0008 (0.0074) model time 0.2077 (0.1994) loss 3.6492 (3.4167) grad_norm 1.3227 (1.1723) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][90/625] eta 0:01:50 lr 0.001915 wd 0.0500 time 0.1998 (0.2063) data time 0.0007 (0.0066) model time 0.1991 (0.1993) loss 2.8356 (3.3983) grad_norm 0.9651 (1.1689) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][100/625] eta 0:01:47 lr 0.001915 wd 0.0500 time 0.2019 (0.2057) data time 0.0007 (0.0061) model time 0.2012 (0.1994) loss 3.2466 (3.4059) grad_norm 1.6363 (1.1673) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][110/625] eta 0:01:45 lr 0.001915 wd 0.0500 time 0.2010 (0.2053) data time 0.0009 (0.0056) model time 0.2000 (0.1996) loss 3.5661 (3.4234) grad_norm 1.3873 (1.1747) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][120/625] eta 0:01:43 lr 0.001915 wd 0.0500 time 0.1980 (0.2047) data time 0.0007 (0.0052) model time 0.1973 (0.1993) loss 3.4234 (3.4330) grad_norm 1.8192 (1.1883) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][130/625] eta 0:01:41 lr 0.001915 wd 0.0500 time 0.2040 (0.2044) data time 0.0007 (0.0049) model time 0.2033 (0.1993) loss 2.4332 (3.4437) grad_norm 1.1842 (1.1912) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][140/625] eta 0:01:38 lr 0.001915 wd 0.0500 time 0.1972 (0.2041) data time 0.0008 (0.0046) model time 0.1964 (0.1992) loss 3.9464 (3.4399) grad_norm 1.1728 (1.2001) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][150/625] eta 0:01:36 lr 0.001915 wd 0.0500 time 0.2033 (0.2038) data time 0.0009 (0.0044) model time 0.2024 (0.1993) loss 3.5324 (3.4352) grad_norm 0.8640 (1.1998) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][160/625] eta 0:01:34 lr 0.001915 wd 0.0500 time 0.2004 (0.2036) data time 0.0006 (0.0042) model time 0.1999 (0.1992) loss 2.6579 (3.4316) grad_norm 1.4882 (1.1944) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][170/625] eta 0:01:32 lr 0.001915 wd 0.0500 time 0.2096 (0.2034) data time 0.0007 (0.0040) model time 0.2089 (0.1993) loss 3.4019 (3.4320) grad_norm 0.8685 (1.1904) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][180/625] eta 0:01:30 lr 0.001915 wd 0.0500 time 0.1986 (0.2033) data time 0.0007 (0.0038) model time 0.1979 (0.1993) loss 4.2636 (3.4541) grad_norm 1.4615 (1.2120) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][190/625] eta 0:01:28 lr 0.001915 wd 0.0500 time 0.2004 (0.2031) data time 0.0008 (0.0036) model time 0.1997 (0.1993) loss 3.1196 (3.4546) grad_norm 0.8232 (1.2240) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][200/625] eta 0:01:26 lr 0.001914 wd 0.0500 time 0.2016 (0.2029) data time 0.0008 (0.0035) model time 0.2009 (0.1992) loss 2.0809 (3.4537) grad_norm 1.6047 (1.2316) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][210/625] eta 0:01:24 lr 0.001914 wd 0.0500 time 0.2001 (0.2028) data time 0.0006 (0.0034) model time 0.1995 (0.1992) loss 4.3158 (3.4625) grad_norm 1.1526 (1.2230) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][220/625] eta 0:01:22 lr 0.001914 wd 0.0500 time 0.1982 (0.2026) data time 0.0009 (0.0033) model time 0.1973 (0.1992) loss 3.7644 (3.4546) grad_norm 1.0605 (1.2153) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][230/625] eta 0:01:20 lr 0.001914 wd 0.0500 time 0.1992 (0.2025) data time 0.0006 (0.0032) model time 0.1986 (0.1992) loss 2.4907 (3.4471) grad_norm 0.8110 (1.2212) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][240/625] eta 0:01:17 lr 0.001914 wd 0.0500 time 0.1988 (0.2024) data time 0.0007 (0.0031) model time 0.1981 (0.1992) loss 4.2575 (3.4540) grad_norm 1.0839 (1.2237) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][250/625] eta 0:01:15 lr 0.001914 wd 0.0500 time 0.1984 (0.2023) data time 0.0008 (0.0030) model time 0.1975 (0.1992) loss 3.6186 (3.4493) grad_norm 1.0582 (1.2339) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][260/625] eta 0:01:13 lr 0.001914 wd 0.0500 time 0.2046 (0.2023) data time 0.0006 (0.0029) model time 0.2040 (0.1992) loss 3.5218 (3.4550) grad_norm 1.0567 (1.2262) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][270/625] eta 0:01:11 lr 0.001914 wd 0.0500 time 0.1995 (0.2022) data time 0.0007 (0.0028) model time 0.1988 (0.1992) loss 2.9957 (3.4600) grad_norm 1.1768 (1.2155) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][280/625] eta 0:01:09 lr 0.001914 wd 0.0500 time 0.1957 (0.2021) data time 0.0008 (0.0028) model time 0.1949 (0.1992) loss 3.7726 (3.4692) grad_norm 1.1137 (1.2157) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][290/625] eta 0:01:07 lr 0.001914 wd 0.0500 time 0.1984 (0.2020) data time 0.0008 (0.0027) model time 0.1977 (0.1992) loss 2.5743 (3.4688) grad_norm 1.2370 (1.2184) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][300/625] eta 0:01:05 lr 0.001914 wd 0.0500 time 0.2009 (0.2019) data time 0.0007 (0.0026) model time 0.2002 (0.1992) loss 3.2504 (3.4732) grad_norm 0.9923 (1.2192) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][310/625] eta 0:01:03 lr 0.001914 wd 0.0500 time 0.1978 (0.2019) data time 0.0009 (0.0026) model time 0.1969 (0.1992) loss 3.4698 (3.4711) grad_norm 1.3047 (1.2205) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][320/625] eta 0:01:01 lr 0.001914 wd 0.0500 time 0.1969 (0.2018) data time 0.0007 (0.0025) model time 0.1962 (0.1992) loss 2.9297 (3.4609) grad_norm 1.6904 (1.2257) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][330/625] eta 0:00:59 lr 0.001914 wd 0.0500 time 0.1999 (0.2018) data time 0.0006 (0.0025) model time 0.1993 (0.1992) loss 3.3053 (3.4660) grad_norm 1.2464 (1.2379) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][340/625] eta 0:00:57 lr 0.001913 wd 0.0500 time 0.1982 (0.2017) data time 0.0008 (0.0024) model time 0.1974 (0.1992) loss 2.3359 (3.4668) grad_norm 1.3301 (1.2406) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][350/625] eta 0:00:55 lr 0.001913 wd 0.0500 time 0.1990 (0.2023) data time 0.0006 (0.0024) model time 0.1983 (0.2000) loss 4.0889 (3.4698) grad_norm 1.1020 (1.2439) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][360/625] eta 0:00:53 lr 0.001913 wd 0.0500 time 0.2003 (0.2030) data time 0.0009 (0.0024) model time 0.1995 (0.2007) loss 3.4642 (3.4710) grad_norm 0.9092 (1.2412) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][370/625] eta 0:00:51 lr 0.001913 wd 0.0500 time 0.1967 (0.2029) data time 0.0008 (0.0023) model time 0.1959 (0.2007) loss 3.7685 (3.4740) grad_norm 1.2144 (1.2412) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][380/625] eta 0:00:49 lr 0.001913 wd 0.0500 time 0.1960 (0.2029) data time 0.0008 (0.0023) model time 0.1952 (0.2007) loss 3.4512 (3.4787) grad_norm 1.0296 (1.2494) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][390/625] eta 0:00:47 lr 0.001913 wd 0.0500 time 0.1964 (0.2033) data time 0.0006 (0.0023) model time 0.1958 (0.2013) loss 2.6828 (3.4785) grad_norm 0.8593 (1.2487) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][400/625] eta 0:00:45 lr 0.001913 wd 0.0500 time 0.1989 (0.2033) data time 0.0007 (0.0022) model time 0.1982 (0.2012) loss 3.9996 (3.4818) grad_norm 1.7436 (1.2545) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][410/625] eta 0:00:43 lr 0.001913 wd 0.0500 time 0.2029 (0.2032) data time 0.0007 (0.0022) model time 0.2022 (0.2012) loss 3.5275 (3.4884) grad_norm 1.3522 (1.2579) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][420/625] eta 0:00:41 lr 0.001913 wd 0.0500 time 0.2018 (0.2032) data time 0.0008 (0.0022) model time 0.2009 (0.2011) loss 3.5432 (3.4859) grad_norm 0.9807 (1.2569) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][430/625] eta 0:00:39 lr 0.001913 wd 0.0500 time 0.1956 (0.2031) data time 0.0008 (0.0022) model time 0.1948 (0.2011) loss 4.0980 (3.4878) grad_norm 1.1986 (1.2547) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][440/625] eta 0:00:37 lr 0.001913 wd 0.0500 time 0.1983 (0.2030) data time 0.0007 (0.0021) model time 0.1975 (0.2010) loss 3.8502 (3.4868) grad_norm 1.3228 (1.2515) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][450/625] eta 0:00:35 lr 0.001913 wd 0.0500 time 0.2030 (0.2029) data time 0.0009 (0.0021) model time 0.2021 (0.2010) loss 3.7976 (3.4876) grad_norm 0.7873 (1.2515) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][460/625] eta 0:00:33 lr 0.001913 wd 0.0500 time 0.2115 (0.2029) data time 0.0007 (0.0021) model time 0.2108 (0.2009) loss 4.2862 (3.4919) grad_norm 1.4792 (1.2510) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][470/625] eta 0:00:31 lr 0.001913 wd 0.0500 time 0.1946 (0.2028) data time 0.0009 (0.0020) model time 0.1937 (0.2008) loss 3.8813 (3.4885) grad_norm 1.0285 (1.2519) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][480/625] eta 0:00:29 lr 0.001912 wd 0.0500 time 0.1993 (0.2027) data time 0.0007 (0.0020) model time 0.1986 (0.2008) loss 2.6827 (3.4806) grad_norm 1.6564 (1.2521) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][490/625] eta 0:00:27 lr 0.001912 wd 0.0500 time 0.1967 (0.2027) data time 0.0007 (0.0020) model time 0.1960 (0.2008) loss 3.8612 (3.4736) grad_norm 0.8864 (1.2497) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][500/625] eta 0:00:25 lr 0.001912 wd 0.0500 time 0.2014 (0.2026) data time 0.0007 (0.0020) model time 0.2006 (0.2008) loss 3.5914 (3.4738) grad_norm 1.1926 (1.2474) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][510/625] eta 0:00:23 lr 0.001912 wd 0.0500 time 0.1944 (0.2027) data time 0.0007 (0.0020) model time 0.1937 (0.2008) loss 4.1018 (3.4780) grad_norm 1.0142 (1.2462) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][520/625] eta 0:00:21 lr 0.001912 wd 0.0500 time 0.1960 (0.2026) data time 0.0008 (0.0019) model time 0.1952 (0.2008) loss 3.5210 (3.4795) grad_norm 0.9112 (1.2430) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][530/625] eta 0:00:19 lr 0.001912 wd 0.0500 time 0.2006 (0.2026) data time 0.0008 (0.0019) model time 0.1998 (0.2007) loss 3.8068 (3.4774) grad_norm 1.0325 (1.2386) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][540/625] eta 0:00:17 lr 0.001912 wd 0.0500 time 0.1986 (0.2025) data time 0.0008 (0.0019) model time 0.1978 (0.2007) loss 3.8456 (3.4780) grad_norm 1.0233 (1.2360) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][550/625] eta 0:00:15 lr 0.001912 wd 0.0500 time 0.2004 (0.2025) data time 0.0005 (0.0019) model time 0.1998 (0.2007) loss 2.6007 (3.4723) grad_norm 1.6371 (1.2366) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][560/625] eta 0:00:13 lr 0.001912 wd 0.0500 time 0.2040 (0.2025) data time 0.0008 (0.0019) model time 0.2033 (0.2007) loss 2.6529 (3.4717) grad_norm 1.2308 (1.2435) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][570/625] eta 0:00:11 lr 0.001912 wd 0.0500 time 0.2014 (0.2025) data time 0.0007 (0.0018) model time 0.2007 (0.2007) loss 3.6977 (3.4683) grad_norm 1.2711 (1.2476) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][580/625] eta 0:00:09 lr 0.001912 wd 0.0500 time 0.1982 (0.2025) data time 0.0007 (0.0018) model time 0.1976 (0.2007) loss 3.6594 (3.4683) grad_norm 0.9658 (1.2526) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][590/625] eta 0:00:07 lr 0.001912 wd 0.0500 time 0.2360 (0.2025) data time 0.0009 (0.0018) model time 0.2351 (0.2008) loss 3.9513 (3.4730) grad_norm 1.5902 (1.2493) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][600/625] eta 0:00:05 lr 0.001912 wd 0.0500 time 0.2042 (0.2025) data time 0.0008 (0.0018) model time 0.2035 (0.2008) loss 3.4261 (3.4779) grad_norm 1.1925 (1.2513) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][610/625] eta 0:00:03 lr 0.001911 wd 0.0500 time 0.2005 (0.2025) data time 0.0004 (0.0018) model time 0.2002 (0.2008) loss 3.6461 (3.4790) grad_norm 0.8411 (1.2508) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][620/625] eta 0:00:01 lr 0.001911 wd 0.0500 time 0.1952 (0.2025) data time 0.0006 (0.0018) model time 0.1947 (0.2008) loss 3.6259 (3.4785) grad_norm 0.9453 (1.2485) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 57 training takes 0:02:06 [2024-07-29 21:16:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:16:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.532 (0.532) Loss 0.8853 (0.8853) Acc@1 83.594 (83.594) Acc@5 97.168 (97.168) Mem 8975MB [2024-07-29 21:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.3096 (1.0235) Acc@1 72.461 (80.300) Acc@5 92.188 (95.716) Mem 8975MB [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.4785 (1.1898) Acc@1 68.018 (76.163) Acc@5 90.332 (93.497) Mem 8975MB [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.880 Acc@5 93.480 [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.9% [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.88% [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 21:16:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 21:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.6177 (0.6177) Acc@1 84.766 (84.766) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.1113 (0.8150) Acc@1 72.412 (79.594) Acc@5 92.090 (95.304) Mem 8975MB [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.3350 (0.9997) Acc@1 67.041 (75.388) Acc@5 89.160 (92.955) Mem 8975MB [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.134 Acc@5 92.946 [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 75.1% [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 75.13% [2024-07-29 21:16:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:16:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][0/625] eta 0:07:24 lr 0.001911 wd 0.0500 time 0.7113 (0.7113) data time 0.5192 (0.5192) model time 0.0000 (0.0000) loss 3.6255 (3.6255) grad_norm 1.1977 (1.1977) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][10/625] eta 0:02:33 lr 0.001911 wd 0.0500 time 0.1984 (0.2501) data time 0.0009 (0.0481) model time 0.0000 (0.0000) loss 3.2561 (3.6205) grad_norm 1.0672 (1.1030) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][20/625] eta 0:02:16 lr 0.001911 wd 0.0500 time 0.1979 (0.2260) data time 0.0006 (0.0256) model time 0.0000 (0.0000) loss 2.2218 (3.2494) grad_norm 1.5179 (1.1042) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][30/625] eta 0:02:09 lr 0.001911 wd 0.0500 time 0.1980 (0.2173) data time 0.0010 (0.0176) model time 0.0000 (0.0000) loss 3.5782 (3.2141) grad_norm 1.3454 (1.1129) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][40/625] eta 0:02:04 lr 0.001911 wd 0.0500 time 0.1998 (0.2130) data time 0.0006 (0.0135) model time 0.0000 (0.0000) loss 4.0692 (3.2573) grad_norm 1.2079 (1.1042) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][50/625] eta 0:02:01 lr 0.001911 wd 0.0500 time 0.1995 (0.2104) data time 0.0007 (0.0110) model time 0.0000 (0.0000) loss 2.3024 (3.2603) grad_norm 0.9959 (1.0865) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][60/625] eta 0:01:57 lr 0.001911 wd 0.0500 time 0.1989 (0.2087) data time 0.0008 (0.0094) model time 0.1980 (0.1991) loss 3.7004 (3.2994) grad_norm 1.1232 (1.1000) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][70/625] eta 0:01:55 lr 0.001911 wd 0.0500 time 0.2000 (0.2079) data time 0.0009 (0.0082) model time 0.1992 (0.2005) loss 3.3275 (3.3206) grad_norm 1.0374 (1.1262) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][80/625] eta 0:01:52 lr 0.001911 wd 0.0500 time 0.1991 (0.2070) data time 0.0007 (0.0073) model time 0.1983 (0.2002) loss 3.0981 (3.3599) grad_norm 1.0801 (1.1389) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][90/625] eta 0:01:50 lr 0.001911 wd 0.0500 time 0.2066 (0.2062) data time 0.0008 (0.0066) model time 0.2058 (0.2000) loss 3.6789 (3.3592) grad_norm 0.9845 (1.1337) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][100/625] eta 0:01:47 lr 0.001911 wd 0.0500 time 0.1967 (0.2056) data time 0.0008 (0.0060) model time 0.1959 (0.1998) loss 2.7437 (3.3525) grad_norm 0.9766 (1.1649) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][110/625] eta 0:01:45 lr 0.001911 wd 0.0500 time 0.2027 (0.2050) data time 0.0007 (0.0056) model time 0.2020 (0.1995) loss 3.6816 (3.3585) grad_norm 0.9689 (1.1970) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][120/625] eta 0:01:43 lr 0.001910 wd 0.0500 time 0.2000 (0.2047) data time 0.0008 (0.0052) model time 0.1992 (0.1996) loss 3.0333 (3.3517) grad_norm 0.8973 (1.1901) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][130/625] eta 0:01:41 lr 0.001910 wd 0.0500 time 0.2155 (0.2045) data time 0.0008 (0.0049) model time 0.2147 (0.1999) loss 3.8706 (3.3455) grad_norm 1.9929 (1.2037) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][140/625] eta 0:01:39 lr 0.001910 wd 0.0500 time 0.2042 (0.2042) data time 0.0007 (0.0046) model time 0.2036 (0.1998) loss 2.8307 (3.3594) grad_norm 1.4332 (1.2177) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][150/625] eta 0:01:37 lr 0.001910 wd 0.0500 time 0.2004 (0.2051) data time 0.0008 (0.0043) model time 0.1996 (0.2015) loss 3.5389 (3.3768) grad_norm 1.0500 (1.2081) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][160/625] eta 0:01:35 lr 0.001910 wd 0.0500 time 0.1968 (0.2048) data time 0.0007 (0.0041) model time 0.1961 (0.2013) loss 3.6505 (3.3732) grad_norm 1.2089 (1.1938) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][170/625] eta 0:01:33 lr 0.001910 wd 0.0500 time 0.1975 (0.2047) data time 0.0007 (0.0039) model time 0.1968 (0.2014) loss 3.7737 (3.3945) grad_norm 1.0389 (1.1964) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][180/625] eta 0:01:30 lr 0.001910 wd 0.0500 time 0.1997 (0.2044) data time 0.0008 (0.0038) model time 0.1989 (0.2011) loss 3.4616 (3.3949) grad_norm 1.1087 (1.2223) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][190/625] eta 0:01:28 lr 0.001910 wd 0.0500 time 0.2008 (0.2043) data time 0.0006 (0.0036) model time 0.2002 (0.2012) loss 4.1438 (3.3933) grad_norm 1.0543 (1.2331) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][200/625] eta 0:01:26 lr 0.001910 wd 0.0500 time 0.2048 (0.2042) data time 0.0007 (0.0035) model time 0.2041 (0.2012) loss 2.4816 (3.3966) grad_norm 0.9844 (1.2414) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][210/625] eta 0:01:24 lr 0.001910 wd 0.0500 time 0.2018 (0.2040) data time 0.0009 (0.0034) model time 0.2009 (0.2011) loss 3.6635 (3.4048) grad_norm 1.2226 (1.2345) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][220/625] eta 0:01:22 lr 0.001910 wd 0.0500 time 0.2041 (0.2039) data time 0.0006 (0.0032) model time 0.2035 (0.2011) loss 2.6127 (3.3960) grad_norm 1.5464 (1.2315) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][230/625] eta 0:01:20 lr 0.001910 wd 0.0500 time 0.2051 (0.2038) data time 0.0006 (0.0031) model time 0.2046 (0.2011) loss 3.1830 (3.3955) grad_norm 0.8153 (1.2428) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][240/625] eta 0:01:18 lr 0.001910 wd 0.0500 time 0.1902 (0.2037) data time 0.0008 (0.0030) model time 0.1895 (0.2010) loss 2.2098 (3.3910) grad_norm 0.9181 (1.2395) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][250/625] eta 0:01:16 lr 0.001910 wd 0.0500 time 0.1993 (0.2035) data time 0.0006 (0.0030) model time 0.1987 (0.2008) loss 3.8857 (3.3990) grad_norm 1.4863 (1.2349) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][260/625] eta 0:01:14 lr 0.001909 wd 0.0500 time 0.2016 (0.2034) data time 0.0008 (0.0029) model time 0.2008 (0.2008) loss 3.6121 (3.4023) grad_norm 1.0507 (1.2397) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][270/625] eta 0:01:12 lr 0.001909 wd 0.0500 time 0.2035 (0.2036) data time 0.0005 (0.0028) model time 0.2029 (0.2011) loss 2.6257 (3.3942) grad_norm 1.4351 (1.2415) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][280/625] eta 0:01:10 lr 0.001909 wd 0.0500 time 0.1996 (0.2035) data time 0.0009 (0.0027) model time 0.1987 (0.2011) loss 2.9226 (3.3992) grad_norm 1.2951 (1.2388) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][290/625] eta 0:01:08 lr 0.001909 wd 0.0500 time 0.1998 (0.2034) data time 0.0008 (0.0027) model time 0.1990 (0.2010) loss 2.9729 (3.3896) grad_norm 1.8533 (1.2383) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][300/625] eta 0:01:06 lr 0.001909 wd 0.0500 time 0.2080 (0.2034) data time 0.0006 (0.0026) model time 0.2073 (0.2011) loss 2.8495 (3.3783) grad_norm 1.5130 (1.2357) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][310/625] eta 0:01:04 lr 0.001909 wd 0.0500 time 0.2011 (0.2033) data time 0.0006 (0.0026) model time 0.2006 (0.2011) loss 4.1323 (3.3838) grad_norm 1.4973 (1.2360) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][320/625] eta 0:01:01 lr 0.001909 wd 0.0500 time 0.2026 (0.2032) data time 0.0008 (0.0025) model time 0.2018 (0.2010) loss 2.7039 (3.3785) grad_norm 1.5504 (1.2385) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][330/625] eta 0:00:59 lr 0.001909 wd 0.0500 time 0.1987 (0.2031) data time 0.0009 (0.0025) model time 0.1978 (0.2009) loss 3.4625 (3.3722) grad_norm 0.8390 (1.2374) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][340/625] eta 0:00:57 lr 0.001909 wd 0.0500 time 0.1997 (0.2034) data time 0.0008 (0.0024) model time 0.1989 (0.2013) loss 3.5572 (3.3689) grad_norm 1.0937 (1.2398) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][350/625] eta 0:00:55 lr 0.001909 wd 0.0500 time 0.2016 (0.2034) data time 0.0009 (0.0024) model time 0.2007 (0.2013) loss 4.0329 (3.3695) grad_norm 1.1516 (1.2416) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][360/625] eta 0:00:53 lr 0.001909 wd 0.0500 time 0.1967 (0.2033) data time 0.0008 (0.0023) model time 0.1959 (0.2012) loss 2.8111 (3.3590) grad_norm 1.1078 (1.2457) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][370/625] eta 0:00:51 lr 0.001909 wd 0.0500 time 0.2014 (0.2033) data time 0.0007 (0.0023) model time 0.2006 (0.2012) loss 4.0469 (3.3651) grad_norm 1.0605 (1.2391) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][380/625] eta 0:00:49 lr 0.001909 wd 0.0500 time 0.2603 (0.2035) data time 0.0009 (0.0023) model time 0.2594 (0.2015) loss 3.6788 (3.3701) grad_norm 0.8059 (1.2354) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][390/625] eta 0:00:47 lr 0.001908 wd 0.0500 time 0.1968 (0.2034) data time 0.0009 (0.0022) model time 0.1959 (0.2015) loss 3.4748 (3.3690) grad_norm 1.4943 (1.2380) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][400/625] eta 0:00:45 lr 0.001908 wd 0.0500 time 0.2033 (0.2033) data time 0.0009 (0.0022) model time 0.2024 (0.2014) loss 3.9086 (3.3715) grad_norm 1.1739 (1.2405) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][410/625] eta 0:00:43 lr 0.001908 wd 0.0500 time 0.1992 (0.2032) data time 0.0006 (0.0022) model time 0.1986 (0.2013) loss 3.9688 (3.3745) grad_norm 1.4151 (1.2485) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][420/625] eta 0:00:41 lr 0.001908 wd 0.0500 time 0.2004 (0.2034) data time 0.0009 (0.0021) model time 0.1995 (0.2015) loss 3.3256 (3.3758) grad_norm 1.0997 (1.2592) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][430/625] eta 0:00:39 lr 0.001908 wd 0.0500 time 0.1957 (0.2033) data time 0.0007 (0.0021) model time 0.1950 (0.2015) loss 2.9353 (3.3766) grad_norm 1.0605 (1.2600) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][440/625] eta 0:00:37 lr 0.001908 wd 0.0500 time 0.2114 (0.2040) data time 0.0007 (0.0021) model time 0.2107 (0.2022) loss 2.9777 (3.3761) grad_norm 1.0485 (1.2579) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][450/625] eta 0:00:35 lr 0.001908 wd 0.0500 time 0.1981 (0.2040) data time 0.0007 (0.0020) model time 0.1975 (0.2023) loss 3.2212 (3.3807) grad_norm 0.9602 (1.2543) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][460/625] eta 0:00:33 lr 0.001908 wd 0.0500 time 0.2044 (0.2042) data time 0.0008 (0.0020) model time 0.2035 (0.2025) loss 3.8730 (3.3808) grad_norm 1.7934 (1.2647) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][470/625] eta 0:00:31 lr 0.001908 wd 0.0500 time 0.2025 (0.2041) data time 0.0006 (0.0020) model time 0.2019 (0.2024) loss 2.5560 (3.3816) grad_norm 1.0470 (1.2683) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][480/625] eta 0:00:29 lr 0.001908 wd 0.0500 time 0.1999 (0.2040) data time 0.0006 (0.0020) model time 0.1993 (0.2024) loss 3.8344 (3.3842) grad_norm 1.5102 (1.2649) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][490/625] eta 0:00:27 lr 0.001908 wd 0.0500 time 0.1962 (0.2040) data time 0.0008 (0.0020) model time 0.1954 (0.2023) loss 3.5541 (3.3834) grad_norm 0.9808 (1.2639) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][500/625] eta 0:00:25 lr 0.001908 wd 0.0500 time 0.2000 (0.2041) data time 0.0009 (0.0019) model time 0.1992 (0.2025) loss 3.4176 (3.3855) grad_norm 1.6721 (1.2692) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][510/625] eta 0:00:23 lr 0.001908 wd 0.0500 time 0.1999 (0.2041) data time 0.0006 (0.0019) model time 0.1993 (0.2025) loss 4.1884 (3.3919) grad_norm 1.6057 (1.2702) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][520/625] eta 0:00:21 lr 0.001908 wd 0.0500 time 0.1964 (0.2041) data time 0.0008 (0.0019) model time 0.1955 (0.2025) loss 3.5573 (3.3922) grad_norm 1.2198 (1.2746) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][530/625] eta 0:00:19 lr 0.001907 wd 0.0500 time 0.1969 (0.2041) data time 0.0006 (0.0019) model time 0.1963 (0.2025) loss 4.1481 (3.3942) grad_norm 1.1775 (1.2729) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][540/625] eta 0:00:17 lr 0.001907 wd 0.0500 time 0.1995 (0.2040) data time 0.0007 (0.0019) model time 0.1988 (0.2024) loss 4.2324 (3.3971) grad_norm 1.1848 (1.2699) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][550/625] eta 0:00:15 lr 0.001907 wd 0.0500 time 0.2023 (0.2039) data time 0.0008 (0.0018) model time 0.2016 (0.2023) loss 3.4739 (3.3964) grad_norm 0.7892 (1.2672) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][560/625] eta 0:00:13 lr 0.001907 wd 0.0500 time 0.2000 (0.2038) data time 0.0008 (0.0018) model time 0.1992 (0.2023) loss 3.8109 (3.4008) grad_norm 1.5215 (1.2638) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][570/625] eta 0:00:11 lr 0.001907 wd 0.0500 time 0.2040 (0.2038) data time 0.0008 (0.0018) model time 0.2032 (0.2022) loss 3.2510 (3.4012) grad_norm 1.1136 (1.2608) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][580/625] eta 0:00:09 lr 0.001907 wd 0.0500 time 0.2008 (0.2037) data time 0.0008 (0.0018) model time 0.2000 (0.2022) loss 3.8494 (3.3988) grad_norm 1.7162 (1.2594) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][590/625] eta 0:00:07 lr 0.001907 wd 0.0500 time 0.1969 (0.2037) data time 0.0008 (0.0018) model time 0.1962 (0.2021) loss 3.6891 (3.3983) grad_norm 0.7926 (1.2561) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][600/625] eta 0:00:05 lr 0.001907 wd 0.0500 time 0.2003 (0.2036) data time 0.0008 (0.0018) model time 0.1995 (0.2021) loss 3.2460 (3.3944) grad_norm 2.2524 (1.2574) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][610/625] eta 0:00:03 lr 0.001907 wd 0.0500 time 0.2018 (0.2035) data time 0.0004 (0.0018) model time 0.2014 (0.2020) loss 3.2940 (3.3932) grad_norm 1.2773 (1.2585) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][620/625] eta 0:00:01 lr 0.001907 wd 0.0500 time 0.1981 (0.2035) data time 0.0004 (0.0017) model time 0.1978 (0.2019) loss 4.0236 (3.3939) grad_norm 0.8541 (1.2632) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 58 training takes 0:02:07 [2024-07-29 21:18:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:18:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.549 (0.549) Loss 0.8374 (0.8374) Acc@1 85.889 (85.889) Acc@5 97.266 (97.266) Mem 8975MB [2024-07-29 21:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 1.3867 (1.0070) Acc@1 69.238 (79.887) Acc@5 91.602 (95.712) Mem 8975MB [2024-07-29 21:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.079) Loss 1.4307 (1.1700) Acc@1 69.678 (75.944) Acc@5 90.527 (93.513) Mem 8975MB [2024-07-29 21:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.630 Acc@5 93.384 [2024-07-29 21:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.6% [2024-07-29 21:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.714 (0.714) Loss 0.6084 (0.6084) Acc@1 85.059 (85.059) Acc@5 97.266 (97.266) Mem 8975MB [2024-07-29 21:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.125) Loss 1.0947 (0.8022) Acc@1 72.656 (79.927) Acc@5 92.334 (95.481) Mem 8975MB [2024-07-29 21:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.3154 (0.9847) Acc@1 67.383 (75.784) Acc@5 89.453 (93.159) Mem 8975MB [2024-07-29 21:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.498 Acc@5 93.132 [2024-07-29 21:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 75.5% [2024-07-29 21:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 75.50% [2024-07-29 21:19:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:19:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][0/625] eta 0:06:59 lr 0.001907 wd 0.0500 time 0.6713 (0.6713) data time 0.4778 (0.4778) model time 0.0000 (0.0000) loss 4.4083 (4.4083) grad_norm 1.0466 (1.0466) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][10/625] eta 0:02:29 lr 0.001907 wd 0.0500 time 0.2004 (0.2423) data time 0.0008 (0.0443) model time 0.0000 (0.0000) loss 3.4577 (3.3476) grad_norm 0.8182 (1.1127) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][20/625] eta 0:02:14 lr 0.001907 wd 0.0500 time 0.1971 (0.2221) data time 0.0006 (0.0236) model time 0.0000 (0.0000) loss 4.1023 (3.5098) grad_norm 0.9947 (1.1035) loss_scale 16384.0000 (10922.6667) mem 8975MB [2024-07-29 21:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][30/625] eta 0:02:07 lr 0.001906 wd 0.0500 time 0.2001 (0.2149) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 3.6825 (3.5070) grad_norm 0.9992 (1.1979) loss_scale 16384.0000 (12684.3871) mem 8975MB [2024-07-29 21:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][40/625] eta 0:02:09 lr 0.001906 wd 0.0500 time 0.2097 (0.2221) data time 0.0006 (0.0125) model time 0.0000 (0.0000) loss 3.4288 (3.4529) grad_norm 1.4616 (1.3486) loss_scale 16384.0000 (13586.7317) mem 8975MB [2024-07-29 21:19:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][50/625] eta 0:02:07 lr 0.001906 wd 0.0500 time 0.1977 (0.2220) data time 0.0006 (0.0103) model time 0.0000 (0.0000) loss 2.1111 (3.3995) grad_norm 0.9213 (1.3346) loss_scale 16384.0000 (14135.2157) mem 8975MB [2024-07-29 21:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][60/625] eta 0:02:03 lr 0.001906 wd 0.0500 time 0.1997 (0.2184) data time 0.0008 (0.0087) model time 0.1990 (0.1992) loss 3.4692 (3.4110) grad_norm 0.8958 (1.2798) loss_scale 16384.0000 (14503.8689) mem 8975MB [2024-07-29 21:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][70/625] eta 0:01:59 lr 0.001906 wd 0.0500 time 0.1988 (0.2157) data time 0.0008 (0.0076) model time 0.1979 (0.1989) loss 4.0879 (3.4482) grad_norm 1.1018 (1.2582) loss_scale 16384.0000 (14768.6761) mem 8975MB [2024-07-29 21:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][80/625] eta 0:01:56 lr 0.001906 wd 0.0500 time 0.1953 (0.2136) data time 0.0007 (0.0068) model time 0.1946 (0.1983) loss 4.1614 (3.4710) grad_norm 1.3391 (1.2754) loss_scale 16384.0000 (14968.0988) mem 8975MB [2024-07-29 21:19:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][90/625] eta 0:01:53 lr 0.001906 wd 0.0500 time 0.1980 (0.2120) data time 0.0009 (0.0061) model time 0.1971 (0.1983) loss 3.8031 (3.4582) grad_norm 1.1405 (1.2795) loss_scale 16384.0000 (15123.6923) mem 8975MB [2024-07-29 21:19:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][100/625] eta 0:01:50 lr 0.001906 wd 0.0500 time 0.2004 (0.2107) data time 0.0006 (0.0056) model time 0.1997 (0.1984) loss 3.2242 (3.4555) grad_norm 0.7788 (1.2794) loss_scale 16384.0000 (15248.4752) mem 8975MB [2024-07-29 21:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][110/625] eta 0:01:48 lr 0.001906 wd 0.0500 time 0.2102 (0.2099) data time 0.0006 (0.0052) model time 0.2096 (0.1988) loss 3.6459 (3.4467) grad_norm 0.8490 (1.2765) loss_scale 16384.0000 (15350.7748) mem 8975MB [2024-07-29 21:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][120/625] eta 0:01:45 lr 0.001906 wd 0.0500 time 0.1996 (0.2091) data time 0.0007 (0.0048) model time 0.1990 (0.1988) loss 3.3476 (3.4341) grad_norm 1.3571 (1.2686) loss_scale 16384.0000 (15436.1653) mem 8975MB [2024-07-29 21:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][130/625] eta 0:01:43 lr 0.001906 wd 0.0500 time 0.2001 (0.2083) data time 0.0008 (0.0045) model time 0.1993 (0.1988) loss 3.5552 (3.4354) grad_norm 0.9653 (1.2710) loss_scale 16384.0000 (15508.5191) mem 8975MB [2024-07-29 21:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][140/625] eta 0:01:40 lr 0.001906 wd 0.0500 time 0.1983 (0.2077) data time 0.0010 (0.0043) model time 0.1974 (0.1988) loss 3.2781 (3.4134) grad_norm 1.0320 (1.2575) loss_scale 16384.0000 (15570.6099) mem 8975MB [2024-07-29 21:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][150/625] eta 0:01:38 lr 0.001906 wd 0.0500 time 0.1984 (0.2074) data time 0.0009 (0.0041) model time 0.1975 (0.1990) loss 4.1286 (3.4190) grad_norm 1.0417 (1.2575) loss_scale 16384.0000 (15624.4768) mem 8975MB [2024-07-29 21:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][160/625] eta 0:01:36 lr 0.001906 wd 0.0500 time 0.1991 (0.2070) data time 0.0007 (0.0039) model time 0.1983 (0.1991) loss 2.7084 (3.4172) grad_norm 1.2520 (1.2541) loss_scale 16384.0000 (15671.6522) mem 8975MB [2024-07-29 21:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][170/625] eta 0:01:33 lr 0.001905 wd 0.0500 time 0.1966 (0.2066) data time 0.0007 (0.0037) model time 0.1959 (0.1991) loss 2.5385 (3.4016) grad_norm 0.8802 (1.2458) loss_scale 16384.0000 (15713.3099) mem 8975MB [2024-07-29 21:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][180/625] eta 0:01:31 lr 0.001905 wd 0.0500 time 0.1984 (0.2063) data time 0.0010 (0.0035) model time 0.1975 (0.1993) loss 3.3787 (3.4091) grad_norm 0.8580 (1.2504) loss_scale 16384.0000 (15750.3646) mem 8975MB [2024-07-29 21:19:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][190/625] eta 0:01:29 lr 0.001905 wd 0.0500 time 0.2006 (0.2060) data time 0.0008 (0.0034) model time 0.1998 (0.1993) loss 3.4388 (3.4154) grad_norm 1.8449 (1.2502) loss_scale 16384.0000 (15783.5393) mem 8975MB [2024-07-29 21:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][200/625] eta 0:01:27 lr 0.001905 wd 0.0500 time 0.2017 (0.2057) data time 0.0007 (0.0033) model time 0.2010 (0.1993) loss 4.2918 (3.4262) grad_norm 1.8893 (1.2649) loss_scale 16384.0000 (15813.4129) mem 8975MB [2024-07-29 21:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][210/625] eta 0:01:25 lr 0.001905 wd 0.0500 time 0.1976 (0.2054) data time 0.0007 (0.0032) model time 0.1968 (0.1992) loss 2.1555 (3.4242) grad_norm 1.1378 (1.2720) loss_scale 16384.0000 (15840.4550) mem 8975MB [2024-07-29 21:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][220/625] eta 0:01:23 lr 0.001905 wd 0.0500 time 0.2025 (0.2052) data time 0.0010 (0.0031) model time 0.2015 (0.1992) loss 3.2027 (3.4256) grad_norm 1.5495 (1.2780) loss_scale 16384.0000 (15865.0498) mem 8975MB [2024-07-29 21:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][230/625] eta 0:01:20 lr 0.001905 wd 0.0500 time 0.1966 (0.2051) data time 0.0007 (0.0030) model time 0.1959 (0.1994) loss 3.0612 (3.4194) grad_norm 1.2489 (1.2844) loss_scale 16384.0000 (15887.5152) mem 8975MB [2024-07-29 21:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][240/625] eta 0:01:18 lr 0.001905 wd 0.0500 time 0.1986 (0.2049) data time 0.0008 (0.0029) model time 0.1978 (0.1994) loss 3.5655 (3.4284) grad_norm 1.3549 (1.2953) loss_scale 16384.0000 (15908.1162) mem 8975MB [2024-07-29 21:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][250/625] eta 0:01:16 lr 0.001905 wd 0.0500 time 0.1977 (0.2047) data time 0.0007 (0.0028) model time 0.1971 (0.1994) loss 2.2869 (3.4165) grad_norm 1.1076 (1.3065) loss_scale 16384.0000 (15927.0757) mem 8975MB [2024-07-29 21:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][260/625] eta 0:01:14 lr 0.001905 wd 0.0500 time 0.1964 (0.2053) data time 0.0007 (0.0027) model time 0.1957 (0.2004) loss 3.4439 (3.4177) grad_norm 0.8642 (1.3062) loss_scale 16384.0000 (15944.5824) mem 8975MB [2024-07-29 21:19:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][270/625] eta 0:01:12 lr 0.001905 wd 0.0500 time 0.1998 (0.2051) data time 0.0008 (0.0027) model time 0.1990 (0.2003) loss 4.1478 (3.4254) grad_norm 1.2916 (1.3021) loss_scale 16384.0000 (15960.7970) mem 8975MB [2024-07-29 21:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][280/625] eta 0:01:10 lr 0.001905 wd 0.0500 time 0.2081 (0.2050) data time 0.0008 (0.0026) model time 0.2072 (0.2003) loss 3.3147 (3.4334) grad_norm 1.1105 (1.2983) loss_scale 16384.0000 (15975.8577) mem 8975MB [2024-07-29 21:20:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][290/625] eta 0:01:08 lr 0.001905 wd 0.0500 time 0.1975 (0.2048) data time 0.0007 (0.0025) model time 0.1967 (0.2002) loss 2.9603 (3.4269) grad_norm 1.8801 (1.2921) loss_scale 16384.0000 (15989.8832) mem 8975MB [2024-07-29 21:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][300/625] eta 0:01:06 lr 0.001904 wd 0.0500 time 0.1985 (0.2046) data time 0.0007 (0.0025) model time 0.1979 (0.2002) loss 3.6477 (3.4353) grad_norm 2.0451 (1.3048) loss_scale 16384.0000 (16002.9767) mem 8975MB [2024-07-29 21:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][310/625] eta 0:01:04 lr 0.001904 wd 0.0500 time 0.1972 (0.2045) data time 0.0008 (0.0024) model time 0.1964 (0.2002) loss 3.1946 (3.4457) grad_norm 1.3903 (1.3086) loss_scale 16384.0000 (16015.2283) mem 8975MB [2024-07-29 21:20:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][320/625] eta 0:01:02 lr 0.001904 wd 0.0500 time 0.1985 (0.2044) data time 0.0008 (0.0024) model time 0.1977 (0.2001) loss 2.7750 (3.4500) grad_norm 1.7665 (1.3077) loss_scale 16384.0000 (16026.7165) mem 8975MB [2024-07-29 21:20:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][330/625] eta 0:01:00 lr 0.001904 wd 0.0500 time 0.1955 (0.2042) data time 0.0010 (0.0023) model time 0.1945 (0.2000) loss 2.8250 (3.4478) grad_norm 1.0331 (1.2996) loss_scale 16384.0000 (16037.5106) mem 8975MB [2024-07-29 21:20:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][340/625] eta 0:00:58 lr 0.001904 wd 0.0500 time 0.1966 (0.2041) data time 0.0008 (0.0023) model time 0.1958 (0.2000) loss 3.4168 (3.4441) grad_norm 1.0466 (1.2942) loss_scale 16384.0000 (16047.6716) mem 8975MB [2024-07-29 21:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][350/625] eta 0:00:56 lr 0.001904 wd 0.0500 time 0.1954 (0.2040) data time 0.0007 (0.0023) model time 0.1947 (0.2000) loss 3.5762 (3.4485) grad_norm 1.0590 (1.2938) loss_scale 16384.0000 (16057.2536) mem 8975MB [2024-07-29 21:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][360/625] eta 0:00:54 lr 0.001904 wd 0.0500 time 0.1989 (0.2039) data time 0.0009 (0.0022) model time 0.1981 (0.2000) loss 3.9023 (3.4408) grad_norm 1.3224 (1.2936) loss_scale 16384.0000 (16066.3047) mem 8975MB [2024-07-29 21:20:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][370/625] eta 0:00:51 lr 0.001904 wd 0.0500 time 0.1975 (0.2038) data time 0.0008 (0.0022) model time 0.1967 (0.2000) loss 3.5569 (3.4421) grad_norm 1.1112 (1.2937) loss_scale 16384.0000 (16074.8679) mem 8975MB [2024-07-29 21:20:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][380/625] eta 0:00:49 lr 0.001904 wd 0.0500 time 0.1988 (0.2037) data time 0.0008 (0.0021) model time 0.1979 (0.2000) loss 3.5412 (3.4418) grad_norm 0.9666 (1.2895) loss_scale 16384.0000 (16082.9816) mem 8975MB [2024-07-29 21:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][390/625] eta 0:00:47 lr 0.001904 wd 0.0500 time 0.2083 (0.2036) data time 0.0007 (0.0021) model time 0.2076 (0.2000) loss 4.3959 (3.4517) grad_norm 0.9655 (1.2867) loss_scale 16384.0000 (16090.6803) mem 8975MB [2024-07-29 21:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][400/625] eta 0:00:45 lr 0.001904 wd 0.0500 time 0.2022 (0.2036) data time 0.0009 (0.0021) model time 0.2013 (0.2000) loss 2.4985 (3.4484) grad_norm 1.3176 (1.2883) loss_scale 16384.0000 (16097.9950) mem 8975MB [2024-07-29 21:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][410/625] eta 0:00:43 lr 0.001904 wd 0.0500 time 0.2011 (0.2035) data time 0.0007 (0.0020) model time 0.2004 (0.2000) loss 3.2115 (3.4528) grad_norm 1.5696 (1.2835) loss_scale 16384.0000 (16104.9538) mem 8975MB [2024-07-29 21:20:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][420/625] eta 0:00:41 lr 0.001904 wd 0.0500 time 0.2023 (0.2035) data time 0.0009 (0.0020) model time 0.2014 (0.2000) loss 2.6157 (3.4553) grad_norm 1.1619 (1.2812) loss_scale 16384.0000 (16111.5819) mem 8975MB [2024-07-29 21:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][430/625] eta 0:00:39 lr 0.001903 wd 0.0500 time 0.2050 (0.2034) data time 0.0006 (0.0020) model time 0.2044 (0.2000) loss 2.6134 (3.4605) grad_norm 1.2151 (1.2797) loss_scale 16384.0000 (16117.9026) mem 8975MB [2024-07-29 21:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][440/625] eta 0:00:37 lr 0.001903 wd 0.0500 time 0.1993 (0.2033) data time 0.0007 (0.0020) model time 0.1986 (0.2000) loss 3.5405 (3.4643) grad_norm 1.0445 (1.2795) loss_scale 16384.0000 (16123.9365) mem 8975MB [2024-07-29 21:20:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][450/625] eta 0:00:35 lr 0.001903 wd 0.0500 time 0.1987 (0.2033) data time 0.0010 (0.0019) model time 0.1977 (0.2000) loss 3.8598 (3.4687) grad_norm 1.2003 (1.2780) loss_scale 16384.0000 (16129.7029) mem 8975MB [2024-07-29 21:20:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][460/625] eta 0:00:33 lr 0.001903 wd 0.0500 time 0.2003 (0.2032) data time 0.0006 (0.0019) model time 0.1997 (0.2000) loss 4.4136 (3.4766) grad_norm 1.5389 (1.2827) loss_scale 16384.0000 (16135.2191) mem 8975MB [2024-07-29 21:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][470/625] eta 0:00:31 lr 0.001903 wd 0.0500 time 0.1989 (0.2032) data time 0.0008 (0.0019) model time 0.1981 (0.2001) loss 2.9633 (3.4744) grad_norm 1.7977 (1.2876) loss_scale 16384.0000 (16140.5011) mem 8975MB [2024-07-29 21:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][480/625] eta 0:00:29 lr 0.001903 wd 0.0500 time 0.1998 (0.2032) data time 0.0009 (0.0019) model time 0.1989 (0.2001) loss 3.5391 (3.4804) grad_norm 0.9919 (1.2888) loss_scale 16384.0000 (16145.5634) mem 8975MB [2024-07-29 21:20:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][490/625] eta 0:00:27 lr 0.001903 wd 0.0500 time 0.1990 (0.2031) data time 0.0008 (0.0019) model time 0.1982 (0.2001) loss 3.7091 (3.4707) grad_norm 0.7283 (1.2874) loss_scale 16384.0000 (16150.4196) mem 8975MB [2024-07-29 21:20:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][500/625] eta 0:00:25 lr 0.001903 wd 0.0500 time 0.2009 (0.2031) data time 0.0007 (0.0019) model time 0.2002 (0.2000) loss 3.8483 (3.4708) grad_norm 1.0195 (1.2824) loss_scale 16384.0000 (16155.0818) mem 8975MB [2024-07-29 21:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][510/625] eta 0:00:23 lr 0.001903 wd 0.0500 time 0.2025 (0.2030) data time 0.0006 (0.0018) model time 0.2019 (0.2000) loss 3.5493 (3.4687) grad_norm 1.6137 (1.2820) loss_scale 16384.0000 (16159.5616) mem 8975MB [2024-07-29 21:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][520/625] eta 0:00:21 lr 0.001903 wd 0.0500 time 0.1988 (0.2029) data time 0.0006 (0.0018) model time 0.1982 (0.2000) loss 3.0739 (3.4755) grad_norm 1.3047 (1.2792) loss_scale 16384.0000 (16163.8695) mem 8975MB [2024-07-29 21:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][530/625] eta 0:00:19 lr 0.001903 wd 0.0500 time 0.1966 (0.2029) data time 0.0010 (0.0018) model time 0.1956 (0.1999) loss 3.0878 (3.4757) grad_norm 0.9480 (1.2759) loss_scale 16384.0000 (16168.0151) mem 8975MB [2024-07-29 21:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][540/625] eta 0:00:17 lr 0.001903 wd 0.0500 time 0.1979 (0.2029) data time 0.0007 (0.0018) model time 0.1971 (0.2000) loss 2.2094 (3.4704) grad_norm 0.9953 (1.2745) loss_scale 16384.0000 (16172.0074) mem 8975MB [2024-07-29 21:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][550/625] eta 0:00:15 lr 0.001903 wd 0.0500 time 0.1969 (0.2028) data time 0.0007 (0.0018) model time 0.1962 (0.1999) loss 2.5218 (3.4679) grad_norm 0.8150 (1.2779) loss_scale 16384.0000 (16175.8548) mem 8975MB [2024-07-29 21:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][560/625] eta 0:00:13 lr 0.001902 wd 0.0500 time 0.1993 (0.2028) data time 0.0006 (0.0018) model time 0.1986 (0.1999) loss 2.1807 (3.4663) grad_norm 1.3159 (1.2793) loss_scale 16384.0000 (16179.5651) mem 8975MB [2024-07-29 21:20:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][570/625] eta 0:00:11 lr 0.001902 wd 0.0500 time 0.2011 (0.2028) data time 0.0008 (0.0018) model time 0.2003 (0.2000) loss 2.8148 (3.4674) grad_norm 1.3795 (1.2814) loss_scale 16384.0000 (16183.1454) mem 8975MB [2024-07-29 21:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][580/625] eta 0:00:09 lr 0.001902 wd 0.0500 time 0.2024 (0.2028) data time 0.0006 (0.0017) model time 0.2018 (0.2000) loss 2.8513 (3.4642) grad_norm 0.9571 (1.2810) loss_scale 16384.0000 (16186.6024) mem 8975MB [2024-07-29 21:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][590/625] eta 0:00:07 lr 0.001902 wd 0.0500 time 0.2022 (0.2031) data time 0.0006 (0.0017) model time 0.2016 (0.2004) loss 3.5373 (3.4644) grad_norm 0.8885 (1.2797) loss_scale 16384.0000 (16189.9425) mem 8975MB [2024-07-29 21:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][600/625] eta 0:00:05 lr 0.001902 wd 0.0500 time 0.2000 (0.2034) data time 0.0008 (0.0017) model time 0.1991 (0.2008) loss 3.6456 (3.4621) grad_norm 1.1355 (1.2787) loss_scale 16384.0000 (16193.1714) mem 8975MB [2024-07-29 21:21:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][610/625] eta 0:00:03 lr 0.001902 wd 0.0500 time 0.1985 (0.2035) data time 0.0005 (0.0017) model time 0.1980 (0.2008) loss 3.1175 (3.4521) grad_norm 1.0230 (1.2750) loss_scale 16384.0000 (16196.2946) mem 8975MB [2024-07-29 21:21:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][620/625] eta 0:00:01 lr 0.001902 wd 0.0500 time 0.1989 (0.2034) data time 0.0005 (0.0017) model time 0.1984 (0.2008) loss 2.9198 (3.4505) grad_norm 0.8321 (1.2742) loss_scale 16384.0000 (16199.3172) mem 8975MB [2024-07-29 21:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 59 training takes 0:02:07 [2024-07-29 21:21:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:21:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.7563 (0.7563) Acc@1 84.912 (84.912) Acc@5 97.168 (97.168) Mem 8975MB [2024-07-29 21:21:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 1.3057 (0.9502) Acc@1 71.631 (80.180) Acc@5 92.383 (95.894) Mem 8975MB [2024-07-29 21:21:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.075) Loss 1.4229 (1.1404) Acc@1 69.727 (76.325) Acc@5 90.723 (93.550) Mem 8975MB [2024-07-29 21:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.108 Acc@5 93.540 [2024-07-29 21:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.1% [2024-07-29 21:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.11% [2024-07-29 21:21:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 21:21:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 21:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.558 (0.558) Loss 0.5996 (0.5996) Acc@1 85.303 (85.303) Acc@5 97.217 (97.217) Mem 8975MB [2024-07-29 21:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0781 (0.7908) Acc@1 73.291 (80.273) Acc@5 92.578 (95.579) Mem 8975MB [2024-07-29 21:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.079) Loss 1.2959 (0.9709) Acc@1 67.773 (76.172) Acc@5 89.648 (93.306) Mem 8975MB [2024-07-29 21:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.902 Acc@5 93.276 [2024-07-29 21:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 75.9% [2024-07-29 21:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 75.90% [2024-07-29 21:21:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:21:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][0/625] eta 0:06:02 lr 0.001902 wd 0.0500 time 0.5802 (0.5802) data time 0.3893 (0.3893) model time 0.0000 (0.0000) loss 3.7594 (3.7594) grad_norm 1.0952 (1.0952) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][10/625] eta 0:02:24 lr 0.001902 wd 0.0500 time 0.1992 (0.2353) data time 0.0010 (0.0364) model time 0.0000 (0.0000) loss 2.4782 (3.1575) grad_norm 1.5839 (1.2630) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][20/625] eta 0:02:12 lr 0.001902 wd 0.0500 time 0.1972 (0.2183) data time 0.0006 (0.0195) model time 0.0000 (0.0000) loss 4.0368 (3.4014) grad_norm 1.9780 (1.3593) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][30/625] eta 0:02:06 lr 0.001902 wd 0.0500 time 0.1956 (0.2127) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 3.5363 (3.3757) grad_norm 1.0151 (1.4104) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][40/625] eta 0:02:03 lr 0.001902 wd 0.0500 time 0.2235 (0.2103) data time 0.0007 (0.0104) model time 0.0000 (0.0000) loss 3.6550 (3.3830) grad_norm 1.7462 (1.4035) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][50/625] eta 0:02:00 lr 0.001902 wd 0.0500 time 0.1980 (0.2096) data time 0.0011 (0.0089) model time 0.0000 (0.0000) loss 3.5248 (3.3814) grad_norm 1.6730 (1.3918) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][60/625] eta 0:01:57 lr 0.001901 wd 0.0500 time 0.1971 (0.2081) data time 0.0010 (0.0075) model time 0.1961 (0.1998) loss 3.2374 (3.4362) grad_norm 0.8473 (1.3758) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][70/625] eta 0:01:54 lr 0.001901 wd 0.0500 time 0.2019 (0.2070) data time 0.0006 (0.0066) model time 0.2013 (0.1996) loss 2.7400 (3.4212) grad_norm 1.1486 (1.3296) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][80/625] eta 0:01:52 lr 0.001901 wd 0.0500 time 0.1989 (0.2063) data time 0.0006 (0.0059) model time 0.1983 (0.1998) loss 2.9670 (3.4083) grad_norm 1.4709 (1.3142) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][90/625] eta 0:01:50 lr 0.001901 wd 0.0500 time 0.1998 (0.2057) data time 0.0008 (0.0053) model time 0.1989 (0.1999) loss 3.9285 (3.4129) grad_norm 0.9868 (1.3137) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][100/625] eta 0:01:47 lr 0.001901 wd 0.0500 time 0.2041 (0.2052) data time 0.0008 (0.0049) model time 0.2033 (0.1999) loss 4.0913 (3.4426) grad_norm 0.9428 (1.2992) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][110/625] eta 0:01:45 lr 0.001901 wd 0.0500 time 0.1972 (0.2049) data time 0.0009 (0.0045) model time 0.1963 (0.2001) loss 3.5521 (3.4160) grad_norm 1.2170 (1.2927) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][120/625] eta 0:01:44 lr 0.001901 wd 0.0500 time 0.1987 (0.2063) data time 0.0006 (0.0042) model time 0.1981 (0.2031) loss 4.1471 (3.4352) grad_norm 0.9478 (1.3182) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][130/625] eta 0:01:42 lr 0.001901 wd 0.0500 time 0.1958 (0.2075) data time 0.0007 (0.0040) model time 0.1951 (0.2053) loss 3.3227 (3.4231) grad_norm 1.6637 (1.3311) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][140/625] eta 0:01:41 lr 0.001901 wd 0.0500 time 0.1995 (0.2086) data time 0.0008 (0.0038) model time 0.1987 (0.2072) loss 3.1558 (3.4378) grad_norm 1.2259 (1.3288) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][150/625] eta 0:01:38 lr 0.001901 wd 0.0500 time 0.1977 (0.2081) data time 0.0008 (0.0036) model time 0.1969 (0.2065) loss 2.9240 (3.4266) grad_norm 1.6309 (1.3246) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][160/625] eta 0:01:36 lr 0.001901 wd 0.0500 time 0.2009 (0.2083) data time 0.0010 (0.0034) model time 0.1999 (0.2069) loss 3.6960 (3.4386) grad_norm 1.4056 (1.3145) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][170/625] eta 0:01:34 lr 0.001901 wd 0.0500 time 0.1963 (0.2080) data time 0.0009 (0.0033) model time 0.1954 (0.2064) loss 3.0544 (3.4337) grad_norm 1.4340 (1.3053) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][180/625] eta 0:01:32 lr 0.001901 wd 0.0500 time 0.2011 (0.2078) data time 0.0008 (0.0031) model time 0.2003 (0.2062) loss 3.9257 (3.4326) grad_norm 1.0479 (1.2994) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][190/625] eta 0:01:30 lr 0.001900 wd 0.0500 time 0.1981 (0.2075) data time 0.0007 (0.0030) model time 0.1974 (0.2059) loss 2.8036 (3.4219) grad_norm 0.9781 (1.2924) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][200/625] eta 0:01:28 lr 0.001900 wd 0.0500 time 0.1987 (0.2072) data time 0.0006 (0.0029) model time 0.1981 (0.2055) loss 2.2024 (3.4205) grad_norm 1.0430 (1.2866) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:21:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][210/625] eta 0:01:25 lr 0.001900 wd 0.0500 time 0.1986 (0.2069) data time 0.0006 (0.0028) model time 0.1980 (0.2051) loss 3.6486 (3.4128) grad_norm 1.2331 (1.2817) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][220/625] eta 0:01:23 lr 0.001900 wd 0.0500 time 0.2114 (0.2069) data time 0.0008 (0.0027) model time 0.2105 (0.2052) loss 3.6776 (3.4190) grad_norm 1.1674 (1.2715) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][230/625] eta 0:01:21 lr 0.001900 wd 0.0500 time 0.1987 (0.2066) data time 0.0008 (0.0027) model time 0.1979 (0.2049) loss 3.9221 (3.4158) grad_norm 1.1484 (1.2827) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][240/625] eta 0:01:19 lr 0.001900 wd 0.0500 time 0.2042 (0.2066) data time 0.0006 (0.0026) model time 0.2036 (0.2048) loss 3.5590 (3.4196) grad_norm 1.0338 (1.2981) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][250/625] eta 0:01:17 lr 0.001900 wd 0.0500 time 0.2098 (0.2064) data time 0.0008 (0.0025) model time 0.2090 (0.2047) loss 3.3035 (3.4249) grad_norm 0.9818 (1.2921) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][260/625] eta 0:01:15 lr 0.001900 wd 0.0500 time 0.2033 (0.2064) data time 0.0008 (0.0025) model time 0.2025 (0.2046) loss 2.6827 (3.4287) grad_norm 1.2688 (1.2975) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][270/625] eta 0:01:13 lr 0.001900 wd 0.0500 time 0.2008 (0.2062) data time 0.0006 (0.0025) model time 0.2001 (0.2044) loss 3.4898 (3.4172) grad_norm 1.3664 (1.2990) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][280/625] eta 0:01:11 lr 0.001900 wd 0.0500 time 0.2005 (0.2060) data time 0.0006 (0.0024) model time 0.1999 (0.2042) loss 3.4337 (3.4195) grad_norm 1.3891 (1.3004) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][290/625] eta 0:01:09 lr 0.001900 wd 0.0500 time 0.1961 (0.2065) data time 0.0007 (0.0024) model time 0.1954 (0.2048) loss 2.4696 (3.4107) grad_norm 2.0778 (1.3056) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][300/625] eta 0:01:07 lr 0.001900 wd 0.0500 time 0.1978 (0.2063) data time 0.0009 (0.0023) model time 0.1969 (0.2046) loss 3.9831 (3.4142) grad_norm 1.0284 (1.3076) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][310/625] eta 0:01:04 lr 0.001900 wd 0.0500 time 0.2006 (0.2061) data time 0.0008 (0.0023) model time 0.1998 (0.2044) loss 2.3791 (3.4106) grad_norm 1.3197 (1.3095) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][320/625] eta 0:01:02 lr 0.001899 wd 0.0500 time 0.2020 (0.2059) data time 0.0006 (0.0022) model time 0.2014 (0.2043) loss 3.2714 (3.4129) grad_norm 0.9848 (1.3027) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][330/625] eta 0:01:00 lr 0.001899 wd 0.0500 time 0.3171 (0.2062) data time 0.0008 (0.0022) model time 0.3163 (0.2046) loss 3.3118 (3.4233) grad_norm 1.1595 (1.2987) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][340/625] eta 0:00:58 lr 0.001899 wd 0.0500 time 0.2030 (0.2063) data time 0.0009 (0.0022) model time 0.2021 (0.2047) loss 2.8355 (3.4221) grad_norm 1.2068 (1.2915) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][350/625] eta 0:00:56 lr 0.001899 wd 0.0500 time 0.1997 (0.2061) data time 0.0006 (0.0021) model time 0.1992 (0.2046) loss 3.3784 (3.4265) grad_norm 1.2893 (1.2893) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][360/625] eta 0:00:54 lr 0.001899 wd 0.0500 time 0.2003 (0.2060) data time 0.0008 (0.0021) model time 0.1996 (0.2044) loss 3.7994 (3.4366) grad_norm 0.8272 (1.2861) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][370/625] eta 0:00:52 lr 0.001899 wd 0.0500 time 0.1997 (0.2059) data time 0.0006 (0.0020) model time 0.1990 (0.2043) loss 3.6077 (3.4377) grad_norm 1.5419 (1.2870) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][380/625] eta 0:00:50 lr 0.001899 wd 0.0500 time 0.2001 (0.2057) data time 0.0007 (0.0020) model time 0.1994 (0.2042) loss 2.8880 (3.4394) grad_norm 1.4690 (1.2902) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][390/625] eta 0:00:48 lr 0.001899 wd 0.0500 time 0.1984 (0.2056) data time 0.0007 (0.0020) model time 0.1977 (0.2041) loss 2.9942 (3.4428) grad_norm 1.5843 (1.2870) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][400/625] eta 0:00:46 lr 0.001899 wd 0.0500 time 0.1981 (0.2055) data time 0.0008 (0.0020) model time 0.1973 (0.2040) loss 3.8306 (3.4373) grad_norm 1.0545 (1.2892) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][410/625] eta 0:00:44 lr 0.001899 wd 0.0500 time 0.1994 (0.2055) data time 0.0006 (0.0019) model time 0.1987 (0.2039) loss 2.9479 (3.4325) grad_norm 1.2550 (1.2872) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][420/625] eta 0:00:42 lr 0.001899 wd 0.0500 time 0.2007 (0.2053) data time 0.0006 (0.0019) model time 0.2001 (0.2038) loss 3.4758 (3.4415) grad_norm 1.4265 (1.2822) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][430/625] eta 0:00:40 lr 0.001899 wd 0.0500 time 0.1983 (0.2052) data time 0.0007 (0.0019) model time 0.1976 (0.2037) loss 3.6820 (3.4422) grad_norm 1.0028 (1.2806) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][440/625] eta 0:00:37 lr 0.001899 wd 0.0500 time 0.1996 (0.2051) data time 0.0007 (0.0019) model time 0.1989 (0.2036) loss 3.7302 (3.4405) grad_norm 1.1163 (1.2743) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][450/625] eta 0:00:35 lr 0.001898 wd 0.0500 time 0.1995 (0.2050) data time 0.0007 (0.0018) model time 0.1988 (0.2034) loss 2.3918 (3.4345) grad_norm 1.1592 (1.2743) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][460/625] eta 0:00:33 lr 0.001898 wd 0.0500 time 0.2030 (0.2049) data time 0.0007 (0.0018) model time 0.2023 (0.2034) loss 2.0773 (3.4301) grad_norm 1.2241 (1.2722) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][470/625] eta 0:00:31 lr 0.001898 wd 0.0500 time 0.2012 (0.2048) data time 0.0006 (0.0018) model time 0.2006 (0.2033) loss 3.6611 (3.4294) grad_norm 1.1088 (1.2786) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][480/625] eta 0:00:29 lr 0.001898 wd 0.0500 time 0.2023 (0.2048) data time 0.0006 (0.0018) model time 0.2017 (0.2033) loss 4.2964 (3.4296) grad_norm 1.0156 (1.2756) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][490/625] eta 0:00:27 lr 0.001898 wd 0.0500 time 0.2035 (0.2049) data time 0.0007 (0.0018) model time 0.2029 (0.2034) loss 3.8129 (3.4291) grad_norm 1.2652 (1.2796) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][500/625] eta 0:00:25 lr 0.001898 wd 0.0500 time 0.2012 (0.2048) data time 0.0006 (0.0018) model time 0.2006 (0.2033) loss 3.9255 (3.4317) grad_norm 0.8984 (1.2777) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][510/625] eta 0:00:23 lr 0.001898 wd 0.0500 time 0.2008 (0.2047) data time 0.0008 (0.0017) model time 0.2001 (0.2032) loss 4.0016 (3.4361) grad_norm 1.7774 (1.2780) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][520/625] eta 0:00:21 lr 0.001898 wd 0.0500 time 0.1995 (0.2047) data time 0.0008 (0.0017) model time 0.1987 (0.2032) loss 2.6654 (3.4336) grad_norm 0.9064 (1.2844) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][530/625] eta 0:00:19 lr 0.001898 wd 0.0500 time 0.2007 (0.2046) data time 0.0006 (0.0017) model time 0.2002 (0.2031) loss 3.8528 (3.4345) grad_norm 0.8702 (1.2832) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][540/625] eta 0:00:17 lr 0.001898 wd 0.0500 time 0.1991 (0.2045) data time 0.0007 (0.0017) model time 0.1984 (0.2031) loss 3.8331 (3.4340) grad_norm 1.6600 (1.2842) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][550/625] eta 0:00:15 lr 0.001898 wd 0.0500 time 0.2001 (0.2046) data time 0.0006 (0.0017) model time 0.1995 (0.2031) loss 3.3465 (3.4352) grad_norm 1.2683 (1.2808) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][560/625] eta 0:00:13 lr 0.001898 wd 0.0500 time 0.2020 (0.2045) data time 0.0009 (0.0017) model time 0.2011 (0.2030) loss 2.9023 (3.4312) grad_norm 1.5300 (1.2819) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][570/625] eta 0:00:11 lr 0.001898 wd 0.0500 time 0.1976 (0.2044) data time 0.0008 (0.0017) model time 0.1968 (0.2030) loss 2.9300 (3.4289) grad_norm 1.4256 (1.2815) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][580/625] eta 0:00:09 lr 0.001897 wd 0.0500 time 0.2000 (0.2043) data time 0.0006 (0.0017) model time 0.1994 (0.2029) loss 4.3481 (3.4309) grad_norm 1.4520 (1.2849) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][590/625] eta 0:00:07 lr 0.001897 wd 0.0500 time 0.1976 (0.2042) data time 0.0006 (0.0016) model time 0.1970 (0.2028) loss 3.8685 (3.4271) grad_norm 1.2230 (1.2856) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][600/625] eta 0:00:05 lr 0.001897 wd 0.0500 time 0.1993 (0.2042) data time 0.0006 (0.0016) model time 0.1987 (0.2027) loss 2.3035 (3.4242) grad_norm 1.4299 (1.2842) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][610/625] eta 0:00:03 lr 0.001897 wd 0.0500 time 0.2006 (0.2041) data time 0.0004 (0.0016) model time 0.2002 (0.2026) loss 4.0655 (3.4238) grad_norm 0.9162 (1.2796) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][620/625] eta 0:00:01 lr 0.001897 wd 0.0500 time 0.2049 (0.2041) data time 0.0006 (0.0016) model time 0.2044 (0.2026) loss 2.9214 (3.4247) grad_norm 2.0539 (1.2804) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 60 training takes 0:02:07 [2024-07-29 21:23:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:23:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.441 (0.441) Loss 0.7896 (0.7896) Acc@1 84.814 (84.814) Acc@5 97.803 (97.803) Mem 8975MB [2024-07-29 21:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 1.2656 (0.9630) Acc@1 72.412 (80.420) Acc@5 91.797 (95.748) Mem 8975MB [2024-07-29 21:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.074) Loss 1.4170 (1.1286) Acc@1 68.652 (76.300) Acc@5 90.332 (93.497) Mem 8975MB [2024-07-29 21:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.952 Acc@5 93.388 [2024-07-29 21:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.0% [2024-07-29 21:23:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.767 (0.767) Loss 0.5923 (0.5923) Acc@1 85.352 (85.352) Acc@5 97.314 (97.314) Mem 8975MB [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 1.0674 (0.7811) Acc@1 73.389 (80.540) Acc@5 92.773 (95.694) Mem 8975MB [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.2773 (0.9590) Acc@1 68.164 (76.449) Acc@5 89.990 (93.452) Mem 8975MB [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.188 Acc@5 93.430 [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.2% [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.19% [2024-07-29 21:23:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:23:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:23:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][0/625] eta 0:07:06 lr 0.001897 wd 0.0500 time 0.6828 (0.6828) data time 0.4911 (0.4911) model time 0.0000 (0.0000) loss 3.6644 (3.6644) grad_norm 0.9457 (0.9457) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][10/625] eta 0:02:29 lr 0.001897 wd 0.0500 time 0.1968 (0.2433) data time 0.0009 (0.0455) model time 0.0000 (0.0000) loss 3.4412 (3.4814) grad_norm 0.9493 (1.1009) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][20/625] eta 0:02:14 lr 0.001897 wd 0.0500 time 0.1983 (0.2220) data time 0.0008 (0.0243) model time 0.0000 (0.0000) loss 3.3917 (3.3638) grad_norm 0.8778 (1.0674) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][30/625] eta 0:02:07 lr 0.001897 wd 0.0500 time 0.1989 (0.2146) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 3.6230 (3.3641) grad_norm 0.8988 (1.1292) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][40/625] eta 0:02:03 lr 0.001897 wd 0.0500 time 0.2037 (0.2110) data time 0.0008 (0.0129) model time 0.0000 (0.0000) loss 3.1762 (3.3916) grad_norm 1.6307 (1.2245) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][50/625] eta 0:02:02 lr 0.001897 wd 0.0500 time 0.3828 (0.2123) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 3.7860 (3.3740) grad_norm 1.4511 (1.2825) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][60/625] eta 0:02:00 lr 0.001897 wd 0.0500 time 0.1984 (0.2125) data time 0.0008 (0.0089) model time 0.1977 (0.2125) loss 3.5730 (3.4370) grad_norm 0.9180 (1.2437) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][70/625] eta 0:01:56 lr 0.001897 wd 0.0500 time 0.1966 (0.2107) data time 0.0009 (0.0078) model time 0.1957 (0.2055) loss 3.9443 (3.4591) grad_norm 1.6832 (1.2607) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][80/625] eta 0:01:54 lr 0.001896 wd 0.0500 time 0.1966 (0.2093) data time 0.0009 (0.0072) model time 0.1957 (0.2027) loss 2.9618 (3.4717) grad_norm 1.2288 (1.3014) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][90/625] eta 0:01:51 lr 0.001896 wd 0.0500 time 0.1995 (0.2083) data time 0.0006 (0.0065) model time 0.1989 (0.2018) loss 3.4035 (3.4832) grad_norm 1.6674 (1.2882) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][100/625] eta 0:01:48 lr 0.001896 wd 0.0500 time 0.1992 (0.2075) data time 0.0007 (0.0059) model time 0.1986 (0.2014) loss 2.9025 (3.4631) grad_norm 1.1822 (1.3050) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][110/625] eta 0:01:46 lr 0.001896 wd 0.0500 time 0.1954 (0.2069) data time 0.0007 (0.0055) model time 0.1947 (0.2010) loss 4.0833 (3.4733) grad_norm 1.6322 (1.3301) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][120/625] eta 0:01:44 lr 0.001896 wd 0.0500 time 0.2065 (0.2063) data time 0.0006 (0.0051) model time 0.2059 (0.2008) loss 3.4525 (3.4728) grad_norm 1.2382 (1.3175) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][130/625] eta 0:01:42 lr 0.001896 wd 0.0500 time 0.2125 (0.2061) data time 0.0008 (0.0048) model time 0.2117 (0.2010) loss 3.6585 (3.4906) grad_norm 0.8724 (1.2969) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][140/625] eta 0:01:39 lr 0.001896 wd 0.0500 time 0.1975 (0.2056) data time 0.0009 (0.0045) model time 0.1966 (0.2008) loss 3.6886 (3.4900) grad_norm 1.3104 (1.2855) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][150/625] eta 0:01:37 lr 0.001896 wd 0.0500 time 0.2057 (0.2053) data time 0.0008 (0.0043) model time 0.2048 (0.2006) loss 3.8721 (3.4957) grad_norm 0.8953 (1.2753) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][160/625] eta 0:01:35 lr 0.001896 wd 0.0500 time 0.1987 (0.2050) data time 0.0007 (0.0041) model time 0.1980 (0.2005) loss 3.8025 (3.4908) grad_norm 1.1626 (1.2649) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][170/625] eta 0:01:33 lr 0.001896 wd 0.0500 time 0.1983 (0.2049) data time 0.0008 (0.0039) model time 0.1975 (0.2007) loss 3.8156 (3.4941) grad_norm 1.1419 (1.2562) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][180/625] eta 0:01:31 lr 0.001896 wd 0.0500 time 0.1984 (0.2047) data time 0.0007 (0.0037) model time 0.1977 (0.2006) loss 4.0925 (3.4907) grad_norm 1.6687 (1.2835) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][190/625] eta 0:01:29 lr 0.001896 wd 0.0500 time 0.3485 (0.2052) data time 0.0008 (0.0036) model time 0.3477 (0.2016) loss 3.9132 (3.4922) grad_norm 1.0719 (1.2849) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][200/625] eta 0:01:27 lr 0.001895 wd 0.0500 time 0.2001 (0.2055) data time 0.0008 (0.0035) model time 0.1994 (0.2021) loss 3.2854 (3.4786) grad_norm 1.1647 (1.2768) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][210/625] eta 0:01:25 lr 0.001895 wd 0.0500 time 0.1998 (0.2060) data time 0.0008 (0.0033) model time 0.1990 (0.2030) loss 2.5146 (3.4696) grad_norm 0.9953 (1.2689) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][220/625] eta 0:01:23 lr 0.001895 wd 0.0500 time 0.1971 (0.2058) data time 0.0009 (0.0032) model time 0.1962 (0.2028) loss 3.6045 (3.4782) grad_norm 1.8127 (1.2832) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][230/625] eta 0:01:21 lr 0.001895 wd 0.0500 time 0.2001 (0.2055) data time 0.0006 (0.0031) model time 0.1994 (0.2026) loss 4.7791 (3.4862) grad_norm 1.6406 (1.2990) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][240/625] eta 0:01:19 lr 0.001895 wd 0.0500 time 0.2001 (0.2054) data time 0.0007 (0.0030) model time 0.1994 (0.2025) loss 3.2817 (3.4843) grad_norm 1.3986 (1.2905) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][250/625] eta 0:01:16 lr 0.001895 wd 0.0500 time 0.2027 (0.2052) data time 0.0008 (0.0029) model time 0.2019 (0.2024) loss 3.3379 (3.4722) grad_norm 1.6884 (1.2863) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][260/625] eta 0:01:14 lr 0.001895 wd 0.0500 time 0.1991 (0.2050) data time 0.0006 (0.0029) model time 0.1984 (0.2022) loss 3.6962 (3.4757) grad_norm 1.0260 (1.2826) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][270/625] eta 0:01:12 lr 0.001895 wd 0.0500 time 0.1992 (0.2048) data time 0.0006 (0.0028) model time 0.1986 (0.2021) loss 3.2828 (3.4753) grad_norm 1.3649 (1.2871) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][280/625] eta 0:01:10 lr 0.001895 wd 0.0500 time 0.2003 (0.2046) data time 0.0009 (0.0027) model time 0.1994 (0.2019) loss 3.5961 (3.4663) grad_norm 1.0036 (1.2858) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][290/625] eta 0:01:08 lr 0.001895 wd 0.0500 time 0.1981 (0.2045) data time 0.0006 (0.0027) model time 0.1975 (0.2018) loss 3.9708 (3.4723) grad_norm 0.9071 (1.2833) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][300/625] eta 0:01:06 lr 0.001895 wd 0.0500 time 0.2038 (0.2044) data time 0.0006 (0.0026) model time 0.2032 (0.2017) loss 3.2979 (3.4760) grad_norm 0.9207 (1.2804) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][310/625] eta 0:01:04 lr 0.001895 wd 0.0500 time 0.2006 (0.2042) data time 0.0008 (0.0026) model time 0.1999 (0.2016) loss 3.3484 (3.4718) grad_norm 1.7733 (1.2847) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][320/625] eta 0:01:02 lr 0.001895 wd 0.0500 time 0.2018 (0.2041) data time 0.0008 (0.0025) model time 0.2010 (0.2015) loss 2.7132 (3.4650) grad_norm 1.5674 (1.2834) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][330/625] eta 0:01:00 lr 0.001894 wd 0.0500 time 0.2009 (0.2040) data time 0.0008 (0.0025) model time 0.2001 (0.2015) loss 2.6859 (3.4699) grad_norm 0.7685 (1.2745) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][340/625] eta 0:00:58 lr 0.001894 wd 0.0500 time 0.2019 (0.2039) data time 0.0007 (0.0024) model time 0.2011 (0.2014) loss 2.2085 (3.4618) grad_norm 0.8661 (1.2683) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][350/625] eta 0:00:56 lr 0.001894 wd 0.0500 time 0.2038 (0.2038) data time 0.0008 (0.0024) model time 0.2030 (0.2014) loss 3.9744 (3.4698) grad_norm 1.3027 (1.2640) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][360/625] eta 0:00:53 lr 0.001894 wd 0.0500 time 0.1998 (0.2037) data time 0.0007 (0.0023) model time 0.1991 (0.2013) loss 3.4841 (3.4577) grad_norm 1.0653 (1.2630) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][370/625] eta 0:00:51 lr 0.001894 wd 0.0500 time 0.1998 (0.2036) data time 0.0008 (0.0023) model time 0.1990 (0.2012) loss 3.6109 (3.4589) grad_norm 1.0204 (1.2677) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][380/625] eta 0:00:49 lr 0.001894 wd 0.0500 time 0.2013 (0.2035) data time 0.0007 (0.0023) model time 0.2006 (0.2012) loss 3.6387 (3.4519) grad_norm 1.0579 (1.2656) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][390/625] eta 0:00:47 lr 0.001894 wd 0.0500 time 0.2001 (0.2034) data time 0.0008 (0.0022) model time 0.1993 (0.2011) loss 2.1956 (3.4462) grad_norm 0.8335 (1.2631) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][400/625] eta 0:00:45 lr 0.001894 wd 0.0500 time 0.2002 (0.2034) data time 0.0007 (0.0022) model time 0.1995 (0.2011) loss 2.9777 (3.4401) grad_norm 0.9726 (1.2600) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][410/625] eta 0:00:43 lr 0.001894 wd 0.0500 time 0.2025 (0.2033) data time 0.0009 (0.0022) model time 0.2017 (0.2011) loss 2.8229 (3.4407) grad_norm 1.3055 (1.2638) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][420/625] eta 0:00:41 lr 0.001894 wd 0.0500 time 0.2052 (0.2033) data time 0.0009 (0.0021) model time 0.2043 (0.2011) loss 3.0223 (3.4434) grad_norm 1.2987 (1.2654) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][430/625] eta 0:00:39 lr 0.001894 wd 0.0500 time 0.1973 (0.2033) data time 0.0009 (0.0021) model time 0.1964 (0.2011) loss 3.7862 (3.4402) grad_norm 1.6099 (1.2685) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][440/625] eta 0:00:37 lr 0.001894 wd 0.0500 time 0.1985 (0.2032) data time 0.0008 (0.0021) model time 0.1977 (0.2011) loss 3.3123 (3.4373) grad_norm 0.8919 (1.2730) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][450/625] eta 0:00:35 lr 0.001894 wd 0.0500 time 0.2042 (0.2032) data time 0.0006 (0.0020) model time 0.2036 (0.2011) loss 2.9466 (3.4338) grad_norm 0.8054 (1.2699) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][460/625] eta 0:00:33 lr 0.001893 wd 0.0500 time 0.1976 (0.2031) data time 0.0009 (0.0020) model time 0.1967 (0.2010) loss 3.4945 (3.4387) grad_norm 1.0973 (1.2717) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][470/625] eta 0:00:31 lr 0.001893 wd 0.0500 time 0.1980 (0.2030) data time 0.0006 (0.0020) model time 0.1973 (0.2010) loss 2.8034 (3.4446) grad_norm 0.9050 (1.2732) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][480/625] eta 0:00:29 lr 0.001893 wd 0.0500 time 0.1988 (0.2034) data time 0.0008 (0.0020) model time 0.1980 (0.2014) loss 3.1643 (3.4452) grad_norm 0.8622 (1.2713) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][490/625] eta 0:00:27 lr 0.001893 wd 0.0500 time 0.1949 (0.2033) data time 0.0009 (0.0019) model time 0.1940 (0.2013) loss 2.8233 (3.4438) grad_norm 0.9381 (1.2686) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][500/625] eta 0:00:25 lr 0.001893 wd 0.0500 time 0.2024 (0.2033) data time 0.0007 (0.0019) model time 0.2018 (0.2013) loss 4.0153 (3.4439) grad_norm 1.7851 (1.2681) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][510/625] eta 0:00:23 lr 0.001893 wd 0.0500 time 0.1983 (0.2032) data time 0.0008 (0.0019) model time 0.1975 (0.2012) loss 2.6083 (3.4400) grad_norm 0.8704 (1.2633) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][520/625] eta 0:00:21 lr 0.001893 wd 0.0500 time 0.1988 (0.2032) data time 0.0009 (0.0019) model time 0.1979 (0.2012) loss 3.2612 (3.4409) grad_norm 1.0079 (1.2641) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][530/625] eta 0:00:19 lr 0.001893 wd 0.0500 time 0.2009 (0.2031) data time 0.0008 (0.0019) model time 0.2001 (0.2012) loss 3.0255 (3.4412) grad_norm 1.7939 (1.2706) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][540/625] eta 0:00:17 lr 0.001893 wd 0.0500 time 0.2011 (0.2031) data time 0.0007 (0.0019) model time 0.2003 (0.2012) loss 3.2839 (3.4444) grad_norm 1.0480 (1.2670) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][550/625] eta 0:00:15 lr 0.001893 wd 0.0500 time 0.1987 (0.2030) data time 0.0008 (0.0018) model time 0.1978 (0.2011) loss 3.3030 (3.4502) grad_norm 0.9479 (1.2625) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][560/625] eta 0:00:13 lr 0.001893 wd 0.0500 time 0.1983 (0.2030) data time 0.0006 (0.0018) model time 0.1977 (0.2011) loss 4.1970 (3.4492) grad_norm 1.8197 (1.2616) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][570/625] eta 0:00:11 lr 0.001893 wd 0.0500 time 0.1992 (0.2029) data time 0.0007 (0.0018) model time 0.1985 (0.2011) loss 4.1778 (3.4446) grad_norm 1.0754 (1.2628) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][580/625] eta 0:00:09 lr 0.001892 wd 0.0500 time 0.2274 (0.2030) data time 0.0006 (0.0018) model time 0.2268 (0.2011) loss 3.3911 (3.4416) grad_norm 1.1158 (1.2610) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][590/625] eta 0:00:07 lr 0.001892 wd 0.0500 time 0.2065 (0.2029) data time 0.0008 (0.0018) model time 0.2058 (0.2011) loss 3.2647 (3.4392) grad_norm 0.9898 (1.2628) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][600/625] eta 0:00:05 lr 0.001892 wd 0.0500 time 0.1998 (0.2029) data time 0.0007 (0.0018) model time 0.1991 (0.2011) loss 2.0816 (3.4342) grad_norm 0.8897 (1.2619) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][610/625] eta 0:00:03 lr 0.001892 wd 0.0500 time 0.1985 (0.2031) data time 0.0005 (0.0018) model time 0.1979 (0.2013) loss 3.3145 (3.4345) grad_norm 0.9132 (1.2619) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][620/625] eta 0:00:01 lr 0.001892 wd 0.0500 time 0.1982 (0.2030) data time 0.0003 (0.0017) model time 0.1979 (0.2012) loss 4.1298 (3.4335) grad_norm 1.5040 (1.2597) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 61 training takes 0:02:06 [2024-07-29 21:25:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:25:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.7637 (0.7637) Acc@1 85.645 (85.645) Acc@5 97.217 (97.217) Mem 8975MB [2024-07-29 21:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.2803 (0.9449) Acc@1 72.998 (80.877) Acc@5 91.943 (95.827) Mem 8975MB [2024-07-29 21:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 1.4131 (1.1343) Acc@1 70.459 (76.723) Acc@5 90.625 (93.576) Mem 8975MB [2024-07-29 21:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.428 Acc@5 93.494 [2024-07-29 21:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.4% [2024-07-29 21:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.43% [2024-07-29 21:25:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 21:25:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 21:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.5859 (0.5859) Acc@1 85.498 (85.498) Acc@5 97.314 (97.314) Mem 8975MB [2024-07-29 21:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0527 (0.7726) Acc@1 73.535 (80.824) Acc@5 92.920 (95.832) Mem 8975MB [2024-07-29 21:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.2588 (0.9484) Acc@1 68.750 (76.739) Acc@5 90.332 (93.610) Mem 8975MB [2024-07-29 21:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.506 Acc@5 93.598 [2024-07-29 21:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.5% [2024-07-29 21:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.51% [2024-07-29 21:25:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:25:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][0/625] eta 0:06:51 lr 0.001892 wd 0.0500 time 0.6582 (0.6582) data time 0.4708 (0.4708) model time 0.0000 (0.0000) loss 3.9437 (3.9437) grad_norm 1.0348 (1.0348) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][10/625] eta 0:02:29 lr 0.001892 wd 0.0500 time 0.1960 (0.2423) data time 0.0009 (0.0437) model time 0.0000 (0.0000) loss 3.5061 (3.3688) grad_norm 2.4561 (1.5575) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][20/625] eta 0:02:14 lr 0.001892 wd 0.0500 time 0.2187 (0.2229) data time 0.0007 (0.0233) model time 0.0000 (0.0000) loss 3.4655 (3.3101) grad_norm 1.1319 (1.4825) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][30/625] eta 0:02:08 lr 0.001892 wd 0.0500 time 0.1969 (0.2156) data time 0.0009 (0.0160) model time 0.0000 (0.0000) loss 3.8364 (3.4056) grad_norm 1.5533 (1.4468) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 21:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][40/625] eta 0:02:03 lr 0.001892 wd 0.0500 time 0.2015 (0.2116) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 3.2514 (3.3828) grad_norm 1.0717 (inf) loss_scale 8192.0000 (15584.7805) mem 8975MB [2024-07-29 21:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][50/625] eta 0:02:00 lr 0.001892 wd 0.0500 time 0.1983 (0.2092) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 4.0063 (3.4304) grad_norm 0.9884 (inf) loss_scale 8192.0000 (14135.2157) mem 8975MB [2024-07-29 21:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][60/625] eta 0:01:57 lr 0.001892 wd 0.0500 time 0.2007 (0.2078) data time 0.0009 (0.0086) model time 0.1998 (0.1997) loss 2.9060 (3.4195) grad_norm 0.8160 (inf) loss_scale 8192.0000 (13160.9180) mem 8975MB [2024-07-29 21:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][70/625] eta 0:01:54 lr 0.001892 wd 0.0500 time 0.2325 (0.2072) data time 0.0006 (0.0075) model time 0.2319 (0.2010) loss 4.3414 (3.4321) grad_norm 1.8073 (inf) loss_scale 8192.0000 (12461.0704) mem 8975MB [2024-07-29 21:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][80/625] eta 0:01:52 lr 0.001891 wd 0.0500 time 0.2001 (0.2064) data time 0.0008 (0.0067) model time 0.1993 (0.2006) loss 3.7042 (3.4309) grad_norm 1.9911 (inf) loss_scale 8192.0000 (11934.0247) mem 8975MB [2024-07-29 21:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][90/625] eta 0:01:50 lr 0.001891 wd 0.0500 time 0.2055 (0.2058) data time 0.0009 (0.0061) model time 0.2046 (0.2006) loss 3.7804 (3.4526) grad_norm 0.8915 (inf) loss_scale 8192.0000 (11522.8132) mem 8975MB [2024-07-29 21:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][100/625] eta 0:01:47 lr 0.001891 wd 0.0500 time 0.1983 (0.2052) data time 0.0009 (0.0056) model time 0.1975 (0.2002) loss 3.4059 (3.4412) grad_norm 1.0909 (inf) loss_scale 8192.0000 (11193.0297) mem 8975MB [2024-07-29 21:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][110/625] eta 0:01:45 lr 0.001891 wd 0.0500 time 0.2044 (0.2048) data time 0.0008 (0.0051) model time 0.2036 (0.2001) loss 3.8682 (3.4234) grad_norm 1.3717 (inf) loss_scale 8192.0000 (10922.6667) mem 8975MB [2024-07-29 21:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][120/625] eta 0:01:43 lr 0.001891 wd 0.0500 time 0.2001 (0.2045) data time 0.0007 (0.0048) model time 0.1994 (0.2002) loss 3.9123 (3.4228) grad_norm 0.9814 (inf) loss_scale 8192.0000 (10696.9917) mem 8975MB [2024-07-29 21:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][130/625] eta 0:01:41 lr 0.001891 wd 0.0500 time 0.2078 (0.2042) data time 0.0008 (0.0045) model time 0.2070 (0.2001) loss 3.5854 (3.4350) grad_norm 1.0789 (inf) loss_scale 8192.0000 (10505.7710) mem 8975MB [2024-07-29 21:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][140/625] eta 0:01:38 lr 0.001891 wd 0.0500 time 0.2020 (0.2040) data time 0.0010 (0.0042) model time 0.2009 (0.2001) loss 3.1794 (3.4194) grad_norm 1.1854 (inf) loss_scale 8192.0000 (10341.6738) mem 8975MB [2024-07-29 21:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][150/625] eta 0:01:36 lr 0.001891 wd 0.0500 time 0.1977 (0.2037) data time 0.0008 (0.0040) model time 0.1969 (0.1999) loss 2.4882 (3.4133) grad_norm 0.9056 (inf) loss_scale 8192.0000 (10199.3113) mem 8975MB [2024-07-29 21:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][160/625] eta 0:01:34 lr 0.001891 wd 0.0500 time 0.1994 (0.2037) data time 0.0006 (0.0038) model time 0.1988 (0.2003) loss 4.2540 (3.4235) grad_norm 1.7793 (inf) loss_scale 8192.0000 (10074.6335) mem 8975MB [2024-07-29 21:26:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][170/625] eta 0:01:32 lr 0.001891 wd 0.0500 time 0.1981 (0.2037) data time 0.0007 (0.0037) model time 0.1975 (0.2004) loss 3.6791 (3.4185) grad_norm 1.9323 (inf) loss_scale 8192.0000 (9964.5380) mem 8975MB [2024-07-29 21:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][180/625] eta 0:01:30 lr 0.001891 wd 0.0500 time 0.1987 (0.2035) data time 0.0011 (0.0035) model time 0.1976 (0.2003) loss 3.1905 (3.4000) grad_norm 0.9745 (inf) loss_scale 8192.0000 (9866.6077) mem 8975MB [2024-07-29 21:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][190/625] eta 0:01:28 lr 0.001891 wd 0.0500 time 0.1998 (0.2036) data time 0.0008 (0.0034) model time 0.1990 (0.2006) loss 3.7929 (3.4168) grad_norm 0.9712 (inf) loss_scale 8192.0000 (9778.9319) mem 8975MB [2024-07-29 21:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][200/625] eta 0:01:26 lr 0.001890 wd 0.0500 time 0.2068 (0.2036) data time 0.0007 (0.0032) model time 0.2062 (0.2007) loss 2.4232 (3.4212) grad_norm 0.8181 (inf) loss_scale 8192.0000 (9699.9801) mem 8975MB [2024-07-29 21:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][210/625] eta 0:01:24 lr 0.001890 wd 0.0500 time 0.2006 (0.2036) data time 0.0006 (0.0031) model time 0.2000 (0.2008) loss 2.2094 (3.4135) grad_norm 0.8274 (inf) loss_scale 8192.0000 (9628.5118) mem 8975MB [2024-07-29 21:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][220/625] eta 0:01:22 lr 0.001890 wd 0.0500 time 0.2007 (0.2035) data time 0.0009 (0.0030) model time 0.1998 (0.2009) loss 3.8210 (3.4030) grad_norm 0.9056 (inf) loss_scale 8192.0000 (9563.5113) mem 8975MB [2024-07-29 21:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][230/625] eta 0:01:20 lr 0.001890 wd 0.0500 time 0.2011 (0.2035) data time 0.0007 (0.0030) model time 0.2005 (0.2009) loss 4.0006 (3.4025) grad_norm 1.3463 (inf) loss_scale 8192.0000 (9504.1385) mem 8975MB [2024-07-29 21:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][240/625] eta 0:01:18 lr 0.001890 wd 0.0500 time 0.1977 (0.2034) data time 0.0009 (0.0029) model time 0.1968 (0.2009) loss 3.4340 (3.3974) grad_norm 1.2741 (inf) loss_scale 8192.0000 (9449.6929) mem 8975MB [2024-07-29 21:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][250/625] eta 0:01:16 lr 0.001890 wd 0.0500 time 0.2027 (0.2033) data time 0.0006 (0.0028) model time 0.2021 (0.2009) loss 3.2515 (3.4080) grad_norm 1.2076 (inf) loss_scale 8192.0000 (9399.5857) mem 8975MB [2024-07-29 21:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][260/625] eta 0:01:14 lr 0.001890 wd 0.0500 time 0.2138 (0.2034) data time 0.0008 (0.0027) model time 0.2130 (0.2011) loss 3.4225 (3.4003) grad_norm 1.6990 (inf) loss_scale 8192.0000 (9353.3180) mem 8975MB [2024-07-29 21:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][270/625] eta 0:01:12 lr 0.001890 wd 0.0500 time 0.2012 (0.2034) data time 0.0006 (0.0027) model time 0.2006 (0.2011) loss 3.0770 (3.4096) grad_norm 1.9945 (inf) loss_scale 8192.0000 (9310.4649) mem 8975MB [2024-07-29 21:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][280/625] eta 0:01:10 lr 0.001890 wd 0.0500 time 0.1965 (0.2033) data time 0.0009 (0.0026) model time 0.1957 (0.2011) loss 3.5831 (3.4169) grad_norm 1.0275 (inf) loss_scale 8192.0000 (9270.6619) mem 8975MB [2024-07-29 21:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][290/625] eta 0:01:08 lr 0.001890 wd 0.0500 time 0.1981 (0.2032) data time 0.0006 (0.0026) model time 0.1975 (0.2010) loss 4.1918 (3.4192) grad_norm 2.3462 (inf) loss_scale 8192.0000 (9233.5945) mem 8975MB [2024-07-29 21:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][300/625] eta 0:01:06 lr 0.001890 wd 0.0500 time 0.2013 (0.2032) data time 0.0006 (0.0025) model time 0.2007 (0.2010) loss 3.0845 (3.4164) grad_norm 1.4953 (inf) loss_scale 8192.0000 (9198.9900) mem 8975MB [2024-07-29 21:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][310/625] eta 0:01:03 lr 0.001890 wd 0.0500 time 0.1976 (0.2032) data time 0.0007 (0.0024) model time 0.1969 (0.2010) loss 4.1395 (3.4129) grad_norm 1.3334 (inf) loss_scale 8192.0000 (9166.6109) mem 8975MB [2024-07-29 21:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][320/625] eta 0:01:01 lr 0.001889 wd 0.0500 time 0.1975 (0.2032) data time 0.0009 (0.0024) model time 0.1966 (0.2012) loss 2.9249 (3.4178) grad_norm 0.9937 (inf) loss_scale 8192.0000 (9136.2492) mem 8975MB [2024-07-29 21:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][330/625] eta 0:00:59 lr 0.001889 wd 0.0500 time 0.2011 (0.2032) data time 0.0008 (0.0024) model time 0.2004 (0.2011) loss 2.6556 (3.4116) grad_norm 1.2705 (inf) loss_scale 8192.0000 (9107.7221) mem 8975MB [2024-07-29 21:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][340/625] eta 0:00:57 lr 0.001889 wd 0.0500 time 0.2020 (0.2031) data time 0.0006 (0.0023) model time 0.2014 (0.2010) loss 3.9414 (3.4194) grad_norm 0.9718 (inf) loss_scale 8192.0000 (9080.8680) mem 8975MB [2024-07-29 21:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][350/625] eta 0:00:55 lr 0.001889 wd 0.0500 time 0.1999 (0.2030) data time 0.0006 (0.0023) model time 0.1993 (0.2010) loss 3.9078 (3.4245) grad_norm 1.6536 (inf) loss_scale 8192.0000 (9055.5442) mem 8975MB [2024-07-29 21:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][360/625] eta 0:00:54 lr 0.001889 wd 0.0500 time 0.4189 (0.2041) data time 0.0008 (0.0022) model time 0.4181 (0.2023) loss 2.3268 (3.4277) grad_norm 1.3745 (inf) loss_scale 8192.0000 (9031.6233) mem 8975MB [2024-07-29 21:26:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][370/625] eta 0:00:52 lr 0.001889 wd 0.0500 time 0.2011 (0.2040) data time 0.0009 (0.0022) model time 0.2002 (0.2022) loss 3.4233 (3.4219) grad_norm 1.1726 (inf) loss_scale 8192.0000 (9008.9919) mem 8975MB [2024-07-29 21:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][380/625] eta 0:00:49 lr 0.001889 wd 0.0500 time 0.1971 (0.2039) data time 0.0009 (0.0022) model time 0.1963 (0.2021) loss 3.6798 (3.4235) grad_norm 1.2910 (inf) loss_scale 8192.0000 (8987.5486) mem 8975MB [2024-07-29 21:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][390/625] eta 0:00:47 lr 0.001889 wd 0.0500 time 0.1985 (0.2038) data time 0.0008 (0.0021) model time 0.1977 (0.2020) loss 3.4716 (3.4202) grad_norm 1.1020 (inf) loss_scale 8192.0000 (8967.2020) mem 8975MB [2024-07-29 21:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][400/625] eta 0:00:45 lr 0.001889 wd 0.0500 time 0.1998 (0.2037) data time 0.0010 (0.0021) model time 0.1989 (0.2020) loss 3.7534 (3.4255) grad_norm 0.9295 (inf) loss_scale 8192.0000 (8947.8703) mem 8975MB [2024-07-29 21:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][410/625] eta 0:00:43 lr 0.001889 wd 0.0500 time 0.1991 (0.2036) data time 0.0006 (0.0021) model time 0.1985 (0.2019) loss 3.0506 (3.4272) grad_norm 1.3471 (inf) loss_scale 8192.0000 (8929.4793) mem 8975MB [2024-07-29 21:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][420/625] eta 0:00:41 lr 0.001889 wd 0.0500 time 0.2012 (0.2038) data time 0.0006 (0.0020) model time 0.2006 (0.2021) loss 3.9777 (3.4286) grad_norm 0.8853 (inf) loss_scale 8192.0000 (8911.9620) mem 8975MB [2024-07-29 21:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][430/625] eta 0:00:39 lr 0.001889 wd 0.0500 time 0.2001 (0.2037) data time 0.0006 (0.0020) model time 0.1995 (0.2020) loss 3.6887 (3.4314) grad_norm 1.1884 (inf) loss_scale 8192.0000 (8895.2575) mem 8975MB [2024-07-29 21:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][440/625] eta 0:00:37 lr 0.001889 wd 0.0500 time 0.1977 (0.2036) data time 0.0010 (0.0020) model time 0.1967 (0.2019) loss 2.0474 (3.4306) grad_norm 0.9351 (inf) loss_scale 8192.0000 (8879.3107) mem 8975MB [2024-07-29 21:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][450/625] eta 0:00:35 lr 0.001888 wd 0.0500 time 0.2024 (0.2045) data time 0.0006 (0.0020) model time 0.2018 (0.2030) loss 3.8922 (3.4318) grad_norm 1.3158 (inf) loss_scale 8192.0000 (8864.0710) mem 8975MB [2024-07-29 21:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][460/625] eta 0:00:33 lr 0.001888 wd 0.0500 time 0.2147 (0.2051) data time 0.0006 (0.0020) model time 0.2141 (0.2036) loss 4.3549 (3.4399) grad_norm 1.2094 (inf) loss_scale 8192.0000 (8849.4924) mem 8975MB [2024-07-29 21:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][470/625] eta 0:00:31 lr 0.001888 wd 0.0500 time 0.2075 (0.2050) data time 0.0006 (0.0019) model time 0.2069 (0.2036) loss 4.2143 (3.4367) grad_norm 0.9520 (inf) loss_scale 8192.0000 (8835.5329) mem 8975MB [2024-07-29 21:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][480/625] eta 0:00:29 lr 0.001888 wd 0.0500 time 0.1959 (0.2049) data time 0.0008 (0.0019) model time 0.1951 (0.2035) loss 3.3272 (3.4392) grad_norm 1.0402 (inf) loss_scale 8192.0000 (8822.1538) mem 8975MB [2024-07-29 21:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][490/625] eta 0:00:27 lr 0.001888 wd 0.0500 time 0.1987 (0.2050) data time 0.0007 (0.0019) model time 0.1980 (0.2035) loss 4.3034 (3.4391) grad_norm 1.3217 (inf) loss_scale 8192.0000 (8809.3198) mem 8975MB [2024-07-29 21:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][500/625] eta 0:00:25 lr 0.001888 wd 0.0500 time 0.1992 (0.2050) data time 0.0008 (0.0019) model time 0.1984 (0.2036) loss 3.4169 (3.4384) grad_norm 1.1486 (inf) loss_scale 8192.0000 (8796.9980) mem 8975MB [2024-07-29 21:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][510/625] eta 0:00:23 lr 0.001888 wd 0.0500 time 0.1983 (0.2050) data time 0.0007 (0.0019) model time 0.1976 (0.2035) loss 3.3010 (3.4394) grad_norm 1.7948 (inf) loss_scale 8192.0000 (8785.1585) mem 8975MB [2024-07-29 21:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][520/625] eta 0:00:21 lr 0.001888 wd 0.0500 time 0.1976 (0.2049) data time 0.0008 (0.0018) model time 0.1969 (0.2035) loss 2.8692 (3.4409) grad_norm 1.4539 (inf) loss_scale 8192.0000 (8773.7735) mem 8975MB [2024-07-29 21:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][530/625] eta 0:00:19 lr 0.001888 wd 0.0500 time 0.1977 (0.2050) data time 0.0008 (0.0018) model time 0.1969 (0.2036) loss 3.9568 (3.4440) grad_norm 2.3474 (inf) loss_scale 8192.0000 (8762.8173) mem 8975MB [2024-07-29 21:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][540/625] eta 0:00:17 lr 0.001888 wd 0.0500 time 0.1977 (0.2049) data time 0.0007 (0.0018) model time 0.1970 (0.2035) loss 4.0975 (3.4470) grad_norm 0.8308 (inf) loss_scale 8192.0000 (8752.2662) mem 8975MB [2024-07-29 21:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][550/625] eta 0:00:15 lr 0.001888 wd 0.0500 time 0.1995 (0.2049) data time 0.0008 (0.0018) model time 0.1987 (0.2035) loss 4.5043 (3.4480) grad_norm 1.6536 (inf) loss_scale 8192.0000 (8742.0980) mem 8975MB [2024-07-29 21:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][560/625] eta 0:00:13 lr 0.001888 wd 0.0500 time 0.1965 (0.2048) data time 0.0009 (0.0018) model time 0.1956 (0.2034) loss 3.8111 (3.4490) grad_norm 1.1377 (inf) loss_scale 8192.0000 (8732.2923) mem 8975MB [2024-07-29 21:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][570/625] eta 0:00:11 lr 0.001887 wd 0.0500 time 0.2036 (0.2049) data time 0.0008 (0.0018) model time 0.2027 (0.2036) loss 4.2925 (3.4538) grad_norm 0.9614 (inf) loss_scale 8192.0000 (8722.8301) mem 8975MB [2024-07-29 21:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][580/625] eta 0:00:09 lr 0.001887 wd 0.0500 time 0.1980 (0.2048) data time 0.0006 (0.0017) model time 0.1974 (0.2035) loss 3.6649 (3.4565) grad_norm 1.1317 (inf) loss_scale 8192.0000 (8713.6936) mem 8975MB [2024-07-29 21:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][590/625] eta 0:00:07 lr 0.001887 wd 0.0500 time 0.1982 (0.2048) data time 0.0008 (0.0017) model time 0.1974 (0.2034) loss 3.7718 (3.4562) grad_norm 1.1143 (inf) loss_scale 8192.0000 (8704.8663) mem 8975MB [2024-07-29 21:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][600/625] eta 0:00:05 lr 0.001887 wd 0.0500 time 0.1992 (0.2047) data time 0.0007 (0.0017) model time 0.1985 (0.2033) loss 4.0094 (3.4592) grad_norm 1.0797 (inf) loss_scale 8192.0000 (8696.3328) mem 8975MB [2024-07-29 21:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][610/625] eta 0:00:03 lr 0.001887 wd 0.0500 time 0.1991 (0.2047) data time 0.0004 (0.0017) model time 0.1987 (0.2034) loss 4.0310 (3.4582) grad_norm 2.4048 (inf) loss_scale 8192.0000 (8688.0786) mem 8975MB [2024-07-29 21:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][620/625] eta 0:00:01 lr 0.001887 wd 0.0500 time 0.1994 (0.2047) data time 0.0004 (0.0017) model time 0.1990 (0.2033) loss 3.7387 (3.4570) grad_norm 1.3722 (inf) loss_scale 8192.0000 (8680.0902) mem 8975MB [2024-07-29 21:27:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 62 training takes 0:02:07 [2024-07-29 21:27:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:27:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.429 (0.429) Loss 0.8042 (0.8042) Acc@1 85.352 (85.352) Acc@5 97.168 (97.168) Mem 8975MB [2024-07-29 21:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 1.3262 (0.9909) Acc@1 72.314 (80.540) Acc@5 92.041 (95.854) Mem 8975MB [2024-07-29 21:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.4961 (1.1730) Acc@1 68.018 (76.335) Acc@5 90.088 (93.529) Mem 8975MB [2024-07-29 21:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.038 Acc@5 93.486 [2024-07-29 21:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.0% [2024-07-29 21:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.740 (0.740) Loss 0.5811 (0.5811) Acc@1 85.596 (85.596) Acc@5 97.314 (97.314) Mem 8975MB [2024-07-29 21:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 1.0439 (0.7652) Acc@1 73.730 (81.104) Acc@5 92.871 (95.890) Mem 8975MB [2024-07-29 21:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.094) Loss 1.2451 (0.9389) Acc@1 68.799 (76.983) Acc@5 90.576 (93.710) Mem 8975MB [2024-07-29 21:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.755 Acc@5 93.710 [2024-07-29 21:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.8% [2024-07-29 21:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.76% [2024-07-29 21:27:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:27:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][0/625] eta 0:06:55 lr 0.001887 wd 0.0500 time 0.6647 (0.6647) data time 0.4786 (0.4786) model time 0.0000 (0.0000) loss 2.6477 (2.6477) grad_norm 1.5880 (1.5880) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][10/625] eta 0:02:28 lr 0.001887 wd 0.0500 time 0.1967 (0.2415) data time 0.0007 (0.0443) model time 0.0000 (0.0000) loss 3.9621 (3.3185) grad_norm 1.2205 (1.6058) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][20/625] eta 0:02:14 lr 0.001887 wd 0.0500 time 0.1987 (0.2217) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 3.8282 (3.2975) grad_norm 0.8796 (1.5972) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][30/625] eta 0:02:07 lr 0.001887 wd 0.0500 time 0.1997 (0.2147) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 3.7421 (3.3139) grad_norm 1.0502 (1.4065) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][40/625] eta 0:02:03 lr 0.001887 wd 0.0500 time 0.1991 (0.2111) data time 0.0009 (0.0126) model time 0.0000 (0.0000) loss 2.5656 (3.3142) grad_norm 0.8495 (1.3198) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][50/625] eta 0:02:00 lr 0.001887 wd 0.0500 time 0.1976 (0.2089) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 4.0271 (3.3736) grad_norm 1.1362 (1.2838) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][60/625] eta 0:01:57 lr 0.001887 wd 0.0500 time 0.1991 (0.2075) data time 0.0008 (0.0087) model time 0.1983 (0.1999) loss 3.5715 (3.3962) grad_norm 0.8589 (1.2626) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][70/625] eta 0:01:54 lr 0.001886 wd 0.0500 time 0.1986 (0.2064) data time 0.0008 (0.0076) model time 0.1978 (0.1994) loss 2.5198 (3.3954) grad_norm 1.0385 (1.2384) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][80/625] eta 0:01:52 lr 0.001886 wd 0.0500 time 0.2007 (0.2059) data time 0.0006 (0.0068) model time 0.2001 (0.1999) loss 3.7329 (3.4067) grad_norm 1.1202 (1.2277) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][90/625] eta 0:01:49 lr 0.001886 wd 0.0500 time 0.1999 (0.2053) data time 0.0006 (0.0062) model time 0.1992 (0.1999) loss 3.9266 (3.4103) grad_norm 1.0610 (1.2079) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][100/625] eta 0:01:47 lr 0.001886 wd 0.0500 time 0.2039 (0.2050) data time 0.0006 (0.0057) model time 0.2033 (0.2001) loss 2.5737 (3.3971) grad_norm 1.6692 (1.2158) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][110/625] eta 0:01:45 lr 0.001886 wd 0.0500 time 0.2010 (0.2047) data time 0.0006 (0.0052) model time 0.2004 (0.2002) loss 2.3324 (3.3676) grad_norm 0.9847 (1.2144) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][120/625] eta 0:01:43 lr 0.001886 wd 0.0500 time 0.1985 (0.2043) data time 0.0007 (0.0049) model time 0.1977 (0.2001) loss 3.1185 (3.3397) grad_norm 1.0193 (1.2363) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][130/625] eta 0:01:40 lr 0.001886 wd 0.0500 time 0.1985 (0.2039) data time 0.0008 (0.0046) model time 0.1977 (0.1999) loss 3.6633 (3.3428) grad_norm 1.1137 (1.2337) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][140/625] eta 0:01:38 lr 0.001886 wd 0.0500 time 0.1997 (0.2037) data time 0.0007 (0.0043) model time 0.1990 (0.1998) loss 4.0561 (3.3450) grad_norm 1.9037 (1.2501) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][150/625] eta 0:01:36 lr 0.001886 wd 0.0500 time 0.1985 (0.2034) data time 0.0008 (0.0041) model time 0.1978 (0.1997) loss 4.1373 (3.3431) grad_norm 1.0684 (1.2658) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][160/625] eta 0:01:34 lr 0.001886 wd 0.0500 time 0.1973 (0.2032) data time 0.0008 (0.0039) model time 0.1965 (0.1997) loss 3.2774 (3.3360) grad_norm 1.0782 (1.2758) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][170/625] eta 0:01:32 lr 0.001886 wd 0.0500 time 0.2017 (0.2030) data time 0.0006 (0.0037) model time 0.2010 (0.1996) loss 3.5238 (3.3472) grad_norm 1.0383 (1.2760) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][180/625] eta 0:01:30 lr 0.001886 wd 0.0500 time 0.1988 (0.2028) data time 0.0007 (0.0035) model time 0.1982 (0.1995) loss 3.4171 (3.3499) grad_norm 0.7551 (1.2830) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][190/625] eta 0:01:28 lr 0.001885 wd 0.0500 time 0.2006 (0.2027) data time 0.0010 (0.0034) model time 0.1996 (0.1996) loss 3.2582 (3.3648) grad_norm 0.9525 (1.2895) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][200/625] eta 0:01:26 lr 0.001885 wd 0.0500 time 0.1993 (0.2026) data time 0.0007 (0.0033) model time 0.1986 (0.1995) loss 3.8365 (3.3700) grad_norm 1.2423 (1.2817) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 21:28:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:28:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:30:34 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 63) [2024-07-29 21:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][210/625] eta 0:09:15 lr 0.001885 wd 0.0500 time 0.2007 (1.3387) data time 0.0011 (0.1108) model time 0.1996 (1.2279) loss 3.5484 (3.9750) grad_norm 0.8136 (1.1729) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][220/625] eta 0:04:30 lr 0.001885 wd 0.0500 time 0.1971 (0.6685) data time 0.0009 (0.0462) model time 0.1962 (0.6223) loss 3.2776 (3.6366) grad_norm 1.1168 (1.1237) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][230/625] eta 0:03:15 lr 0.001885 wd 0.0500 time 0.1986 (0.4941) data time 0.0006 (0.0294) model time 0.1980 (0.4647) loss 3.7175 (3.6943) grad_norm 1.0818 (1.1348) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][240/625] eta 0:02:39 lr 0.001885 wd 0.0500 time 0.1959 (0.4139) data time 0.0008 (0.0217) model time 0.1950 (0.3923) loss 3.2216 (3.6121) grad_norm 1.5564 (1.1745) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][250/625] eta 0:02:18 lr 0.001885 wd 0.0500 time 0.1962 (0.3680) data time 0.0007 (0.0173) model time 0.1955 (0.3508) loss 3.6498 (3.6053) grad_norm 1.0606 (1.1634) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][260/625] eta 0:02:03 lr 0.001885 wd 0.0500 time 0.1978 (0.3383) data time 0.0008 (0.0144) model time 0.1970 (0.3239) loss 3.5994 (3.5855) grad_norm 0.7870 (1.1567) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][270/625] eta 0:01:52 lr 0.001885 wd 0.0500 time 0.1956 (0.3173) data time 0.0009 (0.0124) model time 0.1946 (0.3049) loss 3.5161 (3.5595) grad_norm 0.9679 (1.1568) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][280/625] eta 0:01:44 lr 0.001885 wd 0.0500 time 0.1969 (0.3018) data time 0.0009 (0.0109) model time 0.1960 (0.2909) loss 3.2845 (3.5187) grad_norm 1.1544 (1.1809) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][290/625] eta 0:01:37 lr 0.001885 wd 0.0500 time 0.1958 (0.2902) data time 0.0011 (0.0097) model time 0.1947 (0.2805) loss 3.5096 (3.4946) grad_norm 1.2350 (1.1915) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][300/625] eta 0:01:31 lr 0.001885 wd 0.0500 time 0.2025 (0.2810) data time 0.0009 (0.0088) model time 0.2016 (0.2722) loss 3.6873 (3.5000) grad_norm 1.1845 (1.1732) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][310/625] eta 0:01:26 lr 0.001884 wd 0.0500 time 0.2128 (0.2735) data time 0.0007 (0.0081) model time 0.2121 (0.2654) loss 3.5517 (3.5319) grad_norm 1.0830 (1.1773) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][320/625] eta 0:01:21 lr 0.001884 wd 0.0500 time 0.1982 (0.2671) data time 0.0009 (0.0075) model time 0.1974 (0.2596) loss 3.3982 (3.5210) grad_norm 0.9442 (1.1664) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][330/625] eta 0:01:17 lr 0.001884 wd 0.0500 time 0.1988 (0.2621) data time 0.0009 (0.0070) model time 0.1979 (0.2551) loss 3.8653 (3.5089) grad_norm 1.0195 (1.1578) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][340/625] eta 0:01:13 lr 0.001884 wd 0.0500 time 0.2089 (0.2578) data time 0.0007 (0.0066) model time 0.2083 (0.2513) loss 2.8573 (3.4998) grad_norm 0.7842 (1.1486) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][350/625] eta 0:01:09 lr 0.001884 wd 0.0500 time 0.1987 (0.2540) data time 0.0010 (0.0062) model time 0.1978 (0.2478) loss 3.4008 (3.4899) grad_norm 0.9391 (1.1417) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][360/625] eta 0:01:06 lr 0.001884 wd 0.0500 time 0.1996 (0.2507) data time 0.0009 (0.0060) model time 0.1987 (0.2447) loss 3.0773 (3.4819) grad_norm 1.5942 (1.1509) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][370/625] eta 0:01:03 lr 0.001884 wd 0.0500 time 0.1979 (0.2476) data time 0.0009 (0.0057) model time 0.1970 (0.2419) loss 3.3884 (3.4826) grad_norm 1.2145 (1.1483) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][380/625] eta 0:00:59 lr 0.001884 wd 0.0500 time 0.1982 (0.2448) data time 0.0009 (0.0054) model time 0.1973 (0.2394) loss 3.6711 (3.4708) grad_norm 0.9001 (1.1383) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][390/625] eta 0:00:56 lr 0.001884 wd 0.0500 time 0.1985 (0.2423) data time 0.0008 (0.0052) model time 0.1978 (0.2372) loss 3.5578 (3.4645) grad_norm 2.7208 (1.1582) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][400/625] eta 0:00:54 lr 0.001884 wd 0.0500 time 0.1978 (0.2401) data time 0.0007 (0.0050) model time 0.1971 (0.2352) loss 3.8038 (3.4621) grad_norm 1.0173 (1.1644) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][410/625] eta 0:00:51 lr 0.001884 wd 0.0500 time 0.2001 (0.2382) data time 0.0008 (0.0048) model time 0.1993 (0.2335) loss 3.7777 (3.4440) grad_norm 1.0242 (1.1647) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][420/625] eta 0:00:48 lr 0.001884 wd 0.0500 time 0.1960 (0.2365) data time 0.0008 (0.0046) model time 0.1953 (0.2319) loss 3.6513 (3.4451) grad_norm 1.4443 (1.1770) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][430/625] eta 0:00:45 lr 0.001883 wd 0.0500 time 0.2068 (0.2349) data time 0.0008 (0.0044) model time 0.2060 (0.2305) loss 3.3172 (3.4468) grad_norm 2.3088 (1.1839) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][440/625] eta 0:00:43 lr 0.001883 wd 0.0500 time 0.2018 (0.2334) data time 0.0008 (0.0043) model time 0.2010 (0.2291) loss 3.6898 (3.4429) grad_norm 0.8475 (1.1736) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][450/625] eta 0:00:40 lr 0.001883 wd 0.0500 time 0.1979 (0.2320) data time 0.0009 (0.0041) model time 0.1970 (0.2279) loss 3.7219 (3.4397) grad_norm 1.2780 (1.1772) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][460/625] eta 0:00:38 lr 0.001883 wd 0.0500 time 0.2044 (0.2307) data time 0.0007 (0.0040) model time 0.2036 (0.2267) loss 2.1590 (3.4207) grad_norm 1.0979 (1.1757) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][470/625] eta 0:00:35 lr 0.001883 wd 0.0500 time 0.1993 (0.2297) data time 0.0008 (0.0039) model time 0.1985 (0.2258) loss 2.3999 (3.4138) grad_norm 1.1932 (1.1795) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][480/625] eta 0:00:33 lr 0.001883 wd 0.0500 time 0.2009 (0.2287) data time 0.0008 (0.0038) model time 0.2000 (0.2249) loss 3.3759 (3.4237) grad_norm 1.0458 (1.1810) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][490/625] eta 0:00:30 lr 0.001883 wd 0.0500 time 0.2002 (0.2278) data time 0.0006 (0.0037) model time 0.1996 (0.2241) loss 4.0508 (3.4202) grad_norm 0.7882 (1.1780) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][500/625] eta 0:00:28 lr 0.001883 wd 0.0500 time 0.1993 (0.2270) data time 0.0009 (0.0036) model time 0.1983 (0.2234) loss 2.4398 (3.4075) grad_norm 0.9447 (1.1870) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][510/625] eta 0:00:26 lr 0.001883 wd 0.0500 time 0.1977 (0.2261) data time 0.0008 (0.0036) model time 0.1970 (0.2226) loss 3.9768 (3.4037) grad_norm 1.7333 (1.1888) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][520/625] eta 0:00:23 lr 0.001883 wd 0.0500 time 0.2007 (0.2253) data time 0.0009 (0.0035) model time 0.1999 (0.2218) loss 4.1860 (3.4141) grad_norm 2.5803 (1.1959) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][530/625] eta 0:00:21 lr 0.001883 wd 0.0500 time 0.2050 (0.2246) data time 0.0006 (0.0034) model time 0.2044 (0.2212) loss 2.8102 (3.4187) grad_norm 1.1408 (1.2007) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][540/625] eta 0:00:19 lr 0.001883 wd 0.0500 time 0.1959 (0.2238) data time 0.0009 (0.0033) model time 0.1950 (0.2205) loss 2.8429 (3.4165) grad_norm 1.4497 (1.2004) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][550/625] eta 0:00:16 lr 0.001882 wd 0.0500 time 0.1998 (0.2232) data time 0.0007 (0.0033) model time 0.1990 (0.2199) loss 2.9892 (3.4169) grad_norm 1.0347 (1.1971) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][560/625] eta 0:00:14 lr 0.001882 wd 0.0500 time 0.2043 (0.2225) data time 0.0009 (0.0032) model time 0.2033 (0.2193) loss 2.8495 (3.4143) grad_norm 1.5390 (1.2024) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][570/625] eta 0:00:12 lr 0.001882 wd 0.0500 time 0.2097 (0.2220) data time 0.0006 (0.0031) model time 0.2091 (0.2188) loss 2.9815 (3.4142) grad_norm 1.2110 (1.2087) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][580/625] eta 0:00:09 lr 0.001882 wd 0.0500 time 0.1975 (0.2215) data time 0.0009 (0.0031) model time 0.1966 (0.2184) loss 3.1100 (3.4082) grad_norm 1.4472 (1.2107) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][590/625] eta 0:00:07 lr 0.001882 wd 0.0500 time 0.1876 (0.2215) data time 0.0006 (0.0030) model time 0.1869 (0.2185) loss 3.8823 (3.4043) grad_norm 0.9891 (1.2091) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][600/625] eta 0:00:05 lr 0.001882 wd 0.0500 time 0.2024 (0.2211) data time 0.0006 (0.0030) model time 0.2018 (0.2181) loss 3.2354 (3.4069) grad_norm 1.6563 (1.2045) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][610/625] eta 0:00:03 lr 0.001882 wd 0.0500 time 0.1981 (0.2206) data time 0.0007 (0.0029) model time 0.1974 (0.2176) loss 3.2133 (3.4101) grad_norm 1.2435 (1.2078) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][620/625] eta 0:00:01 lr 0.001882 wd 0.0500 time 0.1988 (0.2201) data time 0.0004 (0.0029) model time 0.1984 (0.2172) loss 2.5503 (3.4130) grad_norm 1.3369 (1.2085) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 21:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 63 training takes 0:01:32 [2024-07-29 21:32:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:32:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.390 (0.390) Loss 0.8120 (0.8120) Acc@1 84.131 (84.131) Acc@5 97.168 (97.168) Mem 8977MB [2024-07-29 21:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.2314 (0.9721) Acc@1 74.365 (80.637) Acc@5 92.139 (95.845) Mem 8977MB [2024-07-29 21:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.4355 (1.1483) Acc@1 67.480 (76.423) Acc@5 90.723 (93.601) Mem 8977MB [2024-07-29 21:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.286 Acc@5 93.612 [2024-07-29 21:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.3% [2024-07-29 21:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.753 (0.753) Loss 0.5762 (0.5762) Acc@1 85.742 (85.742) Acc@5 97.314 (97.314) Mem 8977MB [2024-07-29 21:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 1.0332 (0.7587) Acc@1 73.926 (81.259) Acc@5 92.822 (95.930) Mem 8977MB [2024-07-29 21:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.2295 (0.9304) Acc@1 69.043 (77.218) Acc@5 90.967 (93.824) Mem 8977MB [2024-07-29 21:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.997 Acc@5 93.826 [2024-07-29 21:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.0% [2024-07-29 21:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.00% [2024-07-29 21:32:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:32:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:32:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][0/625] eta 0:07:40 lr 0.001882 wd 0.0500 time 0.7365 (0.7365) data time 0.3870 (0.3870) model time 0.0000 (0.0000) loss 3.8535 (3.8535) grad_norm 0.9339 (0.9339) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 21:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][10/625] eta 0:02:32 lr 0.001882 wd 0.0500 time 0.1999 (0.2483) data time 0.0008 (0.0360) model time 0.0000 (0.0000) loss 3.4122 (3.7531) grad_norm 0.9188 (1.1772) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][20/625] eta 0:02:16 lr 0.001882 wd 0.0500 time 0.1974 (0.2251) data time 0.0007 (0.0193) model time 0.0000 (0.0000) loss 3.3958 (3.6735) grad_norm 1.2259 (1.2809) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][30/625] eta 0:02:09 lr 0.001882 wd 0.0500 time 0.1997 (0.2172) data time 0.0010 (0.0134) model time 0.0000 (0.0000) loss 2.5837 (3.5669) grad_norm 1.1177 (1.2734) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][40/625] eta 0:02:04 lr 0.001881 wd 0.0500 time 0.2011 (0.2130) data time 0.0006 (0.0103) model time 0.0000 (0.0000) loss 2.7630 (3.5143) grad_norm 1.0561 (1.3135) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][50/625] eta 0:02:01 lr 0.001881 wd 0.0500 time 0.2071 (0.2110) data time 0.0008 (0.0085) model time 0.0000 (0.0000) loss 3.3536 (3.4213) grad_norm 1.3819 (1.2946) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][60/625] eta 0:01:58 lr 0.001881 wd 0.0500 time 0.1976 (0.2099) data time 0.0007 (0.0072) model time 0.1969 (0.2038) loss 3.9575 (3.4061) grad_norm 0.9042 (1.2623) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][70/625] eta 0:01:55 lr 0.001881 wd 0.0500 time 0.1972 (0.2086) data time 0.0010 (0.0064) model time 0.1962 (0.2015) loss 4.2152 (3.4462) grad_norm 0.8081 (1.2601) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][80/625] eta 0:01:53 lr 0.001881 wd 0.0500 time 0.2002 (0.2075) data time 0.0008 (0.0057) model time 0.1994 (0.2006) loss 3.7758 (3.4407) grad_norm 0.9957 (1.2568) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][90/625] eta 0:01:50 lr 0.001881 wd 0.0500 time 0.2027 (0.2066) data time 0.0007 (0.0052) model time 0.2020 (0.2001) loss 3.7425 (3.4679) grad_norm 1.3712 (1.2461) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][100/625] eta 0:01:48 lr 0.001881 wd 0.0500 time 0.2004 (0.2060) data time 0.0006 (0.0047) model time 0.1998 (0.2001) loss 2.6356 (3.4487) grad_norm 1.2204 (1.2712) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][110/625] eta 0:01:45 lr 0.001881 wd 0.0500 time 0.2123 (0.2057) data time 0.0009 (0.0044) model time 0.2114 (0.2003) loss 3.3877 (3.4315) grad_norm 1.3451 (1.2576) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][120/625] eta 0:01:43 lr 0.001881 wd 0.0500 time 0.1983 (0.2051) data time 0.0007 (0.0041) model time 0.1977 (0.2000) loss 2.5933 (3.4206) grad_norm 1.2279 (1.2546) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][130/625] eta 0:01:41 lr 0.001881 wd 0.0500 time 0.2019 (0.2050) data time 0.0007 (0.0039) model time 0.2012 (0.2003) loss 3.8537 (3.4362) grad_norm 1.0180 (1.2513) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][140/625] eta 0:01:39 lr 0.001881 wd 0.0500 time 0.2018 (0.2048) data time 0.0007 (0.0037) model time 0.2011 (0.2003) loss 3.7427 (3.4451) grad_norm 1.4497 (1.2670) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][150/625] eta 0:01:37 lr 0.001881 wd 0.0500 time 0.1968 (0.2046) data time 0.0009 (0.0035) model time 0.1959 (0.2003) loss 2.8470 (3.4437) grad_norm 2.4080 (1.2891) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][160/625] eta 0:01:35 lr 0.001880 wd 0.0500 time 0.2022 (0.2059) data time 0.0008 (0.0033) model time 0.2014 (0.2027) loss 3.7448 (3.4518) grad_norm 1.0174 (1.3083) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][170/625] eta 0:01:33 lr 0.001880 wd 0.0500 time 0.2014 (0.2056) data time 0.0008 (0.0032) model time 0.2006 (0.2023) loss 3.8076 (3.4606) grad_norm 0.7635 (1.3049) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][180/625] eta 0:01:31 lr 0.001880 wd 0.0500 time 0.1986 (0.2052) data time 0.0007 (0.0031) model time 0.1979 (0.2020) loss 3.5968 (3.4528) grad_norm 0.8791 (1.2893) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][190/625] eta 0:01:29 lr 0.001880 wd 0.0500 time 0.1987 (0.2050) data time 0.0009 (0.0030) model time 0.1978 (0.2018) loss 2.7499 (3.4489) grad_norm 1.0480 (1.2752) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][200/625] eta 0:01:27 lr 0.001880 wd 0.0500 time 0.1995 (0.2047) data time 0.0009 (0.0029) model time 0.1986 (0.2016) loss 3.4234 (3.4544) grad_norm 0.9114 (1.2747) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][210/625] eta 0:01:24 lr 0.001880 wd 0.0500 time 0.2020 (0.2046) data time 0.0009 (0.0028) model time 0.2011 (0.2016) loss 3.6183 (3.4613) grad_norm 0.9169 (1.2692) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][220/625] eta 0:01:22 lr 0.001880 wd 0.0500 time 0.1981 (0.2044) data time 0.0008 (0.0027) model time 0.1973 (0.2015) loss 2.3499 (3.4476) grad_norm 0.8672 (1.2619) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][230/625] eta 0:01:20 lr 0.001880 wd 0.0500 time 0.2005 (0.2043) data time 0.0008 (0.0026) model time 0.1998 (0.2014) loss 2.6510 (3.4452) grad_norm 0.9792 (1.2575) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][240/625] eta 0:01:18 lr 0.001880 wd 0.0500 time 0.1987 (0.2042) data time 0.0009 (0.0026) model time 0.1979 (0.2014) loss 4.2541 (3.4366) grad_norm 2.1333 (1.2861) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][250/625] eta 0:01:16 lr 0.001880 wd 0.0500 time 0.1993 (0.2040) data time 0.0009 (0.0025) model time 0.1984 (0.2013) loss 2.9679 (3.4447) grad_norm 1.2150 (1.2911) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][260/625] eta 0:01:14 lr 0.001880 wd 0.0500 time 0.1994 (0.2039) data time 0.0008 (0.0024) model time 0.1986 (0.2012) loss 3.2911 (3.4455) grad_norm 1.9185 (1.2834) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][270/625] eta 0:01:12 lr 0.001880 wd 0.0500 time 0.2018 (0.2037) data time 0.0008 (0.0024) model time 0.2010 (0.2011) loss 3.8181 (3.4412) grad_norm 1.0772 (1.2835) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][280/625] eta 0:01:10 lr 0.001879 wd 0.0500 time 0.2045 (0.2037) data time 0.0006 (0.0023) model time 0.2039 (0.2011) loss 2.7129 (3.4323) grad_norm 1.1411 (1.2753) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][290/625] eta 0:01:08 lr 0.001879 wd 0.0500 time 0.2003 (0.2036) data time 0.0007 (0.0023) model time 0.1996 (0.2011) loss 2.4926 (3.4263) grad_norm 1.4282 (1.2683) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][300/625] eta 0:01:06 lr 0.001879 wd 0.0500 time 0.1985 (0.2036) data time 0.0007 (0.0022) model time 0.1978 (0.2011) loss 3.3140 (3.4237) grad_norm 0.9991 (1.2693) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][310/625] eta 0:01:04 lr 0.001879 wd 0.0500 time 0.2060 (0.2035) data time 0.0007 (0.0022) model time 0.2053 (0.2011) loss 3.6800 (3.4271) grad_norm 0.9272 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][320/625] eta 0:01:02 lr 0.001879 wd 0.0500 time 0.2092 (0.2035) data time 0.0008 (0.0021) model time 0.2083 (0.2011) loss 2.5653 (3.4327) grad_norm 1.4336 (1.2632) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][330/625] eta 0:01:00 lr 0.001879 wd 0.0500 time 0.1967 (0.2035) data time 0.0007 (0.0021) model time 0.1960 (0.2011) loss 4.2693 (3.4371) grad_norm 0.9384 (1.2572) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][340/625] eta 0:00:57 lr 0.001879 wd 0.0500 time 0.2053 (0.2034) data time 0.0007 (0.0021) model time 0.2046 (0.2012) loss 4.2632 (3.4380) grad_norm 1.2280 (1.2633) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][350/625] eta 0:00:55 lr 0.001879 wd 0.0500 time 0.2010 (0.2035) data time 0.0007 (0.0020) model time 0.2003 (0.2012) loss 3.4582 (3.4423) grad_norm 1.1074 (1.2738) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][360/625] eta 0:00:53 lr 0.001879 wd 0.0500 time 0.2060 (0.2035) data time 0.0007 (0.0020) model time 0.2053 (0.2013) loss 3.3033 (3.4427) grad_norm 1.3335 (1.2740) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][370/625] eta 0:00:51 lr 0.001879 wd 0.0500 time 0.2063 (0.2037) data time 0.0009 (0.0020) model time 0.2054 (0.2016) loss 3.9671 (3.4407) grad_norm 1.2354 (1.2691) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][380/625] eta 0:00:49 lr 0.001879 wd 0.0500 time 0.2119 (0.2037) data time 0.0008 (0.0020) model time 0.2111 (0.2017) loss 2.3343 (3.4373) grad_norm 1.5264 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][390/625] eta 0:00:47 lr 0.001878 wd 0.0500 time 0.1990 (0.2037) data time 0.0008 (0.0019) model time 0.1982 (0.2016) loss 3.8740 (3.4246) grad_norm 1.9164 (1.2646) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][400/625] eta 0:00:45 lr 0.001878 wd 0.0500 time 0.2006 (0.2036) data time 0.0008 (0.0019) model time 0.1998 (0.2015) loss 2.7863 (3.4228) grad_norm 1.5257 (1.2673) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][410/625] eta 0:00:43 lr 0.001878 wd 0.0500 time 0.1969 (0.2035) data time 0.0008 (0.0019) model time 0.1961 (0.2015) loss 4.0344 (3.4170) grad_norm 1.1066 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][420/625] eta 0:00:41 lr 0.001878 wd 0.0500 time 0.1996 (0.2034) data time 0.0009 (0.0019) model time 0.1987 (0.2014) loss 2.6086 (3.4163) grad_norm 0.9719 (1.2594) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][430/625] eta 0:00:39 lr 0.001878 wd 0.0500 time 0.2010 (0.2033) data time 0.0006 (0.0018) model time 0.2004 (0.2014) loss 3.2955 (3.4135) grad_norm 1.1346 (1.2612) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][440/625] eta 0:00:37 lr 0.001878 wd 0.0500 time 0.2001 (0.2032) data time 0.0008 (0.0018) model time 0.1993 (0.2013) loss 3.0422 (3.4140) grad_norm 1.8307 (1.2712) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][450/625] eta 0:00:35 lr 0.001878 wd 0.0500 time 0.2024 (0.2032) data time 0.0007 (0.0018) model time 0.2017 (0.2012) loss 2.2458 (3.4107) grad_norm 0.9478 (1.2690) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][460/625] eta 0:00:33 lr 0.001878 wd 0.0500 time 0.1992 (0.2031) data time 0.0007 (0.0018) model time 0.1985 (0.2012) loss 2.4301 (3.4079) grad_norm 0.8535 (1.2632) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][470/625] eta 0:00:31 lr 0.001878 wd 0.0500 time 0.1991 (0.2030) data time 0.0006 (0.0018) model time 0.1985 (0.2011) loss 3.9751 (3.4097) grad_norm 1.7967 (1.2610) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][480/625] eta 0:00:29 lr 0.001878 wd 0.0500 time 0.1993 (0.2030) data time 0.0006 (0.0017) model time 0.1987 (0.2011) loss 3.8380 (3.4089) grad_norm 1.1769 (1.2665) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][490/625] eta 0:00:27 lr 0.001878 wd 0.0500 time 0.1998 (0.2029) data time 0.0007 (0.0017) model time 0.1990 (0.2011) loss 2.4193 (3.4058) grad_norm 0.9927 (1.2646) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][500/625] eta 0:00:25 lr 0.001878 wd 0.0500 time 0.1995 (0.2029) data time 0.0009 (0.0017) model time 0.1986 (0.2010) loss 4.1622 (3.4112) grad_norm 0.8116 (1.2630) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][510/625] eta 0:00:23 lr 0.001877 wd 0.0500 time 0.1992 (0.2028) data time 0.0008 (0.0017) model time 0.1984 (0.2010) loss 4.1648 (3.4169) grad_norm 0.9880 (1.2610) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][520/625] eta 0:00:21 lr 0.001877 wd 0.0500 time 0.1966 (0.2028) data time 0.0009 (0.0017) model time 0.1957 (0.2010) loss 2.1854 (3.4095) grad_norm 1.8017 (1.2629) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][530/625] eta 0:00:19 lr 0.001877 wd 0.0500 time 0.1981 (0.2028) data time 0.0009 (0.0017) model time 0.1972 (0.2010) loss 3.4713 (3.4029) grad_norm 1.4303 (1.2665) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][540/625] eta 0:00:17 lr 0.001877 wd 0.0500 time 0.1958 (0.2027) data time 0.0008 (0.0016) model time 0.1950 (0.2009) loss 3.3778 (3.4028) grad_norm 0.7595 (1.2659) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][550/625] eta 0:00:15 lr 0.001877 wd 0.0500 time 0.1947 (0.2027) data time 0.0008 (0.0016) model time 0.1939 (0.2009) loss 4.3148 (3.4080) grad_norm 1.2167 (1.2611) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][560/625] eta 0:00:13 lr 0.001877 wd 0.0500 time 0.2004 (0.2026) data time 0.0008 (0.0016) model time 0.1996 (0.2009) loss 4.2619 (3.4093) grad_norm 0.8878 (1.2626) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][570/625] eta 0:00:11 lr 0.001877 wd 0.0500 time 0.1998 (0.2026) data time 0.0009 (0.0016) model time 0.1989 (0.2009) loss 3.9827 (3.4122) grad_norm 0.8772 (1.2590) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][580/625] eta 0:00:09 lr 0.001877 wd 0.0500 time 0.1968 (0.2026) data time 0.0008 (0.0016) model time 0.1960 (0.2008) loss 3.5962 (3.4115) grad_norm 1.3325 (1.2597) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][590/625] eta 0:00:07 lr 0.001877 wd 0.0500 time 0.1952 (0.2025) data time 0.0007 (0.0016) model time 0.1945 (0.2008) loss 3.4604 (3.4153) grad_norm 1.1337 (1.2637) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][600/625] eta 0:00:05 lr 0.001877 wd 0.0500 time 0.1976 (0.2025) data time 0.0009 (0.0016) model time 0.1967 (0.2008) loss 3.6663 (3.4143) grad_norm 1.6278 (1.2650) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][610/625] eta 0:00:03 lr 0.001877 wd 0.0500 time 0.1993 (0.2024) data time 0.0004 (0.0016) model time 0.1989 (0.2007) loss 3.4600 (3.4085) grad_norm 1.1174 (1.2657) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][620/625] eta 0:00:01 lr 0.001877 wd 0.0500 time 0.1979 (0.2024) data time 0.0004 (0.0016) model time 0.1975 (0.2007) loss 3.5265 (3.4130) grad_norm 0.8667 (1.2670) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 64 training takes 0:02:06 [2024-07-29 21:34:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:34:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.7256 (0.7256) Acc@1 85.742 (85.742) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 21:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.1982 (0.8986) Acc@1 72.852 (80.753) Acc@5 91.699 (95.810) Mem 8975MB [2024-07-29 21:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.3418 (1.0771) Acc@1 69.678 (76.349) Acc@5 89.795 (93.517) Mem 8975MB [2024-07-29 21:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.152 Acc@5 93.552 [2024-07-29 21:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.2% [2024-07-29 21:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.798 (0.798) Loss 0.5728 (0.5728) Acc@1 85.986 (85.986) Acc@5 97.510 (97.510) Mem 8975MB [2024-07-29 21:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 1.0244 (0.7536) Acc@1 74.268 (81.507) Acc@5 92.871 (95.978) Mem 8975MB [2024-07-29 21:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.2168 (0.9229) Acc@1 69.336 (77.481) Acc@5 91.162 (93.908) Mem 8975MB [2024-07-29 21:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.229 Acc@5 93.908 [2024-07-29 21:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.2% [2024-07-29 21:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.23% [2024-07-29 21:34:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:34:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][0/625] eta 0:06:14 lr 0.001876 wd 0.0500 time 0.5992 (0.5992) data time 0.4049 (0.4049) model time 0.0000 (0.0000) loss 3.1750 (3.1750) grad_norm 1.0587 (1.0587) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][10/625] eta 0:02:25 lr 0.001876 wd 0.0500 time 0.1999 (0.2360) data time 0.0009 (0.0377) model time 0.0000 (0.0000) loss 3.6363 (3.4319) grad_norm 0.8852 (1.0517) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][20/625] eta 0:02:13 lr 0.001876 wd 0.0500 time 0.1998 (0.2209) data time 0.0009 (0.0202) model time 0.0000 (0.0000) loss 2.6663 (3.4249) grad_norm 0.8410 (1.1543) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][30/625] eta 0:02:07 lr 0.001876 wd 0.0500 time 0.2005 (0.2148) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 3.6846 (3.3464) grad_norm 0.9483 (1.1430) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][40/625] eta 0:02:03 lr 0.001876 wd 0.0500 time 0.1982 (0.2109) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 3.7800 (3.3628) grad_norm 2.2752 (1.1771) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][50/625] eta 0:02:02 lr 0.001876 wd 0.0500 time 0.1981 (0.2128) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 4.0362 (3.3712) grad_norm 1.0922 (1.1714) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][60/625] eta 0:01:59 lr 0.001876 wd 0.0500 time 0.1985 (0.2106) data time 0.0007 (0.0076) model time 0.1978 (0.1986) loss 3.1653 (3.3531) grad_norm 1.6920 (1.1764) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][70/625] eta 0:01:56 lr 0.001876 wd 0.0500 time 0.2019 (0.2092) data time 0.0008 (0.0067) model time 0.2011 (0.1990) loss 3.7946 (3.3785) grad_norm 1.2059 (1.1791) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:35:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][80/625] eta 0:01:53 lr 0.001876 wd 0.0500 time 0.1968 (0.2082) data time 0.0008 (0.0060) model time 0.1959 (0.1994) loss 3.6498 (3.3806) grad_norm 1.1969 (1.1780) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][90/625] eta 0:01:50 lr 0.001876 wd 0.0500 time 0.1984 (0.2073) data time 0.0010 (0.0054) model time 0.1974 (0.1993) loss 3.0231 (3.3800) grad_norm 1.0291 (1.2061) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 21:35:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:35:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:36:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:37:06 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 65) [2024-07-29 21:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][100/625] eta 0:35:48 lr 0.001876 wd 0.0500 time 0.8063 (4.0928) data time 0.0009 (0.4186) model time 0.8054 (3.6742) loss 4.2387 (4.2643) grad_norm 1.0046 (1.4934) loss_scale 8192.0000 (8192.0000) mem 8973MB [2024-07-29 21:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][110/625] eta 0:07:21 lr 0.001876 wd 0.0500 time 0.2103 (0.8579) data time 0.0007 (0.0708) model time 0.2096 (0.7872) loss 2.7444 (3.7225) grad_norm 1.3713 (1.3421) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][120/625] eta 0:04:45 lr 0.001875 wd 0.0500 time 0.2056 (0.5650) data time 0.0010 (0.0391) model time 0.2046 (0.5259) loss 3.3489 (3.6856) grad_norm 1.4699 (1.2655) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][130/625] eta 0:03:45 lr 0.001875 wd 0.0500 time 0.2044 (0.4546) data time 0.0009 (0.0272) model time 0.2036 (0.4274) loss 3.6805 (3.6634) grad_norm 1.6747 (1.2427) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][140/625] eta 0:03:12 lr 0.001875 wd 0.0500 time 0.2095 (0.3964) data time 0.0011 (0.0209) model time 0.2084 (0.3754) loss 3.3481 (3.5925) grad_norm 1.8071 (1.2621) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][150/625] eta 0:02:51 lr 0.001875 wd 0.0500 time 0.2098 (0.3608) data time 0.0009 (0.0171) model time 0.2089 (0.3437) loss 3.3524 (3.5637) grad_norm 1.8207 (1.2633) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][160/625] eta 0:02:36 lr 0.001875 wd 0.0500 time 0.2148 (0.3369) data time 0.0007 (0.0145) model time 0.2141 (0.3224) loss 4.1128 (3.5501) grad_norm 1.2778 (1.2631) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][170/625] eta 0:02:25 lr 0.001875 wd 0.0500 time 0.2139 (0.3199) data time 0.0010 (0.0127) model time 0.2128 (0.3073) loss 3.3354 (3.5018) grad_norm 1.5459 (1.2932) loss_scale 16384.0000 (9216.0000) mem 8975MB [2024-07-29 21:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][180/625] eta 0:02:16 lr 0.001875 wd 0.0500 time 0.2073 (0.3070) data time 0.0010 (0.0112) model time 0.2062 (0.2958) loss 3.2568 (3.4790) grad_norm 1.4463 (1.3052) loss_scale 16384.0000 (10090.1463) mem 8975MB [2024-07-29 21:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][190/625] eta 0:02:09 lr 0.001875 wd 0.0500 time 0.2058 (0.2966) data time 0.0009 (0.0101) model time 0.2049 (0.2864) loss 2.4104 (3.4613) grad_norm 1.4282 (1.3202) loss_scale 16384.0000 (10774.2609) mem 8975MB [2024-07-29 21:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][200/625] eta 0:02:02 lr 0.001875 wd 0.0500 time 0.2118 (0.2881) data time 0.0009 (0.0092) model time 0.2109 (0.2788) loss 4.4878 (3.4999) grad_norm 0.8917 (1.3020) loss_scale 16384.0000 (11324.2353) mem 8975MB [2024-07-29 21:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][210/625] eta 0:01:56 lr 0.001875 wd 0.0500 time 0.2066 (0.2813) data time 0.0011 (0.0085) model time 0.2054 (0.2728) loss 3.6496 (3.5048) grad_norm 1.4153 (1.2954) loss_scale 16384.0000 (11776.0000) mem 8975MB [2024-07-29 21:37:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][220/625] eta 0:01:51 lr 0.001875 wd 0.0500 time 0.2108 (0.2754) data time 0.0008 (0.0079) model time 0.2100 (0.2675) loss 3.9341 (3.5001) grad_norm 1.2936 (1.2903) loss_scale 16384.0000 (12153.7049) mem 8975MB [2024-07-29 21:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 21:37:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:37:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:40:10 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 65) [2024-07-29 21:40:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:43:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 21:43:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 21:43:10 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 21:43:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 21:43:32 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 21:43:32 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 21:43:32 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 21:43:33 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 65) [2024-07-29 21:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 21:43:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][230/625] eta 0:15:13 lr 0.001874 wd 0.0500 time 0.2097 (2.3137) data time 0.0007 (0.1719) model time 0.2089 (2.1418) loss 4.3822 (3.8591) grad_norm 1.2232 (1.5428) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][240/625] eta 0:05:12 lr 0.001874 wd 0.0500 time 0.2036 (0.8115) data time 0.0007 (0.0499) model time 0.2029 (0.7617) loss 4.0664 (3.6330) grad_norm 0.8753 (1.1783) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][250/625] eta 0:03:31 lr 0.001874 wd 0.0500 time 0.2115 (0.5629) data time 0.0010 (0.0295) model time 0.2105 (0.5334) loss 3.4266 (3.6828) grad_norm 1.1288 (1.1628) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][260/625] eta 0:02:48 lr 0.001874 wd 0.0500 time 0.2115 (0.4603) data time 0.0007 (0.0211) model time 0.2108 (0.4392) loss 2.6995 (3.6366) grad_norm 2.2548 (1.1867) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][270/625] eta 0:02:23 lr 0.001874 wd 0.0500 time 0.2051 (0.4034) data time 0.0008 (0.0166) model time 0.2043 (0.3868) loss 3.2078 (3.5909) grad_norm 1.2014 (1.1803) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][280/625] eta 0:02:06 lr 0.001874 wd 0.0500 time 0.2086 (0.3676) data time 0.0008 (0.0137) model time 0.2079 (0.3540) loss 3.8406 (3.5708) grad_norm 0.9740 (1.2012) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:43:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][290/625] eta 0:01:55 lr 0.001874 wd 0.0500 time 0.2121 (0.3433) data time 0.0008 (0.0117) model time 0.2114 (0.3316) loss 3.9998 (3.5365) grad_norm 1.3701 (1.2462) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][300/625] eta 0:01:45 lr 0.001874 wd 0.0500 time 0.2052 (0.3258) data time 0.0007 (0.0102) model time 0.2045 (0.3155) loss 3.4720 (3.5018) grad_norm 0.8652 (1.2469) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][310/625] eta 0:01:38 lr 0.001874 wd 0.0500 time 0.2131 (0.3123) data time 0.0010 (0.0093) model time 0.2120 (0.3030) loss 3.8204 (3.4613) grad_norm 1.0910 (1.2342) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][320/625] eta 0:01:32 lr 0.001874 wd 0.0500 time 0.2080 (0.3018) data time 0.0009 (0.0084) model time 0.2071 (0.2935) loss 3.5283 (3.4549) grad_norm 1.8642 (1.2708) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][330/625] eta 0:01:26 lr 0.001874 wd 0.0500 time 0.2132 (0.2935) data time 0.0009 (0.0077) model time 0.2124 (0.2858) loss 3.5742 (3.4874) grad_norm 0.9479 (1.2702) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][340/625] eta 0:01:21 lr 0.001874 wd 0.0500 time 0.2116 (0.2864) data time 0.0008 (0.0071) model time 0.2107 (0.2793) loss 3.4999 (3.4849) grad_norm 1.7941 (1.2768) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][350/625] eta 0:01:17 lr 0.001873 wd 0.0500 time 0.2073 (0.2804) data time 0.0010 (0.0066) model time 0.2063 (0.2738) loss 3.2672 (3.4869) grad_norm 1.0244 (1.2858) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][360/625] eta 0:01:13 lr 0.001873 wd 0.0500 time 0.2133 (0.2758) data time 0.0009 (0.0062) model time 0.2124 (0.2696) loss 3.9252 (3.4885) grad_norm 0.8162 (1.2714) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][370/625] eta 0:01:09 lr 0.001873 wd 0.0500 time 0.2122 (0.2715) data time 0.0008 (0.0059) model time 0.2114 (0.2656) loss 2.9802 (3.4684) grad_norm 0.9544 (1.2542) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][380/625] eta 0:01:05 lr 0.001873 wd 0.0500 time 0.2123 (0.2676) data time 0.0009 (0.0056) model time 0.2114 (0.2621) loss 3.4918 (3.4607) grad_norm 0.9582 (1.2493) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][390/625] eta 0:01:02 lr 0.001873 wd 0.0500 time 0.2163 (0.2645) data time 0.0009 (0.0053) model time 0.2155 (0.2592) loss 3.2294 (3.4589) grad_norm 1.2193 (1.2718) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][400/625] eta 0:00:58 lr 0.001873 wd 0.0500 time 0.2138 (0.2616) data time 0.0010 (0.0051) model time 0.2128 (0.2565) loss 1.9886 (3.4493) grad_norm 1.7633 (1.3024) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][410/625] eta 0:00:55 lr 0.001873 wd 0.0500 time 0.2146 (0.2590) data time 0.0008 (0.0048) model time 0.2138 (0.2541) loss 2.7798 (3.4483) grad_norm 1.7194 (1.3175) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][420/625] eta 0:00:52 lr 0.001873 wd 0.0500 time 0.2138 (0.2566) data time 0.0007 (0.0046) model time 0.2131 (0.2520) loss 3.3778 (3.4493) grad_norm 1.3240 (1.3134) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][430/625] eta 0:00:49 lr 0.001873 wd 0.0500 time 0.2120 (0.2547) data time 0.0011 (0.0045) model time 0.2109 (0.2502) loss 3.7456 (3.4355) grad_norm 1.5910 (1.3150) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][440/625] eta 0:00:46 lr 0.001873 wd 0.0500 time 0.2071 (0.2527) data time 0.0007 (0.0043) model time 0.2064 (0.2484) loss 3.5531 (3.4258) grad_norm 1.0271 (1.3080) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][450/625] eta 0:00:43 lr 0.001873 wd 0.0500 time 0.2136 (0.2508) data time 0.0009 (0.0041) model time 0.2127 (0.2467) loss 4.1026 (3.4241) grad_norm 1.2628 (1.3006) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][460/625] eta 0:00:41 lr 0.001872 wd 0.0500 time 0.2131 (0.2495) data time 0.0007 (0.0043) model time 0.2124 (0.2452) loss 2.9068 (3.4178) grad_norm 0.7586 (1.2907) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][470/625] eta 0:00:38 lr 0.001872 wd 0.0500 time 0.2120 (0.2480) data time 0.0009 (0.0041) model time 0.2111 (0.2439) loss 2.6004 (3.4213) grad_norm 1.3242 (1.2848) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][480/625] eta 0:00:35 lr 0.001872 wd 0.0500 time 0.2175 (0.2467) data time 0.0007 (0.0040) model time 0.2168 (0.2427) loss 2.2517 (3.4121) grad_norm 0.9699 (1.2758) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][490/625] eta 0:00:33 lr 0.001872 wd 0.0500 time 0.2161 (0.2454) data time 0.0007 (0.0039) model time 0.2154 (0.2415) loss 3.9283 (3.4044) grad_norm 1.0729 (1.2694) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][500/625] eta 0:00:30 lr 0.001872 wd 0.0500 time 0.2106 (0.2443) data time 0.0010 (0.0038) model time 0.2097 (0.2405) loss 3.2414 (3.4013) grad_norm 1.7072 (1.2681) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][510/625] eta 0:00:27 lr 0.001872 wd 0.0500 time 0.2081 (0.2433) data time 0.0008 (0.0037) model time 0.2073 (0.2396) loss 2.5869 (3.4000) grad_norm 1.8254 (1.2728) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][520/625] eta 0:00:25 lr 0.001872 wd 0.0500 time 0.2130 (0.2423) data time 0.0007 (0.0036) model time 0.2123 (0.2387) loss 3.2384 (3.3989) grad_norm 1.0120 (1.2697) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][530/625] eta 0:00:22 lr 0.001872 wd 0.0500 time 0.2122 (0.2413) data time 0.0010 (0.0035) model time 0.2112 (0.2378) loss 3.5764 (3.3924) grad_norm 0.9460 (1.2666) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][540/625] eta 0:00:20 lr 0.001872 wd 0.0500 time 0.2106 (0.2405) data time 0.0009 (0.0034) model time 0.2097 (0.2370) loss 3.9924 (3.3928) grad_norm 1.3695 (1.2696) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][550/625] eta 0:00:17 lr 0.001872 wd 0.0500 time 0.2116 (0.2396) data time 0.0008 (0.0034) model time 0.2108 (0.2363) loss 4.1210 (3.4073) grad_norm 1.3165 (1.2767) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][560/625] eta 0:00:15 lr 0.001872 wd 0.0500 time 0.2147 (0.2389) data time 0.0007 (0.0033) model time 0.2139 (0.2356) loss 3.4920 (3.4046) grad_norm 0.8643 (1.2684) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][570/625] eta 0:00:13 lr 0.001872 wd 0.0500 time 0.2092 (0.2382) data time 0.0009 (0.0032) model time 0.2084 (0.2350) loss 3.7470 (3.4098) grad_norm 1.2519 (1.2625) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][580/625] eta 0:00:10 lr 0.001871 wd 0.0500 time 0.2097 (0.2376) data time 0.0008 (0.0032) model time 0.2089 (0.2344) loss 3.8740 (3.4130) grad_norm 1.6041 (1.2600) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][590/625] eta 0:00:08 lr 0.001871 wd 0.0500 time 0.2099 (0.2371) data time 0.0007 (0.0032) model time 0.2092 (0.2339) loss 2.5967 (3.4100) grad_norm 1.1904 (1.2599) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][600/625] eta 0:00:05 lr 0.001871 wd 0.0500 time 0.2185 (0.2365) data time 0.0009 (0.0031) model time 0.2176 (0.2334) loss 3.8171 (3.4104) grad_norm 0.9115 (1.2550) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][610/625] eta 0:00:03 lr 0.001871 wd 0.0500 time 0.2098 (0.2365) data time 0.0004 (0.0031) model time 0.2094 (0.2335) loss 2.5242 (3.4030) grad_norm 0.8898 (1.2482) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][620/625] eta 0:00:01 lr 0.001871 wd 0.0500 time 0.2098 (0.2365) data time 0.0006 (0.0030) model time 0.2092 (0.2335) loss 3.5074 (3.4013) grad_norm 1.2756 (1.2473) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 21:45:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 65 training takes 0:01:34 [2024-07-29 21:45:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:45:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 21:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.419 (0.419) Loss 0.7837 (0.7837) Acc@1 84.570 (84.570) Acc@5 97.412 (97.412) Mem 8978MB [2024-07-29 21:45:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.2754 (0.9596) Acc@1 72.314 (80.682) Acc@5 91.211 (95.894) Mem 8978MB [2024-07-29 21:45:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.4170 (1.1327) Acc@1 68.652 (76.367) Acc@5 90.576 (93.620) Mem 8978MB [2024-07-29 21:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.106 Acc@5 93.626 [2024-07-29 21:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.1% [2024-07-29 21:45:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.783 (0.783) Loss 0.5698 (0.5698) Acc@1 86.182 (86.182) Acc@5 97.559 (97.559) Mem 8978MB [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.132) Loss 1.0176 (0.7487) Acc@1 74.561 (81.685) Acc@5 92.822 (96.076) Mem 8978MB [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.095) Loss 1.2031 (0.9161) Acc@1 69.434 (77.693) Acc@5 91.260 (94.017) Mem 8978MB [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.439 Acc@5 94.012 [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.44% [2024-07-29 21:45:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 21:45:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 21:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][0/625] eta 0:06:35 lr 0.001871 wd 0.0500 time 0.6335 (0.6335) data time 0.3505 (0.3505) model time 0.0000 (0.0000) loss 3.7606 (3.7606) grad_norm 1.2024 (1.2024) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 21:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][10/625] eta 0:02:34 lr 0.001871 wd 0.0500 time 0.2171 (0.2510) data time 0.0009 (0.0328) model time 0.0000 (0.0000) loss 3.2516 (3.6319) grad_norm 1.0590 (1.2069) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][20/625] eta 0:02:34 lr 0.001871 wd 0.0500 time 0.4258 (0.2556) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 2.1546 (3.4659) grad_norm 1.5527 (1.2700) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][30/625] eta 0:02:24 lr 0.001871 wd 0.0500 time 0.2123 (0.2424) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 3.9389 (3.5597) grad_norm 1.0487 (1.2169) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][40/625] eta 0:02:18 lr 0.001871 wd 0.0500 time 0.2211 (0.2364) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 3.5873 (3.5685) grad_norm 0.9455 (1.1934) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][50/625] eta 0:02:15 lr 0.001871 wd 0.0500 time 0.2200 (0.2353) data time 0.0008 (0.0078) model time 0.0000 (0.0000) loss 3.4415 (3.5197) grad_norm 1.6933 (1.1943) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][60/625] eta 0:02:11 lr 0.001871 wd 0.0500 time 0.2175 (0.2332) data time 0.0008 (0.0075) model time 0.2166 (0.2167) loss 2.5644 (3.4459) grad_norm 1.5549 (1.2079) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][70/625] eta 0:02:07 lr 0.001870 wd 0.0500 time 0.2051 (0.2302) data time 0.0007 (0.0066) model time 0.2043 (0.2138) loss 2.2341 (3.3963) grad_norm 0.9855 (1.2075) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][80/625] eta 0:02:04 lr 0.001870 wd 0.0500 time 0.2103 (0.2283) data time 0.0009 (0.0059) model time 0.2094 (0.2139) loss 4.0287 (3.3973) grad_norm 1.2736 (1.2257) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][90/625] eta 0:02:01 lr 0.001870 wd 0.0500 time 0.2083 (0.2268) data time 0.0010 (0.0053) model time 0.2074 (0.2137) loss 3.5609 (3.4298) grad_norm 1.2011 (1.2303) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][100/625] eta 0:01:58 lr 0.001870 wd 0.0500 time 0.2154 (0.2255) data time 0.0006 (0.0049) model time 0.2148 (0.2135) loss 3.4157 (3.4308) grad_norm 1.5220 (1.2375) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][110/625] eta 0:01:55 lr 0.001870 wd 0.0500 time 0.2153 (0.2246) data time 0.0008 (0.0046) model time 0.2146 (0.2136) loss 3.7598 (3.4368) grad_norm 1.0061 (1.2311) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][120/625] eta 0:01:53 lr 0.001870 wd 0.0500 time 0.2221 (0.2240) data time 0.0008 (0.0043) model time 0.2212 (0.2140) loss 2.4268 (3.4421) grad_norm 1.5769 (1.2230) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][130/625] eta 0:01:50 lr 0.001870 wd 0.0500 time 0.2188 (0.2233) data time 0.0006 (0.0040) model time 0.2181 (0.2139) loss 3.8111 (3.4115) grad_norm 1.3874 (1.2319) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][140/625] eta 0:01:48 lr 0.001870 wd 0.0500 time 0.2171 (0.2228) data time 0.0010 (0.0038) model time 0.2161 (0.2142) loss 3.7766 (3.4135) grad_norm 1.0331 (1.2400) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][150/625] eta 0:01:45 lr 0.001870 wd 0.0500 time 0.2174 (0.2221) data time 0.0009 (0.0037) model time 0.2166 (0.2137) loss 4.2764 (3.4200) grad_norm 1.1277 (1.2338) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][160/625] eta 0:01:42 lr 0.001870 wd 0.0500 time 0.2109 (0.2214) data time 0.0008 (0.0035) model time 0.2101 (0.2134) loss 4.2778 (3.4296) grad_norm 0.8398 (1.2347) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:45:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][170/625] eta 0:01:41 lr 0.001870 wd 0.0500 time 0.2126 (0.2226) data time 0.0008 (0.0034) model time 0.2117 (0.2157) loss 3.4938 (3.4473) grad_norm 1.2686 (1.2248) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][180/625] eta 0:01:38 lr 0.001869 wd 0.0500 time 0.2165 (0.2222) data time 0.0008 (0.0033) model time 0.2157 (0.2155) loss 4.1478 (3.4543) grad_norm 1.1834 (1.2340) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][190/625] eta 0:01:36 lr 0.001869 wd 0.0500 time 0.2159 (0.2218) data time 0.0010 (0.0032) model time 0.2150 (0.2154) loss 2.5434 (3.4521) grad_norm 1.4130 (1.2435) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][200/625] eta 0:01:34 lr 0.001869 wd 0.0500 time 0.2087 (0.2213) data time 0.0008 (0.0030) model time 0.2079 (0.2151) loss 3.3812 (3.4538) grad_norm 1.4117 (1.2393) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][210/625] eta 0:01:31 lr 0.001869 wd 0.0500 time 0.2093 (0.2209) data time 0.0008 (0.0030) model time 0.2086 (0.2149) loss 3.8692 (3.4485) grad_norm 1.2372 (1.2373) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][220/625] eta 0:01:29 lr 0.001869 wd 0.0500 time 0.2119 (0.2206) data time 0.0008 (0.0029) model time 0.2111 (0.2148) loss 3.5784 (3.4475) grad_norm 1.4771 (1.2358) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][230/625] eta 0:01:27 lr 0.001869 wd 0.0500 time 0.2103 (0.2203) data time 0.0009 (0.0028) model time 0.2094 (0.2146) loss 2.6859 (3.4460) grad_norm 0.8368 (1.2348) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][240/625] eta 0:01:24 lr 0.001869 wd 0.0500 time 0.2112 (0.2200) data time 0.0007 (0.0027) model time 0.2105 (0.2146) loss 3.8519 (3.4518) grad_norm 1.1687 (1.2247) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][250/625] eta 0:01:22 lr 0.001869 wd 0.0500 time 0.2091 (0.2198) data time 0.0008 (0.0026) model time 0.2083 (0.2145) loss 2.9663 (3.4385) grad_norm 1.0716 (1.2205) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][260/625] eta 0:01:20 lr 0.001869 wd 0.0500 time 0.2193 (0.2197) data time 0.0008 (0.0026) model time 0.2184 (0.2146) loss 3.3549 (3.4304) grad_norm 1.2699 (1.2218) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][270/625] eta 0:01:17 lr 0.001869 wd 0.0500 time 0.2098 (0.2194) data time 0.0008 (0.0025) model time 0.2091 (0.2145) loss 4.0904 (3.4316) grad_norm 1.2640 (1.2279) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][280/625] eta 0:01:15 lr 0.001869 wd 0.0500 time 0.2104 (0.2192) data time 0.0007 (0.0025) model time 0.2097 (0.2144) loss 3.1345 (3.4378) grad_norm 0.9295 (1.2313) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][290/625] eta 0:01:13 lr 0.001868 wd 0.0500 time 0.2126 (0.2190) data time 0.0009 (0.0024) model time 0.2117 (0.2143) loss 2.5270 (3.4352) grad_norm 1.2564 (1.2317) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][300/625] eta 0:01:11 lr 0.001868 wd 0.0500 time 0.2103 (0.2189) data time 0.0009 (0.0024) model time 0.2094 (0.2143) loss 3.2517 (3.4295) grad_norm 1.0466 (1.2373) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][310/625] eta 0:01:08 lr 0.001868 wd 0.0500 time 0.2192 (0.2187) data time 0.0009 (0.0023) model time 0.2184 (0.2142) loss 2.9294 (3.4282) grad_norm 1.0254 (1.2338) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][320/625] eta 0:01:06 lr 0.001868 wd 0.0500 time 0.2180 (0.2186) data time 0.0007 (0.0023) model time 0.2172 (0.2142) loss 3.7670 (3.4215) grad_norm 1.6242 (1.2370) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 21:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 21:46:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 21:46:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:02:40 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:03:02 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:03:02 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:03:03 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:03:03 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 66) [2024-07-29 22:03:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][330/625] eta 0:46:23 lr 0.001868 wd 0.0500 time 9.4369 (9.4369) data time 0.9253 (0.9253) model time 8.5116 (8.5116) loss 3.9532 (3.9532) grad_norm 1.6347 (1.6347) loss_scale 8192.0000 (8192.0000) mem 10976MB [2024-07-29 22:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][340/625] eta 0:05:07 lr 0.001868 wd 0.0500 time 0.1946 (1.0787) data time 0.0009 (0.0849) model time 0.1936 (0.9938) loss 2.9754 (3.6485) grad_norm 1.8302 (1.4216) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][350/625] eta 0:03:01 lr 0.001868 wd 0.0500 time 0.2026 (0.6601) data time 0.0009 (0.0449) model time 0.2018 (0.6152) loss 3.5947 (3.5759) grad_norm 1.3854 (1.3409) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][360/625] eta 0:02:15 lr 0.001868 wd 0.0500 time 0.2016 (0.5117) data time 0.0006 (0.0307) model time 0.2010 (0.4810) loss 2.8184 (3.6343) grad_norm 1.0083 (1.2606) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][370/625] eta 0:01:51 lr 0.001868 wd 0.0500 time 0.2016 (0.4370) data time 0.0009 (0.0234) model time 0.2007 (0.4136) loss 3.5891 (3.6052) grad_norm 0.9616 (1.2629) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][380/625] eta 0:01:35 lr 0.001868 wd 0.0500 time 0.1987 (0.3908) data time 0.0007 (0.0190) model time 0.1980 (0.3717) loss 4.0681 (3.6147) grad_norm 0.8343 (1.2224) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][390/625] eta 0:01:24 lr 0.001868 wd 0.0500 time 0.1984 (0.3595) data time 0.0009 (0.0160) model time 0.1975 (0.3434) loss 2.9094 (3.5570) grad_norm 1.0705 (1.2389) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][400/625] eta 0:01:15 lr 0.001868 wd 0.0500 time 0.2017 (0.3372) data time 0.0008 (0.0139) model time 0.2010 (0.3233) loss 3.2234 (3.5241) grad_norm 0.9585 (1.3050) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][410/625] eta 0:01:08 lr 0.001867 wd 0.0500 time 0.2020 (0.3205) data time 0.0008 (0.0123) model time 0.2012 (0.3082) loss 2.8947 (3.5037) grad_norm 1.1509 (1.3179) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][420/625] eta 0:01:02 lr 0.001867 wd 0.0500 time 0.1995 (0.3072) data time 0.0007 (0.0110) model time 0.1988 (0.2961) loss 3.4555 (3.4933) grad_norm 1.0764 (1.3183) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][430/625] eta 0:00:57 lr 0.001867 wd 0.0500 time 0.1995 (0.2965) data time 0.0006 (0.0100) model time 0.1988 (0.2865) loss 3.6794 (3.5043) grad_norm 2.2898 (1.3040) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][440/625] eta 0:00:53 lr 0.001867 wd 0.0500 time 0.1966 (0.2883) data time 0.0008 (0.0092) model time 0.1958 (0.2790) loss 3.5817 (3.5079) grad_norm 1.0835 (1.2844) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][450/625] eta 0:00:49 lr 0.001867 wd 0.0500 time 0.2001 (0.2813) data time 0.0006 (0.0085) model time 0.1995 (0.2728) loss 2.8746 (3.5080) grad_norm 1.1501 (1.2895) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][460/625] eta 0:00:45 lr 0.001867 wd 0.0500 time 0.2040 (0.2752) data time 0.0009 (0.0079) model time 0.2031 (0.2672) loss 3.5014 (3.4927) grad_norm 1.2127 (1.2996) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][470/625] eta 0:00:41 lr 0.001867 wd 0.0500 time 0.1992 (0.2698) data time 0.0008 (0.0074) model time 0.1984 (0.2624) loss 3.7605 (3.4870) grad_norm 1.0812 (1.3131) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][480/625] eta 0:00:38 lr 0.001867 wd 0.0500 time 0.1985 (0.2652) data time 0.0008 (0.0070) model time 0.1977 (0.2582) loss 2.9824 (3.4815) grad_norm 1.1565 (1.3034) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][490/625] eta 0:00:35 lr 0.001867 wd 0.0500 time 0.2008 (0.2611) data time 0.0009 (0.0066) model time 0.1999 (0.2545) loss 3.9362 (3.4885) grad_norm 0.8054 (1.3100) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][500/625] eta 0:00:32 lr 0.001867 wd 0.0500 time 0.2116 (0.2576) data time 0.0008 (0.0063) model time 0.2108 (0.2513) loss 3.7257 (3.4853) grad_norm 1.6422 (1.3114) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][510/625] eta 0:00:29 lr 0.001867 wd 0.0500 time 0.2054 (0.2545) data time 0.0008 (0.0060) model time 0.2046 (0.2485) loss 3.9645 (3.4709) grad_norm 1.1606 (1.3047) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][520/625] eta 0:00:26 lr 0.001866 wd 0.0500 time 0.2013 (0.2517) data time 0.0008 (0.0057) model time 0.2005 (0.2460) loss 3.1348 (3.4660) grad_norm 1.4880 (1.3032) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][530/625] eta 0:00:23 lr 0.001866 wd 0.0500 time 0.2016 (0.2492) data time 0.0009 (0.0055) model time 0.2007 (0.2437) loss 3.0449 (3.4486) grad_norm 1.7176 (1.2995) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:03:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][540/625] eta 0:00:20 lr 0.001866 wd 0.0500 time 0.2005 (0.2468) data time 0.0008 (0.0053) model time 0.1998 (0.2415) loss 3.7062 (3.4461) grad_norm 1.1467 (1.3038) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][550/625] eta 0:00:18 lr 0.001866 wd 0.0500 time 0.2056 (0.2447) data time 0.0006 (0.0051) model time 0.2049 (0.2396) loss 3.2412 (3.4387) grad_norm 0.9456 (1.2877) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][560/625] eta 0:00:15 lr 0.001866 wd 0.0500 time 0.1945 (0.2427) data time 0.0006 (0.0049) model time 0.1939 (0.2378) loss 2.5301 (3.4389) grad_norm 0.8363 (1.2811) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][570/625] eta 0:00:13 lr 0.001866 wd 0.0500 time 0.2030 (0.2410) data time 0.0006 (0.0047) model time 0.2024 (0.2362) loss 3.6202 (3.4406) grad_norm 0.9338 (1.2748) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][580/625] eta 0:00:10 lr 0.001866 wd 0.0500 time 0.2001 (0.2394) data time 0.0007 (0.0046) model time 0.1994 (0.2348) loss 3.5966 (3.4302) grad_norm 1.5516 (1.2718) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][590/625] eta 0:00:08 lr 0.001866 wd 0.0500 time 0.2120 (0.2379) data time 0.0006 (0.0044) model time 0.2114 (0.2335) loss 3.1538 (3.4197) grad_norm 1.3175 (1.2766) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][600/625] eta 0:00:05 lr 0.001866 wd 0.0500 time 0.1981 (0.2366) data time 0.0006 (0.0043) model time 0.1974 (0.2322) loss 4.2960 (3.4173) grad_norm 1.0621 (1.2729) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][610/625] eta 0:00:03 lr 0.001866 wd 0.0500 time 0.1961 (0.2353) data time 0.0006 (0.0042) model time 0.1954 (0.2311) loss 3.4091 (3.4233) grad_norm 1.3800 (1.2744) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][620/625] eta 0:00:01 lr 0.001866 wd 0.0500 time 0.1993 (0.2341) data time 0.0004 (0.0041) model time 0.1989 (0.2300) loss 2.3652 (3.4175) grad_norm 0.9217 (1.2709) loss_scale 8192.0000 (8192.0000) mem 8977MB [2024-07-29 22:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 66 training takes 0:01:08 [2024-07-29 22:04:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:04:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.403 (0.403) Loss 0.7959 (0.7959) Acc@1 84.570 (84.570) Acc@5 97.266 (97.266) Mem 8977MB [2024-07-29 22:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.2666 (0.9894) Acc@1 73.389 (80.740) Acc@5 92.920 (95.983) Mem 8977MB [2024-07-29 22:04:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.4961 (1.1612) Acc@1 68.066 (76.600) Acc@5 89.990 (93.787) Mem 8977MB [2024-07-29 22:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.352 Acc@5 93.732 [2024-07-29 22:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.4% [2024-07-29 22:04:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.762 (0.762) Loss 0.5669 (0.5669) Acc@1 86.328 (86.328) Acc@5 97.559 (97.559) Mem 8977MB [2024-07-29 22:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 1.0088 (0.7440) Acc@1 74.658 (81.836) Acc@5 92.920 (96.160) Mem 8977MB [2024-07-29 22:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.093) Loss 1.1953 (0.9097) Acc@1 69.678 (77.913) Acc@5 91.455 (94.152) Mem 8977MB [2024-07-29 22:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.685 Acc@5 94.132 [2024-07-29 22:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-07-29 22:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.69% [2024-07-29 22:04:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:04:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:04:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][0/625] eta 0:10:48 lr 0.001866 wd 0.0500 time 1.0378 (1.0378) data time 0.5154 (0.5154) model time 0.0000 (0.0000) loss 2.9192 (2.9192) grad_norm 0.9477 (0.9477) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 22:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][10/625] eta 0:03:04 lr 0.001865 wd 0.0500 time 0.1982 (0.2997) data time 0.0006 (0.0477) model time 0.0000 (0.0000) loss 2.3769 (3.1595) grad_norm 1.9735 (1.3631) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][20/625] eta 0:02:32 lr 0.001865 wd 0.0500 time 0.2027 (0.2524) data time 0.0006 (0.0254) model time 0.0000 (0.0000) loss 4.2965 (3.4101) grad_norm 1.0262 (1.2658) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][30/625] eta 0:02:20 lr 0.001865 wd 0.0500 time 0.1975 (0.2359) data time 0.0010 (0.0181) model time 0.0000 (0.0000) loss 3.8863 (3.5211) grad_norm 0.9423 (1.2760) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][40/625] eta 0:02:13 lr 0.001865 wd 0.0500 time 0.2091 (0.2278) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 3.4793 (3.4380) grad_norm 1.1136 (1.2817) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][50/625] eta 0:02:07 lr 0.001865 wd 0.0500 time 0.2023 (0.2224) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 3.8076 (3.4408) grad_norm 1.6402 (1.2781) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][60/625] eta 0:02:03 lr 0.001865 wd 0.0500 time 0.2019 (0.2187) data time 0.0006 (0.0096) model time 0.2014 (0.1991) loss 2.7761 (3.4624) grad_norm 1.2238 (1.2829) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][70/625] eta 0:01:59 lr 0.001865 wd 0.0500 time 0.2012 (0.2160) data time 0.0008 (0.0084) model time 0.2005 (0.1988) loss 3.5826 (3.4401) grad_norm 1.7134 (1.3114) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][80/625] eta 0:01:56 lr 0.001865 wd 0.0500 time 0.2000 (0.2141) data time 0.0006 (0.0075) model time 0.1994 (0.1990) loss 2.3322 (3.4005) grad_norm 1.3018 (1.3052) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][90/625] eta 0:01:53 lr 0.001865 wd 0.0500 time 0.2048 (0.2126) data time 0.0008 (0.0068) model time 0.2040 (0.1991) loss 3.6943 (3.3815) grad_norm 1.0739 (1.2933) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][100/625] eta 0:01:50 lr 0.001865 wd 0.0500 time 0.2028 (0.2114) data time 0.0008 (0.0062) model time 0.2020 (0.1992) loss 3.9915 (3.3908) grad_norm 1.3976 (1.2883) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][110/625] eta 0:01:48 lr 0.001865 wd 0.0500 time 0.2018 (0.2105) data time 0.0008 (0.0057) model time 0.2010 (0.1995) loss 3.7513 (3.4048) grad_norm 0.8585 (1.3141) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][120/625] eta 0:01:46 lr 0.001864 wd 0.0500 time 0.1997 (0.2114) data time 0.0007 (0.0053) model time 0.1990 (0.2024) loss 3.1293 (3.4010) grad_norm 0.9223 (1.3206) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][130/625] eta 0:01:44 lr 0.001864 wd 0.0500 time 0.1972 (0.2105) data time 0.0009 (0.0050) model time 0.1963 (0.2020) loss 3.7352 (3.4057) grad_norm 1.4251 (1.3359) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][140/625] eta 0:01:41 lr 0.001864 wd 0.0500 time 0.1988 (0.2098) data time 0.0008 (0.0047) model time 0.1980 (0.2018) loss 3.0278 (3.4351) grad_norm 2.1175 (1.3455) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][150/625] eta 0:01:39 lr 0.001864 wd 0.0500 time 0.2088 (0.2093) data time 0.0006 (0.0045) model time 0.2082 (0.2017) loss 3.5261 (3.4449) grad_norm 0.8389 (1.3385) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][160/625] eta 0:01:37 lr 0.001864 wd 0.0500 time 0.1988 (0.2087) data time 0.0007 (0.0042) model time 0.1981 (0.2015) loss 3.2489 (3.4334) grad_norm 1.2185 (1.3200) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][170/625] eta 0:01:34 lr 0.001864 wd 0.0500 time 0.1989 (0.2083) data time 0.0007 (0.0040) model time 0.1982 (0.2014) loss 3.0167 (3.4147) grad_norm 1.3294 (1.3255) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][180/625] eta 0:01:32 lr 0.001864 wd 0.0500 time 0.2093 (0.2079) data time 0.0008 (0.0039) model time 0.2085 (0.2013) loss 2.8098 (3.3990) grad_norm 1.1459 (1.3105) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][190/625] eta 0:01:30 lr 0.001864 wd 0.0500 time 0.2037 (0.2076) data time 0.0008 (0.0037) model time 0.2030 (0.2012) loss 3.5810 (3.4047) grad_norm 1.5180 (1.3121) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][200/625] eta 0:01:28 lr 0.001864 wd 0.0500 time 0.2069 (0.2073) data time 0.0008 (0.0036) model time 0.2061 (0.2012) loss 2.7959 (3.4007) grad_norm 1.6169 (1.3171) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][210/625] eta 0:01:25 lr 0.001864 wd 0.0500 time 0.2107 (0.2070) data time 0.0008 (0.0035) model time 0.2100 (0.2011) loss 3.6219 (3.4019) grad_norm 0.9150 (1.3140) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][220/625] eta 0:01:23 lr 0.001864 wd 0.0500 time 0.2005 (0.2067) data time 0.0008 (0.0033) model time 0.1997 (0.2010) loss 2.7664 (3.4166) grad_norm 2.0028 (1.3123) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][230/625] eta 0:01:21 lr 0.001863 wd 0.0500 time 0.2067 (0.2065) data time 0.0006 (0.0032) model time 0.2061 (0.2010) loss 2.4262 (3.4028) grad_norm 0.9481 (1.3031) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][240/625] eta 0:01:19 lr 0.001863 wd 0.0500 time 0.2018 (0.2075) data time 0.0008 (0.0031) model time 0.2010 (0.2026) loss 3.0202 (3.3967) grad_norm 1.1964 (1.3062) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][250/625] eta 0:01:17 lr 0.001863 wd 0.0500 time 0.2017 (0.2073) data time 0.0006 (0.0030) model time 0.2011 (0.2025) loss 2.7014 (3.3911) grad_norm 1.4826 (1.3018) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][260/625] eta 0:01:15 lr 0.001863 wd 0.0500 time 0.2042 (0.2071) data time 0.0008 (0.0030) model time 0.2035 (0.2025) loss 4.0497 (3.4065) grad_norm 0.9474 (1.2948) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][270/625] eta 0:01:13 lr 0.001863 wd 0.0500 time 0.2017 (0.2069) data time 0.0007 (0.0029) model time 0.2011 (0.2024) loss 2.4852 (3.4122) grad_norm 0.9943 (1.2889) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][280/625] eta 0:01:11 lr 0.001863 wd 0.0500 time 0.2028 (0.2066) data time 0.0008 (0.0028) model time 0.2020 (0.2022) loss 2.3189 (3.4147) grad_norm 1.1846 (1.2839) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][290/625] eta 0:01:09 lr 0.001863 wd 0.0500 time 0.2020 (0.2065) data time 0.0008 (0.0028) model time 0.2012 (0.2023) loss 3.6361 (3.4228) grad_norm 1.5125 (1.2848) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][300/625] eta 0:01:07 lr 0.001863 wd 0.0500 time 0.2016 (0.2064) data time 0.0007 (0.0027) model time 0.2009 (0.2022) loss 2.9357 (3.4238) grad_norm 1.0841 (1.2816) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][310/625] eta 0:01:04 lr 0.001863 wd 0.0500 time 0.1987 (0.2062) data time 0.0007 (0.0026) model time 0.1981 (0.2021) loss 3.5824 (3.4218) grad_norm 2.3488 (1.2897) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][320/625] eta 0:01:02 lr 0.001863 wd 0.0500 time 0.2008 (0.2060) data time 0.0009 (0.0026) model time 0.1999 (0.2020) loss 3.5649 (3.4250) grad_norm 1.2703 (1.2923) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][330/625] eta 0:01:00 lr 0.001863 wd 0.0500 time 0.2042 (0.2059) data time 0.0008 (0.0025) model time 0.2035 (0.2019) loss 2.8463 (3.4319) grad_norm 0.8246 (1.2914) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][340/625] eta 0:00:58 lr 0.001862 wd 0.0500 time 0.2048 (0.2057) data time 0.0006 (0.0025) model time 0.2042 (0.2019) loss 3.3698 (3.4327) grad_norm 1.4934 (1.2945) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][350/625] eta 0:00:56 lr 0.001862 wd 0.0500 time 0.1978 (0.2057) data time 0.0010 (0.0024) model time 0.1969 (0.2019) loss 3.9570 (3.4240) grad_norm 0.8694 (1.2898) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][360/625] eta 0:00:54 lr 0.001862 wd 0.0500 time 0.1969 (0.2055) data time 0.0010 (0.0024) model time 0.1960 (0.2018) loss 2.6610 (3.4229) grad_norm 1.1740 (1.2852) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][370/625] eta 0:00:52 lr 0.001862 wd 0.0500 time 0.1996 (0.2054) data time 0.0007 (0.0024) model time 0.1989 (0.2018) loss 3.8351 (3.4172) grad_norm 1.1220 (1.2919) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][380/625] eta 0:00:50 lr 0.001862 wd 0.0500 time 0.2023 (0.2053) data time 0.0006 (0.0023) model time 0.2017 (0.2017) loss 2.8790 (3.4236) grad_norm 1.0675 (1.2907) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][390/625] eta 0:00:48 lr 0.001862 wd 0.0500 time 0.1993 (0.2052) data time 0.0007 (0.0023) model time 0.1986 (0.2017) loss 2.9087 (3.4209) grad_norm 1.0548 (1.2898) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][400/625] eta 0:00:46 lr 0.001862 wd 0.0500 time 0.2002 (0.2051) data time 0.0006 (0.0022) model time 0.1996 (0.2017) loss 3.1079 (3.4166) grad_norm 1.8386 (1.2932) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][410/625] eta 0:00:44 lr 0.001862 wd 0.0500 time 0.1988 (0.2050) data time 0.0007 (0.0022) model time 0.1981 (0.2016) loss 3.8389 (3.4179) grad_norm 1.7500 (1.2963) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][420/625] eta 0:00:42 lr 0.001862 wd 0.0500 time 0.1967 (0.2049) data time 0.0010 (0.0022) model time 0.1957 (0.2016) loss 3.8168 (3.4122) grad_norm 1.7261 (1.2975) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][430/625] eta 0:00:39 lr 0.001862 wd 0.0500 time 0.2013 (0.2049) data time 0.0006 (0.0022) model time 0.2007 (0.2016) loss 3.3963 (3.4090) grad_norm 0.9948 (1.2937) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][440/625] eta 0:00:37 lr 0.001862 wd 0.0500 time 0.2043 (0.2048) data time 0.0008 (0.0021) model time 0.2036 (0.2016) loss 4.2415 (3.4125) grad_norm 0.9017 (1.2897) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][450/625] eta 0:00:35 lr 0.001861 wd 0.0500 time 0.2028 (0.2047) data time 0.0006 (0.0021) model time 0.2022 (0.2016) loss 4.1111 (3.4144) grad_norm 1.2590 (1.2834) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][460/625] eta 0:00:33 lr 0.001861 wd 0.0500 time 0.2019 (0.2047) data time 0.0008 (0.0021) model time 0.2011 (0.2016) loss 4.2192 (3.4114) grad_norm 0.9524 (1.2829) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][470/625] eta 0:00:31 lr 0.001861 wd 0.0500 time 0.2008 (0.2046) data time 0.0008 (0.0021) model time 0.2001 (0.2016) loss 3.6096 (3.4164) grad_norm 2.1586 (1.2885) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][480/625] eta 0:00:29 lr 0.001861 wd 0.0500 time 0.2004 (0.2046) data time 0.0009 (0.0020) model time 0.1995 (0.2016) loss 3.4740 (3.4171) grad_norm 0.9217 (1.2862) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][490/625] eta 0:00:27 lr 0.001861 wd 0.0500 time 0.1987 (0.2046) data time 0.0008 (0.0020) model time 0.1979 (0.2016) loss 3.5011 (3.4142) grad_norm 1.2418 (1.2905) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][500/625] eta 0:00:25 lr 0.001861 wd 0.0500 time 0.1995 (0.2046) data time 0.0007 (0.0020) model time 0.1988 (0.2016) loss 1.9469 (3.4121) grad_norm 0.8308 (1.2908) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][510/625] eta 0:00:23 lr 0.001861 wd 0.0500 time 0.2027 (0.2045) data time 0.0006 (0.0020) model time 0.2020 (0.2016) loss 2.4141 (3.4111) grad_norm 1.3634 (1.2905) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][520/625] eta 0:00:21 lr 0.001861 wd 0.0500 time 0.2000 (0.2045) data time 0.0007 (0.0020) model time 0.1994 (0.2016) loss 4.5337 (3.4084) grad_norm 1.9068 (1.2902) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][530/625] eta 0:00:19 lr 0.001861 wd 0.0500 time 0.1975 (0.2045) data time 0.0010 (0.0019) model time 0.1965 (0.2016) loss 2.1325 (3.4046) grad_norm 1.0450 (1.2935) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][540/625] eta 0:00:17 lr 0.001861 wd 0.0500 time 0.1991 (0.2044) data time 0.0006 (0.0019) model time 0.1984 (0.2016) loss 4.2177 (3.4037) grad_norm 1.0909 (1.2907) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][550/625] eta 0:00:15 lr 0.001861 wd 0.0500 time 0.1991 (0.2043) data time 0.0007 (0.0019) model time 0.1985 (0.2015) loss 4.1306 (3.4017) grad_norm 1.1986 (1.2879) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][560/625] eta 0:00:13 lr 0.001860 wd 0.0500 time 0.2083 (0.2043) data time 0.0008 (0.0019) model time 0.2075 (0.2015) loss 3.2337 (3.4015) grad_norm 1.0173 (1.2907) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][570/625] eta 0:00:11 lr 0.001860 wd 0.0500 time 0.2051 (0.2042) data time 0.0008 (0.0019) model time 0.2044 (0.2015) loss 3.6939 (3.4023) grad_norm 0.8267 (1.2897) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][580/625] eta 0:00:09 lr 0.001860 wd 0.0500 time 0.2003 (0.2042) data time 0.0008 (0.0019) model time 0.1995 (0.2015) loss 3.1133 (3.4034) grad_norm 0.9802 (1.2872) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][590/625] eta 0:00:07 lr 0.001860 wd 0.0500 time 0.2001 (0.2042) data time 0.0007 (0.0018) model time 0.1994 (0.2015) loss 4.0524 (3.4017) grad_norm 1.3305 (1.2858) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][600/625] eta 0:00:05 lr 0.001860 wd 0.0500 time 0.2060 (0.2041) data time 0.0008 (0.0018) model time 0.2052 (0.2015) loss 3.8094 (3.3995) grad_norm 2.0931 (1.2899) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][610/625] eta 0:00:03 lr 0.001860 wd 0.0500 time 0.1964 (0.2041) data time 0.0006 (0.0018) model time 0.1958 (0.2015) loss 3.7732 (3.4019) grad_norm 0.9238 (1.2909) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][620/625] eta 0:00:01 lr 0.001860 wd 0.0500 time 0.2002 (0.2040) data time 0.0004 (0.0018) model time 0.1998 (0.2014) loss 4.0521 (3.3994) grad_norm 1.5474 (1.2943) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 67 training takes 0:02:07 [2024-07-29 22:06:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:06:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.422 (0.422) Loss 0.8394 (0.8394) Acc@1 85.303 (85.303) Acc@5 97.314 (97.314) Mem 8975MB [2024-07-29 22:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.090) Loss 1.3252 (1.0089) Acc@1 73.193 (80.908) Acc@5 92.822 (95.956) Mem 8975MB [2024-07-29 22:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.5010 (1.1863) Acc@1 69.629 (76.897) Acc@5 90.479 (93.745) Mem 8975MB [2024-07-29 22:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.649 Acc@5 93.704 [2024-07-29 22:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.6% [2024-07-29 22:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.65% [2024-07-29 22:06:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 22:06:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 22:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.400 (0.400) Loss 0.5649 (0.5649) Acc@1 86.572 (86.572) Acc@5 97.559 (97.559) Mem 8975MB [2024-07-29 22:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.087) Loss 1.0029 (0.7401) Acc@1 74.951 (82.098) Acc@5 93.018 (96.227) Mem 8975MB [2024-07-29 22:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.1855 (0.9043) Acc@1 70.020 (78.165) Acc@5 91.455 (94.236) Mem 8975MB [2024-07-29 22:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.927 Acc@5 94.230 [2024-07-29 22:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-07-29 22:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.93% [2024-07-29 22:06:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:06:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:06:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][0/625] eta 0:06:39 lr 0.001860 wd 0.0500 time 0.6390 (0.6390) data time 0.4444 (0.4444) model time 0.0000 (0.0000) loss 3.7240 (3.7240) grad_norm 1.2523 (1.2523) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][10/625] eta 0:02:27 lr 0.001860 wd 0.0500 time 0.2014 (0.2400) data time 0.0008 (0.0414) model time 0.0000 (0.0000) loss 3.5696 (3.6788) grad_norm 0.9906 (1.0980) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][20/625] eta 0:02:13 lr 0.001860 wd 0.0500 time 0.2006 (0.2210) data time 0.0008 (0.0221) model time 0.0000 (0.0000) loss 3.4902 (3.5011) grad_norm 1.5203 (1.1927) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][30/625] eta 0:02:07 lr 0.001860 wd 0.0500 time 0.2016 (0.2147) data time 0.0009 (0.0153) model time 0.0000 (0.0000) loss 2.3962 (3.2996) grad_norm 0.9074 (1.1612) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][40/625] eta 0:02:03 lr 0.001859 wd 0.0500 time 0.2037 (0.2115) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 3.9377 (3.3379) grad_norm 1.5357 (1.2142) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][50/625] eta 0:02:00 lr 0.001859 wd 0.0500 time 0.2010 (0.2091) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 3.2633 (3.3407) grad_norm 1.5835 (1.2204) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][60/625] eta 0:01:57 lr 0.001859 wd 0.0500 time 0.2040 (0.2077) data time 0.0008 (0.0082) model time 0.2032 (0.1998) loss 3.6346 (3.3529) grad_norm 1.6842 (1.2200) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][70/625] eta 0:01:54 lr 0.001859 wd 0.0500 time 0.1935 (0.2069) data time 0.0009 (0.0073) model time 0.1926 (0.2000) loss 3.7772 (3.3465) grad_norm 0.9344 (1.2065) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][80/625] eta 0:01:52 lr 0.001859 wd 0.0500 time 0.1958 (0.2060) data time 0.0009 (0.0065) model time 0.1949 (0.1997) loss 3.2604 (3.3579) grad_norm 0.7714 (1.2038) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][90/625] eta 0:01:49 lr 0.001859 wd 0.0500 time 0.2002 (0.2056) data time 0.0009 (0.0059) model time 0.1993 (0.2000) loss 3.6604 (3.3780) grad_norm 1.5401 (1.2204) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:06:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][100/625] eta 0:01:47 lr 0.001859 wd 0.0500 time 0.1956 (0.2052) data time 0.0006 (0.0054) model time 0.1950 (0.2002) loss 3.7395 (3.3761) grad_norm 0.9921 (1.2132) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][110/625] eta 0:01:45 lr 0.001859 wd 0.0500 time 0.2190 (0.2053) data time 0.0008 (0.0050) model time 0.2182 (0.2010) loss 3.5503 (3.3439) grad_norm 0.9822 (1.2086) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][120/625] eta 0:01:43 lr 0.001859 wd 0.0500 time 0.1967 (0.2048) data time 0.0006 (0.0046) model time 0.1961 (0.2008) loss 3.9405 (3.3679) grad_norm 1.2296 (1.2083) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][130/625] eta 0:01:41 lr 0.001859 wd 0.0500 time 0.1996 (0.2045) data time 0.0008 (0.0043) model time 0.1988 (0.2005) loss 2.6992 (3.3620) grad_norm 1.0488 (1.2204) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][140/625] eta 0:01:39 lr 0.001859 wd 0.0500 time 0.1982 (0.2041) data time 0.0006 (0.0041) model time 0.1976 (0.2004) loss 2.5403 (3.3668) grad_norm 1.2030 (1.2119) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][150/625] eta 0:01:36 lr 0.001858 wd 0.0500 time 0.1985 (0.2040) data time 0.0008 (0.0039) model time 0.1978 (0.2004) loss 3.7940 (3.3652) grad_norm 1.0861 (1.2119) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][160/625] eta 0:01:34 lr 0.001858 wd 0.0500 time 0.2079 (0.2038) data time 0.0007 (0.0037) model time 0.2071 (0.2004) loss 2.3096 (3.3615) grad_norm 1.2641 (1.2109) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][170/625] eta 0:01:32 lr 0.001858 wd 0.0500 time 0.2012 (0.2037) data time 0.0009 (0.0035) model time 0.2003 (0.2005) loss 3.7035 (3.3584) grad_norm 2.6048 (1.2177) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][180/625] eta 0:01:30 lr 0.001858 wd 0.0500 time 0.2014 (0.2036) data time 0.0008 (0.0034) model time 0.2006 (0.2004) loss 3.8348 (3.3521) grad_norm 1.6324 (1.2355) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][190/625] eta 0:01:28 lr 0.001858 wd 0.0500 time 0.1984 (0.2035) data time 0.0006 (0.0033) model time 0.1978 (0.2006) loss 4.1925 (3.3611) grad_norm 0.9818 (1.2241) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][200/625] eta 0:01:26 lr 0.001858 wd 0.0500 time 0.1983 (0.2035) data time 0.0008 (0.0032) model time 0.1976 (0.2006) loss 2.2580 (3.3684) grad_norm 0.8154 (1.2153) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][210/625] eta 0:01:24 lr 0.001858 wd 0.0500 time 0.1994 (0.2034) data time 0.0006 (0.0031) model time 0.1988 (0.2006) loss 4.1702 (3.3785) grad_norm 0.8504 (1.2137) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][220/625] eta 0:01:22 lr 0.001858 wd 0.0500 time 0.2023 (0.2033) data time 0.0008 (0.0030) model time 0.2014 (0.2006) loss 2.6054 (3.3721) grad_norm 1.0256 (1.2074) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][230/625] eta 0:01:20 lr 0.001858 wd 0.0500 time 0.2018 (0.2032) data time 0.0008 (0.0029) model time 0.2009 (0.2005) loss 3.5910 (3.3730) grad_norm 1.7507 (1.2098) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][240/625] eta 0:01:18 lr 0.001858 wd 0.0500 time 0.2013 (0.2030) data time 0.0006 (0.0028) model time 0.2006 (0.2005) loss 3.9007 (3.3764) grad_norm 2.0134 (1.2138) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][250/625] eta 0:01:16 lr 0.001858 wd 0.0500 time 0.2056 (0.2030) data time 0.0009 (0.0027) model time 0.2047 (0.2005) loss 3.3086 (3.3806) grad_norm 0.9525 (1.2147) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][260/625] eta 0:01:14 lr 0.001857 wd 0.0500 time 0.2004 (0.2030) data time 0.0008 (0.0027) model time 0.1996 (0.2005) loss 3.3969 (3.3921) grad_norm 2.5384 (1.2193) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][270/625] eta 0:01:12 lr 0.001857 wd 0.0500 time 0.2005 (0.2029) data time 0.0010 (0.0026) model time 0.1995 (0.2005) loss 2.6206 (3.3993) grad_norm 0.8760 (1.2255) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][280/625] eta 0:01:09 lr 0.001857 wd 0.0500 time 0.1990 (0.2028) data time 0.0006 (0.0025) model time 0.1984 (0.2005) loss 3.9643 (3.4077) grad_norm 0.9816 (1.2248) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][290/625] eta 0:01:07 lr 0.001857 wd 0.0500 time 0.2008 (0.2028) data time 0.0006 (0.0025) model time 0.2001 (0.2005) loss 2.4992 (3.4036) grad_norm 1.1408 (1.2193) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][300/625] eta 0:01:05 lr 0.001857 wd 0.0500 time 0.2006 (0.2027) data time 0.0007 (0.0024) model time 0.2000 (0.2005) loss 2.5622 (3.4014) grad_norm 2.0417 (1.2250) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][310/625] eta 0:01:03 lr 0.001857 wd 0.0500 time 0.1980 (0.2027) data time 0.0009 (0.0024) model time 0.1971 (0.2005) loss 3.4443 (3.3992) grad_norm 1.1599 (1.2228) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][320/625] eta 0:01:01 lr 0.001857 wd 0.0500 time 0.1998 (0.2026) data time 0.0007 (0.0023) model time 0.1992 (0.2005) loss 3.7574 (3.3969) grad_norm 1.1557 (1.2184) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][330/625] eta 0:00:59 lr 0.001857 wd 0.0500 time 0.1992 (0.2026) data time 0.0006 (0.0023) model time 0.1987 (0.2005) loss 3.6275 (3.3922) grad_norm 0.8569 (1.2166) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][340/625] eta 0:00:57 lr 0.001857 wd 0.0500 time 0.1999 (0.2026) data time 0.0008 (0.0022) model time 0.1990 (0.2005) loss 3.1048 (3.3955) grad_norm 1.0297 (1.2116) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][350/625] eta 0:00:55 lr 0.001857 wd 0.0500 time 0.1972 (0.2026) data time 0.0008 (0.0022) model time 0.1964 (0.2006) loss 2.7050 (3.3990) grad_norm 0.9383 (1.2082) loss_scale 16384.0000 (8285.3561) mem 8975MB [2024-07-29 22:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][360/625] eta 0:00:53 lr 0.001857 wd 0.0500 time 0.2041 (0.2027) data time 0.0006 (0.0022) model time 0.2035 (0.2007) loss 4.0161 (3.4067) grad_norm 0.7835 (1.2084) loss_scale 16384.0000 (8509.6953) mem 8975MB [2024-07-29 22:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][370/625] eta 0:00:51 lr 0.001856 wd 0.0500 time 0.2032 (0.2027) data time 0.0006 (0.0021) model time 0.2026 (0.2007) loss 4.3483 (3.4117) grad_norm 2.0056 (1.2129) loss_scale 16384.0000 (8721.9407) mem 8975MB [2024-07-29 22:07:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][380/625] eta 0:00:49 lr 0.001856 wd 0.0500 time 0.2013 (0.2027) data time 0.0007 (0.0021) model time 0.2007 (0.2007) loss 4.2910 (3.4164) grad_norm 1.8654 (1.2227) loss_scale 16384.0000 (8923.0446) mem 8975MB [2024-07-29 22:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][390/625] eta 0:00:47 lr 0.001856 wd 0.0500 time 0.2005 (0.2026) data time 0.0007 (0.0021) model time 0.1998 (0.2007) loss 2.5631 (3.4125) grad_norm 1.4258 (1.2327) loss_scale 16384.0000 (9113.8619) mem 8975MB [2024-07-29 22:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][400/625] eta 0:00:45 lr 0.001856 wd 0.0500 time 0.2003 (0.2031) data time 0.0008 (0.0020) model time 0.1995 (0.2013) loss 2.4117 (3.4052) grad_norm 0.7649 (1.2313) loss_scale 16384.0000 (9295.1621) mem 8975MB [2024-07-29 22:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][410/625] eta 0:00:43 lr 0.001856 wd 0.0500 time 0.2030 (0.2031) data time 0.0006 (0.0020) model time 0.2024 (0.2013) loss 3.8123 (3.4090) grad_norm 0.7568 (1.2363) loss_scale 16384.0000 (9467.6399) mem 8975MB [2024-07-29 22:08:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][420/625] eta 0:00:41 lr 0.001856 wd 0.0500 time 0.1992 (0.2030) data time 0.0006 (0.0020) model time 0.1986 (0.2013) loss 3.9504 (3.4116) grad_norm 0.8142 (1.2418) loss_scale 16384.0000 (9631.9240) mem 8975MB [2024-07-29 22:08:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][430/625] eta 0:00:39 lr 0.001856 wd 0.0500 time 0.2023 (0.2030) data time 0.0009 (0.0020) model time 0.2014 (0.2012) loss 3.6482 (3.4140) grad_norm 1.3545 (1.2507) loss_scale 16384.0000 (9788.5847) mem 8975MB [2024-07-29 22:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][440/625] eta 0:00:37 lr 0.001856 wd 0.0500 time 0.2121 (0.2030) data time 0.0008 (0.0019) model time 0.2113 (0.2012) loss 3.8654 (3.4177) grad_norm 1.4018 (1.2500) loss_scale 16384.0000 (9938.1406) mem 8975MB [2024-07-29 22:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][450/625] eta 0:00:35 lr 0.001856 wd 0.0500 time 0.2040 (0.2030) data time 0.0008 (0.0019) model time 0.2032 (0.2012) loss 4.2712 (3.4227) grad_norm 1.5981 (1.2537) loss_scale 16384.0000 (10081.0643) mem 8975MB [2024-07-29 22:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][460/625] eta 0:00:33 lr 0.001856 wd 0.0500 time 0.2002 (0.2029) data time 0.0010 (0.0019) model time 0.1993 (0.2012) loss 3.5307 (3.4233) grad_norm 0.9692 (1.2579) loss_scale 16384.0000 (10217.7874) mem 8975MB [2024-07-29 22:08:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][470/625] eta 0:00:31 lr 0.001856 wd 0.0500 time 0.1996 (0.2029) data time 0.0008 (0.0019) model time 0.1988 (0.2012) loss 3.4998 (3.4243) grad_norm 0.9993 (1.2579) loss_scale 16384.0000 (10348.7049) mem 8975MB [2024-07-29 22:08:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][480/625] eta 0:00:29 lr 0.001855 wd 0.0500 time 0.1993 (0.2033) data time 0.0009 (0.0019) model time 0.1984 (0.2017) loss 3.3353 (3.4241) grad_norm 1.0650 (1.2558) loss_scale 16384.0000 (10474.1788) mem 8975MB [2024-07-29 22:08:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][490/625] eta 0:00:27 lr 0.001855 wd 0.0500 time 0.1983 (0.2033) data time 0.0007 (0.0018) model time 0.1975 (0.2017) loss 3.5761 (3.4300) grad_norm 0.9166 (1.2535) loss_scale 16384.0000 (10594.5418) mem 8975MB [2024-07-29 22:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][500/625] eta 0:00:25 lr 0.001855 wd 0.0500 time 0.1981 (0.2032) data time 0.0008 (0.0018) model time 0.1973 (0.2016) loss 3.2970 (3.4261) grad_norm 0.8096 (1.2581) loss_scale 16384.0000 (10710.0998) mem 8975MB [2024-07-29 22:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][510/625] eta 0:00:23 lr 0.001855 wd 0.0500 time 0.1956 (0.2032) data time 0.0007 (0.0018) model time 0.1949 (0.2016) loss 3.6717 (3.4247) grad_norm 1.0313 (1.2547) loss_scale 16384.0000 (10821.1350) mem 8975MB [2024-07-29 22:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][520/625] eta 0:00:21 lr 0.001855 wd 0.0500 time 0.2021 (0.2031) data time 0.0006 (0.0018) model time 0.2015 (0.2015) loss 3.1019 (3.4251) grad_norm 1.0019 (1.2520) loss_scale 16384.0000 (10927.9079) mem 8975MB [2024-07-29 22:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][530/625] eta 0:00:19 lr 0.001855 wd 0.0500 time 0.2004 (0.2031) data time 0.0007 (0.0018) model time 0.1997 (0.2016) loss 3.7698 (3.4267) grad_norm 1.2730 (1.2493) loss_scale 16384.0000 (11030.6591) mem 8975MB [2024-07-29 22:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][540/625] eta 0:00:17 lr 0.001855 wd 0.0500 time 0.2006 (0.2031) data time 0.0009 (0.0017) model time 0.1997 (0.2015) loss 3.3439 (3.4293) grad_norm 1.5613 (1.2505) loss_scale 16384.0000 (11129.6118) mem 8975MB [2024-07-29 22:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][550/625] eta 0:00:15 lr 0.001855 wd 0.0500 time 0.2001 (0.2030) data time 0.0006 (0.0017) model time 0.1995 (0.2015) loss 2.2631 (3.4300) grad_norm 1.6280 (1.2527) loss_scale 16384.0000 (11224.9728) mem 8975MB [2024-07-29 22:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][560/625] eta 0:00:13 lr 0.001855 wd 0.0500 time 0.1994 (0.2030) data time 0.0008 (0.0017) model time 0.1986 (0.2015) loss 3.9923 (3.4275) grad_norm 1.5477 (1.2558) loss_scale 16384.0000 (11316.9340) mem 8975MB [2024-07-29 22:08:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][570/625] eta 0:00:11 lr 0.001855 wd 0.0500 time 0.2026 (0.2030) data time 0.0007 (0.0017) model time 0.2019 (0.2015) loss 3.0013 (3.4172) grad_norm 1.6369 (1.2589) loss_scale 16384.0000 (11405.6743) mem 8975MB [2024-07-29 22:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][580/625] eta 0:00:09 lr 0.001855 wd 0.0500 time 0.1992 (0.2030) data time 0.0008 (0.0017) model time 0.1984 (0.2014) loss 2.9677 (3.4166) grad_norm 1.3139 (1.2595) loss_scale 16384.0000 (11491.3597) mem 8975MB [2024-07-29 22:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][590/625] eta 0:00:07 lr 0.001854 wd 0.0500 time 0.2015 (0.2029) data time 0.0008 (0.0017) model time 0.2007 (0.2014) loss 3.7388 (3.4212) grad_norm 1.0247 (1.2567) loss_scale 16384.0000 (11574.1455) mem 8975MB [2024-07-29 22:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][600/625] eta 0:00:05 lr 0.001854 wd 0.0500 time 0.2072 (0.2029) data time 0.0007 (0.0017) model time 0.2065 (0.2014) loss 3.5739 (3.4244) grad_norm 1.8341 (1.2574) loss_scale 16384.0000 (11654.1764) mem 8975MB [2024-07-29 22:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][610/625] eta 0:00:03 lr 0.001854 wd 0.0500 time 0.2068 (0.2030) data time 0.0004 (0.0017) model time 0.2063 (0.2015) loss 2.1998 (3.4205) grad_norm 1.0794 (1.2570) loss_scale 16384.0000 (11731.5876) mem 8975MB [2024-07-29 22:08:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][620/625] eta 0:00:01 lr 0.001854 wd 0.0500 time 0.1986 (0.2029) data time 0.0004 (0.0016) model time 0.1982 (0.2014) loss 1.9773 (3.4193) grad_norm 1.5933 (1.2595) loss_scale 16384.0000 (11806.5056) mem 8975MB [2024-07-29 22:08:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 68 training takes 0:02:06 [2024-07-29 22:08:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:08:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.7974 (0.7974) Acc@1 86.084 (86.084) Acc@5 97.461 (97.461) Mem 8975MB [2024-07-29 22:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.3281 (0.9793) Acc@1 70.557 (80.353) Acc@5 92.480 (96.072) Mem 8975MB [2024-07-29 22:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.4287 (1.1483) Acc@1 69.385 (76.523) Acc@5 90.674 (93.843) Mem 8975MB [2024-07-29 22:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.296 Acc@5 93.784 [2024-07-29 22:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.3% [2024-07-29 22:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.801 (0.801) Loss 0.5640 (0.5640) Acc@1 86.523 (86.523) Acc@5 97.656 (97.656) Mem 8975MB [2024-07-29 22:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9971 (0.7373) Acc@1 75.195 (82.213) Acc@5 93.213 (96.302) Mem 8975MB [2024-07-29 22:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.1768 (0.8998) Acc@1 70.166 (78.285) Acc@5 91.602 (94.348) Mem 8975MB [2024-07-29 22:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.057 Acc@5 94.332 [2024-07-29 22:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-29 22:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.06% [2024-07-29 22:08:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:08:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][0/625] eta 0:07:32 lr 0.001854 wd 0.0500 time 0.7243 (0.7243) data time 0.5328 (0.5328) model time 0.0000 (0.0000) loss 2.8988 (2.8988) grad_norm 1.0644 (1.0644) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:08:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][10/625] eta 0:02:34 lr 0.001854 wd 0.0500 time 0.2038 (0.2512) data time 0.0009 (0.0493) model time 0.0000 (0.0000) loss 3.9120 (3.4395) grad_norm 1.1280 (1.2135) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][20/625] eta 0:02:17 lr 0.001854 wd 0.0500 time 0.2031 (0.2271) data time 0.0006 (0.0263) model time 0.0000 (0.0000) loss 3.3457 (3.2189) grad_norm 0.8115 (1.3468) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][30/625] eta 0:02:10 lr 0.001854 wd 0.0500 time 0.2463 (0.2199) data time 0.0006 (0.0181) model time 0.0000 (0.0000) loss 2.9180 (3.2940) grad_norm 1.1733 (1.3260) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][40/625] eta 0:02:06 lr 0.001854 wd 0.0500 time 0.1979 (0.2163) data time 0.0007 (0.0144) model time 0.0000 (0.0000) loss 4.5384 (3.3199) grad_norm 0.8971 (1.2758) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][50/625] eta 0:02:02 lr 0.001854 wd 0.0500 time 0.2005 (0.2133) data time 0.0006 (0.0117) model time 0.0000 (0.0000) loss 3.2244 (3.3628) grad_norm 1.4537 (1.2986) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][60/625] eta 0:01:59 lr 0.001854 wd 0.0500 time 0.2015 (0.2115) data time 0.0008 (0.0100) model time 0.2007 (0.2011) loss 3.7155 (3.4230) grad_norm 0.9036 (1.2705) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][70/625] eta 0:01:56 lr 0.001853 wd 0.0500 time 0.1996 (0.2098) data time 0.0008 (0.0087) model time 0.1988 (0.2000) loss 3.8294 (3.4098) grad_norm 1.3628 (1.2744) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][80/625] eta 0:01:53 lr 0.001853 wd 0.0500 time 0.1997 (0.2089) data time 0.0006 (0.0077) model time 0.1991 (0.2006) loss 3.8607 (3.4146) grad_norm 1.2897 (1.2695) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][90/625] eta 0:01:51 lr 0.001853 wd 0.0500 time 0.2007 (0.2081) data time 0.0008 (0.0070) model time 0.1999 (0.2007) loss 3.5970 (3.4326) grad_norm 1.2758 (1.2701) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][100/625] eta 0:01:48 lr 0.001853 wd 0.0500 time 0.1986 (0.2075) data time 0.0008 (0.0064) model time 0.1978 (0.2007) loss 1.9984 (3.3795) grad_norm 0.9399 (1.2465) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][110/625] eta 0:01:47 lr 0.001853 wd 0.0500 time 0.1990 (0.2080) data time 0.0008 (0.0059) model time 0.1982 (0.2025) loss 3.0514 (3.3409) grad_norm 0.8691 (1.2606) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][120/625] eta 0:01:44 lr 0.001853 wd 0.0500 time 0.2004 (0.2078) data time 0.0006 (0.0055) model time 0.1998 (0.2028) loss 3.8448 (3.3323) grad_norm 1.4821 (1.2707) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][130/625] eta 0:01:42 lr 0.001853 wd 0.0500 time 0.1946 (0.2073) data time 0.0006 (0.0051) model time 0.1940 (0.2025) loss 4.1473 (3.3398) grad_norm 1.5104 (1.3029) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][140/625] eta 0:01:40 lr 0.001853 wd 0.0500 time 0.2032 (0.2068) data time 0.0006 (0.0048) model time 0.2025 (0.2022) loss 3.5892 (3.3339) grad_norm 0.9381 (1.2977) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][150/625] eta 0:01:38 lr 0.001853 wd 0.0500 time 0.1991 (0.2065) data time 0.0006 (0.0046) model time 0.1985 (0.2020) loss 3.8171 (3.3481) grad_norm 1.0805 (1.2991) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][160/625] eta 0:01:35 lr 0.001853 wd 0.0500 time 0.1955 (0.2060) data time 0.0006 (0.0043) model time 0.1949 (0.2017) loss 3.4262 (3.3702) grad_norm 0.8859 (1.2932) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][170/625] eta 0:01:33 lr 0.001853 wd 0.0500 time 0.1976 (0.2058) data time 0.0008 (0.0041) model time 0.1968 (0.2017) loss 3.9196 (3.3663) grad_norm 1.3551 (1.2948) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][180/625] eta 0:01:31 lr 0.001852 wd 0.0500 time 0.2016 (0.2057) data time 0.0006 (0.0040) model time 0.2010 (0.2019) loss 3.7439 (3.3747) grad_norm 0.9125 (1.3020) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][190/625] eta 0:01:29 lr 0.001852 wd 0.0500 time 0.1974 (0.2055) data time 0.0009 (0.0038) model time 0.1965 (0.2018) loss 3.6718 (3.3748) grad_norm 0.9201 (1.2850) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][200/625] eta 0:01:27 lr 0.001852 wd 0.0500 time 0.1972 (0.2053) data time 0.0009 (0.0037) model time 0.1963 (0.2017) loss 3.0869 (3.3808) grad_norm 1.1564 (1.2792) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][210/625] eta 0:01:25 lr 0.001852 wd 0.0500 time 0.2080 (0.2052) data time 0.0008 (0.0035) model time 0.2072 (0.2017) loss 3.0847 (3.3716) grad_norm 1.2144 (1.2770) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][220/625] eta 0:01:23 lr 0.001852 wd 0.0500 time 0.1990 (0.2051) data time 0.0008 (0.0034) model time 0.1982 (0.2017) loss 3.3955 (3.3821) grad_norm 1.1605 (1.2876) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][230/625] eta 0:01:20 lr 0.001852 wd 0.0500 time 0.1995 (0.2051) data time 0.0009 (0.0034) model time 0.1987 (0.2016) loss 3.6339 (3.3836) grad_norm 0.9838 (1.2960) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][240/625] eta 0:01:18 lr 0.001852 wd 0.0500 time 0.1987 (0.2051) data time 0.0007 (0.0033) model time 0.1980 (0.2018) loss 3.8791 (3.3782) grad_norm 0.8988 (1.2922) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][250/625] eta 0:01:16 lr 0.001852 wd 0.0500 time 0.1971 (0.2052) data time 0.0009 (0.0032) model time 0.1962 (0.2021) loss 3.7711 (3.3832) grad_norm 0.7274 (1.2835) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][260/625] eta 0:01:14 lr 0.001852 wd 0.0500 time 0.1963 (0.2051) data time 0.0011 (0.0031) model time 0.1952 (0.2021) loss 3.9222 (3.3856) grad_norm 1.2000 (1.2867) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][270/625] eta 0:01:12 lr 0.001852 wd 0.0500 time 0.1993 (0.2050) data time 0.0009 (0.0031) model time 0.1984 (0.2020) loss 3.0369 (3.3860) grad_norm 2.2987 (1.2920) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][280/625] eta 0:01:10 lr 0.001852 wd 0.0500 time 0.1984 (0.2049) data time 0.0006 (0.0030) model time 0.1978 (0.2020) loss 3.8473 (3.3926) grad_norm 1.1091 (1.2994) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][290/625] eta 0:01:08 lr 0.001851 wd 0.0500 time 0.2030 (0.2047) data time 0.0006 (0.0029) model time 0.2024 (0.2018) loss 3.0415 (3.3973) grad_norm 0.8271 (1.2961) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][300/625] eta 0:01:06 lr 0.001851 wd 0.0500 time 0.1995 (0.2046) data time 0.0008 (0.0028) model time 0.1987 (0.2017) loss 3.5108 (3.3938) grad_norm 1.3950 (1.2972) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][310/625] eta 0:01:04 lr 0.001851 wd 0.0500 time 0.1854 (0.2051) data time 0.0009 (0.0028) model time 0.1845 (0.2025) loss 3.5075 (3.3873) grad_norm 1.4430 (1.3053) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][320/625] eta 0:01:02 lr 0.001851 wd 0.0500 time 0.2004 (0.2050) data time 0.0007 (0.0027) model time 0.1997 (0.2024) loss 2.7800 (3.3837) grad_norm 1.9996 (1.3038) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][330/625] eta 0:01:00 lr 0.001851 wd 0.0500 time 0.2036 (0.2048) data time 0.0007 (0.0027) model time 0.2029 (0.2023) loss 4.0389 (3.3873) grad_norm 1.5408 (1.3062) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:09:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][340/625] eta 0:00:58 lr 0.001851 wd 0.0500 time 0.1974 (0.2047) data time 0.0006 (0.0026) model time 0.1967 (0.2022) loss 3.2331 (3.3900) grad_norm 1.0821 (1.3106) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][350/625] eta 0:00:56 lr 0.001851 wd 0.0500 time 0.2029 (0.2046) data time 0.0008 (0.0026) model time 0.2021 (0.2021) loss 3.7009 (3.3871) grad_norm 1.0650 (1.3147) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][360/625] eta 0:00:54 lr 0.001851 wd 0.0500 time 0.1965 (0.2044) data time 0.0009 (0.0025) model time 0.1955 (0.2020) loss 4.0113 (3.3907) grad_norm 1.1527 (1.3176) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][370/625] eta 0:00:52 lr 0.001851 wd 0.0500 time 0.2000 (0.2043) data time 0.0006 (0.0025) model time 0.1994 (0.2019) loss 2.9340 (3.3827) grad_norm 1.0852 (1.3119) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][380/625] eta 0:00:50 lr 0.001851 wd 0.0500 time 0.1995 (0.2042) data time 0.0007 (0.0024) model time 0.1988 (0.2018) loss 2.0471 (3.3746) grad_norm 2.7582 (1.3157) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][390/625] eta 0:00:47 lr 0.001850 wd 0.0500 time 0.1977 (0.2041) data time 0.0007 (0.0024) model time 0.1970 (0.2018) loss 3.0089 (3.3751) grad_norm 1.3899 (1.3289) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][400/625] eta 0:00:45 lr 0.001850 wd 0.0500 time 0.1986 (0.2040) data time 0.0006 (0.0023) model time 0.1980 (0.2017) loss 3.0629 (3.3721) grad_norm 1.0044 (1.3262) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][410/625] eta 0:00:43 lr 0.001850 wd 0.0500 time 0.2002 (0.2039) data time 0.0006 (0.0023) model time 0.1996 (0.2016) loss 4.3882 (3.3663) grad_norm 1.4722 (1.3239) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][420/625] eta 0:00:41 lr 0.001850 wd 0.0500 time 0.1995 (0.2038) data time 0.0009 (0.0023) model time 0.1987 (0.2016) loss 2.4212 (3.3644) grad_norm 1.1071 (1.3181) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][430/625] eta 0:00:39 lr 0.001850 wd 0.0500 time 0.1991 (0.2038) data time 0.0008 (0.0023) model time 0.1983 (0.2015) loss 3.4218 (3.3647) grad_norm 1.6023 (1.3189) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][440/625] eta 0:00:37 lr 0.001850 wd 0.0500 time 0.2139 (0.2038) data time 0.0008 (0.0022) model time 0.2131 (0.2015) loss 2.7894 (3.3610) grad_norm 1.0112 (1.3136) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][450/625] eta 0:00:35 lr 0.001850 wd 0.0500 time 0.1998 (0.2037) data time 0.0007 (0.0022) model time 0.1991 (0.2015) loss 4.1270 (3.3620) grad_norm 0.8765 (1.3042) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][460/625] eta 0:00:33 lr 0.001850 wd 0.0500 time 0.2016 (0.2036) data time 0.0006 (0.0022) model time 0.2010 (0.2014) loss 4.4104 (3.3651) grad_norm 1.6864 (1.3007) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][470/625] eta 0:00:31 lr 0.001850 wd 0.0500 time 0.2026 (0.2036) data time 0.0007 (0.0021) model time 0.2019 (0.2014) loss 3.2026 (3.3648) grad_norm 0.8496 (1.2979) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][480/625] eta 0:00:29 lr 0.001850 wd 0.0500 time 0.2034 (0.2035) data time 0.0007 (0.0021) model time 0.2026 (0.2014) loss 3.6740 (3.3666) grad_norm 1.4205 (1.2929) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][490/625] eta 0:00:27 lr 0.001850 wd 0.0500 time 0.1998 (0.2035) data time 0.0008 (0.0021) model time 0.1990 (0.2013) loss 3.5440 (3.3684) grad_norm 0.9319 (1.2910) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][500/625] eta 0:00:25 lr 0.001849 wd 0.0500 time 0.2003 (0.2034) data time 0.0007 (0.0021) model time 0.1997 (0.2013) loss 3.2150 (3.3696) grad_norm 1.9158 (1.2903) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][510/625] eta 0:00:23 lr 0.001849 wd 0.0500 time 0.2043 (0.2033) data time 0.0007 (0.0020) model time 0.2036 (0.2013) loss 3.1785 (3.3700) grad_norm 1.0282 (1.2899) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][520/625] eta 0:00:21 lr 0.001849 wd 0.0500 time 0.2002 (0.2036) data time 0.0006 (0.0020) model time 0.1996 (0.2016) loss 3.2681 (3.3677) grad_norm 2.0285 (1.2919) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][530/625] eta 0:00:19 lr 0.001849 wd 0.0500 time 0.2021 (0.2036) data time 0.0007 (0.0020) model time 0.2014 (0.2016) loss 3.5968 (3.3708) grad_norm 1.3583 (1.2914) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][540/625] eta 0:00:17 lr 0.001849 wd 0.0500 time 0.1969 (0.2035) data time 0.0006 (0.0020) model time 0.1963 (0.2015) loss 3.6991 (3.3698) grad_norm 0.9899 (1.2880) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][550/625] eta 0:00:15 lr 0.001849 wd 0.0500 time 0.2014 (0.2034) data time 0.0006 (0.0020) model time 0.2008 (0.2015) loss 4.0147 (3.3751) grad_norm 2.2688 (1.2864) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][560/625] eta 0:00:13 lr 0.001849 wd 0.0500 time 0.2013 (0.2034) data time 0.0007 (0.0019) model time 0.2006 (0.2014) loss 3.7491 (3.3742) grad_norm 1.4043 (1.2903) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][570/625] eta 0:00:11 lr 0.001849 wd 0.0500 time 0.2007 (0.2034) data time 0.0006 (0.0019) model time 0.2001 (0.2014) loss 3.9477 (3.3781) grad_norm 2.3919 (1.2915) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][580/625] eta 0:00:09 lr 0.001849 wd 0.0500 time 0.2017 (0.2033) data time 0.0007 (0.0019) model time 0.2009 (0.2014) loss 3.4926 (3.3774) grad_norm 0.8811 (1.2944) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][590/625] eta 0:00:07 lr 0.001849 wd 0.0500 time 0.2279 (0.2033) data time 0.0009 (0.0019) model time 0.2269 (0.2014) loss 3.5093 (3.3813) grad_norm 2.1980 (1.2997) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][600/625] eta 0:00:05 lr 0.001848 wd 0.0500 time 0.2029 (0.2033) data time 0.0007 (0.0019) model time 0.2021 (0.2014) loss 3.5364 (3.3804) grad_norm 1.3300 (1.2989) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][610/625] eta 0:00:03 lr 0.001848 wd 0.0500 time 0.2007 (0.2032) data time 0.0005 (0.0019) model time 0.2001 (0.2013) loss 3.9229 (3.3824) grad_norm 2.5411 (1.3010) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][620/625] eta 0:00:01 lr 0.001848 wd 0.0500 time 0.1980 (0.2033) data time 0.0004 (0.0019) model time 0.1976 (0.2014) loss 3.8211 (3.3844) grad_norm 1.6947 (1.2991) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:10:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 69 training takes 0:02:07 [2024-07-29 22:10:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:10:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:10:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.8198 (0.8198) Acc@1 84.082 (84.082) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 22:10:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 1.3145 (0.9646) Acc@1 71.973 (80.775) Acc@5 90.967 (95.774) Mem 8975MB [2024-07-29 22:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 1.3945 (1.1379) Acc@1 70.410 (76.758) Acc@5 90.527 (93.610) Mem 8975MB [2024-07-29 22:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.350 Acc@5 93.592 [2024-07-29 22:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.3% [2024-07-29 22:10:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5635 (0.5635) Acc@1 86.572 (86.572) Acc@5 97.656 (97.656) Mem 8975MB [2024-07-29 22:10:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9917 (0.7349) Acc@1 75.439 (82.360) Acc@5 93.262 (96.338) Mem 8975MB [2024-07-29 22:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.1689 (0.8960) Acc@1 70.410 (78.432) Acc@5 91.602 (94.413) Mem 8975MB [2024-07-29 22:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.209 Acc@5 94.390 [2024-07-29 22:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-29 22:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.21% [2024-07-29 22:11:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:11:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:11:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][0/625] eta 0:06:17 lr 0.001848 wd 0.0500 time 0.6040 (0.6040) data time 0.4131 (0.4131) model time 0.0000 (0.0000) loss 3.9666 (3.9666) grad_norm 0.8431 (0.8431) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][10/625] eta 0:02:25 lr 0.001848 wd 0.0500 time 0.2054 (0.2371) data time 0.0008 (0.0384) model time 0.0000 (0.0000) loss 3.2973 (3.3812) grad_norm 0.9239 (1.0969) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][20/625] eta 0:02:12 lr 0.001848 wd 0.0500 time 0.2003 (0.2194) data time 0.0007 (0.0205) model time 0.0000 (0.0000) loss 4.0441 (3.4689) grad_norm 1.8221 (1.2354) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:11:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:11:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:11:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:13:26 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:13:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:13:36 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:13:36 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:13:36 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:13:36 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 70) [2024-07-29 22:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][30/625] eta 0:19:42 lr 0.001848 wd 0.0500 time 0.2114 (1.9867) data time 0.0008 (0.1359) model time 0.0000 (0.0000) loss 3.9729 (3.7405) grad_norm 1.2131 (1.4179) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][40/625] eta 0:07:50 lr 0.001848 wd 0.0500 time 0.2190 (0.8049) data time 0.0010 (0.0460) model time 0.0000 (0.0000) loss 3.6347 (3.6182) grad_norm 1.2288 (1.4133) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][50/625] eta 0:05:25 lr 0.001848 wd 0.0500 time 0.2065 (0.5665) data time 0.0010 (0.0280) model time 0.0000 (0.0000) loss 3.4018 (3.6047) grad_norm 1.3306 (1.3377) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:13:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][60/625] eta 0:04:22 lr 0.001848 wd 0.0500 time 0.2073 (0.4649) data time 0.0010 (0.0203) model time 0.2063 (0.2099) loss 3.6154 (3.5785) grad_norm 1.2936 (1.3183) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:13:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][70/625] eta 0:03:46 lr 0.001848 wd 0.0500 time 0.2119 (0.4082) data time 0.0010 (0.0160) model time 0.2109 (0.2093) loss 3.4649 (3.5394) grad_norm 1.2745 (1.2941) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][80/625] eta 0:03:22 lr 0.001847 wd 0.0500 time 0.2319 (0.3721) data time 0.0008 (0.0133) model time 0.2312 (0.2091) loss 2.8269 (3.5360) grad_norm 1.6407 (1.3126) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][90/625] eta 0:03:06 lr 0.001847 wd 0.0500 time 0.2124 (0.3480) data time 0.0011 (0.0114) model time 0.2113 (0.2103) loss 3.6471 (3.5219) grad_norm 1.3218 (1.3410) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][100/625] eta 0:02:53 lr 0.001847 wd 0.0500 time 0.1999 (0.3297) data time 0.0011 (0.0100) model time 0.1987 (0.2102) loss 2.8705 (3.4949) grad_norm 0.7340 (1.3155) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][110/625] eta 0:02:42 lr 0.001847 wd 0.0500 time 0.2035 (0.3160) data time 0.0007 (0.0090) model time 0.2027 (0.2105) loss 3.3029 (3.4668) grad_norm 1.4335 (1.2962) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][120/625] eta 0:02:33 lr 0.001847 wd 0.0500 time 0.2130 (0.3048) data time 0.0010 (0.0081) model time 0.2120 (0.2103) loss 3.4735 (3.4616) grad_norm 2.2545 (1.3334) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][130/625] eta 0:02:26 lr 0.001847 wd 0.0500 time 0.2052 (0.2961) data time 0.0010 (0.0075) model time 0.2042 (0.2106) loss 3.1032 (3.4862) grad_norm 1.0248 (1.3399) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][140/625] eta 0:02:20 lr 0.001847 wd 0.0500 time 0.2045 (0.2888) data time 0.0009 (0.0069) model time 0.2036 (0.2107) loss 2.5212 (3.4677) grad_norm 0.9327 (1.3228) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][150/625] eta 0:02:14 lr 0.001847 wd 0.0500 time 0.2060 (0.2827) data time 0.0008 (0.0064) model time 0.2052 (0.2108) loss 3.1844 (3.4632) grad_norm 0.9224 (1.3086) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][160/625] eta 0:02:09 lr 0.001847 wd 0.0500 time 0.2051 (0.2775) data time 0.0009 (0.0060) model time 0.2042 (0.2107) loss 3.2474 (3.4712) grad_norm 1.2743 (1.3336) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][170/625] eta 0:02:04 lr 0.001847 wd 0.0500 time 0.2067 (0.2729) data time 0.0011 (0.0057) model time 0.2056 (0.2107) loss 3.7807 (3.4666) grad_norm 0.8336 (1.3243) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][180/625] eta 0:01:59 lr 0.001847 wd 0.0500 time 0.2143 (0.2688) data time 0.0009 (0.0054) model time 0.2134 (0.2106) loss 3.8261 (3.4620) grad_norm 1.0333 (1.3061) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][190/625] eta 0:01:55 lr 0.001846 wd 0.0500 time 0.2079 (0.2651) data time 0.0010 (0.0051) model time 0.2069 (0.2103) loss 3.7265 (3.4610) grad_norm 1.2418 (1.3032) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][200/625] eta 0:01:51 lr 0.001846 wd 0.0500 time 0.2218 (0.2620) data time 0.0008 (0.0049) model time 0.2210 (0.2102) loss 3.2817 (3.4497) grad_norm 1.0727 (1.2991) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][210/625] eta 0:01:47 lr 0.001846 wd 0.0500 time 0.2076 (0.2592) data time 0.0011 (0.0047) model time 0.2065 (0.2101) loss 3.4343 (3.4395) grad_norm 1.1101 (1.2995) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][220/625] eta 0:01:43 lr 0.001846 wd 0.0500 time 0.2146 (0.2566) data time 0.0011 (0.0045) model time 0.2135 (0.2099) loss 2.4790 (3.4406) grad_norm 1.0485 (1.2924) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][230/625] eta 0:01:40 lr 0.001846 wd 0.0500 time 0.2075 (0.2543) data time 0.0011 (0.0043) model time 0.2064 (0.2099) loss 3.0747 (3.4352) grad_norm 0.9901 (1.2951) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][240/625] eta 0:01:37 lr 0.001846 wd 0.0500 time 0.2144 (0.2522) data time 0.0010 (0.0042) model time 0.2134 (0.2098) loss 3.3419 (3.4319) grad_norm 1.4098 (inf) loss_scale 8192.0000 (16269.6930) mem 8977MB [2024-07-29 22:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][250/625] eta 0:01:33 lr 0.001846 wd 0.0500 time 0.2132 (0.2503) data time 0.0007 (0.0040) model time 0.2125 (0.2098) loss 3.3007 (3.4318) grad_norm 0.7997 (inf) loss_scale 8192.0000 (15910.6844) mem 8977MB [2024-07-29 22:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][260/625] eta 0:01:30 lr 0.001846 wd 0.0500 time 0.2065 (0.2487) data time 0.0012 (0.0039) model time 0.2053 (0.2098) loss 3.6288 (3.4204) grad_norm 0.8373 (inf) loss_scale 8192.0000 (15582.2298) mem 8977MB [2024-07-29 22:14:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][270/625] eta 0:01:27 lr 0.001846 wd 0.0500 time 0.2075 (0.2471) data time 0.0010 (0.0038) model time 0.2065 (0.2098) loss 3.4734 (3.4195) grad_norm 1.5063 (inf) loss_scale 8192.0000 (15280.5878) mem 8977MB [2024-07-29 22:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][280/625] eta 0:01:24 lr 0.001846 wd 0.0500 time 0.2137 (0.2456) data time 0.0008 (0.0037) model time 0.2130 (0.2097) loss 3.0770 (3.4044) grad_norm 1.0031 (inf) loss_scale 8192.0000 (15002.6039) mem 8977MB [2024-07-29 22:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][290/625] eta 0:01:21 lr 0.001846 wd 0.0500 time 0.2094 (0.2443) data time 0.0007 (0.0036) model time 0.2087 (0.2098) loss 2.4044 (3.3959) grad_norm 1.9645 (inf) loss_scale 8192.0000 (14745.6000) mem 8977MB [2024-07-29 22:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][300/625] eta 0:01:19 lr 0.001845 wd 0.0500 time 0.2062 (0.2431) data time 0.0011 (0.0035) model time 0.2051 (0.2097) loss 3.5676 (3.3949) grad_norm 1.1481 (inf) loss_scale 8192.0000 (14507.2873) mem 8977MB [2024-07-29 22:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][310/625] eta 0:01:16 lr 0.001845 wd 0.0500 time 0.2113 (0.2421) data time 0.0011 (0.0034) model time 0.2102 (0.2099) loss 3.6212 (3.3949) grad_norm 1.9375 (inf) loss_scale 8192.0000 (14285.6982) mem 8977MB [2024-07-29 22:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][320/625] eta 0:01:13 lr 0.001845 wd 0.0500 time 0.2059 (0.2410) data time 0.0011 (0.0033) model time 0.2049 (0.2098) loss 3.6484 (3.3897) grad_norm 0.8237 (inf) loss_scale 8192.0000 (14079.1322) mem 8977MB [2024-07-29 22:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][330/625] eta 0:01:10 lr 0.001845 wd 0.0500 time 0.2071 (0.2400) data time 0.0010 (0.0032) model time 0.2061 (0.2098) loss 2.9309 (3.3809) grad_norm 1.0826 (inf) loss_scale 8192.0000 (13886.1115) mem 8977MB [2024-07-29 22:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][340/625] eta 0:01:08 lr 0.001845 wd 0.0500 time 0.2088 (0.2390) data time 0.0010 (0.0032) model time 0.2078 (0.2097) loss 3.8019 (3.3826) grad_norm 1.5365 (inf) loss_scale 8192.0000 (13705.3460) mem 8977MB [2024-07-29 22:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][350/625] eta 0:01:05 lr 0.001845 wd 0.0500 time 0.2036 (0.2380) data time 0.0009 (0.0031) model time 0.2027 (0.2096) loss 3.9793 (3.3950) grad_norm 1.2504 (inf) loss_scale 8192.0000 (13535.7046) mem 8977MB [2024-07-29 22:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][360/625] eta 0:01:02 lr 0.001845 wd 0.0500 time 0.2170 (0.2374) data time 0.0007 (0.0031) model time 0.2163 (0.2098) loss 4.0345 (3.3889) grad_norm 0.8736 (inf) loss_scale 8192.0000 (13376.1910) mem 8977MB [2024-07-29 22:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][370/625] eta 0:01:00 lr 0.001845 wd 0.0500 time 0.2025 (0.2366) data time 0.0009 (0.0030) model time 0.2017 (0.2098) loss 2.4880 (3.3896) grad_norm 1.5215 (inf) loss_scale 8192.0000 (13225.9246) mem 8977MB [2024-07-29 22:15:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][380/625] eta 0:00:57 lr 0.001845 wd 0.0500 time 0.2030 (0.2359) data time 0.0012 (0.0029) model time 0.2018 (0.2098) loss 3.2123 (3.3932) grad_norm 0.7586 (inf) loss_scale 8192.0000 (13084.1239) mem 8977MB [2024-07-29 22:15:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][390/625] eta 0:00:55 lr 0.001845 wd 0.0500 time 0.2108 (0.2353) data time 0.0010 (0.0029) model time 0.2097 (0.2099) loss 3.2556 (3.3876) grad_norm 1.0533 (inf) loss_scale 8192.0000 (12950.0932) mem 8977MB [2024-07-29 22:15:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][400/625] eta 0:00:52 lr 0.001844 wd 0.0500 time 0.2189 (0.2346) data time 0.0008 (0.0028) model time 0.2181 (0.2099) loss 3.0870 (3.3898) grad_norm 1.0082 (inf) loss_scale 8192.0000 (12823.2107) mem 8977MB [2024-07-29 22:15:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][410/625] eta 0:00:50 lr 0.001844 wd 0.0500 time 0.2139 (0.2346) data time 0.0008 (0.0028) model time 0.2131 (0.2105) loss 2.4322 (3.3829) grad_norm 1.0497 (inf) loss_scale 8192.0000 (12702.9195) mem 8977MB [2024-07-29 22:15:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][420/625] eta 0:00:48 lr 0.001844 wd 0.0500 time 0.2054 (0.2346) data time 0.0011 (0.0028) model time 0.2043 (0.2111) loss 3.7202 (3.3869) grad_norm 1.7749 (inf) loss_scale 8192.0000 (12588.7190) mem 8977MB [2024-07-29 22:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][430/625] eta 0:00:45 lr 0.001844 wd 0.0500 time 0.2095 (0.2341) data time 0.0009 (0.0028) model time 0.2086 (0.2111) loss 3.3668 (3.3930) grad_norm 0.8735 (inf) loss_scale 8192.0000 (12480.1580) mem 8977MB [2024-07-29 22:15:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][440/625] eta 0:00:43 lr 0.001844 wd 0.0500 time 0.2138 (0.2335) data time 0.0011 (0.0027) model time 0.2127 (0.2111) loss 3.7544 (3.3969) grad_norm 1.0453 (inf) loss_scale 8192.0000 (12376.8289) mem 8977MB [2024-07-29 22:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][450/625] eta 0:00:40 lr 0.001844 wd 0.0500 time 0.2138 (0.2337) data time 0.0008 (0.0027) model time 0.2130 (0.2118) loss 3.7860 (3.3956) grad_norm 1.0866 (inf) loss_scale 8192.0000 (12278.3624) mem 8977MB [2024-07-29 22:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][460/625] eta 0:00:38 lr 0.001844 wd 0.0500 time 0.2106 (0.2331) data time 0.0008 (0.0026) model time 0.2098 (0.2117) loss 3.6270 (3.4033) grad_norm 1.8645 (inf) loss_scale 8192.0000 (12184.4230) mem 8977MB [2024-07-29 22:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][470/625] eta 0:00:36 lr 0.001844 wd 0.0500 time 0.2096 (0.2326) data time 0.0009 (0.0026) model time 0.2086 (0.2117) loss 3.3936 (3.4058) grad_norm 0.8892 (inf) loss_scale 8192.0000 (12094.7056) mem 8977MB [2024-07-29 22:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][480/625] eta 0:00:33 lr 0.001844 wd 0.0500 time 0.2073 (0.2321) data time 0.0007 (0.0026) model time 0.2066 (0.2116) loss 3.4729 (3.4027) grad_norm 1.8853 (inf) loss_scale 8192.0000 (12008.9319) mem 8977MB [2024-07-29 22:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][490/625] eta 0:00:31 lr 0.001844 wd 0.0500 time 0.2195 (0.2316) data time 0.0008 (0.0025) model time 0.2187 (0.2115) loss 3.0840 (3.3988) grad_norm 1.3953 (inf) loss_scale 8192.0000 (11926.8473) mem 8977MB [2024-07-29 22:15:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][500/625] eta 0:00:28 lr 0.001843 wd 0.0500 time 0.2231 (0.2312) data time 0.0008 (0.0025) model time 0.2223 (0.2115) loss 3.1361 (3.3911) grad_norm 0.8850 (inf) loss_scale 8192.0000 (11848.2189) mem 8977MB [2024-07-29 22:15:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][510/625] eta 0:00:26 lr 0.001843 wd 0.0500 time 0.2152 (0.2308) data time 0.0010 (0.0025) model time 0.2142 (0.2115) loss 4.0579 (3.3924) grad_norm 1.3028 (inf) loss_scale 8192.0000 (11772.8330) mem 8977MB [2024-07-29 22:15:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][520/625] eta 0:00:24 lr 0.001843 wd 0.0500 time 0.2227 (0.2304) data time 0.0007 (0.0024) model time 0.2220 (0.2115) loss 3.5619 (3.3920) grad_norm 1.4932 (inf) loss_scale 8192.0000 (11700.4929) mem 8977MB [2024-07-29 22:15:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][530/625] eta 0:00:21 lr 0.001843 wd 0.0500 time 0.2073 (0.2300) data time 0.0011 (0.0024) model time 0.2063 (0.2114) loss 3.6121 (3.3918) grad_norm 1.2400 (inf) loss_scale 8192.0000 (11631.0178) mem 8977MB [2024-07-29 22:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][540/625] eta 0:00:19 lr 0.001843 wd 0.0500 time 0.2171 (0.2296) data time 0.0010 (0.0024) model time 0.2161 (0.2114) loss 3.4414 (3.4007) grad_norm 1.4356 (inf) loss_scale 8192.0000 (11564.2408) mem 8977MB [2024-07-29 22:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][550/625] eta 0:00:17 lr 0.001843 wd 0.0500 time 0.2142 (0.2293) data time 0.0012 (0.0024) model time 0.2130 (0.2114) loss 4.0632 (3.3950) grad_norm 1.0363 (inf) loss_scale 8192.0000 (11500.0076) mem 8977MB [2024-07-29 22:15:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][560/625] eta 0:00:14 lr 0.001843 wd 0.0500 time 0.2175 (0.2290) data time 0.0007 (0.0023) model time 0.2168 (0.2113) loss 2.9402 (3.3924) grad_norm 1.3024 (inf) loss_scale 8192.0000 (11438.1757) mem 8977MB [2024-07-29 22:15:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:15:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:15:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:18:35 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 70) [2024-07-29 22:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][570/625] eta 0:02:46 lr 0.001843 wd 0.0500 time 0.2117 (3.0290) data time 0.0007 (0.2501) model time 0.2109 (2.7789) loss 2.8128 (3.5498) grad_norm 0.7531 (0.9977) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][580/625] eta 0:00:38 lr 0.001843 wd 0.0500 time 0.2133 (0.8625) data time 0.0010 (0.0586) model time 0.2123 (0.8039) loss 3.5187 (3.6183) grad_norm 1.7208 (1.2472) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][590/625] eta 0:00:20 lr 0.001843 wd 0.0500 time 0.2015 (0.5796) data time 0.0009 (0.0336) model time 0.2006 (0.5460) loss 4.2388 (3.6620) grad_norm 0.9668 (1.2198) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][600/625] eta 0:00:11 lr 0.001843 wd 0.0500 time 0.2107 (0.4686) data time 0.0009 (0.0237) model time 0.2098 (0.4449) loss 4.2914 (3.6470) grad_norm 1.1174 (1.2519) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][610/625] eta 0:00:06 lr 0.001842 wd 0.0500 time 0.2096 (0.4089) data time 0.0007 (0.0185) model time 0.2089 (0.3904) loss 3.5616 (3.5988) grad_norm 1.1776 (1.2085) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][620/625] eta 0:00:01 lr 0.001842 wd 0.0500 time 0.2201 (0.3712) data time 0.0007 (0.0152) model time 0.2194 (0.3560) loss 3.4553 (3.5749) grad_norm 1.1678 (1.2175) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:19:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 70 training takes 0:00:20 [2024-07-29 22:19:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:19:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.7812 (0.7812) Acc@1 84.814 (84.814) Acc@5 97.070 (97.070) Mem 8975MB [2024-07-29 22:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.095) Loss 1.2715 (0.9496) Acc@1 71.240 (80.589) Acc@5 92.676 (95.858) Mem 8975MB [2024-07-29 22:19:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 1.3799 (1.1207) Acc@1 68.652 (76.393) Acc@5 90.381 (93.466) Mem 8975MB [2024-07-29 22:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.212 Acc@5 93.460 [2024-07-29 22:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.2% [2024-07-29 22:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.911 (0.911) Loss 0.5620 (0.5620) Acc@1 86.670 (86.670) Acc@5 97.754 (97.754) Mem 8975MB [2024-07-29 22:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.136) Loss 0.9873 (0.7326) Acc@1 75.391 (82.453) Acc@5 93.408 (96.453) Mem 8975MB [2024-07-29 22:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.097) Loss 1.1621 (0.8923) Acc@1 70.752 (78.546) Acc@5 91.846 (94.508) Mem 8975MB [2024-07-29 22:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.315 Acc@5 94.484 [2024-07-29 22:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.3% [2024-07-29 22:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.31% [2024-07-29 22:19:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:19:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][0/625] eta 0:07:16 lr 0.001842 wd 0.0500 time 0.6978 (0.6978) data time 0.3989 (0.3989) model time 0.0000 (0.0000) loss 2.7240 (2.7240) grad_norm 0.9629 (0.9629) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 22:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][10/625] eta 0:02:38 lr 0.001842 wd 0.0500 time 0.2097 (0.2578) data time 0.0008 (0.0372) model time 0.0000 (0.0000) loss 2.4624 (3.2130) grad_norm 0.9676 (1.2279) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][20/625] eta 0:02:23 lr 0.001842 wd 0.0500 time 0.2148 (0.2378) data time 0.0007 (0.0200) model time 0.0000 (0.0000) loss 2.5456 (3.2136) grad_norm 0.8694 (1.1631) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][30/625] eta 0:02:16 lr 0.001842 wd 0.0500 time 0.2069 (0.2288) data time 0.0010 (0.0139) model time 0.0000 (0.0000) loss 3.6065 (3.2318) grad_norm 0.8276 (1.1264) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][40/625] eta 0:02:11 lr 0.001842 wd 0.0500 time 0.2072 (0.2246) data time 0.0009 (0.0107) model time 0.0000 (0.0000) loss 3.8985 (3.3196) grad_norm 1.8988 (1.1804) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][50/625] eta 0:02:07 lr 0.001842 wd 0.0500 time 0.2048 (0.2218) data time 0.0008 (0.0088) model time 0.0000 (0.0000) loss 2.6380 (3.3717) grad_norm 1.8242 (1.1916) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][60/625] eta 0:02:04 lr 0.001842 wd 0.0500 time 0.2122 (0.2199) data time 0.0009 (0.0075) model time 0.2114 (0.2095) loss 3.2630 (3.4028) grad_norm 1.2005 (1.1894) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][70/625] eta 0:02:01 lr 0.001842 wd 0.0500 time 0.2080 (0.2186) data time 0.0007 (0.0066) model time 0.2073 (0.2095) loss 3.3181 (3.3806) grad_norm 0.8772 (1.2239) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][80/625] eta 0:01:58 lr 0.001842 wd 0.0500 time 0.2083 (0.2179) data time 0.0009 (0.0060) model time 0.2074 (0.2102) loss 3.3042 (3.3752) grad_norm 1.6515 (1.2095) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][90/625] eta 0:01:56 lr 0.001841 wd 0.0500 time 0.2159 (0.2174) data time 0.0008 (0.0054) model time 0.2151 (0.2107) loss 3.7882 (3.3729) grad_norm 1.0174 (1.2150) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][100/625] eta 0:01:53 lr 0.001841 wd 0.0500 time 0.2138 (0.2170) data time 0.0008 (0.0050) model time 0.2129 (0.2110) loss 2.6450 (3.3645) grad_norm 3.3763 (1.2873) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][110/625] eta 0:01:51 lr 0.001841 wd 0.0500 time 0.2214 (0.2168) data time 0.0008 (0.0046) model time 0.2205 (0.2114) loss 3.3480 (3.3863) grad_norm 1.2969 (1.2760) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][120/625] eta 0:01:49 lr 0.001841 wd 0.0500 time 0.2047 (0.2165) data time 0.0009 (0.0043) model time 0.2038 (0.2116) loss 3.0195 (3.3734) grad_norm 1.4365 (1.2752) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][130/625] eta 0:01:47 lr 0.001841 wd 0.0500 time 0.2182 (0.2162) data time 0.0008 (0.0041) model time 0.2174 (0.2116) loss 3.5028 (3.3685) grad_norm 1.1054 (1.2678) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][140/625] eta 0:01:44 lr 0.001841 wd 0.0500 time 0.2065 (0.2160) data time 0.0011 (0.0039) model time 0.2054 (0.2116) loss 2.9706 (3.3610) grad_norm 1.6557 (1.2572) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][150/625] eta 0:01:42 lr 0.001841 wd 0.0500 time 0.2140 (0.2158) data time 0.0010 (0.0037) model time 0.2130 (0.2117) loss 3.5768 (3.3457) grad_norm 1.1319 (1.2448) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][160/625] eta 0:01:40 lr 0.001841 wd 0.0500 time 0.2148 (0.2158) data time 0.0010 (0.0035) model time 0.2138 (0.2120) loss 2.7240 (3.3392) grad_norm 1.0521 (1.2345) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][170/625] eta 0:01:38 lr 0.001841 wd 0.0500 time 0.2087 (0.2156) data time 0.0011 (0.0034) model time 0.2076 (0.2120) loss 4.1224 (3.3524) grad_norm 1.2731 (1.2357) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:19:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][180/625] eta 0:01:35 lr 0.001841 wd 0.0500 time 0.2056 (0.2156) data time 0.0009 (0.0032) model time 0.2047 (0.2121) loss 3.8779 (3.3509) grad_norm 1.0093 (1.2278) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][190/625] eta 0:01:33 lr 0.001840 wd 0.0500 time 0.2107 (0.2156) data time 0.0011 (0.0031) model time 0.2096 (0.2123) loss 2.7497 (3.3460) grad_norm 1.0016 (1.2249) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][200/625] eta 0:01:31 lr 0.001840 wd 0.0500 time 0.2232 (0.2158) data time 0.0011 (0.0030) model time 0.2221 (0.2127) loss 3.4968 (3.3350) grad_norm 1.2598 (1.2358) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][210/625] eta 0:01:29 lr 0.001840 wd 0.0500 time 0.2188 (0.2158) data time 0.0008 (0.0029) model time 0.2180 (0.2128) loss 3.8715 (3.3283) grad_norm 1.9061 (1.2408) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][220/625] eta 0:01:27 lr 0.001840 wd 0.0500 time 0.2183 (0.2159) data time 0.0010 (0.0028) model time 0.2173 (0.2131) loss 2.3304 (3.3306) grad_norm 1.7208 (1.2553) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][230/625] eta 0:01:25 lr 0.001840 wd 0.0500 time 0.2112 (0.2159) data time 0.0008 (0.0028) model time 0.2104 (0.2131) loss 3.7057 (3.3337) grad_norm 0.9844 (1.2544) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][240/625] eta 0:01:23 lr 0.001840 wd 0.0500 time 0.2073 (0.2159) data time 0.0010 (0.0027) model time 0.2063 (0.2132) loss 2.9180 (3.3242) grad_norm 1.6826 (1.2456) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][250/625] eta 0:01:20 lr 0.001840 wd 0.0500 time 0.2050 (0.2157) data time 0.0008 (0.0026) model time 0.2042 (0.2132) loss 2.9342 (3.3215) grad_norm 1.4978 (1.2548) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][260/625] eta 0:01:18 lr 0.001840 wd 0.0500 time 0.2166 (0.2156) data time 0.0010 (0.0026) model time 0.2156 (0.2131) loss 2.6860 (3.3313) grad_norm 1.0892 (1.2524) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][270/625] eta 0:01:16 lr 0.001840 wd 0.0500 time 0.2194 (0.2156) data time 0.0007 (0.0025) model time 0.2187 (0.2131) loss 3.5858 (3.3445) grad_norm 1.2302 (1.2425) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][280/625] eta 0:01:14 lr 0.001840 wd 0.0500 time 0.2200 (0.2155) data time 0.0009 (0.0025) model time 0.2191 (0.2131) loss 3.7292 (3.3416) grad_norm 1.3212 (1.2518) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][290/625] eta 0:01:12 lr 0.001839 wd 0.0500 time 0.2115 (0.2155) data time 0.0008 (0.0024) model time 0.2107 (0.2131) loss 3.2737 (3.3437) grad_norm 1.2411 (1.2503) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][300/625] eta 0:01:09 lr 0.001839 wd 0.0500 time 0.2133 (0.2154) data time 0.0009 (0.0024) model time 0.2124 (0.2130) loss 3.1982 (3.3434) grad_norm 1.0216 (1.2442) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][310/625] eta 0:01:07 lr 0.001839 wd 0.0500 time 0.2063 (0.2152) data time 0.0007 (0.0023) model time 0.2056 (0.2129) loss 2.9244 (3.3411) grad_norm 1.2262 (1.2394) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][320/625] eta 0:01:05 lr 0.001839 wd 0.0500 time 0.2232 (0.2152) data time 0.0009 (0.0023) model time 0.2223 (0.2130) loss 3.4887 (3.3423) grad_norm 0.9786 (1.2377) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][330/625] eta 0:01:03 lr 0.001839 wd 0.0500 time 0.2122 (0.2160) data time 0.0007 (0.0022) model time 0.2115 (0.2139) loss 2.8988 (3.3364) grad_norm 0.9005 (1.2339) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][340/625] eta 0:01:01 lr 0.001839 wd 0.0500 time 0.2222 (0.2160) data time 0.0008 (0.0022) model time 0.2215 (0.2139) loss 3.3522 (3.3406) grad_norm 1.0440 (1.2319) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][350/625] eta 0:00:59 lr 0.001839 wd 0.0500 time 0.2223 (0.2161) data time 0.0011 (0.0022) model time 0.2213 (0.2140) loss 3.5481 (3.3490) grad_norm 1.6799 (1.2441) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][360/625] eta 0:00:57 lr 0.001839 wd 0.0500 time 0.2110 (0.2161) data time 0.0008 (0.0021) model time 0.2102 (0.2141) loss 4.0002 (3.3514) grad_norm 2.0562 (1.2450) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][370/625] eta 0:00:55 lr 0.001839 wd 0.0500 time 0.2207 (0.2160) data time 0.0008 (0.0021) model time 0.2199 (0.2141) loss 4.6537 (3.3552) grad_norm 1.4369 (1.2438) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][380/625] eta 0:00:53 lr 0.001839 wd 0.0500 time 0.2075 (0.2167) data time 0.0011 (0.0021) model time 0.2063 (0.2149) loss 3.4650 (3.3620) grad_norm 0.9251 (1.2428) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][390/625] eta 0:00:50 lr 0.001839 wd 0.0500 time 0.2117 (0.2166) data time 0.0010 (0.0021) model time 0.2107 (0.2148) loss 2.6988 (3.3603) grad_norm 1.0108 (1.2436) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][400/625] eta 0:00:48 lr 0.001838 wd 0.0500 time 0.2160 (0.2167) data time 0.0009 (0.0021) model time 0.2151 (0.2149) loss 3.2559 (3.3553) grad_norm 0.9136 (1.2405) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][410/625] eta 0:00:46 lr 0.001838 wd 0.0500 time 0.2080 (0.2166) data time 0.0010 (0.0020) model time 0.2070 (0.2148) loss 3.2230 (3.3525) grad_norm 0.9140 (1.2385) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][420/625] eta 0:00:44 lr 0.001838 wd 0.0500 time 0.2120 (0.2166) data time 0.0010 (0.0020) model time 0.2110 (0.2148) loss 2.4849 (3.3463) grad_norm 1.0728 (1.2331) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][430/625] eta 0:00:42 lr 0.001838 wd 0.0500 time 0.2084 (0.2171) data time 0.0009 (0.0020) model time 0.2075 (0.2154) loss 3.8036 (3.3547) grad_norm 2.3680 (1.2409) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][440/625] eta 0:00:40 lr 0.001838 wd 0.0500 time 0.2138 (0.2171) data time 0.0007 (0.0020) model time 0.2131 (0.2154) loss 3.6638 (3.3556) grad_norm 1.5901 (1.2446) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][450/625] eta 0:00:37 lr 0.001838 wd 0.0500 time 0.2105 (0.2169) data time 0.0007 (0.0020) model time 0.2098 (0.2153) loss 4.0304 (3.3572) grad_norm 1.7946 (1.2465) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][460/625] eta 0:00:35 lr 0.001838 wd 0.0500 time 0.2182 (0.2169) data time 0.0010 (0.0019) model time 0.2172 (0.2152) loss 3.8505 (3.3647) grad_norm 1.1787 (1.2442) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][470/625] eta 0:00:33 lr 0.001838 wd 0.0500 time 0.2183 (0.2169) data time 0.0008 (0.0019) model time 0.2176 (0.2152) loss 2.1764 (3.3560) grad_norm 0.8641 (1.2433) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][480/625] eta 0:00:31 lr 0.001838 wd 0.0500 time 0.2169 (0.2169) data time 0.0007 (0.0019) model time 0.2162 (0.2152) loss 4.0133 (3.3570) grad_norm 1.2812 (1.2416) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][490/625] eta 0:00:29 lr 0.001838 wd 0.0500 time 0.2086 (0.2169) data time 0.0009 (0.0019) model time 0.2076 (0.2152) loss 3.7875 (3.3594) grad_norm 1.7729 (1.2466) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][500/625] eta 0:00:27 lr 0.001837 wd 0.0500 time 0.2104 (0.2169) data time 0.0010 (0.0019) model time 0.2094 (0.2152) loss 3.7293 (3.3635) grad_norm 1.6450 (1.2455) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][510/625] eta 0:00:24 lr 0.001837 wd 0.0500 time 0.2060 (0.2168) data time 0.0009 (0.0019) model time 0.2051 (0.2151) loss 4.0728 (3.3680) grad_norm 1.1918 (1.2447) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][520/625] eta 0:00:22 lr 0.001837 wd 0.0500 time 0.2138 (0.2168) data time 0.0009 (0.0019) model time 0.2129 (0.2151) loss 3.6659 (3.3679) grad_norm 1.0766 (1.2409) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][530/625] eta 0:00:20 lr 0.001837 wd 0.0500 time 0.2160 (0.2168) data time 0.0010 (0.0019) model time 0.2150 (0.2151) loss 3.3476 (3.3698) grad_norm 1.4702 (1.2417) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][540/625] eta 0:00:18 lr 0.001837 wd 0.0500 time 0.2081 (0.2167) data time 0.0010 (0.0018) model time 0.2071 (0.2151) loss 2.8469 (3.3697) grad_norm 1.2538 (1.2455) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][550/625] eta 0:00:16 lr 0.001837 wd 0.0500 time 0.2129 (0.2167) data time 0.0009 (0.0018) model time 0.2120 (0.2150) loss 3.0619 (3.3671) grad_norm 0.9984 (1.2432) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][560/625] eta 0:00:14 lr 0.001837 wd 0.0500 time 0.2112 (0.2167) data time 0.0010 (0.0018) model time 0.2103 (0.2151) loss 3.4955 (3.3706) grad_norm 1.2744 (1.2398) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][570/625] eta 0:00:11 lr 0.001837 wd 0.0500 time 0.2082 (0.2167) data time 0.0008 (0.0018) model time 0.2073 (0.2151) loss 2.8628 (3.3732) grad_norm 1.1070 (1.2387) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][580/625] eta 0:00:09 lr 0.001837 wd 0.0500 time 0.2139 (0.2167) data time 0.0009 (0.0018) model time 0.2130 (0.2151) loss 3.6937 (3.3767) grad_norm 1.6300 (1.2409) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][590/625] eta 0:00:07 lr 0.001837 wd 0.0500 time 0.2155 (0.2167) data time 0.0011 (0.0018) model time 0.2144 (0.2151) loss 3.0814 (3.3730) grad_norm 1.5416 (1.2411) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][600/625] eta 0:00:05 lr 0.001836 wd 0.0500 time 0.2148 (0.2167) data time 0.0007 (0.0018) model time 0.2141 (0.2151) loss 2.5386 (3.3701) grad_norm 2.0259 (1.2420) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][610/625] eta 0:00:03 lr 0.001836 wd 0.0500 time 0.2127 (0.2167) data time 0.0005 (0.0018) model time 0.2121 (0.2151) loss 3.7268 (3.3711) grad_norm 0.9448 (1.2402) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][620/625] eta 0:00:01 lr 0.001836 wd 0.0500 time 0.2140 (0.2166) data time 0.0007 (0.0017) model time 0.2133 (0.2151) loss 3.6842 (3.3747) grad_norm 1.4243 (1.2417) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 71 training takes 0:02:15 [2024-07-29 22:21:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:21:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.7788 (0.7788) Acc@1 85.645 (85.645) Acc@5 97.949 (97.949) Mem 8978MB [2024-07-29 22:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.2637 (0.9670) Acc@1 73.682 (80.708) Acc@5 92.432 (95.827) Mem 8978MB [2024-07-29 22:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.081) Loss 1.4658 (1.1383) Acc@1 68.164 (76.753) Acc@5 89.502 (93.631) Mem 8978MB [2024-07-29 22:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.460 Acc@5 93.594 [2024-07-29 22:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.5% [2024-07-29 22:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.741 (0.741) Loss 0.5620 (0.5620) Acc@1 86.719 (86.719) Acc@5 97.656 (97.656) Mem 8978MB [2024-07-29 22:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.140) Loss 0.9824 (0.7305) Acc@1 75.391 (82.511) Acc@5 93.359 (96.467) Mem 8978MB [2024-07-29 22:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.100) Loss 1.1572 (0.8888) Acc@1 70.801 (78.702) Acc@5 91.846 (94.543) Mem 8978MB [2024-07-29 22:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.475 Acc@5 94.520 [2024-07-29 22:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-29 22:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.47% [2024-07-29 22:21:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:21:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][0/625] eta 0:09:55 lr 0.001836 wd 0.0500 time 0.9523 (0.9523) data time 0.7568 (0.7568) model time 0.0000 (0.0000) loss 3.2201 (3.2201) grad_norm 1.1050 (1.1050) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][10/625] eta 0:02:53 lr 0.001836 wd 0.0500 time 0.2349 (0.2816) data time 0.0009 (0.0697) model time 0.0000 (0.0000) loss 4.1084 (3.2760) grad_norm 1.3238 (1.4569) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][20/625] eta 0:02:30 lr 0.001836 wd 0.0500 time 0.2092 (0.2494) data time 0.0007 (0.0369) model time 0.0000 (0.0000) loss 3.7764 (3.2314) grad_norm 2.9241 (1.5489) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][30/625] eta 0:02:21 lr 0.001836 wd 0.0500 time 0.2156 (0.2383) data time 0.0007 (0.0253) model time 0.0000 (0.0000) loss 3.1982 (3.2218) grad_norm 0.8908 (1.4810) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][40/625] eta 0:02:15 lr 0.001836 wd 0.0500 time 0.2147 (0.2320) data time 0.0007 (0.0194) model time 0.0000 (0.0000) loss 3.8459 (3.2599) grad_norm 1.5002 (1.4326) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][50/625] eta 0:02:11 lr 0.001836 wd 0.0500 time 0.2265 (0.2287) data time 0.0010 (0.0158) model time 0.0000 (0.0000) loss 3.5115 (3.3000) grad_norm 1.1338 (1.4040) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][60/625] eta 0:02:08 lr 0.001836 wd 0.0500 time 0.2821 (0.2273) data time 0.0010 (0.0134) model time 0.2811 (0.2187) loss 3.2604 (3.3543) grad_norm 0.9295 (1.3785) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][70/625] eta 0:02:05 lr 0.001836 wd 0.0500 time 0.2165 (0.2252) data time 0.0009 (0.0117) model time 0.2156 (0.2152) loss 2.1409 (3.3626) grad_norm 1.6381 (1.3731) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][80/625] eta 0:02:01 lr 0.001835 wd 0.0500 time 0.2143 (0.2238) data time 0.0007 (0.0104) model time 0.2136 (0.2143) loss 3.2521 (3.3794) grad_norm 0.9490 (1.3563) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][90/625] eta 0:01:59 lr 0.001835 wd 0.0500 time 0.2128 (0.2226) data time 0.0009 (0.0093) model time 0.2119 (0.2136) loss 4.2932 (3.3995) grad_norm 1.0149 (1.3163) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][100/625] eta 0:01:56 lr 0.001835 wd 0.0500 time 0.2120 (0.2220) data time 0.0009 (0.0085) model time 0.2111 (0.2141) loss 3.8015 (3.4040) grad_norm 1.2805 (1.3214) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][110/625] eta 0:01:53 lr 0.001835 wd 0.0500 time 0.2207 (0.2213) data time 0.0009 (0.0079) model time 0.2198 (0.2138) loss 3.5374 (3.3873) grad_norm 0.9322 (1.3595) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][120/625] eta 0:01:51 lr 0.001835 wd 0.0500 time 0.2428 (0.2208) data time 0.0011 (0.0073) model time 0.2417 (0.2139) loss 3.0324 (3.3764) grad_norm 1.5959 (1.3684) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][130/625] eta 0:01:49 lr 0.001835 wd 0.0500 time 0.2934 (0.2215) data time 0.0008 (0.0069) model time 0.2925 (0.2156) loss 2.3904 (3.3475) grad_norm 1.0875 (1.3552) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][140/625] eta 0:01:47 lr 0.001835 wd 0.0500 time 0.2100 (0.2212) data time 0.0012 (0.0066) model time 0.2089 (0.2155) loss 2.9179 (3.3506) grad_norm 0.9374 (1.3396) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][150/625] eta 0:01:44 lr 0.001835 wd 0.0500 time 0.2134 (0.2208) data time 0.0008 (0.0062) model time 0.2126 (0.2154) loss 3.5718 (3.3356) grad_norm 0.9355 (1.3316) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][160/625] eta 0:01:42 lr 0.001835 wd 0.0500 time 0.2234 (0.2204) data time 0.0008 (0.0060) model time 0.2226 (0.2152) loss 3.8641 (3.3298) grad_norm 1.5977 (1.3327) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][170/625] eta 0:01:40 lr 0.001835 wd 0.0500 time 0.2204 (0.2200) data time 0.0008 (0.0057) model time 0.2196 (0.2149) loss 3.1801 (3.3247) grad_norm 1.3889 (1.3176) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][180/625] eta 0:01:37 lr 0.001834 wd 0.0500 time 0.2175 (0.2197) data time 0.0009 (0.0054) model time 0.2166 (0.2148) loss 2.3249 (3.3258) grad_norm 1.0004 (1.3088) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:22:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:22:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:25:25 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:30:02 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:30:16 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:30:17 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:30:17 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:30:17 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 72) [2024-07-29 22:30:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][190/625] eta 0:15:32 lr 0.001834 wd 0.0500 time 0.2160 (2.1435) data time 0.0009 (0.1502) model time 0.2150 (1.9933) loss 4.0515 (3.7278) grad_norm 1.0273 (0.9597) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][200/625] eta 0:06:05 lr 0.001834 wd 0.0500 time 0.2199 (0.8600) data time 0.0010 (0.0508) model time 0.2188 (0.8093) loss 3.5639 (3.6112) grad_norm 1.1096 (1.0919) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][210/625] eta 0:04:10 lr 0.001834 wd 0.0500 time 0.2147 (0.6043) data time 0.0010 (0.0311) model time 0.2137 (0.5732) loss 4.0843 (3.6611) grad_norm 1.1508 (1.1389) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][220/625] eta 0:03:19 lr 0.001834 wd 0.0500 time 0.2156 (0.4928) data time 0.0010 (0.0225) model time 0.2145 (0.4703) loss 3.2140 (3.6258) grad_norm 1.3528 (1.1807) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][230/625] eta 0:02:50 lr 0.001834 wd 0.0500 time 0.2132 (0.4317) data time 0.0010 (0.0177) model time 0.2121 (0.4140) loss 3.4044 (3.5917) grad_norm 1.2058 (1.2024) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][240/625] eta 0:02:31 lr 0.001834 wd 0.0500 time 0.2139 (0.3922) data time 0.0008 (0.0147) model time 0.2131 (0.3776) loss 2.9005 (3.5824) grad_norm 1.0407 (1.2549) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][250/625] eta 0:02:16 lr 0.001834 wd 0.0500 time 0.2046 (0.3648) data time 0.0010 (0.0126) model time 0.2035 (0.3522) loss 3.5456 (3.5457) grad_norm 1.0144 (1.2201) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][260/625] eta 0:02:05 lr 0.001834 wd 0.0500 time 0.2209 (0.3451) data time 0.0009 (0.0111) model time 0.2200 (0.3341) loss 2.6517 (3.4982) grad_norm 1.1147 (1.2025) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][270/625] eta 0:01:57 lr 0.001834 wd 0.0500 time 0.2169 (0.3298) data time 0.0007 (0.0099) model time 0.2162 (0.3200) loss 3.4148 (3.4649) grad_norm 1.4209 (1.2261) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][280/625] eta 0:01:49 lr 0.001833 wd 0.0500 time 0.2059 (0.3180) data time 0.0013 (0.0090) model time 0.2046 (0.3090) loss 3.6783 (3.4540) grad_norm 1.2344 (1.2197) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][290/625] eta 0:01:43 lr 0.001833 wd 0.0500 time 0.2156 (0.3080) data time 0.0010 (0.0082) model time 0.2146 (0.2998) loss 3.1586 (3.4695) grad_norm 1.7102 (1.2581) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][300/625] eta 0:01:37 lr 0.001833 wd 0.0500 time 0.2198 (0.2999) data time 0.0007 (0.0076) model time 0.2191 (0.2923) loss 2.7652 (3.4534) grad_norm 1.2367 (1.2720) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][310/625] eta 0:01:32 lr 0.001833 wd 0.0500 time 0.2135 (0.2933) data time 0.0009 (0.0071) model time 0.2126 (0.2861) loss 3.0998 (3.4501) grad_norm 1.7607 (1.2788) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][320/625] eta 0:01:27 lr 0.001833 wd 0.0500 time 0.2056 (0.2878) data time 0.0009 (0.0067) model time 0.2047 (0.2811) loss 3.0441 (3.4530) grad_norm 1.0450 (1.2622) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][330/625] eta 0:01:23 lr 0.001833 wd 0.0500 time 0.2206 (0.2827) data time 0.0015 (0.0063) model time 0.2191 (0.2764) loss 3.4657 (3.4455) grad_norm 0.9282 (1.2627) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][340/625] eta 0:01:19 lr 0.001833 wd 0.0500 time 0.2071 (0.2783) data time 0.0011 (0.0060) model time 0.2061 (0.2724) loss 3.9028 (3.4351) grad_norm 1.2902 (1.2669) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][350/625] eta 0:01:15 lr 0.001833 wd 0.0500 time 0.2005 (0.2743) data time 0.0011 (0.0057) model time 0.1994 (0.2686) loss 3.5976 (3.4336) grad_norm 0.8265 (1.2667) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][360/625] eta 0:01:11 lr 0.001833 wd 0.0500 time 0.2221 (0.2713) data time 0.0007 (0.0054) model time 0.2214 (0.2659) loss 3.7419 (3.4260) grad_norm 1.2052 (1.2639) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][370/625] eta 0:01:08 lr 0.001833 wd 0.0500 time 0.2144 (0.2683) data time 0.0011 (0.0052) model time 0.2133 (0.2631) loss 3.3995 (3.4186) grad_norm 1.0029 (1.2533) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][380/625] eta 0:01:05 lr 0.001832 wd 0.0500 time 0.2121 (0.2656) data time 0.0011 (0.0050) model time 0.2110 (0.2606) loss 2.2552 (3.4107) grad_norm 1.0187 (1.2542) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][390/625] eta 0:01:01 lr 0.001832 wd 0.0500 time 0.2077 (0.2629) data time 0.0011 (0.0048) model time 0.2066 (0.2581) loss 3.2217 (3.4016) grad_norm 1.8339 (1.2586) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][400/625] eta 0:00:58 lr 0.001832 wd 0.0500 time 0.2120 (0.2606) data time 0.0010 (0.0046) model time 0.2111 (0.2560) loss 3.4806 (3.3979) grad_norm 1.0358 (1.2556) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][410/625] eta 0:00:55 lr 0.001832 wd 0.0500 time 0.2194 (0.2587) data time 0.0008 (0.0044) model time 0.2185 (0.2543) loss 3.6238 (3.3962) grad_norm 1.1634 (1.2518) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][420/625] eta 0:00:52 lr 0.001832 wd 0.0500 time 0.2065 (0.2571) data time 0.0011 (0.0043) model time 0.2054 (0.2528) loss 3.8615 (3.3924) grad_norm 1.3455 (1.2444) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][430/625] eta 0:00:49 lr 0.001832 wd 0.0500 time 0.2086 (0.2553) data time 0.0010 (0.0042) model time 0.2076 (0.2511) loss 3.4654 (3.3969) grad_norm 1.6220 (1.2487) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][440/625] eta 0:00:46 lr 0.001832 wd 0.0500 time 0.2042 (0.2536) data time 0.0009 (0.0040) model time 0.2033 (0.2496) loss 3.0268 (3.3850) grad_norm 0.7836 (1.2419) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][450/625] eta 0:00:44 lr 0.001832 wd 0.0500 time 0.2088 (0.2520) data time 0.0007 (0.0039) model time 0.2081 (0.2481) loss 2.4509 (3.3760) grad_norm 1.1576 (1.2450) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][460/625] eta 0:00:41 lr 0.001832 wd 0.0500 time 0.2175 (0.2506) data time 0.0011 (0.0038) model time 0.2164 (0.2468) loss 3.9329 (3.3790) grad_norm 0.9745 (1.2474) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][470/625] eta 0:00:38 lr 0.001832 wd 0.0500 time 0.2109 (0.2494) data time 0.0012 (0.0037) model time 0.2097 (0.2457) loss 4.0353 (3.3775) grad_norm 0.8947 (1.2436) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][480/625] eta 0:00:35 lr 0.001831 wd 0.0500 time 0.2166 (0.2482) data time 0.0012 (0.0036) model time 0.2153 (0.2446) loss 3.7655 (3.3740) grad_norm 1.3363 (1.2411) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][490/625] eta 0:00:33 lr 0.001831 wd 0.0500 time 0.2155 (0.2472) data time 0.0009 (0.0036) model time 0.2146 (0.2437) loss 2.8305 (3.3618) grad_norm 1.0357 (1.2377) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][500/625] eta 0:00:30 lr 0.001831 wd 0.0500 time 0.2044 (0.2461) data time 0.0011 (0.0035) model time 0.2033 (0.2427) loss 3.7571 (3.3646) grad_norm 1.1432 (1.2513) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][510/625] eta 0:00:28 lr 0.001831 wd 0.0500 time 0.2244 (0.2453) data time 0.0008 (0.0034) model time 0.2235 (0.2419) loss 3.5071 (3.3759) grad_norm 1.5042 (1.2527) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][520/625] eta 0:00:25 lr 0.001831 wd 0.0500 time 0.2153 (0.2444) data time 0.0007 (0.0033) model time 0.2145 (0.2411) loss 4.0200 (3.3736) grad_norm 0.8615 (1.2596) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][530/625] eta 0:00:23 lr 0.001831 wd 0.0500 time 0.2119 (0.2436) data time 0.0007 (0.0033) model time 0.2112 (0.2403) loss 2.9297 (3.3787) grad_norm 2.7031 (1.2634) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][540/625] eta 0:00:20 lr 0.001831 wd 0.0500 time 0.2081 (0.2427) data time 0.0011 (0.0032) model time 0.2070 (0.2395) loss 3.3091 (3.3814) grad_norm 0.9417 (1.2753) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][550/625] eta 0:00:18 lr 0.001831 wd 0.0500 time 0.2144 (0.2419) data time 0.0010 (0.0031) model time 0.2134 (0.2388) loss 3.5157 (3.3792) grad_norm 1.2789 (1.2726) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][560/625] eta 0:00:15 lr 0.001831 wd 0.0500 time 0.2122 (0.2411) data time 0.0008 (0.0031) model time 0.2115 (0.2381) loss 2.9921 (3.3781) grad_norm 1.0579 (1.2694) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][570/625] eta 0:00:13 lr 0.001831 wd 0.0500 time 0.2115 (0.2409) data time 0.0007 (0.0030) model time 0.2107 (0.2379) loss 2.4499 (3.3691) grad_norm 1.4126 (1.2673) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][580/625] eta 0:00:10 lr 0.001831 wd 0.0500 time 0.2048 (0.2408) data time 0.0010 (0.0030) model time 0.2037 (0.2378) loss 3.7740 (3.3712) grad_norm 1.5347 (1.2651) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][590/625] eta 0:00:08 lr 0.001830 wd 0.0500 time 0.2069 (0.2401) data time 0.0007 (0.0029) model time 0.2062 (0.2371) loss 3.4840 (3.3757) grad_norm 1.2620 (1.2681) loss_scale 8192.0000 (8192.0000) mem 8974MB [2024-07-29 22:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:31:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:32:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:36:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:36:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:36:43 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:36:54 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:36:55 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:36:55 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:36:55 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 72) [2024-07-29 22:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][600/625] eta 0:00:57 lr 0.001830 wd 0.0500 time 0.1995 (2.2884) data time 0.0007 (0.2195) model time 0.1987 (2.0689) loss 3.9211 (3.7768) grad_norm 1.0012 (1.2814) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][610/625] eta 0:00:11 lr 0.001830 wd 0.0500 time 0.1976 (0.7964) data time 0.0004 (0.0636) model time 0.1972 (0.7328) loss 3.7211 (3.6147) grad_norm 0.9246 (1.2518) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][620/625] eta 0:00:02 lr 0.001830 wd 0.0500 time 0.1981 (0.5472) data time 0.0006 (0.0373) model time 0.1975 (0.5099) loss 3.3451 (3.6167) grad_norm 1.2389 (1.2612) loss_scale 8192.0000 (8192.0000) mem 8975MB [2024-07-29 22:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 72 training takes 0:00:13 [2024-07-29 22:37:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:37:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.399 (0.399) Loss 0.8169 (0.8169) Acc@1 86.035 (86.035) Acc@5 97.412 (97.412) Mem 8975MB [2024-07-29 22:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 1.3115 (0.9897) Acc@1 72.461 (80.842) Acc@5 92.383 (95.969) Mem 8975MB [2024-07-29 22:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.4150 (1.1599) Acc@1 70.264 (76.788) Acc@5 91.113 (93.710) Mem 8975MB [2024-07-29 22:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.536 Acc@5 93.646 [2024-07-29 22:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.5% [2024-07-29 22:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.694 (0.694) Loss 0.5625 (0.5625) Acc@1 86.768 (86.768) Acc@5 97.705 (97.705) Mem 8975MB [2024-07-29 22:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.126) Loss 0.9775 (0.7289) Acc@1 75.586 (82.591) Acc@5 93.604 (96.515) Mem 8975MB [2024-07-29 22:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.1484 (0.8858) Acc@1 70.898 (78.767) Acc@5 91.846 (94.615) Mem 8975MB [2024-07-29 22:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.541 Acc@5 94.588 [2024-07-29 22:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-29 22:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.54% [2024-07-29 22:37:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:37:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][0/625] eta 0:08:00 lr 0.001830 wd 0.0500 time 0.7686 (0.7686) data time 0.4196 (0.4196) model time 0.0000 (0.0000) loss 3.9902 (3.9902) grad_norm 0.9869 (0.9869) loss_scale 8192.0000 (8192.0000) mem 8971MB [2024-07-29 22:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][10/625] eta 0:02:33 lr 0.001830 wd 0.0500 time 0.1958 (0.2502) data time 0.0009 (0.0390) model time 0.0000 (0.0000) loss 3.2275 (3.5498) grad_norm 1.1222 (1.1091) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][20/625] eta 0:02:16 lr 0.001830 wd 0.0500 time 0.1969 (0.2259) data time 0.0010 (0.0209) model time 0.0000 (0.0000) loss 3.4242 (3.4935) grad_norm 1.5420 (1.1821) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][30/625] eta 0:02:09 lr 0.001830 wd 0.0500 time 0.1981 (0.2179) data time 0.0006 (0.0144) model time 0.0000 (0.0000) loss 2.5939 (3.4365) grad_norm 1.1358 (1.1552) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][40/625] eta 0:02:04 lr 0.001830 wd 0.0500 time 0.1982 (0.2133) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 3.7617 (3.4128) grad_norm 1.1832 (1.1993) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][50/625] eta 0:02:01 lr 0.001830 wd 0.0500 time 0.2010 (0.2107) data time 0.0008 (0.0091) model time 0.0000 (0.0000) loss 3.4927 (3.3892) grad_norm 0.8606 (1.1951) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][60/625] eta 0:01:57 lr 0.001829 wd 0.0500 time 0.1956 (0.2087) data time 0.0007 (0.0078) model time 0.1949 (0.1976) loss 3.5929 (3.3784) grad_norm 1.0850 (1.2576) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][70/625] eta 0:01:55 lr 0.001829 wd 0.0500 time 0.2003 (0.2073) data time 0.0009 (0.0069) model time 0.1994 (0.1974) loss 3.4446 (3.3989) grad_norm 2.0201 (1.2795) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][80/625] eta 0:01:52 lr 0.001829 wd 0.0500 time 0.1992 (0.2064) data time 0.0007 (0.0061) model time 0.1985 (0.1978) loss 3.7098 (3.4096) grad_norm 0.9985 (1.2575) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][90/625] eta 0:01:50 lr 0.001829 wd 0.0500 time 0.1969 (0.2057) data time 0.0009 (0.0056) model time 0.1961 (0.1982) loss 3.6184 (3.4148) grad_norm 1.1081 (1.2536) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][100/625] eta 0:01:47 lr 0.001829 wd 0.0500 time 0.1982 (0.2051) data time 0.0007 (0.0051) model time 0.1975 (0.1984) loss 2.9707 (3.3988) grad_norm 1.5229 (1.2616) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][110/625] eta 0:01:45 lr 0.001829 wd 0.0500 time 0.2000 (0.2046) data time 0.0006 (0.0047) model time 0.1994 (0.1983) loss 3.9594 (3.3909) grad_norm 1.3949 (1.2605) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][120/625] eta 0:01:43 lr 0.001829 wd 0.0500 time 0.1977 (0.2042) data time 0.0007 (0.0044) model time 0.1970 (0.1984) loss 2.9015 (3.3815) grad_norm 1.2784 (1.2621) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][130/625] eta 0:01:40 lr 0.001829 wd 0.0500 time 0.1999 (0.2040) data time 0.0007 (0.0041) model time 0.1993 (0.1987) loss 4.1489 (3.3897) grad_norm 0.7846 (1.2744) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][140/625] eta 0:01:38 lr 0.001829 wd 0.0500 time 0.1984 (0.2038) data time 0.0010 (0.0039) model time 0.1974 (0.1988) loss 3.7833 (3.4098) grad_norm 0.9095 (1.2706) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][150/625] eta 0:01:36 lr 0.001829 wd 0.0500 time 0.2015 (0.2035) data time 0.0007 (0.0037) model time 0.2009 (0.1988) loss 3.6729 (3.3940) grad_norm 1.1453 (1.2738) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][160/625] eta 0:01:34 lr 0.001828 wd 0.0500 time 0.1978 (0.2033) data time 0.0008 (0.0036) model time 0.1970 (0.1989) loss 4.0218 (3.3975) grad_norm 1.7228 (1.2760) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][170/625] eta 0:01:32 lr 0.001828 wd 0.0500 time 0.2010 (0.2032) data time 0.0009 (0.0034) model time 0.2002 (0.1989) loss 3.0507 (3.3913) grad_norm 0.8617 (1.2693) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][180/625] eta 0:01:30 lr 0.001828 wd 0.0500 time 0.2022 (0.2031) data time 0.0006 (0.0033) model time 0.2016 (0.1990) loss 3.4727 (3.3827) grad_norm 1.1615 (1.2837) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][190/625] eta 0:01:28 lr 0.001828 wd 0.0500 time 0.2015 (0.2029) data time 0.0006 (0.0032) model time 0.2009 (0.1990) loss 3.9847 (3.3828) grad_norm 0.9122 (1.2758) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][200/625] eta 0:01:26 lr 0.001828 wd 0.0500 time 0.1979 (0.2029) data time 0.0006 (0.0031) model time 0.1973 (0.1992) loss 2.9385 (3.3863) grad_norm 0.8313 (1.2653) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][210/625] eta 0:01:24 lr 0.001828 wd 0.0500 time 0.1989 (0.2028) data time 0.0006 (0.0030) model time 0.1983 (0.1992) loss 2.5100 (3.3814) grad_norm 1.3220 (1.2652) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][220/625] eta 0:01:22 lr 0.001828 wd 0.0500 time 0.1993 (0.2027) data time 0.0008 (0.0029) model time 0.1986 (0.1993) loss 3.4540 (3.3800) grad_norm 0.8775 (1.2661) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][230/625] eta 0:01:20 lr 0.001828 wd 0.0500 time 0.2017 (0.2027) data time 0.0007 (0.0028) model time 0.2010 (0.1994) loss 4.1729 (3.3746) grad_norm 1.1569 (1.2601) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][240/625] eta 0:01:18 lr 0.001828 wd 0.0500 time 0.1965 (0.2027) data time 0.0007 (0.0027) model time 0.1958 (0.1995) loss 2.2276 (3.3613) grad_norm 1.1613 (1.2549) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][250/625] eta 0:01:15 lr 0.001828 wd 0.0500 time 0.2011 (0.2026) data time 0.0008 (0.0027) model time 0.2003 (0.1996) loss 3.7374 (3.3682) grad_norm 1.4359 (1.2505) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][260/625] eta 0:01:14 lr 0.001827 wd 0.0500 time 0.2022 (0.2033) data time 0.0009 (0.0026) model time 0.2013 (0.2005) loss 3.1924 (3.3678) grad_norm 1.1124 (1.2521) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][270/625] eta 0:01:12 lr 0.001827 wd 0.0500 time 0.2000 (0.2032) data time 0.0008 (0.0025) model time 0.1991 (0.2005) loss 3.6527 (3.3577) grad_norm 1.1070 (1.2546) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][280/625] eta 0:01:10 lr 0.001827 wd 0.0500 time 0.2003 (0.2031) data time 0.0010 (0.0025) model time 0.1993 (0.2004) loss 3.7622 (3.3585) grad_norm 1.3128 (1.2546) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][290/625] eta 0:01:08 lr 0.001827 wd 0.0500 time 0.2004 (0.2031) data time 0.0007 (0.0024) model time 0.1998 (0.2005) loss 3.6746 (3.3703) grad_norm 1.0971 (1.2576) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][300/625] eta 0:01:06 lr 0.001827 wd 0.0500 time 0.2072 (0.2032) data time 0.0008 (0.0024) model time 0.2065 (0.2006) loss 2.3591 (3.3759) grad_norm 1.2478 (1.2619) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][310/625] eta 0:01:03 lr 0.001827 wd 0.0500 time 0.1997 (0.2031) data time 0.0010 (0.0023) model time 0.1987 (0.2006) loss 3.4001 (3.3767) grad_norm 1.2005 (1.2748) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][320/625] eta 0:01:01 lr 0.001827 wd 0.0500 time 0.2006 (0.2031) data time 0.0007 (0.0023) model time 0.2000 (0.2006) loss 3.9055 (3.3803) grad_norm 1.7297 (1.2748) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][330/625] eta 0:00:59 lr 0.001827 wd 0.0500 time 0.1992 (0.2031) data time 0.0008 (0.0023) model time 0.1984 (0.2007) loss 2.6396 (3.3793) grad_norm 1.0194 (1.2805) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][340/625] eta 0:00:57 lr 0.001827 wd 0.0500 time 0.2007 (0.2032) data time 0.0010 (0.0023) model time 0.1997 (0.2008) loss 3.6069 (3.3764) grad_norm 1.4448 (1.2837) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][350/625] eta 0:00:55 lr 0.001827 wd 0.0500 time 0.2008 (0.2032) data time 0.0007 (0.0022) model time 0.2001 (0.2009) loss 3.6526 (3.3769) grad_norm 1.3846 (1.2842) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][360/625] eta 0:00:54 lr 0.001826 wd 0.0500 time 0.2045 (0.2039) data time 0.0006 (0.0022) model time 0.2039 (0.2017) loss 3.4041 (3.3729) grad_norm 1.2642 (1.2882) loss_scale 8192.0000 (8192.0000) mem 8978MB [2024-07-29 22:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][370/625] eta 0:00:51 lr 0.001826 wd 0.0500 time 0.1989 (0.2038) data time 0.0007 (0.0022) model time 0.1981 (0.2017) loss 4.0306 (3.3723) grad_norm 1.3238 (1.2837) loss_scale 16384.0000 (8368.6469) mem 8978MB [2024-07-29 22:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][380/625] eta 0:00:49 lr 0.001826 wd 0.0500 time 0.2027 (0.2039) data time 0.0009 (0.0021) model time 0.2017 (0.2018) loss 3.3726 (3.3756) grad_norm 1.2386 (1.2787) loss_scale 16384.0000 (8579.0236) mem 8978MB [2024-07-29 22:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][390/625] eta 0:00:47 lr 0.001826 wd 0.0500 time 0.2035 (0.2039) data time 0.0007 (0.0021) model time 0.2028 (0.2018) loss 2.4509 (3.3712) grad_norm 1.1565 (1.2766) loss_scale 16384.0000 (8778.6394) mem 8978MB [2024-07-29 22:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][400/625] eta 0:00:45 lr 0.001826 wd 0.0500 time 0.1970 (0.2039) data time 0.0006 (0.0021) model time 0.1964 (0.2019) loss 3.2664 (3.3775) grad_norm 1.1758 (1.2823) loss_scale 16384.0000 (8968.2993) mem 8978MB [2024-07-29 22:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][410/625] eta 0:00:43 lr 0.001826 wd 0.0500 time 0.2014 (0.2038) data time 0.0009 (0.0020) model time 0.2005 (0.2018) loss 3.3588 (3.3813) grad_norm 1.1499 (1.2788) loss_scale 16384.0000 (9148.7299) mem 8978MB [2024-07-29 22:38:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][420/625] eta 0:00:41 lr 0.001826 wd 0.0500 time 0.1995 (0.2038) data time 0.0009 (0.0020) model time 0.1987 (0.2018) loss 3.6659 (3.3836) grad_norm 1.0383 (1.2797) loss_scale 16384.0000 (9320.5891) mem 8978MB [2024-07-29 22:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][430/625] eta 0:00:39 lr 0.001826 wd 0.0500 time 0.1977 (0.2037) data time 0.0009 (0.0020) model time 0.1968 (0.2017) loss 3.1948 (3.3776) grad_norm 1.1751 (1.2798) loss_scale 16384.0000 (9484.4733) mem 8978MB [2024-07-29 22:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][440/625] eta 0:00:37 lr 0.001826 wd 0.0500 time 0.2013 (0.2036) data time 0.0007 (0.0020) model time 0.2006 (0.2017) loss 2.2153 (3.3718) grad_norm 1.0733 (1.2864) loss_scale 16384.0000 (9640.9252) mem 8978MB [2024-07-29 22:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][450/625] eta 0:00:35 lr 0.001826 wd 0.0500 time 0.1977 (0.2035) data time 0.0009 (0.0019) model time 0.1968 (0.2016) loss 3.5768 (3.3722) grad_norm 1.1224 (1.2883) loss_scale 16384.0000 (9790.4390) mem 8978MB [2024-07-29 22:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][460/625] eta 0:00:33 lr 0.001825 wd 0.0500 time 0.1978 (0.2035) data time 0.0010 (0.0019) model time 0.1968 (0.2015) loss 3.6374 (3.3772) grad_norm 1.3590 (1.2853) loss_scale 16384.0000 (9933.4664) mem 8978MB [2024-07-29 22:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][470/625] eta 0:00:31 lr 0.001825 wd 0.0500 time 0.2606 (0.2036) data time 0.0008 (0.0019) model time 0.2598 (0.2017) loss 3.3703 (3.3764) grad_norm 1.3553 (1.2842) loss_scale 16384.0000 (10070.4204) mem 8978MB [2024-07-29 22:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][480/625] eta 0:00:29 lr 0.001825 wd 0.0500 time 0.1999 (0.2036) data time 0.0007 (0.0019) model time 0.1992 (0.2017) loss 4.1679 (3.3780) grad_norm 0.9037 (1.2818) loss_scale 16384.0000 (10201.6798) mem 8978MB [2024-07-29 22:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][490/625] eta 0:00:27 lr 0.001825 wd 0.0500 time 0.2013 (0.2036) data time 0.0009 (0.0020) model time 0.2004 (0.2017) loss 2.4239 (3.3819) grad_norm 1.9715 (1.2859) loss_scale 16384.0000 (10327.5927) mem 8978MB [2024-07-29 22:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][500/625] eta 0:00:25 lr 0.001825 wd 0.0500 time 0.2010 (0.2036) data time 0.0007 (0.0020) model time 0.2003 (0.2016) loss 4.2458 (3.3772) grad_norm 1.1451 (1.2857) loss_scale 16384.0000 (10448.4790) mem 8978MB [2024-07-29 22:39:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][510/625] eta 0:00:23 lr 0.001825 wd 0.0500 time 0.2038 (0.2036) data time 0.0008 (0.0020) model time 0.2030 (0.2016) loss 3.3919 (3.3773) grad_norm 0.8205 (1.2858) loss_scale 16384.0000 (10564.6341) mem 8978MB [2024-07-29 22:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][520/625] eta 0:00:21 lr 0.001825 wd 0.0500 time 0.2014 (0.2035) data time 0.0007 (0.0019) model time 0.2007 (0.2016) loss 4.3556 (3.3778) grad_norm 0.9769 (1.2827) loss_scale 16384.0000 (10676.3301) mem 8978MB [2024-07-29 22:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][530/625] eta 0:00:19 lr 0.001825 wd 0.0500 time 0.2012 (0.2035) data time 0.0007 (0.0019) model time 0.2005 (0.2016) loss 3.9353 (3.3805) grad_norm 0.8204 (1.2806) loss_scale 16384.0000 (10783.8192) mem 8978MB [2024-07-29 22:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][540/625] eta 0:00:17 lr 0.001825 wd 0.0500 time 0.2305 (0.2035) data time 0.0009 (0.0019) model time 0.2296 (0.2016) loss 3.0609 (3.3828) grad_norm 1.1696 (1.2785) loss_scale 16384.0000 (10887.3346) mem 8978MB [2024-07-29 22:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][550/625] eta 0:00:15 lr 0.001825 wd 0.0500 time 0.2014 (0.2035) data time 0.0006 (0.0019) model time 0.2008 (0.2016) loss 3.7886 (3.3853) grad_norm 1.4685 (1.2769) loss_scale 16384.0000 (10987.0926) mem 8978MB [2024-07-29 22:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][560/625] eta 0:00:13 lr 0.001824 wd 0.0500 time 0.1990 (0.2034) data time 0.0009 (0.0019) model time 0.1981 (0.2016) loss 2.6565 (3.3893) grad_norm 1.5508 (1.2775) loss_scale 16384.0000 (11083.2941) mem 8978MB [2024-07-29 22:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][570/625] eta 0:00:11 lr 0.001824 wd 0.0500 time 0.1997 (0.2034) data time 0.0008 (0.0018) model time 0.1989 (0.2015) loss 3.3585 (3.3890) grad_norm 1.1238 (1.2776) loss_scale 16384.0000 (11176.1261) mem 8978MB [2024-07-29 22:39:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][580/625] eta 0:00:09 lr 0.001824 wd 0.0500 time 0.2017 (0.2033) data time 0.0007 (0.0018) model time 0.2011 (0.2015) loss 3.8507 (3.3863) grad_norm 0.9392 (1.2741) loss_scale 16384.0000 (11265.7625) mem 8978MB [2024-07-29 22:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][590/625] eta 0:00:07 lr 0.001824 wd 0.0500 time 0.2043 (0.2037) data time 0.0007 (0.0018) model time 0.2035 (0.2019) loss 3.6213 (3.3849) grad_norm 1.2032 (1.2722) loss_scale 16384.0000 (11352.3655) mem 8978MB [2024-07-29 22:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][600/625] eta 0:00:05 lr 0.001824 wd 0.0500 time 0.1996 (0.2037) data time 0.0009 (0.0018) model time 0.1987 (0.2019) loss 3.4534 (3.3867) grad_norm 1.8224 (1.2716) loss_scale 16384.0000 (11436.0865) mem 8978MB [2024-07-29 22:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][610/625] eta 0:00:03 lr 0.001824 wd 0.0500 time 0.1975 (0.2036) data time 0.0004 (0.0018) model time 0.1971 (0.2019) loss 3.6765 (3.3896) grad_norm 1.3244 (1.2717) loss_scale 16384.0000 (11517.0671) mem 8978MB [2024-07-29 22:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:39:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:39:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:44:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:44:14 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 73) [2024-07-29 22:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:44:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][620/625] eta 0:00:04 lr 0.001824 wd 0.0500 time 0.2063 (0.9487) data time 0.0006 (0.1061) model time 0.2057 (0.8427) loss 3.9419 (3.8196) grad_norm 1.1676 (1.1579) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:44:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 73 training takes 0:00:10 [2024-07-29 22:44:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:44:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:44:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.435 (0.435) Loss 0.8335 (0.8335) Acc@1 84.668 (84.668) Acc@5 97.412 (97.412) Mem 8977MB [2024-07-29 22:44:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.091) Loss 1.3232 (0.9768) Acc@1 72.803 (81.419) Acc@5 91.748 (95.969) Mem 8977MB [2024-07-29 22:44:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.074) Loss 1.4082 (1.1536) Acc@1 69.922 (77.083) Acc@5 90.820 (93.690) Mem 8977MB [2024-07-29 22:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.761 Acc@5 93.662 [2024-07-29 22:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.8% [2024-07-29 22:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.76% [2024-07-29 22:44:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 22:44:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 22:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.5625 (0.5625) Acc@1 87.012 (87.012) Acc@5 97.852 (97.852) Mem 8977MB [2024-07-29 22:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.092) Loss 0.9741 (0.7276) Acc@1 75.781 (82.755) Acc@5 93.750 (96.578) Mem 8977MB [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.075) Loss 1.1426 (0.8834) Acc@1 70.996 (78.911) Acc@5 91.895 (94.671) Mem 8977MB [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.675 Acc@5 94.640 [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.7% [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.67% [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:44:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][0/625] eta 0:10:09 lr 0.001824 wd 0.0500 time 0.9752 (0.9752) data time 0.5035 (0.5035) model time 0.0000 (0.0000) loss 3.9251 (3.9251) grad_norm 1.0303 (1.0303) loss_scale 16384.0000 (16384.0000) mem 8971MB [2024-07-29 22:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][10/625] eta 0:02:52 lr 0.001824 wd 0.0500 time 0.2038 (0.2802) data time 0.0011 (0.0468) model time 0.0000 (0.0000) loss 3.6930 (3.7486) grad_norm 1.4610 (1.2059) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:44:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][20/625] eta 0:02:27 lr 0.001824 wd 0.0500 time 0.2040 (0.2446) data time 0.0010 (0.0250) model time 0.0000 (0.0000) loss 3.2779 (3.6506) grad_norm 2.5351 (1.4668) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][30/625] eta 0:02:17 lr 0.001823 wd 0.0500 time 0.2074 (0.2319) data time 0.0011 (0.0174) model time 0.0000 (0.0000) loss 3.9231 (3.5690) grad_norm 2.1905 (1.5596) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][40/625] eta 0:02:11 lr 0.001823 wd 0.0500 time 0.2088 (0.2256) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 2.5602 (3.5345) grad_norm 1.2080 (1.5250) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][50/625] eta 0:02:07 lr 0.001823 wd 0.0500 time 0.2138 (0.2216) data time 0.0009 (0.0110) model time 0.0000 (0.0000) loss 3.6567 (3.5118) grad_norm 0.8779 (1.4591) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][60/625] eta 0:02:03 lr 0.001823 wd 0.0500 time 0.2021 (0.2185) data time 0.0010 (0.0094) model time 0.2012 (0.2015) loss 2.8389 (3.4494) grad_norm 1.0346 (1.3908) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][70/625] eta 0:02:00 lr 0.001823 wd 0.0500 time 0.2003 (0.2169) data time 0.0008 (0.0082) model time 0.1995 (0.2037) loss 3.5778 (3.4549) grad_norm 1.7061 (1.3674) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][80/625] eta 0:01:57 lr 0.001823 wd 0.0500 time 0.2048 (0.2155) data time 0.0009 (0.0073) model time 0.2039 (0.2040) loss 4.0436 (3.4503) grad_norm 1.1993 (1.3534) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][90/625] eta 0:01:54 lr 0.001823 wd 0.0500 time 0.2087 (0.2142) data time 0.0009 (0.0067) model time 0.2078 (0.2037) loss 2.9710 (3.4729) grad_norm 1.1811 (1.3594) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][100/625] eta 0:01:52 lr 0.001823 wd 0.0500 time 0.2030 (0.2134) data time 0.0008 (0.0061) model time 0.2022 (0.2039) loss 2.6847 (3.4582) grad_norm 1.6198 (1.3542) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][110/625] eta 0:01:49 lr 0.001823 wd 0.0500 time 0.2093 (0.2130) data time 0.0007 (0.0057) model time 0.2086 (0.2046) loss 2.9909 (3.4543) grad_norm 1.1673 (1.3618) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][120/625] eta 0:01:47 lr 0.001823 wd 0.0500 time 0.1994 (0.2122) data time 0.0010 (0.0053) model time 0.1984 (0.2043) loss 3.1165 (3.4470) grad_norm 1.5231 (1.3565) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][130/625] eta 0:01:44 lr 0.001822 wd 0.0500 time 0.1989 (0.2116) data time 0.0010 (0.0050) model time 0.1979 (0.2042) loss 3.5947 (3.4355) grad_norm 1.4612 (1.3630) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][140/625] eta 0:01:42 lr 0.001822 wd 0.0500 time 0.2077 (0.2112) data time 0.0011 (0.0047) model time 0.2065 (0.2042) loss 3.3842 (3.4308) grad_norm 1.0902 (1.3532) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][150/625] eta 0:01:40 lr 0.001822 wd 0.0500 time 0.2054 (0.2107) data time 0.0013 (0.0045) model time 0.2041 (0.2040) loss 3.8034 (3.4340) grad_norm 0.8560 (1.3432) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][160/625] eta 0:01:37 lr 0.001822 wd 0.0500 time 0.2116 (0.2104) data time 0.0009 (0.0043) model time 0.2107 (0.2041) loss 3.2427 (3.4220) grad_norm 0.9691 (1.3345) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][170/625] eta 0:01:35 lr 0.001822 wd 0.0500 time 0.2085 (0.2102) data time 0.0011 (0.0041) model time 0.2074 (0.2042) loss 3.4636 (3.4211) grad_norm 2.0362 (1.3225) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][180/625] eta 0:01:33 lr 0.001822 wd 0.0500 time 0.2049 (0.2099) data time 0.0012 (0.0039) model time 0.2037 (0.2042) loss 2.3339 (3.4133) grad_norm 1.5135 (1.3148) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][190/625] eta 0:01:31 lr 0.001822 wd 0.0500 time 0.2069 (0.2098) data time 0.0012 (0.0038) model time 0.2057 (0.2044) loss 3.1516 (3.4055) grad_norm 0.8518 (1.3087) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][200/625] eta 0:01:29 lr 0.001822 wd 0.0500 time 0.2102 (0.2096) data time 0.0010 (0.0036) model time 0.2092 (0.2044) loss 3.4621 (3.4027) grad_norm 1.1674 (1.3060) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][210/625] eta 0:01:27 lr 0.001822 wd 0.0500 time 0.2120 (0.2097) data time 0.0007 (0.0035) model time 0.2113 (0.2047) loss 3.4828 (3.4064) grad_norm 1.1847 (1.3095) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][220/625] eta 0:01:25 lr 0.001822 wd 0.0500 time 0.2537 (0.2100) data time 0.0012 (0.0034) model time 0.2525 (0.2053) loss 3.6316 (3.3905) grad_norm 0.8729 (1.3060) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][230/625] eta 0:01:22 lr 0.001821 wd 0.0500 time 0.2105 (0.2098) data time 0.0016 (0.0033) model time 0.2089 (0.2054) loss 3.3450 (3.3921) grad_norm 1.5016 (1.3071) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][240/625] eta 0:01:20 lr 0.001821 wd 0.0500 time 0.2060 (0.2096) data time 0.0011 (0.0032) model time 0.2049 (0.2053) loss 3.3058 (3.3813) grad_norm 1.1513 (1.3120) loss_scale 16384.0000 (16384.0000) mem 8975MB [2024-07-29 22:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:45:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:45:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:50:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:50:36 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 74) [2024-07-29 22:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][250/625] eta 0:15:09 lr 0.001821 wd 0.0500 time 0.1987 (2.4264) data time 0.0006 (0.1765) model time 0.1981 (2.2499) loss 3.1878 (3.8786) grad_norm 1.5614 (1.2971) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][260/625] eta 0:04:20 lr 0.001821 wd 0.0500 time 0.2014 (0.7131) data time 0.0009 (0.0415) model time 0.2005 (0.6716) loss 3.5638 (3.6241) grad_norm 1.1646 (1.1755) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][270/625] eta 0:02:53 lr 0.001821 wd 0.0500 time 0.1953 (0.4890) data time 0.0009 (0.0239) model time 0.1944 (0.4652) loss 3.5285 (3.6482) grad_norm 0.9725 (1.1935) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][280/625] eta 0:02:18 lr 0.001821 wd 0.0500 time 0.1987 (0.4007) data time 0.0006 (0.0169) model time 0.1980 (0.3837) loss 4.1157 (3.6134) grad_norm 1.6452 (1.2410) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][290/625] eta 0:01:58 lr 0.001821 wd 0.0500 time 0.1952 (0.3535) data time 0.0009 (0.0132) model time 0.1943 (0.3403) loss 3.3834 (3.5448) grad_norm 1.4207 (1.2969) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][300/625] eta 0:01:45 lr 0.001821 wd 0.0500 time 0.1954 (0.3241) data time 0.0008 (0.0109) model time 0.1946 (0.3132) loss 3.4314 (3.5385) grad_norm 1.4780 (1.3506) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][310/625] eta 0:01:35 lr 0.001821 wd 0.0500 time 0.2011 (0.3043) data time 0.0009 (0.0093) model time 0.2002 (0.2950) loss 3.2625 (3.5118) grad_norm 1.3852 (1.4013) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][320/625] eta 0:01:28 lr 0.001821 wd 0.0500 time 0.1959 (0.2897) data time 0.0010 (0.0082) model time 0.1949 (0.2815) loss 3.8098 (3.4785) grad_norm 1.6965 (1.4081) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][330/625] eta 0:01:22 lr 0.001820 wd 0.0500 time 0.2033 (0.2789) data time 0.0007 (0.0073) model time 0.2026 (0.2716) loss 2.5882 (3.4576) grad_norm 1.0321 (1.3734) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][340/625] eta 0:01:17 lr 0.001820 wd 0.0500 time 0.1953 (0.2703) data time 0.0009 (0.0066) model time 0.1945 (0.2637) loss 3.8498 (3.4495) grad_norm 1.0634 (1.3384) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][350/625] eta 0:01:12 lr 0.001820 wd 0.0500 time 0.1985 (0.2635) data time 0.0007 (0.0061) model time 0.1978 (0.2574) loss 4.0101 (3.4730) grad_norm 1.6064 (1.3425) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][360/625] eta 0:01:08 lr 0.001820 wd 0.0500 time 0.1963 (0.2577) data time 0.0008 (0.0056) model time 0.1955 (0.2520) loss 3.3719 (3.4739) grad_norm 1.9187 (1.3495) loss_scale 16384.0000 (16384.0000) mem 8977MB [2024-07-29 22:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][370/625] eta 0:01:04 lr 0.001820 wd 0.0500 time 0.1957 (0.2528) data time 0.0009 (0.0052) model time 0.1948 (0.2475) loss 3.3933 (3.4728) grad_norm 1.0722 (inf) loss_scale 8192.0000 (15917.7886) mem 8977MB [2024-07-29 22:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][380/625] eta 0:01:00 lr 0.001820 wd 0.0500 time 0.1995 (0.2488) data time 0.0008 (0.0049) model time 0.1986 (0.2439) loss 3.6428 (3.4613) grad_norm 0.9205 (inf) loss_scale 8192.0000 (15336.9023) mem 8977MB [2024-07-29 22:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][390/625] eta 0:00:57 lr 0.001820 wd 0.0500 time 0.1972 (0.2453) data time 0.0008 (0.0046) model time 0.1963 (0.2407) loss 3.9157 (3.4474) grad_norm 0.9441 (inf) loss_scale 8192.0000 (14837.2587) mem 8977MB [2024-07-29 22:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][400/625] eta 0:00:54 lr 0.001820 wd 0.0500 time 0.2003 (0.2423) data time 0.0007 (0.0044) model time 0.1997 (0.2379) loss 3.1832 (3.4316) grad_norm 0.9147 (inf) loss_scale 8192.0000 (14402.9281) mem 8977MB [2024-07-29 22:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][410/625] eta 0:00:51 lr 0.001820 wd 0.0500 time 0.1981 (0.2396) data time 0.0008 (0.0042) model time 0.1973 (0.2354) loss 2.7405 (3.4307) grad_norm 1.1134 (inf) loss_scale 8192.0000 (14021.8896) mem 8977MB [2024-07-29 22:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][420/625] eta 0:00:48 lr 0.001820 wd 0.0500 time 0.1976 (0.2373) data time 0.0009 (0.0040) model time 0.1968 (0.2333) loss 3.6861 (3.4251) grad_norm 1.6298 (inf) loss_scale 8192.0000 (13684.9017) mem 8977MB [2024-07-29 22:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][430/625] eta 0:00:45 lr 0.001819 wd 0.0500 time 0.1964 (0.2353) data time 0.0008 (0.0038) model time 0.1955 (0.2314) loss 3.7609 (3.4103) grad_norm 1.4327 (inf) loss_scale 8192.0000 (13384.7432) mem 8977MB [2024-07-29 22:51:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][440/625] eta 0:00:43 lr 0.001819 wd 0.0500 time 0.1990 (0.2334) data time 0.0007 (0.0037) model time 0.1983 (0.2298) loss 3.3172 (3.4083) grad_norm 1.0508 (inf) loss_scale 8192.0000 (13115.6891) mem 8977MB [2024-07-29 22:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][450/625] eta 0:00:40 lr 0.001819 wd 0.0500 time 0.2026 (0.2318) data time 0.0006 (0.0035) model time 0.2020 (0.2282) loss 2.5462 (3.3927) grad_norm 0.7923 (inf) loss_scale 8192.0000 (12873.1429) mem 8977MB [2024-07-29 22:51:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][460/625] eta 0:00:37 lr 0.001819 wd 0.0500 time 0.1993 (0.2302) data time 0.0008 (0.0034) model time 0.1985 (0.2268) loss 2.3908 (3.3865) grad_norm 0.8596 (inf) loss_scale 8192.0000 (12653.3709) mem 8977MB [2024-07-29 22:51:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][470/625] eta 0:00:35 lr 0.001819 wd 0.0500 time 0.2000 (0.2289) data time 0.0006 (0.0033) model time 0.1995 (0.2256) loss 2.9754 (3.3938) grad_norm 0.9445 (inf) loss_scale 8192.0000 (12453.3094) mem 8977MB [2024-07-29 22:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][480/625] eta 0:00:32 lr 0.001819 wd 0.0500 time 0.1989 (0.2276) data time 0.0008 (0.0032) model time 0.1981 (0.2244) loss 3.1787 (3.3924) grad_norm 0.9070 (inf) loss_scale 8192.0000 (12270.4206) mem 8977MB [2024-07-29 22:51:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][490/625] eta 0:00:30 lr 0.001819 wd 0.0500 time 0.1985 (0.2264) data time 0.0008 (0.0031) model time 0.1978 (0.2233) loss 3.8788 (3.3942) grad_norm 1.5991 (inf) loss_scale 8192.0000 (12102.5844) mem 8977MB [2024-07-29 22:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][500/625] eta 0:00:28 lr 0.001819 wd 0.0500 time 0.2021 (0.2254) data time 0.0008 (0.0030) model time 0.2013 (0.2224) loss 3.5949 (3.3877) grad_norm 1.3326 (inf) loss_scale 8192.0000 (11948.0158) mem 8977MB [2024-07-29 22:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][510/625] eta 0:00:25 lr 0.001819 wd 0.0500 time 0.2018 (0.2244) data time 0.0005 (0.0029) model time 0.2013 (0.2215) loss 2.9122 (3.3711) grad_norm 1.4344 (inf) loss_scale 8192.0000 (11805.2015) mem 8977MB [2024-07-29 22:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][520/625] eta 0:00:23 lr 0.001818 wd 0.0500 time 0.1991 (0.2235) data time 0.0007 (0.0028) model time 0.1984 (0.2207) loss 4.1045 (3.3653) grad_norm 1.5073 (inf) loss_scale 8192.0000 (11672.8498) mem 8977MB [2024-07-29 22:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][530/625] eta 0:00:21 lr 0.001818 wd 0.0500 time 0.2445 (0.2228) data time 0.0008 (0.0028) model time 0.2437 (0.2200) loss 3.6497 (3.3646) grad_norm 1.5187 (inf) loss_scale 8192.0000 (11549.8516) mem 8977MB [2024-07-29 22:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][540/625] eta 0:00:18 lr 0.001818 wd 0.0500 time 0.2040 (0.2220) data time 0.0008 (0.0027) model time 0.2033 (0.2193) loss 3.0220 (3.3577) grad_norm 1.3313 (inf) loss_scale 8192.0000 (11435.2491) mem 8977MB [2024-07-29 22:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][550/625] eta 0:00:16 lr 0.001818 wd 0.0500 time 0.2003 (0.2213) data time 0.0006 (0.0027) model time 0.1997 (0.2187) loss 3.0982 (3.3532) grad_norm 1.0717 (inf) loss_scale 8192.0000 (11328.2112) mem 8977MB [2024-07-29 22:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][560/625] eta 0:00:14 lr 0.001818 wd 0.0500 time 0.2148 (0.2207) data time 0.0006 (0.0026) model time 0.2142 (0.2181) loss 3.8050 (3.3513) grad_norm 1.1007 (inf) loss_scale 8192.0000 (11228.0128) mem 8977MB [2024-07-29 22:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][570/625] eta 0:00:12 lr 0.001818 wd 0.0500 time 0.2246 (0.2202) data time 0.0008 (0.0025) model time 0.2239 (0.2176) loss 3.7930 (3.3615) grad_norm 1.0362 (inf) loss_scale 8192.0000 (11134.0186) mem 8977MB [2024-07-29 22:52:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][580/625] eta 0:00:09 lr 0.001818 wd 0.0500 time 0.2004 (0.2196) data time 0.0008 (0.0025) model time 0.1996 (0.2171) loss 3.5767 (3.3612) grad_norm 1.0347 (inf) loss_scale 8192.0000 (11045.6697) mem 8977MB [2024-07-29 22:52:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][590/625] eta 0:00:07 lr 0.001818 wd 0.0500 time 0.2012 (0.2191) data time 0.0007 (0.0024) model time 0.2005 (0.2166) loss 3.8737 (3.3639) grad_norm 1.1929 (inf) loss_scale 8192.0000 (10962.4723) mem 8977MB [2024-07-29 22:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][600/625] eta 0:00:05 lr 0.001818 wd 0.0500 time 0.2019 (0.2186) data time 0.0006 (0.0024) model time 0.2013 (0.2162) loss 3.8841 (3.3670) grad_norm 0.8629 (inf) loss_scale 8192.0000 (10883.9887) mem 8977MB [2024-07-29 22:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][610/625] eta 0:00:03 lr 0.001818 wd 0.0500 time 0.1980 (0.2182) data time 0.0004 (0.0024) model time 0.1976 (0.2158) loss 3.4092 (3.3655) grad_norm 1.0529 (inf) loss_scale 8192.0000 (10809.8292) mem 8977MB [2024-07-29 22:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][620/625] eta 0:00:01 lr 0.001817 wd 0.0500 time 0.1986 (0.2177) data time 0.0003 (0.0023) model time 0.1983 (0.2154) loss 3.9446 (3.3629) grad_norm 1.2835 (inf) loss_scale 8192.0000 (10739.6461) mem 8977MB [2024-07-29 22:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 74 training takes 0:01:22 [2024-07-29 22:52:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:52:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.407 (0.407) Loss 0.7715 (0.7715) Acc@1 85.254 (85.254) Acc@5 97.363 (97.363) Mem 8977MB [2024-07-29 22:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.2256 (0.9416) Acc@1 72.754 (81.104) Acc@5 93.311 (96.103) Mem 8977MB [2024-07-29 22:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.3477 (1.1122) Acc@1 70.068 (76.993) Acc@5 90.820 (93.915) Mem 8977MB [2024-07-29 22:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.829 Acc@5 93.958 [2024-07-29 22:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.8% [2024-07-29 22:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.83% [2024-07-29 22:52:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 22:52:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 22:52:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.444 (0.444) Loss 0.5640 (0.5640) Acc@1 87.012 (87.012) Acc@5 97.900 (97.900) Mem 8977MB [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.091) Loss 0.9707 (0.7264) Acc@1 76.123 (82.848) Acc@5 93.604 (96.609) Mem 8977MB [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.1387 (0.8813) Acc@1 71.240 (79.029) Acc@5 92.188 (94.766) Mem 8977MB [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.781 Acc@5 94.740 [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.78% [2024-07-29 22:52:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 22:52:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 22:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][0/625] eta 0:06:51 lr 0.001817 wd 0.0500 time 0.6589 (0.6589) data time 0.3465 (0.3465) model time 0.0000 (0.0000) loss 7.4122 (7.4122) grad_norm 2.3372 (2.3372) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-29 22:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][10/625] eta 0:02:58 lr 0.001817 wd 0.0500 time 0.2556 (0.2906) data time 0.0009 (0.0323) model time 0.0000 (0.0000) loss 5.4640 (5.9295) grad_norm 1.4127 (1.8307) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][20/625] eta 0:02:45 lr 0.001817 wd 0.0500 time 0.2462 (0.2731) data time 0.0009 (0.0174) model time 0.0000 (0.0000) loss 6.0830 (6.1335) grad_norm 1.5512 (1.8427) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][30/625] eta 0:02:38 lr 0.001817 wd 0.0500 time 0.2567 (0.2666) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 6.8851 (6.2370) grad_norm 1.2638 (1.7242) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][40/625] eta 0:02:36 lr 0.001817 wd 0.0500 time 0.2554 (0.2682) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 7.1214 (6.2708) grad_norm 1.7895 (1.7557) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][50/625] eta 0:02:32 lr 0.001817 wd 0.0500 time 0.2543 (0.2652) data time 0.0007 (0.0078) model time 0.0000 (0.0000) loss 7.1010 (6.3180) grad_norm 1.5433 (1.7116) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][60/625] eta 0:02:28 lr 0.001817 wd 0.0500 time 0.2641 (0.2636) data time 0.0009 (0.0067) model time 0.2633 (0.2546) loss 6.2567 (6.3088) grad_norm 1.7370 (1.7398) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][70/625] eta 0:02:25 lr 0.001817 wd 0.0500 time 0.2577 (0.2623) data time 0.0010 (0.0059) model time 0.2568 (0.2541) loss 5.3318 (6.2766) grad_norm 2.2052 (1.8039) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 22:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:52:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:52:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:54:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:54:41 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 75) [2024-07-29 22:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][80/625] eta 0:24:51 lr 0.001817 wd 0.0500 time 0.2600 (2.7360) data time 0.0007 (0.1650) model time 0.2592 (2.5710) loss 7.8247 (6.9705) grad_norm 1.5771 (1.5375) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][90/625] eta 0:08:37 lr 0.001816 wd 0.0500 time 0.2590 (0.9673) data time 0.0007 (0.0479) model time 0.2583 (0.9194) loss 6.8625 (6.6554) grad_norm 2.2394 (1.6297) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][100/625] eta 0:05:53 lr 0.001816 wd 0.0500 time 0.2574 (0.6724) data time 0.0010 (0.0284) model time 0.2564 (0.6440) loss 5.2905 (6.5329) grad_norm 2.1599 (1.7604) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][110/625] eta 0:04:43 lr 0.001816 wd 0.0500 time 0.2650 (0.5514) data time 0.0007 (0.0204) model time 0.2643 (0.5310) loss 5.2601 (6.5251) grad_norm 1.3001 (1.7701) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][120/625] eta 0:04:04 lr 0.001816 wd 0.0500 time 0.2606 (0.4850) data time 0.0010 (0.0160) model time 0.2596 (0.4690) loss 6.8665 (6.4208) grad_norm 1.8810 (1.8057) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][130/625] eta 0:03:39 lr 0.001816 wd 0.0500 time 0.2595 (0.4433) data time 0.0008 (0.0132) model time 0.2586 (0.4301) loss 7.0446 (6.3854) grad_norm 1.8123 (1.7284) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][140/625] eta 0:03:21 lr 0.001816 wd 0.0500 time 0.2593 (0.4147) data time 0.0007 (0.0113) model time 0.2586 (0.4034) loss 6.7843 (6.3634) grad_norm 2.2241 (1.7080) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][150/625] eta 0:03:07 lr 0.001816 wd 0.0500 time 0.2599 (0.3939) data time 0.0008 (0.0099) model time 0.2590 (0.3839) loss 6.3738 (6.3102) grad_norm 1.7959 (1.7439) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][160/625] eta 0:02:55 lr 0.001816 wd 0.0500 time 0.2590 (0.3782) data time 0.0010 (0.0089) model time 0.2580 (0.3694) loss 6.3426 (6.2641) grad_norm 1.5047 (1.7173) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][170/625] eta 0:02:46 lr 0.001816 wd 0.0500 time 0.2606 (0.3656) data time 0.0010 (0.0080) model time 0.2596 (0.3575) loss 5.9032 (6.2631) grad_norm 1.3879 (1.6873) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][180/625] eta 0:02:38 lr 0.001816 wd 0.0500 time 0.2586 (0.3555) data time 0.0011 (0.0074) model time 0.2576 (0.3481) loss 6.2230 (6.3120) grad_norm 1.9032 (1.6763) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][190/625] eta 0:02:31 lr 0.001815 wd 0.0500 time 0.2593 (0.3475) data time 0.0013 (0.0068) model time 0.2580 (0.3406) loss 7.4586 (6.2861) grad_norm 2.1389 (1.6628) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][200/625] eta 0:02:24 lr 0.001815 wd 0.0500 time 0.2608 (0.3404) data time 0.0010 (0.0064) model time 0.2598 (0.3341) loss 5.2871 (6.2663) grad_norm 1.9296 (1.6802) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][210/625] eta 0:02:18 lr 0.001815 wd 0.0500 time 0.2621 (0.3345) data time 0.0010 (0.0060) model time 0.2611 (0.3285) loss 7.0250 (6.2793) grad_norm 1.5602 (1.6861) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][220/625] eta 0:02:13 lr 0.001815 wd 0.0500 time 0.2653 (0.3295) data time 0.0008 (0.0056) model time 0.2645 (0.3238) loss 6.2153 (6.2655) grad_norm 2.6172 (1.6937) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][230/625] eta 0:02:08 lr 0.001815 wd 0.0500 time 0.2573 (0.3250) data time 0.0009 (0.0053) model time 0.2564 (0.3197) loss 5.5856 (6.2630) grad_norm 3.7225 (1.7349) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][240/625] eta 0:02:03 lr 0.001815 wd 0.0500 time 0.2626 (0.3211) data time 0.0008 (0.0051) model time 0.2619 (0.3161) loss 6.9312 (6.2654) grad_norm 1.8935 (1.7438) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][250/625] eta 0:01:59 lr 0.001815 wd 0.0500 time 0.2604 (0.3177) data time 0.0012 (0.0048) model time 0.2592 (0.3129) loss 5.0202 (6.2618) grad_norm 2.0958 (1.7393) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][260/625] eta 0:01:54 lr 0.001815 wd 0.0500 time 0.2624 (0.3149) data time 0.0010 (0.0046) model time 0.2614 (0.3102) loss 5.4692 (6.2556) grad_norm 1.9566 (1.7233) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][270/625] eta 0:01:50 lr 0.001815 wd 0.0500 time 0.2658 (0.3122) data time 0.0008 (0.0045) model time 0.2650 (0.3077) loss 5.9720 (6.2466) grad_norm 1.4263 (1.7333) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][280/625] eta 0:01:46 lr 0.001815 wd 0.0500 time 0.2607 (0.3098) data time 0.0014 (0.0043) model time 0.2594 (0.3055) loss 6.9016 (6.2254) grad_norm 1.5075 (1.7286) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][290/625] eta 0:01:43 lr 0.001814 wd 0.0500 time 0.2624 (0.3077) data time 0.0009 (0.0042) model time 0.2615 (0.3035) loss 6.5493 (6.2141) grad_norm 1.5177 (1.7174) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][300/625] eta 0:01:39 lr 0.001814 wd 0.0500 time 0.2573 (0.3056) data time 0.0011 (0.0040) model time 0.2563 (0.3016) loss 7.1943 (6.2205) grad_norm 1.7800 (1.7258) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][310/625] eta 0:01:35 lr 0.001814 wd 0.0500 time 0.2616 (0.3039) data time 0.0008 (0.0039) model time 0.2608 (0.3000) loss 5.4356 (6.1982) grad_norm 2.5761 (1.7619) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][320/625] eta 0:01:32 lr 0.001814 wd 0.0500 time 0.2621 (0.3022) data time 0.0007 (0.0038) model time 0.2614 (0.2984) loss 5.0006 (6.1966) grad_norm 2.0483 (1.7583) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:56:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:56:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 22:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 22:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 22:58:27 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 75) [2024-07-29 22:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 22:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][330/625] eta 0:05:20 lr 0.001814 wd 0.0500 time 0.2497 (1.0873) data time 0.0009 (0.1068) model time 0.2488 (0.9804) loss 7.0538 (6.7955) grad_norm 1.4381 (1.5286) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:58:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][340/625] eta 0:02:57 lr 0.001814 wd 0.0500 time 0.2515 (0.6242) data time 0.0007 (0.0483) model time 0.2508 (0.5759) loss 6.5543 (6.4990) grad_norm 1.8579 (1.5839) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 22:58:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 22:58:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 22:58:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:01:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:04:54 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 75) [2024-07-29 23:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 23:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 23:05:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:05:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:07:26 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 75) [2024-07-29 23:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 23:07:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][350/625] eta 0:08:59 lr 0.001814 wd 0.0500 time 0.2574 (1.9621) data time 0.0006 (0.2348) model time 0.2567 (1.7273) loss 7.6197 (6.6811) grad_norm 2.2333 (2.1878) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][360/625] eta 0:03:16 lr 0.001814 wd 0.0500 time 0.2497 (0.7422) data time 0.0007 (0.0677) model time 0.2489 (0.6745) loss 6.2224 (6.4162) grad_norm 1.5069 (1.8528) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][370/625] eta 0:02:17 lr 0.001814 wd 0.0500 time 0.2506 (0.5386) data time 0.0008 (0.0399) model time 0.2498 (0.4988) loss 5.9640 (6.4514) grad_norm 1.2261 (1.6837) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:07:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][380/625] eta 0:01:51 lr 0.001813 wd 0.0500 time 0.2539 (0.4549) data time 0.0008 (0.0284) model time 0.2532 (0.4265) loss 5.0590 (6.3993) grad_norm 1.7162 (1.6334) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][390/625] eta 0:01:36 lr 0.001813 wd 0.0500 time 0.2565 (0.4095) data time 0.0006 (0.0222) model time 0.2559 (0.3874) loss 6.6224 (6.3621) grad_norm 1.4677 (1.6176) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][400/625] eta 0:01:25 lr 0.001813 wd 0.0500 time 0.2616 (0.3810) data time 0.0006 (0.0182) model time 0.2610 (0.3628) loss 6.8489 (6.3783) grad_norm 1.7724 (1.5951) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][410/625] eta 0:01:17 lr 0.001813 wd 0.0500 time 0.2513 (0.3615) data time 0.0007 (0.0155) model time 0.2506 (0.3459) loss 6.7979 (6.3354) grad_norm 1.4878 (1.6033) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][420/625] eta 0:01:11 lr 0.001813 wd 0.0500 time 0.2599 (0.3472) data time 0.0007 (0.0135) model time 0.2592 (0.3337) loss 6.5446 (6.2954) grad_norm 1.3950 (1.6124) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][430/625] eta 0:01:05 lr 0.001813 wd 0.0500 time 0.2538 (0.3363) data time 0.0009 (0.0120) model time 0.2529 (0.3243) loss 6.3760 (6.2521) grad_norm 1.4674 (1.5831) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][440/625] eta 0:01:00 lr 0.001813 wd 0.0500 time 0.2510 (0.3278) data time 0.0010 (0.0109) model time 0.2500 (0.3170) loss 5.7501 (6.2262) grad_norm 1.4519 (1.6142) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][450/625] eta 0:00:56 lr 0.001813 wd 0.0500 time 0.2542 (0.3209) data time 0.0009 (0.0099) model time 0.2533 (0.3110) loss 5.8568 (6.2722) grad_norm 2.5090 (1.6888) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][460/625] eta 0:00:52 lr 0.001813 wd 0.0500 time 0.2520 (0.3152) data time 0.0009 (0.0091) model time 0.2511 (0.3061) loss 6.4406 (6.2705) grad_norm 1.6837 (1.6905) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][470/625] eta 0:00:48 lr 0.001813 wd 0.0500 time 0.2542 (0.3104) data time 0.0009 (0.0084) model time 0.2533 (0.3020) loss 6.0481 (6.2649) grad_norm 3.8104 (1.7135) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][480/625] eta 0:00:44 lr 0.001812 wd 0.0500 time 0.2553 (0.3063) data time 0.0009 (0.0079) model time 0.2544 (0.2984) loss 6.5904 (6.2674) grad_norm 1.3749 (1.7056) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][490/625] eta 0:00:40 lr 0.001812 wd 0.0500 time 0.2573 (0.3029) data time 0.0007 (0.0074) model time 0.2566 (0.2955) loss 6.0476 (6.2323) grad_norm 1.8046 (1.7149) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][500/625] eta 0:00:37 lr 0.001812 wd 0.0500 time 0.2558 (0.2999) data time 0.0008 (0.0070) model time 0.2550 (0.2929) loss 6.0597 (6.2223) grad_norm 1.2495 (1.7050) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][510/625] eta 0:00:34 lr 0.001812 wd 0.0500 time 0.2568 (0.2973) data time 0.0006 (0.0066) model time 0.2562 (0.2907) loss 5.9800 (6.2271) grad_norm 1.9423 (1.6892) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][520/625] eta 0:00:30 lr 0.001812 wd 0.0500 time 0.2570 (0.2949) data time 0.0009 (0.0063) model time 0.2561 (0.2887) loss 4.3123 (6.2146) grad_norm 2.2707 (1.7305) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][530/625] eta 0:00:27 lr 0.001812 wd 0.0500 time 0.2576 (0.2929) data time 0.0008 (0.0060) model time 0.2568 (0.2869) loss 5.8798 (6.2108) grad_norm 1.4395 (1.7229) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][540/625] eta 0:00:24 lr 0.001812 wd 0.0500 time 0.2535 (0.2911) data time 0.0008 (0.0057) model time 0.2527 (0.2853) loss 6.3763 (6.2019) grad_norm 1.1214 (1.7152) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][550/625] eta 0:00:21 lr 0.001812 wd 0.0500 time 0.2592 (0.2894) data time 0.0008 (0.0055) model time 0.2584 (0.2839) loss 6.4948 (6.1813) grad_norm 1.8436 (1.6986) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][560/625] eta 0:00:18 lr 0.001812 wd 0.0500 time 0.2541 (0.2879) data time 0.0008 (0.0053) model time 0.2534 (0.2826) loss 6.0730 (6.1603) grad_norm 1.4581 (1.7102) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][570/625] eta 0:00:15 lr 0.001812 wd 0.0500 time 0.2586 (0.2865) data time 0.0008 (0.0051) model time 0.2578 (0.2814) loss 6.7888 (6.1650) grad_norm 1.4124 (1.7006) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][580/625] eta 0:00:12 lr 0.001811 wd 0.0500 time 0.2555 (0.2853) data time 0.0007 (0.0049) model time 0.2548 (0.2804) loss 4.9667 (6.1502) grad_norm 2.0602 (1.6956) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][590/625] eta 0:00:09 lr 0.001811 wd 0.0500 time 0.2565 (0.2841) data time 0.0007 (0.0047) model time 0.2558 (0.2794) loss 4.8672 (6.1513) grad_norm 1.3446 (1.7028) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][600/625] eta 0:00:07 lr 0.001811 wd 0.0500 time 0.2625 (0.2831) data time 0.0006 (0.0046) model time 0.2619 (0.2785) loss 4.1553 (6.1373) grad_norm 1.1772 (1.6977) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][610/625] eta 0:00:04 lr 0.001811 wd 0.0500 time 0.2498 (0.2821) data time 0.0004 (0.0045) model time 0.2494 (0.2776) loss 5.7415 (6.1221) grad_norm 1.5531 (1.6927) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][620/625] eta 0:00:01 lr 0.001811 wd 0.0500 time 0.2533 (0.2811) data time 0.0005 (0.0043) model time 0.2528 (0.2767) loss 6.3696 (6.1171) grad_norm 1.1301 (1.6824) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 75 training takes 0:01:18 [2024-07-29 23:08:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:08:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.404 (0.404) Loss 0.6733 (0.6733) Acc@1 85.498 (85.498) Acc@5 97.852 (97.852) Mem 9656MB [2024-07-29 23:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.090) Loss 1.1670 (0.8556) Acc@1 73.975 (81.481) Acc@5 93.408 (96.209) Mem 9656MB [2024-07-29 23:09:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.2324 (1.0278) Acc@1 70.801 (77.276) Acc@5 91.895 (94.001) Mem 9656MB [2024-07-29 23:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.949 Acc@5 93.922 [2024-07-29 23:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.9% [2024-07-29 23:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.95% [2024-07-29 23:09:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:09:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.590 (0.590) Loss 0.5645 (0.5645) Acc@1 87.061 (87.061) Acc@5 97.852 (97.852) Mem 9656MB [2024-07-29 23:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.105) Loss 0.9683 (0.7252) Acc@1 76.318 (82.932) Acc@5 93.652 (96.640) Mem 9656MB [2024-07-29 23:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.081) Loss 1.1367 (0.8798) Acc@1 71.289 (79.102) Acc@5 92.236 (94.808) Mem 9656MB [2024-07-29 23:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.839 Acc@5 94.778 [2024-07-29 23:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-29 23:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.84% [2024-07-29 23:09:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:09:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][0/625] eta 0:08:00 lr 0.001811 wd 0.0500 time 0.7690 (0.7690) data time 0.3818 (0.3818) model time 0.0000 (0.0000) loss 7.6386 (7.6386) grad_norm 1.6126 (1.6126) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-29 23:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][10/625] eta 0:03:20 lr 0.001811 wd 0.0500 time 0.5060 (0.3261) data time 0.0010 (0.0355) model time 0.0000 (0.0000) loss 5.5711 (6.4523) grad_norm 0.9578 (1.3894) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 23:09:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:09:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:19:45 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 76) [2024-07-29 23:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 23:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][20/625] eta 0:19:20 lr 0.001811 wd 0.0500 time 0.2575 (1.9174) data time 0.0008 (0.1464) model time 0.0000 (0.0000) loss 6.9956 (7.0461) grad_norm 1.7995 (1.8812) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][30/625] eta 0:08:45 lr 0.001811 wd 0.0500 time 0.2582 (0.8831) data time 0.0011 (0.0556) model time 0.0000 (0.0000) loss 6.6920 (6.5903) grad_norm 1.5915 (1.6733) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][40/625] eta 0:06:16 lr 0.001811 wd 0.0500 time 0.2581 (0.6439) data time 0.0008 (0.0346) model time 0.0000 (0.0000) loss 7.0905 (6.5651) grad_norm 1.5254 (1.5206) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][50/625] eta 0:05:08 lr 0.001810 wd 0.0500 time 0.2632 (0.5369) data time 0.0010 (0.0253) model time 0.0000 (0.0000) loss 6.5104 (6.4553) grad_norm 1.5648 (1.5917) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][60/625] eta 0:04:29 lr 0.001810 wd 0.0500 time 0.2565 (0.4767) data time 0.0013 (0.0201) model time 0.2551 (0.2582) loss 5.2186 (6.3326) grad_norm 1.4898 (1.6794) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][70/625] eta 0:04:02 lr 0.001810 wd 0.0500 time 0.2586 (0.4378) data time 0.0007 (0.0167) model time 0.2578 (0.2580) loss 6.6678 (6.3130) grad_norm 1.2632 (1.6358) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][80/625] eta 0:03:43 lr 0.001810 wd 0.0500 time 0.2578 (0.4110) data time 0.0008 (0.0144) model time 0.2570 (0.2587) loss 5.4602 (6.2430) grad_norm 1.2062 (1.5862) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][90/625] eta 0:03:29 lr 0.001810 wd 0.0500 time 0.2567 (0.3914) data time 0.0010 (0.0126) model time 0.2557 (0.2593) loss 6.6169 (6.2314) grad_norm 1.1269 (1.5722) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][100/625] eta 0:03:17 lr 0.001810 wd 0.0500 time 0.2600 (0.3761) data time 0.0008 (0.0113) model time 0.2592 (0.2592) loss 5.5346 (6.1837) grad_norm 2.7721 (1.5928) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][110/625] eta 0:03:07 lr 0.001810 wd 0.0500 time 0.2583 (0.3641) data time 0.0008 (0.0102) model time 0.2575 (0.2593) loss 6.3038 (6.1982) grad_norm 1.3852 (1.6003) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][120/625] eta 0:02:58 lr 0.001810 wd 0.0500 time 0.2597 (0.3543) data time 0.0010 (0.0094) model time 0.2587 (0.2592) loss 6.9481 (6.2447) grad_norm 1.1647 (1.5892) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][130/625] eta 0:02:51 lr 0.001810 wd 0.0500 time 0.2642 (0.3465) data time 0.0007 (0.0086) model time 0.2635 (0.2597) loss 6.5115 (6.2237) grad_norm 1.9549 (1.5790) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][140/625] eta 0:02:44 lr 0.001809 wd 0.0500 time 0.2584 (0.3398) data time 0.0009 (0.0080) model time 0.2574 (0.2598) loss 4.9554 (6.1950) grad_norm 1.1885 (1.5890) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][150/625] eta 0:02:38 lr 0.001809 wd 0.0500 time 0.2580 (0.3339) data time 0.0012 (0.0075) model time 0.2569 (0.2597) loss 5.3546 (6.1858) grad_norm 2.0490 (1.6040) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][160/625] eta 0:02:32 lr 0.001809 wd 0.0500 time 0.2618 (0.3290) data time 0.0009 (0.0071) model time 0.2609 (0.2598) loss 4.8749 (6.1560) grad_norm 1.9755 (1.6014) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][170/625] eta 0:02:27 lr 0.001809 wd 0.0500 time 0.2600 (0.3247) data time 0.0012 (0.0067) model time 0.2588 (0.2599) loss 6.5910 (6.1614) grad_norm 1.0479 (1.5833) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][180/625] eta 0:02:22 lr 0.001809 wd 0.0500 time 0.2647 (0.3209) data time 0.0010 (0.0064) model time 0.2637 (0.2600) loss 6.5213 (6.1605) grad_norm 1.1774 (1.5806) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][190/625] eta 0:02:18 lr 0.001809 wd 0.0500 time 0.2592 (0.3176) data time 0.0007 (0.0061) model time 0.2584 (0.2600) loss 6.5480 (6.1436) grad_norm 2.0493 (1.5923) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:20:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][200/625] eta 0:02:13 lr 0.001809 wd 0.0500 time 0.2582 (0.3147) data time 0.0012 (0.0058) model time 0.2569 (0.2602) loss 5.1109 (6.1256) grad_norm 2.0794 (1.6045) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][210/625] eta 0:02:09 lr 0.001809 wd 0.0500 time 0.2660 (0.3120) data time 0.0008 (0.0056) model time 0.2653 (0.2603) loss 5.1134 (6.1063) grad_norm 1.9691 (1.6286) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][220/625] eta 0:02:05 lr 0.001809 wd 0.0500 time 0.2580 (0.3096) data time 0.0009 (0.0054) model time 0.2571 (0.2603) loss 4.7572 (6.0894) grad_norm 1.2319 (1.6363) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][230/625] eta 0:02:01 lr 0.001809 wd 0.0500 time 0.2612 (0.3073) data time 0.0008 (0.0052) model time 0.2604 (0.2603) loss 5.5664 (6.0885) grad_norm 1.4662 (1.6316) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][240/625] eta 0:01:57 lr 0.001808 wd 0.0500 time 0.2597 (0.3053) data time 0.0010 (0.0050) model time 0.2587 (0.2602) loss 5.4332 (6.0876) grad_norm 1.3185 (1.6219) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][250/625] eta 0:01:53 lr 0.001808 wd 0.0500 time 0.2615 (0.3035) data time 0.0008 (0.0049) model time 0.2607 (0.2603) loss 6.9912 (6.0826) grad_norm 1.2050 (1.6114) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][260/625] eta 0:01:50 lr 0.001808 wd 0.0500 time 0.2572 (0.3018) data time 0.0009 (0.0047) model time 0.2563 (0.2603) loss 4.8037 (6.0800) grad_norm 1.2565 (1.6041) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][270/625] eta 0:01:46 lr 0.001808 wd 0.0500 time 0.2639 (0.3003) data time 0.0011 (0.0046) model time 0.2628 (0.2604) loss 5.0867 (6.0789) grad_norm 1.4581 (1.6009) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][280/625] eta 0:01:43 lr 0.001808 wd 0.0500 time 0.2652 (0.2990) data time 0.0009 (0.0044) model time 0.2643 (0.2606) loss 5.3945 (6.0696) grad_norm 1.4598 (1.6146) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][290/625] eta 0:01:39 lr 0.001808 wd 0.0500 time 0.2632 (0.2977) data time 0.0010 (0.0043) model time 0.2621 (0.2607) loss 6.1625 (6.0714) grad_norm 1.4283 (1.6165) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][300/625] eta 0:01:36 lr 0.001808 wd 0.0500 time 0.2639 (0.2964) data time 0.0011 (0.0042) model time 0.2628 (0.2606) loss 5.0626 (6.0686) grad_norm 1.3687 (1.6087) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][310/625] eta 0:01:33 lr 0.001808 wd 0.0500 time 0.2626 (0.2953) data time 0.0008 (0.0041) model time 0.2618 (0.2606) loss 5.2073 (6.0589) grad_norm 1.8172 (1.6080) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][320/625] eta 0:01:29 lr 0.001808 wd 0.0500 time 0.2613 (0.2942) data time 0.0010 (0.0040) model time 0.2603 (0.2606) loss 4.2849 (6.0432) grad_norm 1.3026 (1.6051) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][330/625] eta 0:01:26 lr 0.001807 wd 0.0500 time 0.2638 (0.2931) data time 0.0008 (0.0039) model time 0.2630 (0.2606) loss 6.5254 (6.0671) grad_norm 1.0261 (1.5990) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][340/625] eta 0:01:23 lr 0.001807 wd 0.0500 time 0.2608 (0.2922) data time 0.0011 (0.0038) model time 0.2597 (0.2606) loss 7.0121 (6.0853) grad_norm 1.7601 (1.6008) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][350/625] eta 0:01:20 lr 0.001807 wd 0.0500 time 0.2603 (0.2913) data time 0.0010 (0.0038) model time 0.2593 (0.2606) loss 6.4931 (6.0836) grad_norm 1.4783 (1.5922) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][360/625] eta 0:01:16 lr 0.001807 wd 0.0500 time 0.2599 (0.2905) data time 0.0010 (0.0037) model time 0.2589 (0.2607) loss 6.4554 (6.0947) grad_norm 1.6050 (1.5879) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][370/625] eta 0:01:13 lr 0.001807 wd 0.0500 time 0.2607 (0.2898) data time 0.0009 (0.0036) model time 0.2598 (0.2608) loss 5.2222 (6.0984) grad_norm 1.0507 (1.5880) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][380/625] eta 0:01:10 lr 0.001807 wd 0.0500 time 0.2617 (0.2890) data time 0.0010 (0.0035) model time 0.2607 (0.2608) loss 6.2134 (6.1003) grad_norm 1.9637 (1.5883) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][390/625] eta 0:01:07 lr 0.001807 wd 0.0500 time 0.2580 (0.2883) data time 0.0011 (0.0035) model time 0.2569 (0.2608) loss 4.7171 (6.0952) grad_norm 1.5337 (1.5888) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][400/625] eta 0:01:04 lr 0.001807 wd 0.0500 time 0.2628 (0.2876) data time 0.0010 (0.0034) model time 0.2618 (0.2608) loss 6.8494 (6.0883) grad_norm 1.4630 (1.5876) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][410/625] eta 0:01:01 lr 0.001807 wd 0.0500 time 0.2661 (0.2871) data time 0.0010 (0.0034) model time 0.2651 (0.2609) loss 7.1699 (6.0915) grad_norm 1.0929 (1.5785) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][420/625] eta 0:00:58 lr 0.001807 wd 0.0500 time 0.2582 (0.2865) data time 0.0011 (0.0033) model time 0.2571 (0.2609) loss 6.6339 (6.0962) grad_norm 1.3987 (1.5729) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][430/625] eta 0:00:55 lr 0.001806 wd 0.0500 time 0.2578 (0.2859) data time 0.0010 (0.0033) model time 0.2567 (0.2609) loss 5.5986 (6.0998) grad_norm 1.0917 (1.5715) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][440/625] eta 0:00:52 lr 0.001806 wd 0.0500 time 0.2620 (0.2860) data time 0.0011 (0.0032) model time 0.2609 (0.2616) loss 6.7884 (6.0998) grad_norm 1.2694 (1.5658) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][450/625] eta 0:00:49 lr 0.001806 wd 0.0500 time 0.2628 (0.2854) data time 0.0011 (0.0032) model time 0.2617 (0.2616) loss 5.3622 (6.1111) grad_norm 1.1708 (1.5636) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][460/625] eta 0:00:47 lr 0.001806 wd 0.0500 time 0.2601 (0.2850) data time 0.0011 (0.0031) model time 0.2590 (0.2617) loss 6.0918 (6.1086) grad_norm 1.0455 (1.5639) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][470/625] eta 0:00:44 lr 0.001806 wd 0.0500 time 0.2642 (0.2844) data time 0.0008 (0.0031) model time 0.2634 (0.2616) loss 5.7160 (6.0991) grad_norm 1.8478 (1.5607) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][480/625] eta 0:00:41 lr 0.001806 wd 0.0500 time 0.2570 (0.2840) data time 0.0009 (0.0030) model time 0.2561 (0.2616) loss 5.7198 (6.0903) grad_norm 1.1996 (1.5599) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][490/625] eta 0:00:38 lr 0.001806 wd 0.0500 time 0.2605 (0.2836) data time 0.0009 (0.0030) model time 0.2596 (0.2617) loss 4.7029 (6.0790) grad_norm 1.8458 (1.5558) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][500/625] eta 0:00:35 lr 0.001806 wd 0.0500 time 0.2631 (0.2832) data time 0.0011 (0.0030) model time 0.2620 (0.2617) loss 7.2791 (6.0865) grad_norm 1.4618 (1.5625) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][510/625] eta 0:00:32 lr 0.001806 wd 0.0500 time 0.2580 (0.2828) data time 0.0011 (0.0029) model time 0.2570 (0.2618) loss 5.3843 (6.0843) grad_norm 2.7028 (1.5679) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][520/625] eta 0:00:29 lr 0.001805 wd 0.0500 time 0.2623 (0.2824) data time 0.0013 (0.0029) model time 0.2610 (0.2617) loss 6.4255 (6.0776) grad_norm 1.6001 (1.5671) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][530/625] eta 0:00:26 lr 0.001805 wd 0.0500 time 0.2732 (0.2820) data time 0.0010 (0.0029) model time 0.2722 (0.2617) loss 6.0901 (6.0910) grad_norm 1.0881 (1.5631) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][540/625] eta 0:00:23 lr 0.001805 wd 0.0500 time 0.2612 (0.2817) data time 0.0013 (0.0028) model time 0.2599 (0.2617) loss 4.4122 (6.0830) grad_norm 1.9632 (1.5654) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][550/625] eta 0:00:21 lr 0.001805 wd 0.0500 time 0.2619 (0.2813) data time 0.0010 (0.0028) model time 0.2609 (0.2617) loss 5.3276 (6.0767) grad_norm 2.2400 (1.5695) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][560/625] eta 0:00:18 lr 0.001805 wd 0.0500 time 0.2655 (0.2809) data time 0.0009 (0.0028) model time 0.2646 (0.2617) loss 5.6098 (6.0817) grad_norm 1.7023 (1.5763) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][570/625] eta 0:00:15 lr 0.001805 wd 0.0500 time 0.2603 (0.2806) data time 0.0010 (0.0027) model time 0.2593 (0.2617) loss 6.4070 (6.0876) grad_norm 1.4988 (1.5825) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][580/625] eta 0:00:12 lr 0.001805 wd 0.0500 time 0.2616 (0.2803) data time 0.0008 (0.0027) model time 0.2608 (0.2616) loss 5.0421 (6.0895) grad_norm 1.4199 (1.5862) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][590/625] eta 0:00:09 lr 0.001805 wd 0.0500 time 0.2651 (0.2799) data time 0.0011 (0.0027) model time 0.2640 (0.2616) loss 5.0362 (6.0909) grad_norm 1.3034 (1.5964) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][600/625] eta 0:00:06 lr 0.001805 wd 0.0500 time 0.2601 (0.2796) data time 0.0013 (0.0027) model time 0.2588 (0.2616) loss 6.5670 (6.0937) grad_norm 2.1088 (1.5969) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][610/625] eta 0:00:04 lr 0.001805 wd 0.0500 time 0.2616 (0.2794) data time 0.0005 (0.0026) model time 0.2610 (0.2616) loss 4.6894 (6.0927) grad_norm 1.9443 (1.5998) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][620/625] eta 0:00:01 lr 0.001804 wd 0.0500 time 0.2648 (0.2791) data time 0.0005 (0.0026) model time 0.2643 (0.2616) loss 5.2116 (6.0866) grad_norm 2.2299 (1.5995) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 76 training takes 0:02:50 [2024-07-29 23:22:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:22:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6821 (0.6821) Acc@1 86.230 (86.230) Acc@5 97.900 (97.900) Mem 9656MB [2024-07-29 23:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.1182 (0.8624) Acc@1 74.268 (81.596) Acc@5 93.311 (96.267) Mem 9656MB [2024-07-29 23:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.2842 (1.0262) Acc@1 70.117 (77.439) Acc@5 90.918 (94.087) Mem 9656MB [2024-07-29 23:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.257 Acc@5 94.048 [2024-07-29 23:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.3% [2024-07-29 23:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.26% [2024-07-29 23:22:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:22:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.497 (0.497) Loss 0.5630 (0.5630) Acc@1 87.061 (87.061) Acc@5 97.852 (97.852) Mem 9656MB [2024-07-29 23:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9668 (0.7242) Acc@1 76.367 (82.955) Acc@5 93.701 (96.640) Mem 9656MB [2024-07-29 23:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1328 (0.8784) Acc@1 71.436 (79.113) Acc@5 92.236 (94.834) Mem 9656MB [2024-07-29 23:22:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.849 Acc@5 94.804 [2024-07-29 23:22:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-29 23:22:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.85% [2024-07-29 23:22:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:23:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][0/625] eta 0:08:35 lr 0.001804 wd 0.0500 time 0.8243 (0.8243) data time 0.4932 (0.4932) model time 0.0000 (0.0000) loss 6.2481 (6.2481) grad_norm 1.1770 (1.1770) loss_scale 8192.0000 (8192.0000) mem 9652MB [2024-07-29 23:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][10/625] eta 0:03:16 lr 0.001804 wd 0.0500 time 0.2579 (0.3190) data time 0.0009 (0.0459) model time 0.0000 (0.0000) loss 6.6518 (6.2158) grad_norm 1.0416 (1.6268) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][20/625] eta 0:02:57 lr 0.001804 wd 0.0500 time 0.2625 (0.2937) data time 0.0010 (0.0246) model time 0.0000 (0.0000) loss 7.4128 (6.1819) grad_norm 1.4033 (1.5417) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][30/625] eta 0:02:50 lr 0.001804 wd 0.0500 time 0.2592 (0.2860) data time 0.0009 (0.0170) model time 0.0000 (0.0000) loss 5.0727 (6.0617) grad_norm 1.7759 (1.5339) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][40/625] eta 0:02:47 lr 0.001804 wd 0.0500 time 0.2642 (0.2870) data time 0.0007 (0.0131) model time 0.0000 (0.0000) loss 6.3414 (6.0650) grad_norm 0.9772 (1.5821) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][50/625] eta 0:02:42 lr 0.001804 wd 0.0500 time 0.2742 (0.2824) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 4.8649 (5.9973) grad_norm 1.6390 (1.5879) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][60/625] eta 0:02:37 lr 0.001804 wd 0.0500 time 0.2650 (0.2796) data time 0.0010 (0.0092) model time 0.2640 (0.2639) loss 7.0857 (6.0558) grad_norm 2.1488 (1.5575) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][70/625] eta 0:02:33 lr 0.001804 wd 0.0500 time 0.2624 (0.2772) data time 0.0009 (0.0080) model time 0.2615 (0.2627) loss 6.4485 (6.0831) grad_norm 1.6600 (1.5780) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][80/625] eta 0:02:30 lr 0.001803 wd 0.0500 time 0.2644 (0.2754) data time 0.0011 (0.0072) model time 0.2633 (0.2623) loss 5.2104 (6.0233) grad_norm 2.0603 (1.5531) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][90/625] eta 0:02:26 lr 0.001803 wd 0.0500 time 0.2589 (0.2738) data time 0.0012 (0.0065) model time 0.2577 (0.2618) loss 7.2553 (6.0126) grad_norm 1.0268 (1.5637) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][100/625] eta 0:02:23 lr 0.001803 wd 0.0500 time 0.2613 (0.2728) data time 0.0008 (0.0060) model time 0.2605 (0.2619) loss 4.3404 (6.0101) grad_norm 2.1265 (1.5942) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][110/625] eta 0:02:20 lr 0.001803 wd 0.0500 time 0.2615 (0.2719) data time 0.0009 (0.0055) model time 0.2607 (0.2618) loss 7.1771 (5.9988) grad_norm 1.2206 (1.5750) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][120/625] eta 0:02:16 lr 0.001803 wd 0.0500 time 0.2640 (0.2711) data time 0.0011 (0.0052) model time 0.2629 (0.2618) loss 6.4045 (6.0000) grad_norm 1.5297 (1.5687) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][130/625] eta 0:02:13 lr 0.001803 wd 0.0500 time 0.2607 (0.2706) data time 0.0014 (0.0049) model time 0.2593 (0.2619) loss 6.1716 (6.0385) grad_norm 2.6296 (1.5830) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][140/625] eta 0:02:10 lr 0.001803 wd 0.0500 time 0.2616 (0.2699) data time 0.0012 (0.0046) model time 0.2604 (0.2617) loss 6.1656 (6.0128) grad_norm 1.6988 (1.5800) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][150/625] eta 0:02:08 lr 0.001803 wd 0.0500 time 0.2607 (0.2696) data time 0.0011 (0.0044) model time 0.2596 (0.2619) loss 6.9436 (6.0200) grad_norm 1.7562 (1.5831) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][160/625] eta 0:02:05 lr 0.001803 wd 0.0500 time 0.2629 (0.2692) data time 0.0008 (0.0042) model time 0.2621 (0.2619) loss 7.1640 (6.0292) grad_norm 2.1080 (1.5870) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][170/625] eta 0:02:02 lr 0.001803 wd 0.0500 time 0.2599 (0.2688) data time 0.0009 (0.0040) model time 0.2590 (0.2619) loss 6.9077 (6.0446) grad_norm 1.0465 (1.6035) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][180/625] eta 0:01:59 lr 0.001802 wd 0.0500 time 0.2841 (0.2686) data time 0.0011 (0.0038) model time 0.2830 (0.2621) loss 5.2128 (6.0274) grad_norm 1.1228 (1.5951) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][190/625] eta 0:01:56 lr 0.001802 wd 0.0500 time 0.2659 (0.2684) data time 0.0007 (0.0037) model time 0.2652 (0.2622) loss 6.8490 (6.0331) grad_norm 1.1300 (1.5918) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][200/625] eta 0:01:53 lr 0.001802 wd 0.0500 time 0.2666 (0.2681) data time 0.0009 (0.0036) model time 0.2657 (0.2621) loss 4.0185 (5.9986) grad_norm 1.6244 (1.6122) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][210/625] eta 0:01:51 lr 0.001802 wd 0.0500 time 0.2637 (0.2678) data time 0.0008 (0.0034) model time 0.2629 (0.2621) loss 6.3213 (6.0216) grad_norm 1.3103 (1.6248) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][220/625] eta 0:01:48 lr 0.001802 wd 0.0500 time 0.2692 (0.2678) data time 0.0010 (0.0033) model time 0.2682 (0.2623) loss 5.1757 (6.0022) grad_norm 1.3785 (1.6186) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][230/625] eta 0:01:45 lr 0.001802 wd 0.0500 time 0.2643 (0.2677) data time 0.0008 (0.0032) model time 0.2635 (0.2624) loss 6.5127 (6.0009) grad_norm 1.7911 (1.6178) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][240/625] eta 0:01:42 lr 0.001802 wd 0.0500 time 0.2632 (0.2674) data time 0.0009 (0.0032) model time 0.2623 (0.2623) loss 7.0315 (5.9972) grad_norm 1.0619 (1.6080) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][250/625] eta 0:01:40 lr 0.001802 wd 0.0500 time 0.2571 (0.2673) data time 0.0013 (0.0031) model time 0.2557 (0.2623) loss 6.9501 (5.9999) grad_norm 1.8244 (1.5963) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][260/625] eta 0:01:37 lr 0.001802 wd 0.0500 time 0.2620 (0.2673) data time 0.0007 (0.0030) model time 0.2613 (0.2624) loss 5.7620 (6.0027) grad_norm 1.4939 (1.5935) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][270/625] eta 0:01:34 lr 0.001801 wd 0.0500 time 0.2613 (0.2671) data time 0.0011 (0.0029) model time 0.2601 (0.2624) loss 6.1061 (5.9931) grad_norm 1.8466 (1.5901) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][280/625] eta 0:01:32 lr 0.001801 wd 0.0500 time 0.2659 (0.2671) data time 0.0010 (0.0029) model time 0.2649 (0.2626) loss 4.8593 (5.9878) grad_norm 1.0134 (1.5770) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][290/625] eta 0:01:29 lr 0.001801 wd 0.0500 time 0.2619 (0.2670) data time 0.0010 (0.0028) model time 0.2610 (0.2626) loss 6.5016 (5.9930) grad_norm 1.9892 (1.5710) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][300/625] eta 0:01:26 lr 0.001801 wd 0.0500 time 0.2744 (0.2669) data time 0.0011 (0.0028) model time 0.2732 (0.2626) loss 5.6243 (5.9992) grad_norm 2.0978 (1.5728) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][310/625] eta 0:01:24 lr 0.001801 wd 0.0500 time 0.2589 (0.2674) data time 0.0009 (0.0027) model time 0.2580 (0.2633) loss 7.2161 (6.0090) grad_norm 2.4835 (1.5673) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][320/625] eta 0:01:21 lr 0.001801 wd 0.0500 time 0.2640 (0.2673) data time 0.0012 (0.0027) model time 0.2629 (0.2633) loss 6.5507 (6.0208) grad_norm 1.4965 (1.5640) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][330/625] eta 0:01:18 lr 0.001801 wd 0.0500 time 0.2695 (0.2672) data time 0.0008 (0.0026) model time 0.2687 (0.2633) loss 6.7693 (6.0190) grad_norm 1.1271 (1.5575) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][340/625] eta 0:01:16 lr 0.001801 wd 0.0500 time 0.2683 (0.2672) data time 0.0010 (0.0026) model time 0.2673 (0.2634) loss 4.8150 (5.9950) grad_norm 1.3898 (1.5693) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][350/625] eta 0:01:13 lr 0.001801 wd 0.0500 time 0.2587 (0.2670) data time 0.0011 (0.0025) model time 0.2575 (0.2633) loss 6.9437 (6.0011) grad_norm 1.2517 (1.5735) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][360/625] eta 0:01:10 lr 0.001801 wd 0.0500 time 0.2677 (0.2670) data time 0.0009 (0.0025) model time 0.2668 (0.2633) loss 5.9600 (6.0057) grad_norm 0.9997 (1.5641) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][370/625] eta 0:01:08 lr 0.001800 wd 0.0500 time 0.2585 (0.2670) data time 0.0011 (0.0025) model time 0.2574 (0.2634) loss 6.8313 (6.0123) grad_norm 1.0564 (1.5608) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][380/625] eta 0:01:05 lr 0.001800 wd 0.0500 time 0.2710 (0.2670) data time 0.0011 (0.0024) model time 0.2699 (0.2635) loss 6.2586 (6.0119) grad_norm 1.9526 (1.5571) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][390/625] eta 0:01:02 lr 0.001800 wd 0.0500 time 0.2597 (0.2669) data time 0.0010 (0.0024) model time 0.2586 (0.2634) loss 6.1138 (6.0088) grad_norm 1.2423 (1.5534) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][400/625] eta 0:01:00 lr 0.001800 wd 0.0500 time 0.2679 (0.2668) data time 0.0010 (0.0024) model time 0.2669 (0.2634) loss 6.3403 (6.0156) grad_norm 1.2286 (1.5535) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][410/625] eta 0:00:57 lr 0.001800 wd 0.0500 time 0.2579 (0.2667) data time 0.0010 (0.0023) model time 0.2570 (0.2634) loss 6.3903 (6.0161) grad_norm 1.2947 (1.5466) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][420/625] eta 0:00:54 lr 0.001800 wd 0.0500 time 0.3285 (0.2669) data time 0.0011 (0.0023) model time 0.3275 (0.2636) loss 6.4408 (6.0048) grad_norm 2.5439 (1.5614) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][430/625] eta 0:00:52 lr 0.001800 wd 0.0500 time 0.2594 (0.2668) data time 0.0008 (0.0023) model time 0.2586 (0.2636) loss 6.7568 (6.0142) grad_norm 1.3232 (1.5704) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][440/625] eta 0:00:49 lr 0.001800 wd 0.0500 time 0.2676 (0.2667) data time 0.0012 (0.0022) model time 0.2665 (0.2635) loss 4.9765 (6.0144) grad_norm 1.3018 (1.5711) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][450/625] eta 0:00:46 lr 0.001800 wd 0.0500 time 0.2576 (0.2666) data time 0.0009 (0.0022) model time 0.2567 (0.2635) loss 5.4824 (6.0206) grad_norm 1.4690 (1.5681) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][460/625] eta 0:00:43 lr 0.001799 wd 0.0500 time 0.2620 (0.2666) data time 0.0011 (0.0022) model time 0.2609 (0.2635) loss 7.0573 (6.0222) grad_norm 1.5872 (1.5649) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][470/625] eta 0:00:41 lr 0.001799 wd 0.0500 time 0.2653 (0.2665) data time 0.0010 (0.0022) model time 0.2643 (0.2635) loss 4.5373 (6.0190) grad_norm 1.2867 (1.5634) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][480/625] eta 0:00:38 lr 0.001799 wd 0.0500 time 0.2683 (0.2664) data time 0.0013 (0.0022) model time 0.2670 (0.2634) loss 6.7018 (6.0144) grad_norm 1.1088 (1.5647) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][490/625] eta 0:00:35 lr 0.001799 wd 0.0500 time 0.2622 (0.2664) data time 0.0012 (0.0021) model time 0.2610 (0.2634) loss 6.8708 (6.0130) grad_norm 2.7316 (1.5743) loss_scale 16384.0000 (8225.3686) mem 9655MB [2024-07-29 23:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][500/625] eta 0:00:33 lr 0.001799 wd 0.0500 time 0.2622 (0.2663) data time 0.0014 (0.0021) model time 0.2608 (0.2634) loss 5.8236 (6.0150) grad_norm 1.2986 (1.5741) loss_scale 16384.0000 (8388.2156) mem 9655MB [2024-07-29 23:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][510/625] eta 0:00:30 lr 0.001799 wd 0.0500 time 0.2579 (0.2662) data time 0.0011 (0.0021) model time 0.2568 (0.2633) loss 5.1456 (6.0210) grad_norm 1.1906 (1.5710) loss_scale 16384.0000 (8544.6888) mem 9655MB [2024-07-29 23:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][520/625] eta 0:00:27 lr 0.001799 wd 0.0500 time 0.2615 (0.2662) data time 0.0010 (0.0021) model time 0.2605 (0.2633) loss 7.3439 (6.0257) grad_norm 1.6343 (1.5717) loss_scale 16384.0000 (8695.1555) mem 9655MB [2024-07-29 23:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][530/625] eta 0:00:25 lr 0.001799 wd 0.0500 time 0.2610 (0.2661) data time 0.0011 (0.0021) model time 0.2599 (0.2633) loss 4.9573 (6.0182) grad_norm 1.4447 (1.5741) loss_scale 16384.0000 (8839.9548) mem 9655MB [2024-07-29 23:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][540/625] eta 0:00:22 lr 0.001799 wd 0.0500 time 0.2599 (0.2661) data time 0.0011 (0.0020) model time 0.2587 (0.2633) loss 6.4693 (6.0187) grad_norm 1.5828 (1.5734) loss_scale 16384.0000 (8979.4011) mem 9655MB [2024-07-29 23:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][550/625] eta 0:00:19 lr 0.001798 wd 0.0500 time 0.2620 (0.2661) data time 0.0009 (0.0020) model time 0.2611 (0.2633) loss 7.7564 (6.0228) grad_norm 1.3161 (1.5682) loss_scale 16384.0000 (9113.7858) mem 9655MB [2024-07-29 23:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][560/625] eta 0:00:17 lr 0.001798 wd 0.0500 time 0.2641 (0.2660) data time 0.0010 (0.0020) model time 0.2630 (0.2632) loss 6.0141 (6.0240) grad_norm 1.1354 (1.5641) loss_scale 16384.0000 (9243.3797) mem 9655MB [2024-07-29 23:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][570/625] eta 0:00:14 lr 0.001798 wd 0.0500 time 0.2605 (0.2659) data time 0.0010 (0.0020) model time 0.2595 (0.2632) loss 5.5220 (6.0320) grad_norm 1.5116 (1.5668) loss_scale 16384.0000 (9368.4343) mem 9655MB [2024-07-29 23:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][580/625] eta 0:00:11 lr 0.001798 wd 0.0500 time 0.2628 (0.2659) data time 0.0012 (0.0020) model time 0.2616 (0.2631) loss 4.6863 (6.0366) grad_norm 2.1309 (1.5700) loss_scale 16384.0000 (9489.1842) mem 9655MB [2024-07-29 23:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][590/625] eta 0:00:09 lr 0.001798 wd 0.0500 time 0.2615 (0.2658) data time 0.0011 (0.0020) model time 0.2604 (0.2631) loss 6.3869 (6.0389) grad_norm 2.8246 (1.5778) loss_scale 16384.0000 (9605.8477) mem 9655MB [2024-07-29 23:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][600/625] eta 0:00:06 lr 0.001798 wd 0.0500 time 0.2679 (0.2657) data time 0.0007 (0.0020) model time 0.2672 (0.2631) loss 5.4110 (6.0388) grad_norm 1.6765 (1.5808) loss_scale 16384.0000 (9718.6290) mem 9655MB [2024-07-29 23:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][610/625] eta 0:00:03 lr 0.001798 wd 0.0500 time 0.2630 (0.2660) data time 0.0005 (0.0019) model time 0.2624 (0.2634) loss 4.6740 (6.0368) grad_norm 1.4212 (1.5768) loss_scale 16384.0000 (9827.7185) mem 9655MB [2024-07-29 23:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][620/625] eta 0:00:01 lr 0.001798 wd 0.0500 time 0.2589 (0.2660) data time 0.0007 (0.0019) model time 0.2582 (0.2634) loss 6.4000 (6.0370) grad_norm 1.1027 (1.5765) loss_scale 16384.0000 (9933.2947) mem 9655MB [2024-07-29 23:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 77 training takes 0:02:46 [2024-07-29 23:25:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:25:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.6606 (0.6606) Acc@1 85.889 (85.889) Acc@5 97.412 (97.412) Mem 9655MB [2024-07-29 23:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 1.1064 (0.8188) Acc@1 74.316 (81.645) Acc@5 92.236 (96.054) Mem 9655MB [2024-07-29 23:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.3027 (0.9941) Acc@1 69.727 (77.606) Acc@5 90.771 (93.996) Mem 9655MB [2024-07-29 23:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.401 Acc@5 94.008 [2024-07-29 23:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-07-29 23:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.40% [2024-07-29 23:25:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:25:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.571 (0.571) Loss 0.5625 (0.5625) Acc@1 87.061 (87.061) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-29 23:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.107) Loss 0.9644 (0.7227) Acc@1 76.514 (82.990) Acc@5 93.750 (96.675) Mem 9655MB [2024-07-29 23:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.1289 (0.8768) Acc@1 71.436 (79.185) Acc@5 92.236 (94.852) Mem 9655MB [2024-07-29 23:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.911 Acc@5 94.828 [2024-07-29 23:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-29 23:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.91% [2024-07-29 23:25:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:25:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][0/625] eta 0:07:09 lr 0.001798 wd 0.0500 time 0.6869 (0.6869) data time 0.4374 (0.4374) model time 0.0000 (0.0000) loss 5.3757 (5.3757) grad_norm 1.3126 (1.3126) loss_scale 16384.0000 (16384.0000) mem 9654MB [2024-07-29 23:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][10/625] eta 0:03:06 lr 0.001798 wd 0.0500 time 0.2609 (0.3028) data time 0.0010 (0.0408) model time 0.0000 (0.0000) loss 6.1528 (6.3689) grad_norm 1.3852 (1.5990) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][20/625] eta 0:02:51 lr 0.001797 wd 0.0500 time 0.2633 (0.2837) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 7.3463 (6.1117) grad_norm 1.7243 (1.5635) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][30/625] eta 0:02:44 lr 0.001797 wd 0.0500 time 0.2715 (0.2771) data time 0.0008 (0.0153) model time 0.0000 (0.0000) loss 6.7633 (6.1646) grad_norm 0.9829 (1.5382) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][40/625] eta 0:02:40 lr 0.001797 wd 0.0500 time 0.2641 (0.2740) data time 0.0010 (0.0118) model time 0.0000 (0.0000) loss 6.5983 (6.1694) grad_norm 1.6670 (1.4899) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][50/625] eta 0:02:36 lr 0.001797 wd 0.0500 time 0.2622 (0.2716) data time 0.0007 (0.0097) model time 0.0000 (0.0000) loss 6.6589 (6.1792) grad_norm 1.0934 (1.4542) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][60/625] eta 0:02:32 lr 0.001797 wd 0.0500 time 0.2604 (0.2699) data time 0.0010 (0.0083) model time 0.2595 (0.2603) loss 5.8339 (6.1532) grad_norm 1.5839 (1.4562) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][70/625] eta 0:02:29 lr 0.001797 wd 0.0500 time 0.2697 (0.2690) data time 0.0008 (0.0073) model time 0.2689 (0.2611) loss 5.2072 (6.1602) grad_norm 0.9745 (1.4915) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][80/625] eta 0:02:26 lr 0.001797 wd 0.0500 time 0.2642 (0.2681) data time 0.0012 (0.0065) model time 0.2630 (0.2611) loss 5.2185 (6.0757) grad_norm 1.3477 (1.4677) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][90/625] eta 0:02:23 lr 0.001797 wd 0.0500 time 0.2594 (0.2674) data time 0.0011 (0.0059) model time 0.2583 (0.2609) loss 6.7030 (6.0356) grad_norm 1.2637 (1.4692) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][100/625] eta 0:02:20 lr 0.001797 wd 0.0500 time 0.2582 (0.2669) data time 0.0012 (0.0054) model time 0.2570 (0.2609) loss 4.7306 (6.0348) grad_norm 1.1453 (1.4636) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][110/625] eta 0:02:18 lr 0.001796 wd 0.0500 time 0.4851 (0.2684) data time 0.0009 (0.0050) model time 0.4842 (0.2645) loss 6.2956 (6.0494) grad_norm 2.3463 (1.4845) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][120/625] eta 0:02:15 lr 0.001796 wd 0.0500 time 0.2636 (0.2679) data time 0.0010 (0.0047) model time 0.2626 (0.2641) loss 6.1782 (6.0516) grad_norm 2.0676 (1.4999) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][130/625] eta 0:02:12 lr 0.001796 wd 0.0500 time 0.2617 (0.2675) data time 0.0010 (0.0045) model time 0.2607 (0.2637) loss 7.5798 (6.0885) grad_norm 3.0523 (1.5357) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][140/625] eta 0:02:09 lr 0.001796 wd 0.0500 time 0.2634 (0.2671) data time 0.0011 (0.0042) model time 0.2623 (0.2634) loss 5.6801 (6.0994) grad_norm 2.0868 (1.5637) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][150/625] eta 0:02:06 lr 0.001796 wd 0.0500 time 0.2641 (0.2668) data time 0.0010 (0.0040) model time 0.2631 (0.2632) loss 5.5867 (6.0923) grad_norm 1.6465 (1.5455) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][160/625] eta 0:02:03 lr 0.001796 wd 0.0500 time 0.2585 (0.2665) data time 0.0008 (0.0038) model time 0.2577 (0.2631) loss 4.7308 (6.0776) grad_norm 1.1436 (1.5371) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][170/625] eta 0:02:01 lr 0.001796 wd 0.0500 time 0.2783 (0.2664) data time 0.0010 (0.0037) model time 0.2774 (0.2631) loss 6.3133 (6.1097) grad_norm 1.6466 (1.5450) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][180/625] eta 0:01:58 lr 0.001796 wd 0.0500 time 0.2669 (0.2663) data time 0.0009 (0.0035) model time 0.2660 (0.2631) loss 5.2901 (6.1131) grad_norm 1.5525 (1.5513) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][190/625] eta 0:01:55 lr 0.001796 wd 0.0500 time 0.2656 (0.2662) data time 0.0008 (0.0034) model time 0.2648 (0.2631) loss 5.6777 (6.1067) grad_norm 1.1224 (1.5501) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][200/625] eta 0:01:52 lr 0.001796 wd 0.0500 time 0.2566 (0.2658) data time 0.0013 (0.0033) model time 0.2553 (0.2628) loss 6.7011 (6.1036) grad_norm 1.7926 (inf) loss_scale 8192.0000 (15976.4378) mem 9655MB [2024-07-29 23:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][210/625] eta 0:01:50 lr 0.001795 wd 0.0500 time 0.2641 (0.2657) data time 0.0010 (0.0032) model time 0.2631 (0.2627) loss 6.9458 (6.1078) grad_norm 1.3542 (inf) loss_scale 8192.0000 (15607.5071) mem 9655MB [2024-07-29 23:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][220/625] eta 0:01:47 lr 0.001795 wd 0.0500 time 0.2658 (0.2657) data time 0.0008 (0.0031) model time 0.2649 (0.2628) loss 7.3774 (6.1130) grad_norm 1.5824 (inf) loss_scale 8192.0000 (15271.9638) mem 9655MB [2024-07-29 23:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][230/625] eta 0:01:44 lr 0.001795 wd 0.0500 time 0.2619 (0.2656) data time 0.0012 (0.0030) model time 0.2608 (0.2627) loss 4.5821 (6.1125) grad_norm 1.1442 (inf) loss_scale 8192.0000 (14965.4719) mem 9655MB [2024-07-29 23:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][240/625] eta 0:01:42 lr 0.001795 wd 0.0500 time 0.2631 (0.2655) data time 0.0010 (0.0029) model time 0.2621 (0.2627) loss 5.3432 (6.1063) grad_norm 2.4416 (inf) loss_scale 8192.0000 (14684.4149) mem 9655MB [2024-07-29 23:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][250/625] eta 0:01:39 lr 0.001795 wd 0.0500 time 0.2604 (0.2653) data time 0.0010 (0.0029) model time 0.2594 (0.2626) loss 5.6268 (6.0763) grad_norm 1.5507 (inf) loss_scale 8192.0000 (14425.7530) mem 9655MB [2024-07-29 23:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][260/625] eta 0:01:36 lr 0.001795 wd 0.0500 time 0.2719 (0.2652) data time 0.0011 (0.0028) model time 0.2709 (0.2625) loss 6.2858 (6.0523) grad_norm 1.6072 (inf) loss_scale 8192.0000 (14186.9119) mem 9655MB [2024-07-29 23:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][270/625] eta 0:01:34 lr 0.001795 wd 0.0500 time 0.2593 (0.2651) data time 0.0011 (0.0027) model time 0.2582 (0.2624) loss 6.7603 (6.0579) grad_norm 1.0596 (inf) loss_scale 8192.0000 (13965.6974) mem 9655MB [2024-07-29 23:27:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][280/625] eta 0:01:31 lr 0.001795 wd 0.0500 time 0.2610 (0.2650) data time 0.0011 (0.0027) model time 0.2599 (0.2624) loss 6.3944 (6.0663) grad_norm 1.0328 (inf) loss_scale 8192.0000 (13760.2278) mem 9655MB [2024-07-29 23:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][290/625] eta 0:01:28 lr 0.001795 wd 0.0500 time 0.2642 (0.2649) data time 0.0008 (0.0026) model time 0.2635 (0.2624) loss 7.3752 (6.0794) grad_norm 1.6303 (inf) loss_scale 8192.0000 (13568.8797) mem 9655MB [2024-07-29 23:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][300/625] eta 0:01:26 lr 0.001794 wd 0.0500 time 0.2588 (0.2649) data time 0.0010 (0.0026) model time 0.2578 (0.2624) loss 6.9699 (6.0668) grad_norm 1.5744 (inf) loss_scale 8192.0000 (13390.2458) mem 9655MB [2024-07-29 23:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][310/625] eta 0:01:23 lr 0.001794 wd 0.0500 time 0.2597 (0.2649) data time 0.0009 (0.0025) model time 0.2589 (0.2624) loss 5.1154 (6.0557) grad_norm 1.6893 (inf) loss_scale 8192.0000 (13223.0997) mem 9655MB [2024-07-29 23:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][320/625] eta 0:01:20 lr 0.001794 wd 0.0500 time 0.2631 (0.2649) data time 0.0010 (0.0025) model time 0.2621 (0.2625) loss 6.9869 (6.0567) grad_norm 1.5633 (inf) loss_scale 8192.0000 (13066.3676) mem 9655MB [2024-07-29 23:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][330/625] eta 0:01:18 lr 0.001794 wd 0.0500 time 0.2722 (0.2649) data time 0.0008 (0.0025) model time 0.2714 (0.2625) loss 5.2310 (6.0316) grad_norm 1.2946 (inf) loss_scale 8192.0000 (12919.1057) mem 9655MB [2024-07-29 23:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][340/625] eta 0:01:15 lr 0.001794 wd 0.0500 time 0.2647 (0.2649) data time 0.0007 (0.0025) model time 0.2639 (0.2625) loss 5.1777 (6.0278) grad_norm 1.4358 (inf) loss_scale 8192.0000 (12780.4809) mem 9655MB [2024-07-29 23:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][350/625] eta 0:01:12 lr 0.001794 wd 0.0500 time 0.2654 (0.2648) data time 0.0008 (0.0025) model time 0.2646 (0.2624) loss 7.0160 (6.0212) grad_norm 1.8476 (inf) loss_scale 8192.0000 (12649.7550) mem 9655MB [2024-07-29 23:27:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][360/625] eta 0:01:10 lr 0.001794 wd 0.0500 time 0.2648 (0.2648) data time 0.0007 (0.0024) model time 0.2640 (0.2624) loss 5.5633 (6.0237) grad_norm 1.7585 (inf) loss_scale 8192.0000 (12526.2715) mem 9655MB [2024-07-29 23:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][370/625] eta 0:01:07 lr 0.001794 wd 0.0500 time 0.2624 (0.2647) data time 0.0010 (0.0024) model time 0.2614 (0.2623) loss 5.9839 (6.0370) grad_norm 1.3981 (inf) loss_scale 8192.0000 (12409.4447) mem 9655MB [2024-07-29 23:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][380/625] eta 0:01:04 lr 0.001794 wd 0.0500 time 0.2593 (0.2646) data time 0.0012 (0.0024) model time 0.2581 (0.2623) loss 6.8530 (6.0361) grad_norm 1.3184 (inf) loss_scale 8192.0000 (12298.7507) mem 9655MB [2024-07-29 23:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][390/625] eta 0:01:02 lr 0.001793 wd 0.0500 time 0.2637 (0.2646) data time 0.0010 (0.0023) model time 0.2627 (0.2624) loss 6.7465 (6.0332) grad_norm 1.7321 (inf) loss_scale 8192.0000 (12193.7187) mem 9655MB [2024-07-29 23:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][400/625] eta 0:00:59 lr 0.001793 wd 0.0500 time 0.2596 (0.2646) data time 0.0010 (0.0023) model time 0.2585 (0.2623) loss 7.1476 (6.0434) grad_norm 2.6150 (inf) loss_scale 8192.0000 (12093.9252) mem 9655MB [2024-07-29 23:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][410/625] eta 0:00:56 lr 0.001793 wd 0.0500 time 0.2652 (0.2645) data time 0.0008 (0.0023) model time 0.2644 (0.2623) loss 4.2138 (6.0272) grad_norm 1.4433 (inf) loss_scale 8192.0000 (11998.9878) mem 9655MB [2024-07-29 23:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][420/625] eta 0:00:54 lr 0.001793 wd 0.0500 time 0.2594 (0.2645) data time 0.0011 (0.0022) model time 0.2582 (0.2623) loss 4.7339 (6.0091) grad_norm 1.0885 (inf) loss_scale 8192.0000 (11908.5606) mem 9655MB [2024-07-29 23:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][430/625] eta 0:00:51 lr 0.001793 wd 0.0500 time 0.2632 (0.2644) data time 0.0009 (0.0022) model time 0.2623 (0.2622) loss 7.1891 (6.0063) grad_norm 1.2838 (inf) loss_scale 8192.0000 (11822.3295) mem 9655MB [2024-07-29 23:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][440/625] eta 0:00:48 lr 0.001793 wd 0.0500 time 0.2599 (0.2644) data time 0.0008 (0.0022) model time 0.2592 (0.2623) loss 7.7312 (6.0156) grad_norm 1.3762 (inf) loss_scale 8192.0000 (11740.0091) mem 9655MB [2024-07-29 23:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][450/625] eta 0:00:46 lr 0.001793 wd 0.0500 time 0.2679 (0.2644) data time 0.0007 (0.0022) model time 0.2672 (0.2622) loss 7.1865 (6.0193) grad_norm 2.4798 (inf) loss_scale 8192.0000 (11661.3392) mem 9655MB [2024-07-29 23:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][460/625] eta 0:00:43 lr 0.001793 wd 0.0500 time 0.2569 (0.2643) data time 0.0009 (0.0021) model time 0.2560 (0.2622) loss 6.7477 (6.0218) grad_norm 1.9832 (inf) loss_scale 8192.0000 (11586.0824) mem 9655MB [2024-07-29 23:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][470/625] eta 0:00:40 lr 0.001793 wd 0.0500 time 0.2573 (0.2643) data time 0.0011 (0.0021) model time 0.2562 (0.2622) loss 6.1157 (6.0320) grad_norm 1.7632 (inf) loss_scale 8192.0000 (11514.0212) mem 9655MB [2024-07-29 23:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][480/625] eta 0:00:38 lr 0.001792 wd 0.0500 time 0.2660 (0.2643) data time 0.0011 (0.0021) model time 0.2649 (0.2622) loss 6.4888 (6.0265) grad_norm 0.9496 (inf) loss_scale 8192.0000 (11444.9563) mem 9655MB [2024-07-29 23:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][490/625] eta 0:00:35 lr 0.001792 wd 0.0500 time 0.2675 (0.2642) data time 0.0009 (0.0021) model time 0.2665 (0.2622) loss 6.5977 (6.0271) grad_norm 1.0001 (inf) loss_scale 8192.0000 (11378.7047) mem 9655MB [2024-07-29 23:28:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][500/625] eta 0:00:33 lr 0.001792 wd 0.0500 time 0.2726 (0.2642) data time 0.0011 (0.0021) model time 0.2716 (0.2622) loss 6.1904 (6.0259) grad_norm 0.9395 (inf) loss_scale 8192.0000 (11315.0978) mem 9655MB [2024-07-29 23:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][510/625] eta 0:00:30 lr 0.001792 wd 0.0500 time 0.2621 (0.2642) data time 0.0010 (0.0020) model time 0.2611 (0.2622) loss 5.2486 (6.0282) grad_norm 1.3432 (inf) loss_scale 8192.0000 (11253.9804) mem 9655MB [2024-07-29 23:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][520/625] eta 0:00:27 lr 0.001792 wd 0.0500 time 0.2628 (0.2642) data time 0.0012 (0.0020) model time 0.2616 (0.2622) loss 5.0585 (6.0230) grad_norm 1.5347 (inf) loss_scale 8192.0000 (11195.2092) mem 9655MB [2024-07-29 23:28:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][530/625] eta 0:00:25 lr 0.001792 wd 0.0500 time 0.2623 (0.2642) data time 0.0010 (0.0020) model time 0.2612 (0.2623) loss 6.9125 (6.0274) grad_norm 2.5420 (inf) loss_scale 8192.0000 (11138.6516) mem 9655MB [2024-07-29 23:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][540/625] eta 0:00:22 lr 0.001792 wd 0.0500 time 0.2736 (0.2643) data time 0.0011 (0.0020) model time 0.2725 (0.2623) loss 7.0561 (6.0293) grad_norm 1.1620 (inf) loss_scale 8192.0000 (11084.1848) mem 9655MB [2024-07-29 23:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][550/625] eta 0:00:19 lr 0.001792 wd 0.0500 time 0.2726 (0.2643) data time 0.0009 (0.0020) model time 0.2717 (0.2624) loss 6.9692 (6.0270) grad_norm 1.4419 (inf) loss_scale 8192.0000 (11031.6951) mem 9655MB [2024-07-29 23:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][560/625] eta 0:00:17 lr 0.001792 wd 0.0500 time 0.2603 (0.2643) data time 0.0012 (0.0020) model time 0.2591 (0.2624) loss 6.7332 (6.0316) grad_norm 2.0151 (inf) loss_scale 8192.0000 (10981.0766) mem 9655MB [2024-07-29 23:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][570/625] eta 0:00:14 lr 0.001791 wd 0.0500 time 0.2641 (0.2643) data time 0.0010 (0.0020) model time 0.2631 (0.2624) loss 6.1896 (6.0297) grad_norm 1.1474 (inf) loss_scale 8192.0000 (10932.2312) mem 9655MB [2024-07-29 23:28:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][580/625] eta 0:00:11 lr 0.001791 wd 0.0500 time 0.2616 (0.2643) data time 0.0010 (0.0020) model time 0.2606 (0.2624) loss 5.1748 (6.0255) grad_norm 1.3824 (inf) loss_scale 8192.0000 (10885.0671) mem 9655MB [2024-07-29 23:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][590/625] eta 0:00:09 lr 0.001791 wd 0.0500 time 0.2699 (0.2643) data time 0.0007 (0.0019) model time 0.2692 (0.2624) loss 6.7020 (6.0301) grad_norm 1.6174 (inf) loss_scale 8192.0000 (10839.4992) mem 9655MB [2024-07-29 23:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][600/625] eta 0:00:06 lr 0.001791 wd 0.0500 time 0.2728 (0.2642) data time 0.0008 (0.0019) model time 0.2720 (0.2624) loss 5.6516 (6.0349) grad_norm 1.2007 (inf) loss_scale 8192.0000 (10795.4476) mem 9655MB [2024-07-29 23:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][610/625] eta 0:00:03 lr 0.001791 wd 0.0500 time 0.2703 (0.2643) data time 0.0008 (0.0019) model time 0.2696 (0.2624) loss 6.0606 (6.0307) grad_norm 2.2148 (inf) loss_scale 8192.0000 (10752.8380) mem 9655MB [2024-07-29 23:28:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][620/625] eta 0:00:01 lr 0.001791 wd 0.0500 time 0.2698 (0.2643) data time 0.0007 (0.0019) model time 0.2690 (0.2625) loss 5.9873 (6.0212) grad_norm 1.4759 (inf) loss_scale 8192.0000 (10711.6006) mem 9655MB [2024-07-29 23:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 78 training takes 0:02:45 [2024-07-29 23:28:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:28:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.6567 (0.6567) Acc@1 86.182 (86.182) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-29 23:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.098) Loss 1.1484 (0.8404) Acc@1 73.486 (81.907) Acc@5 92.969 (96.285) Mem 9655MB [2024-07-29 23:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2607 (1.0024) Acc@1 71.582 (78.060) Acc@5 91.455 (94.320) Mem 9655MB [2024-07-29 23:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.717 Acc@5 94.224 [2024-07-29 23:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-07-29 23:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.72% [2024-07-29 23:28:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:28:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:28:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.632 (0.632) Loss 0.5610 (0.5610) Acc@1 87.109 (87.109) Acc@5 98.047 (98.047) Mem 9655MB [2024-07-29 23:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.112) Loss 0.9629 (0.7215) Acc@1 76.562 (83.061) Acc@5 93.945 (96.724) Mem 9655MB [2024-07-29 23:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.1270 (0.8752) Acc@1 71.387 (79.246) Acc@5 92.236 (94.880) Mem 9655MB [2024-07-29 23:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.971 Acc@5 94.860 [2024-07-29 23:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-29 23:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.97% [2024-07-29 23:28:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:28:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][0/625] eta 0:07:04 lr 0.001791 wd 0.0500 time 0.6793 (0.6793) data time 0.4274 (0.4274) model time 0.0000 (0.0000) loss 6.2194 (6.2194) grad_norm 1.6713 (1.6713) loss_scale 8192.0000 (8192.0000) mem 9654MB [2024-07-29 23:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][10/625] eta 0:03:06 lr 0.001791 wd 0.0500 time 0.2680 (0.3025) data time 0.0010 (0.0398) model time 0.0000 (0.0000) loss 6.5568 (5.7447) grad_norm 2.1657 (1.6677) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][20/625] eta 0:02:51 lr 0.001791 wd 0.0500 time 0.2607 (0.2828) data time 0.0011 (0.0214) model time 0.0000 (0.0000) loss 5.8247 (6.0668) grad_norm 2.2779 (1.8377) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][30/625] eta 0:02:44 lr 0.001791 wd 0.0500 time 0.2640 (0.2763) data time 0.0012 (0.0148) model time 0.0000 (0.0000) loss 4.2961 (5.9588) grad_norm 2.4128 (1.7935) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][40/625] eta 0:02:39 lr 0.001790 wd 0.0500 time 0.2642 (0.2725) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 6.2006 (5.9240) grad_norm 1.7380 (1.7672) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][50/625] eta 0:02:35 lr 0.001790 wd 0.0500 time 0.2611 (0.2709) data time 0.0012 (0.0095) model time 0.0000 (0.0000) loss 6.4366 (5.9810) grad_norm 2.2338 (1.7020) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][60/625] eta 0:02:32 lr 0.001790 wd 0.0500 time 0.2608 (0.2695) data time 0.0008 (0.0081) model time 0.2601 (0.2615) loss 5.1259 (5.8953) grad_norm 1.2391 (1.6626) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][70/625] eta 0:02:28 lr 0.001790 wd 0.0500 time 0.2661 (0.2685) data time 0.0007 (0.0071) model time 0.2654 (0.2613) loss 5.4071 (5.9068) grad_norm 1.7629 (1.6405) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][80/625] eta 0:02:25 lr 0.001790 wd 0.0500 time 0.2572 (0.2677) data time 0.0011 (0.0064) model time 0.2561 (0.2612) loss 6.2135 (5.9062) grad_norm 2.5111 (1.7437) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][90/625] eta 0:02:22 lr 0.001790 wd 0.0500 time 0.2628 (0.2672) data time 0.0010 (0.0058) model time 0.2618 (0.2614) loss 5.7406 (5.9001) grad_norm 1.3949 (1.7372) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][100/625] eta 0:02:20 lr 0.001790 wd 0.0500 time 0.2610 (0.2668) data time 0.0010 (0.0053) model time 0.2600 (0.2616) loss 5.8183 (5.8597) grad_norm 1.9326 (1.7314) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][110/625] eta 0:02:17 lr 0.001790 wd 0.0500 time 0.2644 (0.2667) data time 0.0011 (0.0049) model time 0.2633 (0.2621) loss 5.5819 (5.8550) grad_norm 1.0723 (1.7183) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][120/625] eta 0:02:14 lr 0.001790 wd 0.0500 time 0.2639 (0.2664) data time 0.0010 (0.0046) model time 0.2629 (0.2620) loss 4.6398 (5.8434) grad_norm 1.2164 (1.7089) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][130/625] eta 0:02:11 lr 0.001789 wd 0.0500 time 0.2645 (0.2661) data time 0.0008 (0.0044) model time 0.2636 (0.2619) loss 6.5576 (5.8507) grad_norm 1.3720 (1.7057) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][140/625] eta 0:02:08 lr 0.001789 wd 0.0500 time 0.2594 (0.2659) data time 0.0010 (0.0041) model time 0.2584 (0.2619) loss 6.5819 (5.8474) grad_norm 1.3304 (1.6843) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][150/625] eta 0:02:06 lr 0.001789 wd 0.0500 time 0.2778 (0.2659) data time 0.0009 (0.0039) model time 0.2769 (0.2622) loss 7.0151 (5.9005) grad_norm 1.5582 (1.6779) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][160/625] eta 0:02:03 lr 0.001789 wd 0.0500 time 0.2569 (0.2657) data time 0.0013 (0.0038) model time 0.2556 (0.2622) loss 6.0037 (5.9168) grad_norm 1.1016 (1.6756) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][170/625] eta 0:02:00 lr 0.001789 wd 0.0500 time 0.2626 (0.2656) data time 0.0010 (0.0036) model time 0.2616 (0.2622) loss 6.4695 (5.9401) grad_norm 1.3485 (1.6634) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][180/625] eta 0:01:58 lr 0.001789 wd 0.0500 time 0.2632 (0.2654) data time 0.0010 (0.0035) model time 0.2623 (0.2621) loss 7.1200 (5.9616) grad_norm 1.3795 (1.6496) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][190/625] eta 0:01:55 lr 0.001789 wd 0.0500 time 0.2625 (0.2661) data time 0.0008 (0.0034) model time 0.2617 (0.2633) loss 5.4844 (5.9470) grad_norm 2.4443 (1.6402) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][200/625] eta 0:01:53 lr 0.001789 wd 0.0500 time 0.2624 (0.2660) data time 0.0010 (0.0032) model time 0.2614 (0.2632) loss 4.9267 (5.9400) grad_norm 1.7260 (1.6402) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][210/625] eta 0:01:50 lr 0.001789 wd 0.0500 time 0.2671 (0.2658) data time 0.0011 (0.0031) model time 0.2661 (0.2630) loss 6.8200 (5.9542) grad_norm 1.1945 (1.6261) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][220/625] eta 0:01:47 lr 0.001788 wd 0.0500 time 0.2578 (0.2656) data time 0.0009 (0.0030) model time 0.2569 (0.2630) loss 4.9644 (5.9525) grad_norm 1.4366 (1.6183) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][230/625] eta 0:01:44 lr 0.001788 wd 0.0500 time 0.2611 (0.2655) data time 0.0010 (0.0030) model time 0.2601 (0.2628) loss 6.3372 (5.9706) grad_norm 1.7049 (1.6088) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][240/625] eta 0:01:42 lr 0.001788 wd 0.0500 time 0.2622 (0.2653) data time 0.0009 (0.0029) model time 0.2614 (0.2627) loss 5.2445 (5.9523) grad_norm 1.6116 (1.5951) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][250/625] eta 0:01:39 lr 0.001788 wd 0.0500 time 0.2661 (0.2653) data time 0.0011 (0.0028) model time 0.2650 (0.2627) loss 6.7134 (5.9643) grad_norm 0.9527 (1.5846) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][260/625] eta 0:01:36 lr 0.001788 wd 0.0500 time 0.2640 (0.2653) data time 0.0010 (0.0027) model time 0.2630 (0.2628) loss 6.9246 (5.9614) grad_norm 1.2042 (1.5887) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][270/625] eta 0:01:34 lr 0.001788 wd 0.0500 time 0.2611 (0.2652) data time 0.0010 (0.0027) model time 0.2601 (0.2628) loss 7.0178 (5.9633) grad_norm 1.1200 (1.5985) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][280/625] eta 0:01:31 lr 0.001788 wd 0.0500 time 0.2586 (0.2651) data time 0.0009 (0.0026) model time 0.2577 (0.2628) loss 5.2064 (5.9790) grad_norm 1.1655 (1.6013) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][290/625] eta 0:01:28 lr 0.001788 wd 0.0500 time 0.2691 (0.2651) data time 0.0007 (0.0026) model time 0.2684 (0.2627) loss 6.5927 (5.9780) grad_norm 1.6063 (1.5974) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][300/625] eta 0:01:26 lr 0.001788 wd 0.0500 time 0.2590 (0.2650) data time 0.0010 (0.0025) model time 0.2580 (0.2626) loss 6.4192 (5.9838) grad_norm 1.3820 (1.5926) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][310/625] eta 0:01:23 lr 0.001787 wd 0.0500 time 0.2606 (0.2649) data time 0.0008 (0.0025) model time 0.2598 (0.2626) loss 6.9923 (5.9877) grad_norm 2.8647 (1.6059) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][320/625] eta 0:01:20 lr 0.001787 wd 0.0500 time 0.2637 (0.2648) data time 0.0010 (0.0024) model time 0.2627 (0.2625) loss 5.1414 (5.9832) grad_norm 1.5556 (1.6058) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][330/625] eta 0:01:18 lr 0.001787 wd 0.0500 time 0.2642 (0.2648) data time 0.0009 (0.0024) model time 0.2632 (0.2626) loss 6.8230 (5.9978) grad_norm 1.2650 (1.5988) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][340/625] eta 0:01:15 lr 0.001787 wd 0.0500 time 0.2553 (0.2648) data time 0.0011 (0.0024) model time 0.2542 (0.2626) loss 6.7860 (5.9871) grad_norm 1.2986 (1.5893) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][350/625] eta 0:01:12 lr 0.001787 wd 0.0500 time 0.2632 (0.2647) data time 0.0010 (0.0023) model time 0.2622 (0.2626) loss 5.0137 (5.9862) grad_norm 1.9669 (1.5920) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][360/625] eta 0:01:10 lr 0.001787 wd 0.0500 time 0.2652 (0.2648) data time 0.0009 (0.0023) model time 0.2643 (0.2626) loss 5.1212 (5.9891) grad_norm 1.0608 (1.5965) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][370/625] eta 0:01:07 lr 0.001787 wd 0.0500 time 0.2644 (0.2647) data time 0.0009 (0.0023) model time 0.2635 (0.2626) loss 4.9408 (5.9810) grad_norm 1.1894 (1.5869) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][380/625] eta 0:01:04 lr 0.001787 wd 0.0500 time 0.2550 (0.2646) data time 0.0012 (0.0022) model time 0.2538 (0.2626) loss 6.1660 (5.9730) grad_norm 2.5885 (1.5897) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][390/625] eta 0:01:02 lr 0.001787 wd 0.0500 time 0.2635 (0.2646) data time 0.0010 (0.0022) model time 0.2625 (0.2625) loss 7.3163 (5.9678) grad_norm 1.8962 (1.5920) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][400/625] eta 0:00:59 lr 0.001787 wd 0.0500 time 0.2615 (0.2646) data time 0.0013 (0.0022) model time 0.2602 (0.2625) loss 6.6120 (5.9665) grad_norm 1.6770 (1.5939) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][410/625] eta 0:00:56 lr 0.001786 wd 0.0500 time 0.2606 (0.2645) data time 0.0011 (0.0022) model time 0.2596 (0.2625) loss 6.6018 (5.9717) grad_norm 2.1014 (1.6063) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][420/625] eta 0:00:54 lr 0.001786 wd 0.0500 time 0.2602 (0.2645) data time 0.0009 (0.0021) model time 0.2593 (0.2625) loss 6.5760 (5.9832) grad_norm 1.8713 (1.6122) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][430/625] eta 0:00:51 lr 0.001786 wd 0.0500 time 0.2701 (0.2645) data time 0.0010 (0.0021) model time 0.2691 (0.2625) loss 6.8164 (5.9823) grad_norm 1.3262 (1.6104) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][440/625] eta 0:00:48 lr 0.001786 wd 0.0500 time 0.2595 (0.2644) data time 0.0013 (0.0021) model time 0.2582 (0.2624) loss 6.0462 (5.9765) grad_norm 1.1168 (1.6032) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][450/625] eta 0:00:46 lr 0.001786 wd 0.0500 time 0.2675 (0.2644) data time 0.0009 (0.0021) model time 0.2665 (0.2625) loss 4.7439 (5.9685) grad_norm 2.1671 (1.5985) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][460/625] eta 0:00:43 lr 0.001786 wd 0.0500 time 0.2623 (0.2644) data time 0.0008 (0.0021) model time 0.2615 (0.2624) loss 5.3551 (5.9724) grad_norm 1.3871 (1.5948) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][470/625] eta 0:00:40 lr 0.001786 wd 0.0500 time 0.2637 (0.2643) data time 0.0010 (0.0020) model time 0.2627 (0.2624) loss 6.4843 (5.9754) grad_norm 1.7404 (1.5944) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][480/625] eta 0:00:38 lr 0.001786 wd 0.0500 time 0.2603 (0.2643) data time 0.0010 (0.0020) model time 0.2593 (0.2624) loss 5.7900 (5.9763) grad_norm 1.3836 (1.5895) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][490/625] eta 0:00:35 lr 0.001786 wd 0.0500 time 0.2714 (0.2643) data time 0.0008 (0.0020) model time 0.2707 (0.2624) loss 4.3438 (5.9789) grad_norm 3.1123 (1.5928) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][500/625] eta 0:00:33 lr 0.001785 wd 0.0500 time 0.2611 (0.2643) data time 0.0009 (0.0020) model time 0.2602 (0.2624) loss 6.3947 (5.9728) grad_norm 1.7000 (1.5979) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][510/625] eta 0:00:30 lr 0.001785 wd 0.0500 time 0.2638 (0.2643) data time 0.0011 (0.0020) model time 0.2627 (0.2624) loss 6.1126 (5.9722) grad_norm 1.8023 (1.6068) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][520/625] eta 0:00:27 lr 0.001785 wd 0.0500 time 0.2595 (0.2643) data time 0.0008 (0.0019) model time 0.2587 (0.2625) loss 7.4574 (5.9768) grad_norm 1.7084 (1.6075) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][530/625] eta 0:00:25 lr 0.001785 wd 0.0500 time 0.2594 (0.2643) data time 0.0009 (0.0019) model time 0.2585 (0.2625) loss 6.0126 (5.9716) grad_norm 1.6823 (1.6049) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][540/625] eta 0:00:22 lr 0.001785 wd 0.0500 time 0.2601 (0.2642) data time 0.0010 (0.0019) model time 0.2591 (0.2624) loss 6.7507 (5.9754) grad_norm 1.6109 (1.6067) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][550/625] eta 0:00:19 lr 0.001785 wd 0.0500 time 0.2638 (0.2642) data time 0.0009 (0.0019) model time 0.2628 (0.2624) loss 4.7811 (5.9797) grad_norm 1.5137 (1.6053) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][560/625] eta 0:00:17 lr 0.001785 wd 0.0500 time 0.2616 (0.2642) data time 0.0010 (0.0019) model time 0.2606 (0.2624) loss 6.5006 (5.9737) grad_norm 1.9475 (1.6127) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][570/625] eta 0:00:14 lr 0.001785 wd 0.0500 time 0.2689 (0.2643) data time 0.0012 (0.0019) model time 0.2676 (0.2625) loss 6.8351 (5.9743) grad_norm 0.9491 (1.6181) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][580/625] eta 0:00:11 lr 0.001785 wd 0.0500 time 0.2586 (0.2643) data time 0.0011 (0.0019) model time 0.2575 (0.2625) loss 6.1428 (5.9802) grad_norm 1.0911 (1.6185) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][590/625] eta 0:00:09 lr 0.001784 wd 0.0500 time 0.2746 (0.2643) data time 0.0009 (0.0018) model time 0.2736 (0.2625) loss 5.5364 (5.9823) grad_norm 1.1463 (1.6110) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][600/625] eta 0:00:06 lr 0.001784 wd 0.0500 time 0.2668 (0.2642) data time 0.0010 (0.0018) model time 0.2658 (0.2625) loss 6.9282 (5.9850) grad_norm 1.5397 (1.6095) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][610/625] eta 0:00:03 lr 0.001784 wd 0.0500 time 0.2622 (0.2642) data time 0.0007 (0.0018) model time 0.2615 (0.2625) loss 5.9145 (5.9824) grad_norm 1.9982 (1.6050) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][620/625] eta 0:00:01 lr 0.001784 wd 0.0500 time 0.2608 (0.2642) data time 0.0005 (0.0018) model time 0.2603 (0.2625) loss 6.1762 (5.9789) grad_norm 1.2262 (1.6012) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 79 training takes 0:02:45 [2024-07-29 23:31:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:31:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.589 (0.589) Loss 0.6519 (0.6519) Acc@1 86.475 (86.475) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-29 23:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 1.0889 (0.8118) Acc@1 75.000 (82.142) Acc@5 93.506 (96.436) Mem 9655MB [2024-07-29 23:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.2344 (0.9824) Acc@1 70.361 (78.009) Acc@5 92.041 (94.362) Mem 9655MB [2024-07-29 23:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.729 Acc@5 94.262 [2024-07-29 23:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-07-29 23:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.73% [2024-07-29 23:31:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:31:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5610 (0.5610) Acc@1 87.109 (87.109) Acc@5 98.145 (98.145) Mem 9655MB [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9609 (0.7206) Acc@1 76.709 (83.114) Acc@5 93.848 (96.751) Mem 9655MB [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1250 (0.8741) Acc@1 71.387 (79.281) Acc@5 92.236 (94.917) Mem 9655MB [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.003 Acc@5 94.892 [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.00% [2024-07-29 23:31:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:31:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][0/625] eta 0:07:33 lr 0.001784 wd 0.0500 time 0.7262 (0.7262) data time 0.4714 (0.4714) model time 0.0000 (0.0000) loss 6.9697 (6.9697) grad_norm 1.3219 (1.3219) loss_scale 8192.0000 (8192.0000) mem 9654MB [2024-07-29 23:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][10/625] eta 0:03:07 lr 0.001784 wd 0.0500 time 0.2606 (0.3055) data time 0.0010 (0.0439) model time 0.0000 (0.0000) loss 5.0210 (5.9347) grad_norm 2.4543 (1.8382) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][20/625] eta 0:02:53 lr 0.001784 wd 0.0500 time 0.2893 (0.2860) data time 0.0008 (0.0235) model time 0.0000 (0.0000) loss 4.4699 (5.8956) grad_norm 1.4769 (1.7073) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][30/625] eta 0:02:45 lr 0.001784 wd 0.0500 time 0.2630 (0.2786) data time 0.0010 (0.0162) model time 0.0000 (0.0000) loss 5.4677 (5.9441) grad_norm 1.5486 (1.7187) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][40/625] eta 0:02:40 lr 0.001784 wd 0.0500 time 0.2587 (0.2749) data time 0.0009 (0.0126) model time 0.0000 (0.0000) loss 6.5802 (6.0227) grad_norm 1.2407 (1.6302) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][50/625] eta 0:02:36 lr 0.001783 wd 0.0500 time 0.2659 (0.2725) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 5.4976 (6.0379) grad_norm 2.0566 (1.6321) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][60/625] eta 0:02:33 lr 0.001783 wd 0.0500 time 0.2638 (0.2708) data time 0.0010 (0.0088) model time 0.2628 (0.2612) loss 6.0014 (6.0281) grad_norm 1.6774 (1.6462) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][70/625] eta 0:02:29 lr 0.001783 wd 0.0500 time 0.2623 (0.2701) data time 0.0011 (0.0077) model time 0.2612 (0.2629) loss 6.9303 (6.0892) grad_norm 1.7604 (1.6028) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][80/625] eta 0:02:26 lr 0.001783 wd 0.0500 time 0.2781 (0.2695) data time 0.0010 (0.0069) model time 0.2771 (0.2633) loss 5.8059 (6.1345) grad_norm 1.1925 (1.6191) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][90/625] eta 0:02:23 lr 0.001783 wd 0.0500 time 0.2613 (0.2690) data time 0.0011 (0.0063) model time 0.2602 (0.2634) loss 6.2431 (6.1302) grad_norm 2.0285 (1.6351) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][100/625] eta 0:02:20 lr 0.001783 wd 0.0500 time 0.2640 (0.2682) data time 0.0011 (0.0058) model time 0.2628 (0.2627) loss 5.7972 (6.1014) grad_norm 2.2810 (1.6234) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][110/625] eta 0:02:17 lr 0.001783 wd 0.0500 time 0.2614 (0.2678) data time 0.0011 (0.0054) model time 0.2603 (0.2626) loss 6.3161 (6.0894) grad_norm 1.1392 (1.6135) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][120/625] eta 0:02:15 lr 0.001783 wd 0.0500 time 0.2672 (0.2675) data time 0.0010 (0.0050) model time 0.2662 (0.2627) loss 5.7487 (6.1000) grad_norm 2.1491 (1.6198) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][130/625] eta 0:02:12 lr 0.001783 wd 0.0500 time 0.2588 (0.2672) data time 0.0011 (0.0047) model time 0.2577 (0.2626) loss 5.0345 (6.0827) grad_norm 2.5527 (1.6482) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][140/625] eta 0:02:09 lr 0.001782 wd 0.0500 time 0.2584 (0.2669) data time 0.0011 (0.0045) model time 0.2573 (0.2626) loss 6.0810 (6.0830) grad_norm 1.2437 (1.6429) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][150/625] eta 0:02:06 lr 0.001782 wd 0.0500 time 0.2684 (0.2666) data time 0.0009 (0.0043) model time 0.2675 (0.2625) loss 7.2678 (6.0973) grad_norm 1.7829 (1.6381) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][160/625] eta 0:02:03 lr 0.001782 wd 0.0500 time 0.2637 (0.2665) data time 0.0009 (0.0041) model time 0.2629 (0.2625) loss 4.3514 (6.1013) grad_norm 1.2107 (1.6417) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][170/625] eta 0:02:01 lr 0.001782 wd 0.0500 time 0.2600 (0.2662) data time 0.0010 (0.0039) model time 0.2591 (0.2624) loss 7.1625 (6.0896) grad_norm 1.2032 (1.6310) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][180/625] eta 0:01:58 lr 0.001782 wd 0.0500 time 0.2641 (0.2662) data time 0.0010 (0.0037) model time 0.2631 (0.2626) loss 6.9261 (6.0992) grad_norm 1.6943 (1.6109) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][190/625] eta 0:01:55 lr 0.001782 wd 0.0500 time 0.2701 (0.2660) data time 0.0011 (0.0036) model time 0.2690 (0.2625) loss 6.4899 (6.0963) grad_norm 1.3882 (1.6055) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][200/625] eta 0:01:53 lr 0.001782 wd 0.0500 time 0.2638 (0.2659) data time 0.0008 (0.0035) model time 0.2631 (0.2625) loss 7.0443 (6.1105) grad_norm 1.6190 (1.6277) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][210/625] eta 0:01:50 lr 0.001782 wd 0.0500 time 0.2629 (0.2658) data time 0.0010 (0.0034) model time 0.2619 (0.2625) loss 5.3958 (6.1121) grad_norm 1.6355 (1.6295) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][220/625] eta 0:01:47 lr 0.001782 wd 0.0500 time 0.2663 (0.2657) data time 0.0008 (0.0033) model time 0.2656 (0.2625) loss 7.0632 (6.1328) grad_norm 1.9224 (1.6255) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][230/625] eta 0:01:44 lr 0.001781 wd 0.0500 time 0.2684 (0.2656) data time 0.0007 (0.0032) model time 0.2677 (0.2625) loss 6.6766 (6.1274) grad_norm 1.4798 (1.6242) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][240/625] eta 0:01:42 lr 0.001781 wd 0.0500 time 0.2625 (0.2654) data time 0.0008 (0.0031) model time 0.2617 (0.2624) loss 5.7528 (6.1179) grad_norm 2.5094 (1.6372) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][250/625] eta 0:01:39 lr 0.001781 wd 0.0500 time 0.2607 (0.2660) data time 0.0010 (0.0030) model time 0.2597 (0.2632) loss 4.9418 (6.1084) grad_norm 1.3554 (1.6368) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][260/625] eta 0:01:37 lr 0.001781 wd 0.0500 time 0.2603 (0.2660) data time 0.0010 (0.0029) model time 0.2592 (0.2633) loss 6.7162 (6.1079) grad_norm 2.3176 (1.6363) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][270/625] eta 0:01:34 lr 0.001781 wd 0.0500 time 0.2787 (0.2660) data time 0.0012 (0.0029) model time 0.2775 (0.2633) loss 5.5319 (6.0924) grad_norm 1.8995 (1.6532) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][280/625] eta 0:01:31 lr 0.001781 wd 0.0500 time 0.2610 (0.2660) data time 0.0009 (0.0028) model time 0.2601 (0.2634) loss 6.7269 (6.0914) grad_norm 1.2861 (1.6535) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][290/625] eta 0:01:29 lr 0.001781 wd 0.0500 time 0.2582 (0.2660) data time 0.0008 (0.0028) model time 0.2574 (0.2634) loss 5.1441 (6.0906) grad_norm 1.2017 (1.6477) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][300/625] eta 0:01:26 lr 0.001781 wd 0.0500 time 0.2946 (0.2660) data time 0.0011 (0.0027) model time 0.2935 (0.2635) loss 5.6725 (6.0794) grad_norm 1.8190 (1.6433) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][310/625] eta 0:01:23 lr 0.001781 wd 0.0500 time 0.2586 (0.2659) data time 0.0011 (0.0027) model time 0.2575 (0.2634) loss 4.6629 (6.0673) grad_norm 0.9819 (1.6516) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][320/625] eta 0:01:21 lr 0.001780 wd 0.0500 time 0.2681 (0.2659) data time 0.0008 (0.0026) model time 0.2672 (0.2635) loss 6.2586 (6.0571) grad_norm 1.1439 (1.6540) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][330/625] eta 0:01:18 lr 0.001780 wd 0.0500 time 0.2555 (0.2658) data time 0.0011 (0.0026) model time 0.2544 (0.2635) loss 6.0924 (6.0532) grad_norm 1.5662 (1.6434) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][340/625] eta 0:01:15 lr 0.001780 wd 0.0500 time 0.2652 (0.2658) data time 0.0010 (0.0025) model time 0.2642 (0.2635) loss 6.8666 (6.0494) grad_norm 1.4438 (1.6434) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][350/625] eta 0:01:13 lr 0.001780 wd 0.0500 time 0.2640 (0.2657) data time 0.0010 (0.0025) model time 0.2631 (0.2634) loss 6.9327 (6.0428) grad_norm 1.3262 (1.6379) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][360/625] eta 0:01:10 lr 0.001780 wd 0.0500 time 0.2662 (0.2656) data time 0.0011 (0.0025) model time 0.2651 (0.2633) loss 6.2296 (6.0386) grad_norm 1.9179 (1.6366) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][370/625] eta 0:01:07 lr 0.001780 wd 0.0500 time 0.2752 (0.2656) data time 0.0008 (0.0024) model time 0.2744 (0.2633) loss 4.8042 (6.0268) grad_norm 1.3198 (1.6305) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][380/625] eta 0:01:05 lr 0.001780 wd 0.0500 time 0.3827 (0.2659) data time 0.0013 (0.0024) model time 0.3814 (0.2637) loss 6.2836 (6.0144) grad_norm 1.3568 (1.6243) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][390/625] eta 0:01:02 lr 0.001780 wd 0.0500 time 0.2720 (0.2660) data time 0.0010 (0.0024) model time 0.2710 (0.2639) loss 6.7920 (6.0136) grad_norm 2.4971 (1.6203) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][400/625] eta 0:00:59 lr 0.001780 wd 0.0500 time 0.2748 (0.2660) data time 0.0013 (0.0024) model time 0.2735 (0.2638) loss 4.3250 (6.0024) grad_norm 1.0964 (1.6140) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][410/625] eta 0:00:57 lr 0.001779 wd 0.0500 time 0.2676 (0.2659) data time 0.0010 (0.0023) model time 0.2667 (0.2638) loss 5.8310 (6.0011) grad_norm 1.3832 (1.6065) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][420/625] eta 0:00:54 lr 0.001779 wd 0.0500 time 0.2644 (0.2660) data time 0.0008 (0.0023) model time 0.2637 (0.2639) loss 5.1881 (5.9974) grad_norm 1.4487 (1.6086) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][430/625] eta 0:00:51 lr 0.001779 wd 0.0500 time 0.2640 (0.2660) data time 0.0008 (0.0023) model time 0.2632 (0.2639) loss 7.4619 (6.0040) grad_norm 1.9745 (1.6164) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][440/625] eta 0:00:49 lr 0.001779 wd 0.0500 time 0.3446 (0.2662) data time 0.0009 (0.0023) model time 0.3437 (0.2641) loss 6.3730 (5.9991) grad_norm 1.5283 (1.6168) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][450/625] eta 0:00:46 lr 0.001779 wd 0.0500 time 0.2651 (0.2661) data time 0.0010 (0.0022) model time 0.2641 (0.2641) loss 6.4619 (6.0107) grad_norm 1.7918 (1.6186) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][460/625] eta 0:00:43 lr 0.001779 wd 0.0500 time 0.2657 (0.2661) data time 0.0007 (0.0022) model time 0.2650 (0.2641) loss 5.8801 (6.0137) grad_norm 1.2482 (1.6101) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][470/625] eta 0:00:41 lr 0.001779 wd 0.0500 time 0.2647 (0.2660) data time 0.0009 (0.0022) model time 0.2637 (0.2640) loss 6.7067 (6.0083) grad_norm 2.6939 (1.6100) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][480/625] eta 0:00:38 lr 0.001779 wd 0.0500 time 0.2633 (0.2660) data time 0.0011 (0.0022) model time 0.2622 (0.2640) loss 6.1760 (6.0029) grad_norm 1.1480 (1.6059) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][490/625] eta 0:00:35 lr 0.001779 wd 0.0500 time 0.2620 (0.2659) data time 0.0010 (0.0022) model time 0.2610 (0.2639) loss 5.5058 (6.0079) grad_norm 1.4909 (1.6000) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][500/625] eta 0:00:33 lr 0.001778 wd 0.0500 time 0.2707 (0.2659) data time 0.0010 (0.0021) model time 0.2697 (0.2639) loss 6.5144 (6.0004) grad_norm 2.6938 (1.5998) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][510/625] eta 0:00:30 lr 0.001778 wd 0.0500 time 0.2598 (0.2658) data time 0.0009 (0.0021) model time 0.2589 (0.2639) loss 5.6083 (6.0063) grad_norm 1.1821 (1.6070) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][520/625] eta 0:00:27 lr 0.001778 wd 0.0500 time 0.2655 (0.2657) data time 0.0010 (0.0021) model time 0.2645 (0.2638) loss 5.7187 (6.0090) grad_norm 1.0642 (1.6020) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][530/625] eta 0:00:25 lr 0.001778 wd 0.0500 time 0.2669 (0.2658) data time 0.0008 (0.0021) model time 0.2661 (0.2639) loss 6.9466 (6.0139) grad_norm 1.1713 (1.5966) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][540/625] eta 0:00:22 lr 0.001778 wd 0.0500 time 0.2634 (0.2657) data time 0.0011 (0.0021) model time 0.2623 (0.2638) loss 4.5742 (6.0081) grad_norm 1.3422 (1.5916) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][550/625] eta 0:00:19 lr 0.001778 wd 0.0500 time 0.2635 (0.2657) data time 0.0008 (0.0021) model time 0.2626 (0.2638) loss 5.8758 (6.0089) grad_norm 1.1279 (1.5885) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][560/625] eta 0:00:17 lr 0.001778 wd 0.0500 time 0.3397 (0.2658) data time 0.0012 (0.0020) model time 0.3385 (0.2639) loss 5.8948 (6.0087) grad_norm 1.1918 (1.5893) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][570/625] eta 0:00:14 lr 0.001778 wd 0.0500 time 0.2591 (0.2657) data time 0.0008 (0.0020) model time 0.2583 (0.2639) loss 5.4886 (6.0081) grad_norm 2.6305 (1.5906) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][580/625] eta 0:00:11 lr 0.001778 wd 0.0500 time 0.2592 (0.2657) data time 0.0011 (0.0020) model time 0.2581 (0.2639) loss 6.2999 (6.0055) grad_norm 1.2934 (1.5919) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][590/625] eta 0:00:09 lr 0.001777 wd 0.0500 time 0.2599 (0.2657) data time 0.0019 (0.0020) model time 0.2580 (0.2639) loss 5.8004 (6.0053) grad_norm 1.1696 (1.5876) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][600/625] eta 0:00:06 lr 0.001777 wd 0.0500 time 0.2635 (0.2657) data time 0.0007 (0.0020) model time 0.2628 (0.2638) loss 6.2480 (5.9979) grad_norm 2.8133 (1.5926) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][610/625] eta 0:00:03 lr 0.001777 wd 0.0500 time 0.2591 (0.2656) data time 0.0005 (0.0020) model time 0.2586 (0.2638) loss 5.8320 (5.9997) grad_norm 1.5669 (1.5945) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 23:34:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:34:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:36:40 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 23:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 23:36:53 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 23:36:54 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 23:36:54 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 23:36:54 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 80) [2024-07-29 23:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 23:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][620/625] eta 0:00:20 lr 0.001777 wd 0.0500 time 0.9553 (4.0374) data time 0.0005 (0.2326) model time 0.9548 (3.8048) loss 6.7019 (6.6985) grad_norm 1.6443 (1.5154) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-29 23:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 80 training takes 0:00:09 [2024-07-29 23:37:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:37:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.399 (0.399) Loss 0.6230 (0.6230) Acc@1 86.719 (86.719) Acc@5 97.705 (97.705) Mem 9656MB [2024-07-29 23:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 1.0938 (0.8067) Acc@1 74.072 (81.960) Acc@5 93.018 (96.271) Mem 9656MB [2024-07-29 23:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.072) Loss 1.2031 (0.9676) Acc@1 71.875 (77.932) Acc@5 92.139 (94.292) Mem 9656MB [2024-07-29 23:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.627 Acc@5 94.264 [2024-07-29 23:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.6% [2024-07-29 23:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.733 (0.733) Loss 0.5605 (0.5605) Acc@1 87.158 (87.158) Acc@5 98.145 (98.145) Mem 9656MB [2024-07-29 23:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.127) Loss 0.9580 (0.7198) Acc@1 76.807 (83.159) Acc@5 93.848 (96.760) Mem 9656MB [2024-07-29 23:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.1230 (0.8732) Acc@1 71.533 (79.281) Acc@5 92.188 (94.934) Mem 9656MB [2024-07-29 23:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.011 Acc@5 94.908 [2024-07-29 23:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-29 23:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.01% [2024-07-29 23:37:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:37:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][0/625] eta 0:10:43 lr 0.001777 wd 0.0500 time 1.0292 (1.0292) data time 0.4658 (0.4658) model time 0.0000 (0.0000) loss 6.1567 (6.1567) grad_norm 1.5517 (1.5517) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-29 23:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][10/625] eta 0:03:18 lr 0.001777 wd 0.0500 time 0.2507 (0.3223) data time 0.0008 (0.0432) model time 0.0000 (0.0000) loss 5.8251 (6.3178) grad_norm 1.8122 (1.6299) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][20/625] eta 0:02:54 lr 0.001777 wd 0.0500 time 0.2547 (0.2886) data time 0.0009 (0.0231) model time 0.0000 (0.0000) loss 6.9589 (6.3972) grad_norm 1.0466 (1.6268) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][30/625] eta 0:02:44 lr 0.001777 wd 0.0500 time 0.2516 (0.2766) data time 0.0008 (0.0159) model time 0.0000 (0.0000) loss 5.5588 (6.2706) grad_norm 1.0245 (1.5124) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][40/625] eta 0:02:38 lr 0.001777 wd 0.0500 time 0.2509 (0.2712) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 6.4146 (6.1988) grad_norm 1.0519 (1.4873) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][50/625] eta 0:02:33 lr 0.001776 wd 0.0500 time 0.2555 (0.2675) data time 0.0010 (0.0100) model time 0.0000 (0.0000) loss 6.2480 (6.1884) grad_norm 1.3564 (1.5412) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][60/625] eta 0:02:29 lr 0.001776 wd 0.0500 time 0.2543 (0.2651) data time 0.0009 (0.0086) model time 0.2534 (0.2518) loss 6.2943 (6.1286) grad_norm 1.2045 (1.6276) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][70/625] eta 0:02:26 lr 0.001776 wd 0.0500 time 0.2492 (0.2634) data time 0.0010 (0.0075) model time 0.2482 (0.2519) loss 6.6766 (6.0832) grad_norm 2.2014 (1.6105) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][80/625] eta 0:02:22 lr 0.001776 wd 0.0500 time 0.2521 (0.2622) data time 0.0009 (0.0067) model time 0.2512 (0.2520) loss 5.6923 (6.0417) grad_norm 1.2556 (1.5928) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][90/625] eta 0:02:19 lr 0.001776 wd 0.0500 time 0.2520 (0.2612) data time 0.0010 (0.0061) model time 0.2510 (0.2521) loss 6.3355 (6.0643) grad_norm 2.0693 (1.5889) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][100/625] eta 0:02:16 lr 0.001776 wd 0.0500 time 0.2556 (0.2605) data time 0.0006 (0.0055) model time 0.2549 (0.2524) loss 6.1311 (6.1070) grad_norm 1.1789 (1.5752) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][110/625] eta 0:02:13 lr 0.001776 wd 0.0500 time 0.2525 (0.2599) data time 0.0009 (0.0051) model time 0.2516 (0.2525) loss 6.9409 (6.0880) grad_norm 2.8612 (1.5709) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][120/625] eta 0:02:10 lr 0.001776 wd 0.0500 time 0.2510 (0.2594) data time 0.0011 (0.0048) model time 0.2498 (0.2525) loss 7.5177 (6.0657) grad_norm 2.5554 (1.5617) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][130/625] eta 0:02:08 lr 0.001776 wd 0.0500 time 0.2492 (0.2589) data time 0.0008 (0.0045) model time 0.2485 (0.2525) loss 5.2465 (6.0554) grad_norm 1.3598 (1.5601) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][140/625] eta 0:02:05 lr 0.001775 wd 0.0500 time 0.2553 (0.2586) data time 0.0009 (0.0042) model time 0.2544 (0.2526) loss 5.7348 (6.0508) grad_norm 1.3067 (1.5566) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][150/625] eta 0:02:02 lr 0.001775 wd 0.0500 time 0.2517 (0.2583) data time 0.0013 (0.0040) model time 0.2504 (0.2527) loss 6.0357 (6.0638) grad_norm 0.9254 (1.5586) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][160/625] eta 0:02:00 lr 0.001775 wd 0.0500 time 0.2522 (0.2581) data time 0.0010 (0.0038) model time 0.2512 (0.2528) loss 5.6677 (6.0731) grad_norm 1.3329 (1.5552) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][170/625] eta 0:01:57 lr 0.001775 wd 0.0500 time 0.2533 (0.2578) data time 0.0008 (0.0037) model time 0.2525 (0.2528) loss 5.3915 (6.0567) grad_norm 1.7892 (1.5563) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][180/625] eta 0:01:54 lr 0.001775 wd 0.0500 time 0.2718 (0.2579) data time 0.0009 (0.0035) model time 0.2709 (0.2532) loss 5.6455 (6.0497) grad_norm 1.5417 (1.5587) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][190/625] eta 0:01:52 lr 0.001775 wd 0.0500 time 0.2560 (0.2578) data time 0.0007 (0.0034) model time 0.2553 (0.2533) loss 6.1396 (6.0458) grad_norm 2.4266 (1.5609) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][200/625] eta 0:01:49 lr 0.001775 wd 0.0500 time 0.2616 (0.2576) data time 0.0006 (0.0032) model time 0.2610 (0.2533) loss 6.1082 (6.0171) grad_norm 2.1038 (1.5766) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][210/625] eta 0:01:46 lr 0.001775 wd 0.0500 time 0.2547 (0.2575) data time 0.0007 (0.0031) model time 0.2540 (0.2534) loss 6.3465 (6.0051) grad_norm 1.5899 (1.5698) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][220/625] eta 0:01:44 lr 0.001775 wd 0.0500 time 0.2547 (0.2574) data time 0.0008 (0.0030) model time 0.2539 (0.2535) loss 5.6266 (6.0083) grad_norm 1.6556 (1.5681) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][230/625] eta 0:01:41 lr 0.001774 wd 0.0500 time 0.2596 (0.2574) data time 0.0008 (0.0029) model time 0.2589 (0.2536) loss 5.5536 (6.0003) grad_norm 1.1844 (1.5659) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][240/625] eta 0:01:39 lr 0.001774 wd 0.0500 time 0.2603 (0.2574) data time 0.0009 (0.0029) model time 0.2593 (0.2537) loss 6.5711 (6.0015) grad_norm 1.6414 (1.5622) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][250/625] eta 0:01:36 lr 0.001774 wd 0.0500 time 0.2544 (0.2573) data time 0.0007 (0.0028) model time 0.2537 (0.2538) loss 4.7237 (5.9732) grad_norm 1.3115 (1.5598) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][260/625] eta 0:01:33 lr 0.001774 wd 0.0500 time 0.2509 (0.2573) data time 0.0007 (0.0027) model time 0.2502 (0.2538) loss 4.6792 (5.9622) grad_norm 1.5372 (1.5648) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][270/625] eta 0:01:31 lr 0.001774 wd 0.0500 time 0.2620 (0.2572) data time 0.0009 (0.0027) model time 0.2611 (0.2539) loss 6.4263 (5.9709) grad_norm 1.9891 (1.5788) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][280/625] eta 0:01:28 lr 0.001774 wd 0.0500 time 0.2642 (0.2572) data time 0.0009 (0.0026) model time 0.2633 (0.2539) loss 7.0069 (5.9734) grad_norm 2.0379 (1.5930) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][290/625] eta 0:01:26 lr 0.001774 wd 0.0500 time 0.2529 (0.2571) data time 0.0009 (0.0025) model time 0.2520 (0.2540) loss 4.4089 (5.9556) grad_norm 1.3377 (1.5868) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][300/625] eta 0:01:23 lr 0.001774 wd 0.0500 time 0.2555 (0.2571) data time 0.0008 (0.0025) model time 0.2548 (0.2541) loss 6.8582 (5.9560) grad_norm 1.4403 (1.5791) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][310/625] eta 0:01:20 lr 0.001774 wd 0.0500 time 0.2515 (0.2571) data time 0.0010 (0.0024) model time 0.2505 (0.2541) loss 6.7366 (5.9659) grad_norm 0.9550 (1.5662) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-29 23:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][320/625] eta 0:01:18 lr 0.001773 wd 0.0500 time 0.2547 (0.2571) data time 0.0007 (0.0024) model time 0.2540 (0.2542) loss 4.7989 (5.9816) grad_norm 1.3323 (1.5608) loss_scale 16384.0000 (8319.6012) mem 9655MB [2024-07-29 23:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][330/625] eta 0:01:16 lr 0.001773 wd 0.0500 time 0.2536 (0.2578) data time 0.0009 (0.0024) model time 0.2527 (0.2551) loss 5.2137 (5.9781) grad_norm 2.4471 (1.5705) loss_scale 16384.0000 (8563.2387) mem 9655MB [2024-07-29 23:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][340/625] eta 0:01:13 lr 0.001773 wd 0.0500 time 0.2556 (0.2578) data time 0.0008 (0.0023) model time 0.2548 (0.2551) loss 4.9077 (5.9841) grad_norm 1.4449 (1.5717) loss_scale 16384.0000 (8792.5865) mem 9655MB [2024-07-29 23:38:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][350/625] eta 0:01:10 lr 0.001773 wd 0.0500 time 0.2571 (0.2578) data time 0.0009 (0.0023) model time 0.2562 (0.2552) loss 6.2344 (5.9865) grad_norm 1.3682 (1.5672) loss_scale 16384.0000 (9008.8661) mem 9655MB [2024-07-29 23:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][360/625] eta 0:01:08 lr 0.001773 wd 0.0500 time 0.2571 (0.2578) data time 0.0007 (0.0022) model time 0.2564 (0.2552) loss 4.9097 (5.9867) grad_norm 1.8648 (1.5689) loss_scale 16384.0000 (9213.1634) mem 9655MB [2024-07-29 23:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][370/625] eta 0:01:05 lr 0.001773 wd 0.0500 time 0.2645 (0.2577) data time 0.0008 (0.0022) model time 0.2637 (0.2552) loss 5.9915 (5.9841) grad_norm 1.7697 (1.5685) loss_scale 16384.0000 (9406.4474) mem 9655MB [2024-07-29 23:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][380/625] eta 0:01:03 lr 0.001773 wd 0.0500 time 0.2579 (0.2576) data time 0.0007 (0.0022) model time 0.2572 (0.2552) loss 6.5819 (5.9774) grad_norm 2.7883 (1.5700) loss_scale 16384.0000 (9589.5853) mem 9655MB [2024-07-29 23:38:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][390/625] eta 0:01:00 lr 0.001773 wd 0.0500 time 0.2541 (0.2576) data time 0.0007 (0.0021) model time 0.2534 (0.2552) loss 5.4306 (5.9751) grad_norm 1.3246 (1.5778) loss_scale 16384.0000 (9763.3555) mem 9655MB [2024-07-29 23:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 23:38:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:38:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:40:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-29 23:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-29 23:40:48 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-29 23:41:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-29 23:41:00 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-29 23:41:00 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-29 23:41:00 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-29 23:41:01 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 81) [2024-07-29 23:41:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-29 23:41:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][400/625] eta 0:04:51 lr 0.001773 wd 0.0500 time 0.2603 (1.2951) data time 0.0008 (0.1094) model time 0.2595 (1.1857) loss 7.2257 (6.8773) grad_norm 1.4696 (1.4756) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][410/625] eta 0:02:35 lr 0.001772 wd 0.0500 time 0.2595 (0.7221) data time 0.0009 (0.0493) model time 0.2586 (0.6728) loss 6.7757 (6.4975) grad_norm 1.8475 (1.4933) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][420/625] eta 0:01:54 lr 0.001772 wd 0.0500 time 0.2618 (0.5580) data time 0.0012 (0.0321) model time 0.2606 (0.5259) loss 6.1880 (6.4615) grad_norm 1.2329 (1.6242) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][430/625] eta 0:01:33 lr 0.001772 wd 0.0500 time 0.2613 (0.4807) data time 0.0011 (0.0239) model time 0.2603 (0.4567) loss 6.9350 (6.4238) grad_norm 1.3798 (1.6836) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][440/625] eta 0:01:20 lr 0.001772 wd 0.0500 time 0.2607 (0.4353) data time 0.0009 (0.0192) model time 0.2598 (0.4161) loss 6.0043 (6.3511) grad_norm 1.4113 (1.6896) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][450/625] eta 0:01:11 lr 0.001772 wd 0.0500 time 0.2610 (0.4060) data time 0.0009 (0.0161) model time 0.2601 (0.3899) loss 5.4798 (6.3154) grad_norm 0.9427 (1.6390) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][460/625] eta 0:01:03 lr 0.001772 wd 0.0500 time 0.2624 (0.3849) data time 0.0010 (0.0139) model time 0.2615 (0.3710) loss 4.4972 (6.2709) grad_norm 1.4210 (1.6342) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][470/625] eta 0:00:57 lr 0.001772 wd 0.0500 time 0.2619 (0.3691) data time 0.0008 (0.0123) model time 0.2611 (0.3568) loss 5.0837 (6.2174) grad_norm 1.3720 (1.6257) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][480/625] eta 0:00:51 lr 0.001772 wd 0.0500 time 0.2648 (0.3571) data time 0.0009 (0.0110) model time 0.2639 (0.3461) loss 6.1294 (6.1690) grad_norm 1.0617 (1.6292) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][490/625] eta 0:00:46 lr 0.001772 wd 0.0500 time 0.2649 (0.3474) data time 0.0008 (0.0100) model time 0.2641 (0.3374) loss 7.1760 (6.1873) grad_norm 2.7762 (1.6321) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][500/625] eta 0:00:42 lr 0.001771 wd 0.0500 time 0.2651 (0.3399) data time 0.0007 (0.0092) model time 0.2643 (0.3307) loss 4.8723 (6.2002) grad_norm 1.3662 (1.6439) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][510/625] eta 0:00:38 lr 0.001771 wd 0.0500 time 0.2628 (0.3335) data time 0.0011 (0.0085) model time 0.2617 (0.3249) loss 6.1430 (6.2108) grad_norm 1.1583 (1.6553) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][520/625] eta 0:00:34 lr 0.001771 wd 0.0500 time 0.2590 (0.3279) data time 0.0009 (0.0079) model time 0.2581 (0.3199) loss 5.9764 (6.1842) grad_norm 1.3264 (1.6505) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][530/625] eta 0:00:30 lr 0.001771 wd 0.0500 time 0.2647 (0.3232) data time 0.0010 (0.0074) model time 0.2638 (0.3158) loss 5.7409 (6.1552) grad_norm 2.0736 (1.6460) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][540/625] eta 0:00:27 lr 0.001771 wd 0.0500 time 0.2620 (0.3191) data time 0.0008 (0.0070) model time 0.2612 (0.3121) loss 7.0050 (6.1499) grad_norm 2.0881 (1.6455) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][550/625] eta 0:00:23 lr 0.001771 wd 0.0500 time 0.2557 (0.3155) data time 0.0009 (0.0066) model time 0.2548 (0.3088) loss 5.3212 (6.1193) grad_norm 0.9931 (1.6239) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][560/625] eta 0:00:20 lr 0.001771 wd 0.0500 time 0.2576 (0.3124) data time 0.0013 (0.0063) model time 0.2563 (0.3061) loss 6.9588 (6.1375) grad_norm 1.3190 (1.6056) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][570/625] eta 0:00:17 lr 0.001771 wd 0.0500 time 0.2659 (0.3097) data time 0.0007 (0.0060) model time 0.2651 (0.3037) loss 4.9792 (6.1034) grad_norm 1.1083 (1.5970) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][580/625] eta 0:00:13 lr 0.001770 wd 0.0500 time 0.2618 (0.3075) data time 0.0009 (0.0058) model time 0.2609 (0.3017) loss 6.3604 (6.0977) grad_norm 1.4958 (1.5841) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][590/625] eta 0:00:10 lr 0.001770 wd 0.0500 time 0.2744 (0.3053) data time 0.0010 (0.0055) model time 0.2734 (0.2998) loss 5.3527 (6.0849) grad_norm 1.1933 (1.5745) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][600/625] eta 0:00:07 lr 0.001770 wd 0.0500 time 0.2594 (0.3033) data time 0.0011 (0.0054) model time 0.2583 (0.2980) loss 6.5325 (6.0778) grad_norm 1.3296 (1.5797) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][610/625] eta 0:00:04 lr 0.001770 wd 0.0500 time 0.2615 (0.3015) data time 0.0007 (0.0052) model time 0.2608 (0.2964) loss 6.1146 (6.0678) grad_norm 1.1054 (1.5797) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][620/625] eta 0:00:01 lr 0.001770 wd 0.0500 time 0.2642 (0.2999) data time 0.0007 (0.0050) model time 0.2635 (0.2949) loss 7.1993 (6.0811) grad_norm 2.0863 (1.5957) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-29 23:42:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 81 training takes 0:01:09 [2024-07-29 23:42:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:42:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-29 23:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.6597 (0.6597) Acc@1 87.158 (87.158) Acc@5 97.852 (97.852) Mem 9656MB [2024-07-29 23:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0596 (0.8136) Acc@1 75.537 (82.040) Acc@5 93.457 (96.413) Mem 9656MB [2024-07-29 23:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.2295 (0.9797) Acc@1 71.533 (78.204) Acc@5 92.139 (94.289) Mem 9656MB [2024-07-29 23:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.843 Acc@5 94.236 [2024-07-29 23:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.8% [2024-07-29 23:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.84% [2024-07-29 23:42:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-29 23:42:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-29 23:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5591 (0.5591) Acc@1 87.109 (87.109) Acc@5 98.145 (98.145) Mem 9656MB [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.094) Loss 0.9570 (0.7191) Acc@1 76.709 (83.216) Acc@5 93.945 (96.782) Mem 9656MB [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1201 (0.8716) Acc@1 71.729 (79.357) Acc@5 92.334 (94.964) Mem 9656MB [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.107 Acc@5 94.936 [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.11% [2024-07-29 23:42:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-29 23:42:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-29 23:42:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][0/625] eta 0:08:17 lr 0.001770 wd 0.0500 time 0.7963 (0.7963) data time 0.4621 (0.4621) model time 0.0000 (0.0000) loss 5.7331 (5.7331) grad_norm 3.3490 (3.3490) loss_scale 16384.0000 (16384.0000) mem 9651MB [2024-07-29 23:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][10/625] eta 0:03:12 lr 0.001770 wd 0.0500 time 0.2584 (0.3128) data time 0.0010 (0.0430) model time 0.0000 (0.0000) loss 7.3515 (6.3936) grad_norm 1.6290 (1.8520) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][20/625] eta 0:02:55 lr 0.001770 wd 0.0500 time 0.2633 (0.2908) data time 0.0010 (0.0230) model time 0.0000 (0.0000) loss 6.3331 (6.1189) grad_norm 1.7718 (1.8128) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][30/625] eta 0:02:48 lr 0.001770 wd 0.0500 time 0.2571 (0.2824) data time 0.0008 (0.0160) model time 0.0000 (0.0000) loss 5.8044 (5.9630) grad_norm 1.4936 (1.6975) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][40/625] eta 0:02:42 lr 0.001770 wd 0.0500 time 0.2589 (0.2782) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 6.7098 (5.9322) grad_norm 1.3386 (1.6290) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][50/625] eta 0:02:38 lr 0.001769 wd 0.0500 time 0.2681 (0.2756) data time 0.0010 (0.0102) model time 0.0000 (0.0000) loss 6.5684 (5.9378) grad_norm 1.6068 (1.5677) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][60/625] eta 0:02:34 lr 0.001769 wd 0.0500 time 0.2691 (0.2740) data time 0.0011 (0.0087) model time 0.2680 (0.2642) loss 5.4373 (5.9186) grad_norm 0.9797 (1.5788) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][70/625] eta 0:02:31 lr 0.001769 wd 0.0500 time 0.2659 (0.2727) data time 0.0009 (0.0076) model time 0.2649 (0.2641) loss 6.0956 (5.8594) grad_norm 1.1045 (1.6363) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][80/625] eta 0:02:28 lr 0.001769 wd 0.0500 time 0.2594 (0.2720) data time 0.0010 (0.0068) model time 0.2584 (0.2646) loss 6.6717 (5.8978) grad_norm 2.0510 (1.6877) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][90/625] eta 0:02:25 lr 0.001769 wd 0.0500 time 0.2628 (0.2712) data time 0.0012 (0.0062) model time 0.2616 (0.2644) loss 6.7711 (5.9736) grad_norm 1.2273 (1.6807) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][100/625] eta 0:02:22 lr 0.001769 wd 0.0500 time 0.2679 (0.2707) data time 0.0011 (0.0057) model time 0.2668 (0.2644) loss 7.3025 (5.9893) grad_norm 1.1262 (1.6640) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][110/625] eta 0:02:19 lr 0.001769 wd 0.0500 time 0.2625 (0.2703) data time 0.0012 (0.0053) model time 0.2613 (0.2646) loss 6.6294 (6.0030) grad_norm 1.2127 (1.6361) loss_scale 16384.0000 (16384.0000) mem 9655MB [2024-07-29 23:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-29 23:42:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-29 23:42:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 06:57:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 06:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 06:57:17 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 82) [2024-07-30 06:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 06:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][120/625] eta 0:34:33 lr 0.001769 wd 0.0500 time 0.8516 (4.1059) data time 0.0009 (0.3493) model time 0.8506 (3.7566) loss 6.4489 (6.6625) grad_norm 1.4144 (1.3515) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][130/625] eta 0:07:25 lr 0.001768 wd 0.0500 time 0.2561 (0.9000) data time 0.0009 (0.0592) model time 0.2551 (0.8409) loss 4.9366 (6.2356) grad_norm 1.3015 (1.5914) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][140/625] eta 0:04:55 lr 0.001768 wd 0.0500 time 0.2606 (0.6086) data time 0.0010 (0.0328) model time 0.2596 (0.5758) loss 6.5668 (6.2628) grad_norm 1.0709 (1.5533) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][150/625] eta 0:03:57 lr 0.001768 wd 0.0500 time 0.2585 (0.4999) data time 0.0009 (0.0229) model time 0.2576 (0.4770) loss 5.8842 (6.2712) grad_norm 1.4719 (1.5948) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][160/625] eta 0:03:26 lr 0.001768 wd 0.0500 time 0.2587 (0.4435) data time 0.0010 (0.0177) model time 0.2577 (0.4258) loss 6.9503 (6.1812) grad_norm 1.4146 (1.6185) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][170/625] eta 0:03:06 lr 0.001768 wd 0.0500 time 0.2600 (0.4088) data time 0.0009 (0.0145) model time 0.2591 (0.3943) loss 5.7947 (6.1656) grad_norm 1.4419 (1.5768) loss_scale 16384.0000 (16384.0000) mem 9656MB [2024-07-30 06:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][180/625] eta 0:02:51 lr 0.001768 wd 0.0500 time 0.2591 (0.3847) data time 0.0010 (0.0123) model time 0.2582 (0.3724) loss 7.0356 (6.1307) grad_norm 1.5259 (inf) loss_scale 8192.0000 (15459.0968) mem 9656MB [2024-07-30 06:58:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][190/625] eta 0:02:39 lr 0.001768 wd 0.0500 time 0.2566 (0.3671) data time 0.0013 (0.0108) model time 0.2553 (0.3563) loss 5.3953 (6.0831) grad_norm 1.7823 (inf) loss_scale 8192.0000 (14449.7778) mem 9656MB [2024-07-30 06:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][200/625] eta 0:02:30 lr 0.001768 wd 0.0500 time 0.2628 (0.3538) data time 0.0010 (0.0096) model time 0.2618 (0.3443) loss 5.9630 (6.0931) grad_norm 2.0027 (inf) loss_scale 8192.0000 (13686.6341) mem 9656MB [2024-07-30 06:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][210/625] eta 0:02:22 lr 0.001768 wd 0.0500 time 0.3102 (0.3441) data time 0.0008 (0.0087) model time 0.3095 (0.3354) loss 4.8528 (6.0726) grad_norm 1.3618 (inf) loss_scale 8192.0000 (13089.3913) mem 9656MB [2024-07-30 06:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][220/625] eta 0:02:15 lr 0.001767 wd 0.0500 time 0.2581 (0.3358) data time 0.0007 (0.0079) model time 0.2573 (0.3278) loss 6.5573 (6.1123) grad_norm 1.8116 (inf) loss_scale 8192.0000 (12609.2549) mem 9656MB [2024-07-30 06:58:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][230/625] eta 0:02:09 lr 0.001767 wd 0.0500 time 0.2607 (0.3291) data time 0.0010 (0.0073) model time 0.2597 (0.3218) loss 6.3591 (6.1168) grad_norm 1.6545 (inf) loss_scale 8192.0000 (12214.8571) mem 9656MB [2024-07-30 06:58:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][240/625] eta 0:02:04 lr 0.001767 wd 0.0500 time 0.2588 (0.3235) data time 0.0010 (0.0068) model time 0.2578 (0.3167) loss 6.6920 (6.1195) grad_norm 2.2569 (inf) loss_scale 8192.0000 (11885.1148) mem 9656MB [2024-07-30 06:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][250/625] eta 0:01:59 lr 0.001767 wd 0.0500 time 0.2585 (0.3186) data time 0.0010 (0.0064) model time 0.2575 (0.3123) loss 5.5712 (6.1058) grad_norm 1.5275 (inf) loss_scale 8192.0000 (11605.3333) mem 9656MB [2024-07-30 06:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][260/625] eta 0:01:54 lr 0.001767 wd 0.0500 time 0.2577 (0.3146) data time 0.0010 (0.0060) model time 0.2566 (0.3086) loss 5.8960 (6.0872) grad_norm 1.8422 (inf) loss_scale 8192.0000 (11364.9577) mem 9656MB [2024-07-30 06:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][270/625] eta 0:01:50 lr 0.001767 wd 0.0500 time 0.2587 (0.3110) data time 0.0011 (0.0057) model time 0.2576 (0.3054) loss 6.7575 (6.0844) grad_norm 1.2612 (inf) loss_scale 8192.0000 (11156.2105) mem 9656MB [2024-07-30 06:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][280/625] eta 0:01:46 lr 0.001767 wd 0.0500 time 0.2592 (0.3079) data time 0.0011 (0.0054) model time 0.2581 (0.3025) loss 5.7525 (6.0866) grad_norm 1.0494 (inf) loss_scale 8192.0000 (10973.2346) mem 9656MB [2024-07-30 06:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][290/625] eta 0:01:42 lr 0.001767 wd 0.0500 time 0.2599 (0.3051) data time 0.0012 (0.0051) model time 0.2587 (0.2999) loss 6.0048 (6.0933) grad_norm 1.7917 (inf) loss_scale 8192.0000 (10811.5349) mem 9656MB [2024-07-30 06:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][300/625] eta 0:01:38 lr 0.001767 wd 0.0500 time 0.2570 (0.3026) data time 0.0011 (0.0049) model time 0.2559 (0.2977) loss 6.2191 (6.0733) grad_norm 1.8209 (inf) loss_scale 8192.0000 (10667.6044) mem 9656MB [2024-07-30 06:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][310/625] eta 0:01:34 lr 0.001766 wd 0.0500 time 0.2613 (0.3006) data time 0.0012 (0.0047) model time 0.2601 (0.2959) loss 6.1288 (6.0691) grad_norm 1.1895 (inf) loss_scale 8192.0000 (10538.6667) mem 9656MB [2024-07-30 06:58:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][320/625] eta 0:01:31 lr 0.001766 wd 0.0500 time 0.2728 (0.2988) data time 0.0008 (0.0045) model time 0.2720 (0.2943) loss 6.5410 (6.0504) grad_norm 1.9098 (inf) loss_scale 8192.0000 (10422.4950) mem 9656MB [2024-07-30 06:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][330/625] eta 0:01:27 lr 0.001766 wd 0.0500 time 0.2614 (0.2970) data time 0.0011 (0.0044) model time 0.2603 (0.2927) loss 6.2475 (6.0450) grad_norm 2.9687 (inf) loss_scale 8192.0000 (10317.2830) mem 9656MB [2024-07-30 06:58:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][340/625] eta 0:01:24 lr 0.001766 wd 0.0500 time 0.2593 (0.2955) data time 0.0009 (0.0042) model time 0.2583 (0.2912) loss 6.8462 (6.0414) grad_norm 1.5013 (inf) loss_scale 8192.0000 (10221.5495) mem 9656MB [2024-07-30 06:58:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][350/625] eta 0:01:20 lr 0.001766 wd 0.0500 time 0.2654 (0.2942) data time 0.0009 (0.0041) model time 0.2646 (0.2901) loss 5.6967 (6.0391) grad_norm 1.3308 (inf) loss_scale 8192.0000 (10134.0690) mem 9656MB [2024-07-30 06:58:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][360/625] eta 0:01:17 lr 0.001766 wd 0.0500 time 0.2628 (0.2929) data time 0.0008 (0.0040) model time 0.2620 (0.2889) loss 6.7506 (6.0363) grad_norm 1.3536 (inf) loss_scale 8192.0000 (10053.8182) mem 9656MB [2024-07-30 06:58:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][370/625] eta 0:01:14 lr 0.001766 wd 0.0500 time 0.2623 (0.2916) data time 0.0008 (0.0038) model time 0.2615 (0.2878) loss 7.3062 (6.0303) grad_norm 2.6201 (inf) loss_scale 8192.0000 (9979.9365) mem 9656MB [2024-07-30 06:58:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][380/625] eta 0:01:11 lr 0.001766 wd 0.0500 time 0.2600 (0.2905) data time 0.0012 (0.0037) model time 0.2588 (0.2867) loss 6.5635 (6.0221) grad_norm 1.7560 (inf) loss_scale 8192.0000 (9911.6947) mem 9656MB [2024-07-30 06:58:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][390/625] eta 0:01:08 lr 0.001766 wd 0.0500 time 0.2600 (0.2894) data time 0.0010 (0.0036) model time 0.2590 (0.2858) loss 5.3008 (6.0130) grad_norm 2.5228 (inf) loss_scale 8192.0000 (9848.4706) mem 9656MB [2024-07-30 06:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][400/625] eta 0:01:04 lr 0.001765 wd 0.0500 time 0.2580 (0.2884) data time 0.0011 (0.0035) model time 0.2569 (0.2848) loss 4.5593 (6.0234) grad_norm 2.3941 (inf) loss_scale 8192.0000 (9789.7305) mem 9656MB [2024-07-30 06:58:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][410/625] eta 0:01:01 lr 0.001765 wd 0.0500 time 0.2620 (0.2875) data time 0.0010 (0.0035) model time 0.2610 (0.2840) loss 5.5186 (6.0241) grad_norm 1.5541 (inf) loss_scale 8192.0000 (9735.0137) mem 9656MB [2024-07-30 06:59:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][420/625] eta 0:00:58 lr 0.001765 wd 0.0500 time 0.2592 (0.2866) data time 0.0012 (0.0034) model time 0.2581 (0.2833) loss 6.6741 (6.0184) grad_norm 1.2185 (inf) loss_scale 8192.0000 (9683.9205) mem 9656MB [2024-07-30 06:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][430/625] eta 0:00:55 lr 0.001765 wd 0.0500 time 0.2649 (0.2859) data time 0.0010 (0.0033) model time 0.2639 (0.2826) loss 6.3790 (6.0149) grad_norm 1.0751 (inf) loss_scale 8192.0000 (9636.1026) mem 9656MB [2024-07-30 06:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][440/625] eta 0:00:52 lr 0.001765 wd 0.0500 time 0.2645 (0.2851) data time 0.0010 (0.0032) model time 0.2635 (0.2819) loss 5.2707 (6.0246) grad_norm 2.7793 (inf) loss_scale 8192.0000 (9591.2547) mem 9656MB [2024-07-30 06:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][450/625] eta 0:00:49 lr 0.001765 wd 0.0500 time 0.2571 (0.2844) data time 0.0011 (0.0032) model time 0.2560 (0.2812) loss 7.2398 (6.0281) grad_norm 1.5017 (inf) loss_scale 8192.0000 (9549.1084) mem 9656MB [2024-07-30 06:59:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][460/625] eta 0:00:46 lr 0.001765 wd 0.0500 time 0.2624 (0.2837) data time 0.0010 (0.0031) model time 0.2613 (0.2806) loss 6.0057 (6.0339) grad_norm 1.7267 (inf) loss_scale 8192.0000 (9509.4269) mem 9656MB [2024-07-30 06:59:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][470/625] eta 0:00:43 lr 0.001765 wd 0.0500 time 0.2571 (0.2831) data time 0.0009 (0.0031) model time 0.2562 (0.2800) loss 6.2209 (6.0359) grad_norm 1.4508 (inf) loss_scale 8192.0000 (9472.0000) mem 9656MB [2024-07-30 06:59:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][480/625] eta 0:00:40 lr 0.001764 wd 0.0500 time 0.2595 (0.2825) data time 0.0009 (0.0030) model time 0.2585 (0.2795) loss 7.4682 (6.0375) grad_norm 1.2313 (inf) loss_scale 8192.0000 (9436.6409) mem 9656MB [2024-07-30 06:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][490/625] eta 0:00:38 lr 0.001764 wd 0.0500 time 0.2577 (0.2820) data time 0.0012 (0.0030) model time 0.2565 (0.2790) loss 6.3154 (6.0353) grad_norm 2.4728 (inf) loss_scale 8192.0000 (9403.1828) mem 9656MB [2024-07-30 06:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][500/625] eta 0:00:35 lr 0.001764 wd 0.0500 time 0.2569 (0.2814) data time 0.0009 (0.0029) model time 0.2560 (0.2785) loss 6.4847 (6.0332) grad_norm 1.9655 (inf) loss_scale 8192.0000 (9371.4764) mem 9656MB [2024-07-30 06:59:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][510/625] eta 0:00:32 lr 0.001764 wd 0.0500 time 0.2593 (0.2810) data time 0.0009 (0.0029) model time 0.2584 (0.2782) loss 5.3943 (6.0196) grad_norm 1.4796 (inf) loss_scale 8192.0000 (9341.3878) mem 9656MB [2024-07-30 06:59:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][520/625] eta 0:00:29 lr 0.001764 wd 0.0500 time 0.2559 (0.2805) data time 0.0011 (0.0028) model time 0.2548 (0.2777) loss 6.9427 (6.0359) grad_norm 1.4916 (inf) loss_scale 8192.0000 (9312.7960) mem 9656MB [2024-07-30 06:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][530/625] eta 0:00:26 lr 0.001764 wd 0.0500 time 0.2583 (0.2800) data time 0.0009 (0.0028) model time 0.2574 (0.2773) loss 6.3405 (6.0398) grad_norm 1.8184 (inf) loss_scale 8192.0000 (9285.5922) mem 9656MB [2024-07-30 06:59:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][540/625] eta 0:00:23 lr 0.001764 wd 0.0500 time 0.2608 (0.2803) data time 0.0007 (0.0027) model time 0.2600 (0.2776) loss 6.2293 (6.0375) grad_norm 1.2552 (inf) loss_scale 8192.0000 (9259.6777) mem 9656MB [2024-07-30 06:59:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][550/625] eta 0:00:20 lr 0.001764 wd 0.0500 time 0.2585 (0.2799) data time 0.0007 (0.0027) model time 0.2577 (0.2772) loss 5.3008 (6.0429) grad_norm 1.3535 (inf) loss_scale 8192.0000 (9234.9630) mem 9656MB [2024-07-30 06:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][560/625] eta 0:00:18 lr 0.001764 wd 0.0500 time 0.2634 (0.2796) data time 0.0010 (0.0027) model time 0.2624 (0.2769) loss 6.1461 (6.0502) grad_norm 1.1832 (inf) loss_scale 8192.0000 (9211.3665) mem 9656MB [2024-07-30 06:59:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][570/625] eta 0:00:15 lr 0.001763 wd 0.0500 time 0.2615 (0.2792) data time 0.0010 (0.0026) model time 0.2605 (0.2765) loss 5.8792 (6.0447) grad_norm 2.3705 (inf) loss_scale 8192.0000 (9188.8142) mem 9656MB [2024-07-30 06:59:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][580/625] eta 0:00:12 lr 0.001763 wd 0.0500 time 0.2773 (0.2788) data time 0.0007 (0.0026) model time 0.2766 (0.2762) loss 5.1698 (6.0341) grad_norm 2.1825 (inf) loss_scale 8192.0000 (9167.2381) mem 9656MB [2024-07-30 06:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][590/625] eta 0:00:09 lr 0.001763 wd 0.0500 time 0.2600 (0.2784) data time 0.0010 (0.0025) model time 0.2589 (0.2759) loss 5.7082 (6.0218) grad_norm 1.6173 (inf) loss_scale 8192.0000 (9146.5763) mem 9656MB [2024-07-30 06:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][600/625] eta 0:00:06 lr 0.001763 wd 0.0500 time 0.2665 (0.2781) data time 0.0007 (0.0025) model time 0.2658 (0.2756) loss 7.8159 (6.0226) grad_norm 1.8277 (inf) loss_scale 8192.0000 (9126.7718) mem 9656MB [2024-07-30 06:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][610/625] eta 0:00:04 lr 0.001763 wd 0.0500 time 0.2609 (0.2778) data time 0.0007 (0.0025) model time 0.2602 (0.2753) loss 7.2030 (6.0285) grad_norm 1.2964 (inf) loss_scale 8192.0000 (9107.7724) mem 9656MB [2024-07-30 06:59:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][620/625] eta 0:00:01 lr 0.001763 wd 0.0500 time 0.2623 (0.2775) data time 0.0007 (0.0025) model time 0.2616 (0.2750) loss 6.0710 (6.0240) grad_norm 0.9528 (inf) loss_scale 8192.0000 (9089.5299) mem 9656MB [2024-07-30 06:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 82 training takes 0:02:20 [2024-07-30 06:59:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 06:59:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 06:59:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6772 (0.6772) Acc@1 86.328 (86.328) Acc@5 98.047 (98.047) Mem 9656MB [2024-07-30 06:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.096) Loss 1.1006 (0.8375) Acc@1 75.635 (82.302) Acc@5 93.604 (96.444) Mem 9656MB [2024-07-30 06:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.2891 (1.0042) Acc@1 69.727 (78.244) Acc@5 90.918 (94.382) Mem 9656MB [2024-07-30 06:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.971 Acc@5 94.308 [2024-07-30 06:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.0% [2024-07-30 06:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.97% [2024-07-30 06:59:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 06:59:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 07:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.5586 (0.5586) Acc@1 87.158 (87.158) Acc@5 98.096 (98.096) Mem 9656MB [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9556 (0.7186) Acc@1 76.807 (83.274) Acc@5 93.945 (96.755) Mem 9656MB [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.079) Loss 1.1191 (0.8708) Acc@1 71.826 (79.422) Acc@5 92.236 (94.947) Mem 9656MB [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.163 Acc@5 94.922 [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.16% [2024-07-30 07:00:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:00:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][0/625] eta 0:24:48 lr 0.001763 wd 0.0500 time 2.3821 (2.3821) data time 0.5771 (0.5771) model time 0.0000 (0.0000) loss 6.3529 (6.3529) grad_norm 2.5082 (2.5082) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-30 07:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][10/625] eta 0:04:39 lr 0.001763 wd 0.0500 time 0.2581 (0.4537) data time 0.0011 (0.0535) model time 0.0000 (0.0000) loss 5.8033 (6.4060) grad_norm 1.1405 (1.9164) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][20/625] eta 0:03:39 lr 0.001763 wd 0.0500 time 0.2634 (0.3627) data time 0.0010 (0.0285) model time 0.0000 (0.0000) loss 6.5545 (6.1345) grad_norm 1.7576 (1.7127) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][30/625] eta 0:03:16 lr 0.001762 wd 0.0500 time 0.2623 (0.3296) data time 0.0007 (0.0196) model time 0.0000 (0.0000) loss 5.1823 (6.0308) grad_norm 1.6874 (1.6182) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][40/625] eta 0:03:03 lr 0.001762 wd 0.0500 time 0.2645 (0.3130) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 6.6219 (6.0350) grad_norm 1.4421 (1.5733) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][50/625] eta 0:02:54 lr 0.001762 wd 0.0500 time 0.2597 (0.3030) data time 0.0010 (0.0123) model time 0.0000 (0.0000) loss 4.8719 (6.1001) grad_norm 1.2751 (1.5638) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][60/625] eta 0:02:47 lr 0.001762 wd 0.0500 time 0.2672 (0.2963) data time 0.0010 (0.0105) model time 0.2662 (0.2606) loss 6.5422 (6.1358) grad_norm 1.6962 (1.5244) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][70/625] eta 0:02:42 lr 0.001762 wd 0.0500 time 0.2681 (0.2920) data time 0.0011 (0.0092) model time 0.2670 (0.2628) loss 6.8127 (6.0998) grad_norm 1.9065 (1.5151) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][80/625] eta 0:02:43 lr 0.001762 wd 0.0500 time 1.2083 (0.2998) data time 0.0009 (0.0082) model time 1.2074 (0.2933) loss 6.8961 (6.1562) grad_norm 1.3480 (1.4874) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][90/625] eta 0:02:38 lr 0.001762 wd 0.0500 time 0.2623 (0.2971) data time 0.0008 (0.0074) model time 0.2615 (0.2884) loss 6.9518 (6.1645) grad_norm 1.2140 (1.4963) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][100/625] eta 0:02:36 lr 0.001762 wd 0.0500 time 0.2666 (0.2971) data time 0.0011 (0.0068) model time 0.2655 (0.2900) loss 7.1273 (6.1346) grad_norm 1.2501 (1.5153) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][110/625] eta 0:02:31 lr 0.001762 wd 0.0500 time 0.2590 (0.2946) data time 0.0013 (0.0063) model time 0.2577 (0.2863) loss 6.2735 (6.1448) grad_norm 2.5558 (1.5351) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][120/625] eta 0:02:27 lr 0.001761 wd 0.0500 time 0.2592 (0.2928) data time 0.0008 (0.0058) model time 0.2584 (0.2842) loss 5.7629 (6.1522) grad_norm 1.1758 (1.5440) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][130/625] eta 0:02:23 lr 0.001761 wd 0.0500 time 0.2566 (0.2904) data time 0.0009 (0.0055) model time 0.2557 (0.2812) loss 5.7566 (6.1491) grad_norm 1.4961 (1.5448) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][140/625] eta 0:02:19 lr 0.001761 wd 0.0500 time 0.2568 (0.2882) data time 0.0009 (0.0052) model time 0.2558 (0.2787) loss 5.9939 (6.1230) grad_norm 1.1048 (1.5518) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][150/625] eta 0:02:25 lr 0.001761 wd 0.0500 time 0.2573 (0.3067) data time 0.0014 (0.0127) model time 0.2559 (0.2957) loss 4.6140 (6.1017) grad_norm 1.7431 (1.5357) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][160/625] eta 0:02:23 lr 0.001761 wd 0.0500 time 1.0348 (0.3096) data time 0.0010 (0.0119) model time 1.0338 (0.3009) loss 6.1626 (6.0896) grad_norm 2.0173 (1.5255) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 07:01:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 07:01:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:01:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 07:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 07:12:46 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 07:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 07:16:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 07:16:33 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 83) [2024-07-30 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 07:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][170/625] eta 0:18:36 lr 0.001761 wd 0.0500 time 0.2520 (2.4529) data time 0.0006 (0.3312) model time 0.2514 (2.1217) loss 5.1115 (6.1274) grad_norm 1.2251 (1.4298) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:16:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][180/625] eta 0:05:37 lr 0.001761 wd 0.0500 time 0.2486 (0.7593) data time 0.0010 (0.0772) model time 0.2476 (0.6821) loss 6.2344 (6.3244) grad_norm 1.9678 (1.7718) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:16:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][190/625] eta 0:03:54 lr 0.001761 wd 0.0500 time 0.2494 (0.5387) data time 0.0007 (0.0441) model time 0.2487 (0.4946) loss 7.5128 (6.5570) grad_norm 1.8518 (1.7872) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][200/625] eta 0:03:12 lr 0.001760 wd 0.0500 time 0.2478 (0.4526) data time 0.0008 (0.0310) model time 0.2471 (0.4216) loss 6.7793 (6.5205) grad_norm 1.1718 (1.8085) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][210/625] eta 0:02:48 lr 0.001760 wd 0.0500 time 0.2498 (0.4058) data time 0.0009 (0.0241) model time 0.2489 (0.3817) loss 6.3253 (6.3542) grad_norm 1.4484 (1.7179) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][220/625] eta 0:02:32 lr 0.001760 wd 0.0500 time 0.2491 (0.3770) data time 0.0010 (0.0197) model time 0.2481 (0.3573) loss 6.2916 (6.3016) grad_norm 1.6342 (1.7238) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][230/625] eta 0:02:21 lr 0.001760 wd 0.0500 time 0.2538 (0.3574) data time 0.0009 (0.0168) model time 0.2529 (0.3406) loss 5.8802 (6.2721) grad_norm 1.0817 (1.6871) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][240/625] eta 0:02:12 lr 0.001760 wd 0.0500 time 0.2507 (0.3431) data time 0.0009 (0.0146) model time 0.2498 (0.3284) loss 6.6504 (6.2061) grad_norm 1.3346 (1.6540) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][250/625] eta 0:02:04 lr 0.001760 wd 0.0500 time 0.2498 (0.3321) data time 0.0008 (0.0130) model time 0.2490 (0.3191) loss 5.4653 (6.1710) grad_norm 2.6800 (1.6698) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][260/625] eta 0:01:58 lr 0.001760 wd 0.0500 time 0.2519 (0.3236) data time 0.0009 (0.0117) model time 0.2510 (0.3119) loss 7.5416 (6.1499) grad_norm 2.2439 (1.7436) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][270/625] eta 0:01:52 lr 0.001760 wd 0.0500 time 0.2524 (0.3169) data time 0.0007 (0.0107) model time 0.2517 (0.3062) loss 6.3802 (6.1795) grad_norm 2.2379 (1.7591) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][280/625] eta 0:01:47 lr 0.001760 wd 0.0500 time 0.2482 (0.3113) data time 0.0006 (0.0098) model time 0.2475 (0.3015) loss 6.1206 (6.1772) grad_norm 1.2632 (1.7305) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][290/625] eta 0:01:42 lr 0.001759 wd 0.0500 time 0.2477 (0.3066) data time 0.0011 (0.0091) model time 0.2467 (0.2975) loss 5.8895 (6.1743) grad_norm 1.2211 (1.6933) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][300/625] eta 0:01:38 lr 0.001759 wd 0.0500 time 0.2551 (0.3026) data time 0.0009 (0.0085) model time 0.2542 (0.2941) loss 6.6432 (6.1571) grad_norm 1.1191 (1.6914) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][310/625] eta 0:01:34 lr 0.001759 wd 0.0500 time 0.2515 (0.2992) data time 0.0008 (0.0079) model time 0.2507 (0.2912) loss 7.4162 (6.1372) grad_norm 2.8834 (1.7149) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][320/625] eta 0:01:30 lr 0.001759 wd 0.0500 time 0.2553 (0.2962) data time 0.0007 (0.0075) model time 0.2546 (0.2887) loss 5.3898 (6.1248) grad_norm 1.6314 (1.7234) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][330/625] eta 0:01:26 lr 0.001759 wd 0.0500 time 0.2532 (0.2936) data time 0.0007 (0.0071) model time 0.2525 (0.2865) loss 5.2421 (6.1212) grad_norm 0.9903 (1.7145) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][340/625] eta 0:01:23 lr 0.001759 wd 0.0500 time 0.2557 (0.2912) data time 0.0008 (0.0067) model time 0.2548 (0.2845) loss 6.9120 (6.1161) grad_norm 1.1857 (1.6929) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][350/625] eta 0:01:19 lr 0.001759 wd 0.0500 time 0.2547 (0.2892) data time 0.0009 (0.0064) model time 0.2538 (0.2827) loss 6.8647 (6.0966) grad_norm 1.7318 (1.6811) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][360/625] eta 0:01:16 lr 0.001759 wd 0.0500 time 0.2549 (0.2873) data time 0.0008 (0.0061) model time 0.2541 (0.2812) loss 6.2098 (6.0863) grad_norm 1.4491 (1.6720) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][370/625] eta 0:01:12 lr 0.001759 wd 0.0500 time 0.2540 (0.2857) data time 0.0006 (0.0059) model time 0.2534 (0.2798) loss 5.0474 (6.0634) grad_norm 1.0870 (1.6707) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][380/625] eta 0:01:09 lr 0.001758 wd 0.0500 time 0.2555 (0.2843) data time 0.0009 (0.0057) model time 0.2546 (0.2786) loss 4.6714 (6.0521) grad_norm 1.0393 (1.6645) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][390/625] eta 0:01:06 lr 0.001758 wd 0.0500 time 0.2515 (0.2829) data time 0.0006 (0.0055) model time 0.2509 (0.2775) loss 4.4826 (6.0442) grad_norm 1.2501 (1.6514) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][400/625] eta 0:01:03 lr 0.001758 wd 0.0500 time 0.2548 (0.2817) data time 0.0009 (0.0053) model time 0.2539 (0.2765) loss 5.0037 (6.0418) grad_norm 0.9725 (1.6325) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][410/625] eta 0:01:00 lr 0.001758 wd 0.0500 time 0.2547 (0.2807) data time 0.0010 (0.0051) model time 0.2537 (0.2756) loss 7.1705 (6.0444) grad_norm 2.7672 (1.6334) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][420/625] eta 0:00:57 lr 0.001758 wd 0.0500 time 0.2524 (0.2797) data time 0.0009 (0.0049) model time 0.2515 (0.2748) loss 7.0665 (6.0284) grad_norm 1.6383 (1.6538) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][430/625] eta 0:00:54 lr 0.001758 wd 0.0500 time 0.2584 (0.2788) data time 0.0007 (0.0048) model time 0.2577 (0.2740) loss 5.8527 (6.0133) grad_norm 1.2566 (1.6435) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][440/625] eta 0:00:51 lr 0.001758 wd 0.0500 time 0.2537 (0.2780) data time 0.0009 (0.0047) model time 0.2528 (0.2733) loss 6.9238 (6.0039) grad_norm 1.2128 (1.6300) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][450/625] eta 0:00:48 lr 0.001758 wd 0.0500 time 0.2590 (0.2772) data time 0.0010 (0.0045) model time 0.2580 (0.2727) loss 6.3945 (6.0072) grad_norm 2.3911 (1.6561) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][460/625] eta 0:00:45 lr 0.001757 wd 0.0500 time 0.2564 (0.2764) data time 0.0009 (0.0044) model time 0.2556 (0.2720) loss 5.5695 (6.0055) grad_norm 2.1980 (1.6590) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][470/625] eta 0:00:42 lr 0.001757 wd 0.0500 time 0.2537 (0.2758) data time 0.0008 (0.0043) model time 0.2529 (0.2715) loss 5.5223 (5.9925) grad_norm 1.3217 (1.6660) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][480/625] eta 0:00:39 lr 0.001757 wd 0.0500 time 0.2528 (0.2751) data time 0.0007 (0.0042) model time 0.2521 (0.2709) loss 7.1168 (5.9927) grad_norm 1.0589 (1.6568) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][490/625] eta 0:00:37 lr 0.001757 wd 0.0500 time 0.2504 (0.2745) data time 0.0010 (0.0041) model time 0.2494 (0.2704) loss 6.7749 (6.0104) grad_norm 1.6907 (1.6547) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][500/625] eta 0:00:34 lr 0.001757 wd 0.0500 time 0.2531 (0.2739) data time 0.0008 (0.0040) model time 0.2523 (0.2699) loss 6.5478 (6.0059) grad_norm 1.2438 (1.6468) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-30 07:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][510/625] eta 0:00:31 lr 0.001757 wd 0.0500 time 0.2530 (0.2734) data time 0.0010 (0.0039) model time 0.2520 (0.2695) loss 6.7501 (6.0105) grad_norm 1.5126 (inf) loss_scale 4096.0000 (8096.4665) mem 9658MB [2024-07-30 07:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][520/625] eta 0:00:28 lr 0.001757 wd 0.0500 time 0.2530 (0.2729) data time 0.0007 (0.0039) model time 0.2523 (0.2690) loss 6.2478 (6.0147) grad_norm 3.0774 (inf) loss_scale 4096.0000 (7983.1388) mem 9658MB [2024-07-30 07:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][530/625] eta 0:00:25 lr 0.001757 wd 0.0500 time 0.2577 (0.2724) data time 0.0006 (0.0038) model time 0.2571 (0.2687) loss 6.8542 (6.0141) grad_norm 2.2049 (inf) loss_scale 4096.0000 (7876.0551) mem 9658MB [2024-07-30 07:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][540/625] eta 0:00:23 lr 0.001757 wd 0.0500 time 0.2677 (0.2720) data time 0.0007 (0.0037) model time 0.2670 (0.2683) loss 6.4714 (6.0104) grad_norm 1.0821 (inf) loss_scale 4096.0000 (7774.7131) mem 9658MB [2024-07-30 07:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][550/625] eta 0:00:20 lr 0.001756 wd 0.0500 time 0.2549 (0.2716) data time 0.0011 (0.0036) model time 0.2538 (0.2680) loss 4.5219 (6.0003) grad_norm 1.6516 (inf) loss_scale 4096.0000 (7678.6632) mem 9658MB [2024-07-30 07:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][560/625] eta 0:00:17 lr 0.001756 wd 0.0500 time 0.2530 (0.2712) data time 0.0017 (0.0036) model time 0.2513 (0.2676) loss 6.8239 (5.9924) grad_norm 1.0706 (inf) loss_scale 4096.0000 (7587.5013) mem 9658MB [2024-07-30 07:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][570/625] eta 0:00:14 lr 0.001756 wd 0.0500 time 0.2556 (0.2708) data time 0.0009 (0.0035) model time 0.2547 (0.2673) loss 5.5453 (6.0013) grad_norm 0.9758 (inf) loss_scale 4096.0000 (7500.8635) mem 9658MB [2024-07-30 07:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][580/625] eta 0:00:12 lr 0.001756 wd 0.0500 time 0.2605 (0.2706) data time 0.0017 (0.0035) model time 0.2587 (0.2671) loss 7.3478 (6.0134) grad_norm 1.6084 (inf) loss_scale 4096.0000 (7418.4213) mem 9658MB [2024-07-30 07:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][590/625] eta 0:00:09 lr 0.001756 wd 0.0500 time 0.2605 (0.2708) data time 0.0006 (0.0034) model time 0.2599 (0.2674) loss 4.2525 (6.0066) grad_norm 3.4206 (inf) loss_scale 4096.0000 (7339.8771) mem 9658MB [2024-07-30 07:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][600/625] eta 0:00:06 lr 0.001756 wd 0.0500 time 0.2649 (0.2705) data time 0.0006 (0.0033) model time 0.2643 (0.2671) loss 6.8405 (6.0227) grad_norm 2.4284 (inf) loss_scale 4096.0000 (7264.9607) mem 9658MB [2024-07-30 07:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][610/625] eta 0:00:04 lr 0.001756 wd 0.0500 time 0.2557 (0.2701) data time 0.0004 (0.0033) model time 0.2553 (0.2669) loss 6.4378 (6.0299) grad_norm 2.3273 (inf) loss_scale 4096.0000 (7193.4266) mem 9658MB [2024-07-30 07:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][620/625] eta 0:00:01 lr 0.001756 wd 0.0500 time 0.2594 (0.2698) data time 0.0004 (0.0032) model time 0.2590 (0.2666) loss 4.7733 (6.0204) grad_norm 1.1747 (inf) loss_scale 4096.0000 (7125.0508) mem 9658MB [2024-07-30 07:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 83 training takes 0:02:03 [2024-07-30 07:18:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:18:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.6616 (0.6616) Acc@1 86.719 (86.719) Acc@5 97.803 (97.803) Mem 9658MB [2024-07-30 07:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 1.0391 (0.8129) Acc@1 75.439 (82.253) Acc@5 94.043 (96.493) Mem 9658MB [2024-07-30 07:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.2500 (0.9758) Acc@1 70.752 (78.123) Acc@5 90.869 (94.308) Mem 9658MB [2024-07-30 07:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.949 Acc@5 94.304 [2024-07-30 07:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-07-30 07:18:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.808 (0.808) Loss 0.5586 (0.5586) Acc@1 87.305 (87.305) Acc@5 98.047 (98.047) Mem 9658MB [2024-07-30 07:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.9551 (0.7184) Acc@1 76.953 (83.350) Acc@5 94.092 (96.777) Mem 9658MB [2024-07-30 07:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.095) Loss 1.1191 (0.8702) Acc@1 71.826 (79.499) Acc@5 92.139 (94.945) Mem 9658MB [2024-07-30 07:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.237 Acc@5 94.914 [2024-07-30 07:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-30 07:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.24% [2024-07-30 07:18:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:18:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][0/625] eta 0:10:26 lr 0.001756 wd 0.0500 time 1.0030 (1.0030) data time 0.6976 (0.6976) model time 0.0000 (0.0000) loss 5.6454 (5.6454) grad_norm 1.5113 (1.5113) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 07:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][10/625] eta 0:03:20 lr 0.001755 wd 0.0500 time 0.2560 (0.3255) data time 0.0009 (0.0642) model time 0.0000 (0.0000) loss 5.1420 (5.5588) grad_norm 2.3952 (1.7513) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][20/625] eta 0:02:56 lr 0.001755 wd 0.0500 time 0.2552 (0.2919) data time 0.0011 (0.0341) model time 0.0000 (0.0000) loss 4.5099 (5.5239) grad_norm 1.3762 (1.7078) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][30/625] eta 0:02:47 lr 0.001755 wd 0.0500 time 0.2554 (0.2809) data time 0.0009 (0.0235) model time 0.0000 (0.0000) loss 7.0627 (5.8499) grad_norm 1.9528 (1.7553) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][40/625] eta 0:02:40 lr 0.001755 wd 0.0500 time 0.2565 (0.2746) data time 0.0009 (0.0180) model time 0.0000 (0.0000) loss 6.1116 (5.8260) grad_norm 2.0109 (1.7712) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][50/625] eta 0:02:35 lr 0.001755 wd 0.0500 time 0.2545 (0.2712) data time 0.0007 (0.0147) model time 0.0000 (0.0000) loss 7.1256 (5.8165) grad_norm 1.1235 (1.7180) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][60/625] eta 0:02:31 lr 0.001755 wd 0.0500 time 0.2518 (0.2684) data time 0.0010 (0.0124) model time 0.2508 (0.2531) loss 6.0657 (5.9347) grad_norm 2.4536 (1.7344) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][70/625] eta 0:02:28 lr 0.001755 wd 0.0500 time 0.2545 (0.2668) data time 0.0006 (0.0109) model time 0.2538 (0.2542) loss 4.2791 (5.8668) grad_norm 2.2738 (1.7630) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][80/625] eta 0:02:24 lr 0.001755 wd 0.0500 time 0.2543 (0.2653) data time 0.0007 (0.0097) model time 0.2537 (0.2541) loss 6.9647 (5.8883) grad_norm 1.8037 (1.7721) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][90/625] eta 0:02:21 lr 0.001754 wd 0.0500 time 0.2519 (0.2642) data time 0.0017 (0.0087) model time 0.2502 (0.2541) loss 6.2955 (5.8974) grad_norm 1.2911 (1.7760) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][100/625] eta 0:02:18 lr 0.001754 wd 0.0500 time 0.2569 (0.2634) data time 0.0009 (0.0080) model time 0.2559 (0.2543) loss 6.3053 (5.9214) grad_norm 2.3510 (1.7715) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][110/625] eta 0:02:15 lr 0.001754 wd 0.0500 time 0.2518 (0.2631) data time 0.0007 (0.0073) model time 0.2512 (0.2551) loss 6.9230 (5.9484) grad_norm 1.9172 (1.7533) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][120/625] eta 0:02:12 lr 0.001754 wd 0.0500 time 0.2537 (0.2624) data time 0.0010 (0.0068) model time 0.2527 (0.2549) loss 6.8479 (5.9505) grad_norm 1.3879 (1.7279) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][130/625] eta 0:02:10 lr 0.001754 wd 0.0500 time 0.2516 (0.2639) data time 0.0010 (0.0064) model time 0.2506 (0.2582) loss 6.2868 (5.9747) grad_norm 1.2000 (1.6998) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][140/625] eta 0:02:07 lr 0.001754 wd 0.0500 time 0.2539 (0.2633) data time 0.0009 (0.0060) model time 0.2530 (0.2577) loss 5.6390 (5.9867) grad_norm 1.7637 (1.6864) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][150/625] eta 0:02:04 lr 0.001754 wd 0.0500 time 0.2532 (0.2627) data time 0.0008 (0.0057) model time 0.2524 (0.2574) loss 5.6981 (5.9702) grad_norm 1.5125 (1.6931) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][160/625] eta 0:02:01 lr 0.001754 wd 0.0500 time 0.2568 (0.2622) data time 0.0010 (0.0054) model time 0.2558 (0.2570) loss 5.9243 (5.9941) grad_norm 1.8354 (1.7089) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][170/625] eta 0:01:59 lr 0.001754 wd 0.0500 time 0.2549 (0.2619) data time 0.0007 (0.0051) model time 0.2542 (0.2570) loss 5.6492 (5.9957) grad_norm 1.4148 (1.6992) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][180/625] eta 0:01:56 lr 0.001753 wd 0.0500 time 0.2555 (0.2616) data time 0.0009 (0.0049) model time 0.2545 (0.2568) loss 7.0288 (6.0061) grad_norm 3.3857 (1.7156) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 07:19:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:19:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 07:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 07:22:49 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 07:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 07:23:02 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 07:23:02 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 07:23:02 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 07:23:02 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 84) [2024-07-30 07:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 07:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][190/625] eta 0:11:47 lr 0.001753 wd 0.0500 time 0.2529 (1.6264) data time 0.0009 (0.1490) model time 0.2520 (1.4774) loss 6.9154 (6.7108) grad_norm 1.0656 (1.6087) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][200/625] eta 0:05:02 lr 0.001753 wd 0.0500 time 0.2589 (0.7127) data time 0.0007 (0.0503) model time 0.2581 (0.6624) loss 7.0384 (6.4806) grad_norm 1.1838 (1.6043) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][210/625] eta 0:03:39 lr 0.001753 wd 0.0500 time 0.2531 (0.5290) data time 0.0010 (0.0305) model time 0.2521 (0.4984) loss 7.1318 (6.4949) grad_norm 1.2027 (1.5332) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][220/625] eta 0:03:02 lr 0.001753 wd 0.0500 time 0.2589 (0.4504) data time 0.0017 (0.0221) model time 0.2572 (0.4283) loss 6.1122 (6.4437) grad_norm 1.4264 (1.6365) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][230/625] eta 0:02:40 lr 0.001753 wd 0.0500 time 0.2543 (0.4068) data time 0.0010 (0.0174) model time 0.2533 (0.3894) loss 6.4415 (6.3318) grad_norm 1.9987 (1.6856) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][240/625] eta 0:02:25 lr 0.001753 wd 0.0500 time 0.2573 (0.3789) data time 0.0009 (0.0144) model time 0.2564 (0.3646) loss 5.2108 (6.2598) grad_norm 2.0684 (1.6617) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][250/625] eta 0:02:14 lr 0.001753 wd 0.0500 time 0.2513 (0.3598) data time 0.0008 (0.0123) model time 0.2505 (0.3474) loss 6.4825 (6.2262) grad_norm 1.7113 (1.6736) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][260/625] eta 0:02:06 lr 0.001752 wd 0.0500 time 0.2565 (0.3457) data time 0.0009 (0.0108) model time 0.2556 (0.3349) loss 4.7675 (6.1580) grad_norm 1.3631 (1.6194) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][270/625] eta 0:01:58 lr 0.001752 wd 0.0500 time 0.2523 (0.3350) data time 0.0007 (0.0096) model time 0.2515 (0.3254) loss 5.3511 (6.1388) grad_norm 1.2921 (1.6185) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][280/625] eta 0:01:52 lr 0.001752 wd 0.0500 time 0.2544 (0.3267) data time 0.0009 (0.0087) model time 0.2535 (0.3180) loss 6.3311 (6.1529) grad_norm 1.1916 (1.6058) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][290/625] eta 0:01:47 lr 0.001752 wd 0.0500 time 0.2537 (0.3199) data time 0.0009 (0.0080) model time 0.2529 (0.3119) loss 5.3649 (6.1659) grad_norm 3.1310 (1.6601) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][300/625] eta 0:01:42 lr 0.001752 wd 0.0500 time 0.2541 (0.3142) data time 0.0007 (0.0073) model time 0.2534 (0.3069) loss 4.7099 (6.1441) grad_norm 0.9515 (1.6341) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][310/625] eta 0:01:37 lr 0.001752 wd 0.0500 time 0.2535 (0.3095) data time 0.0008 (0.0068) model time 0.2527 (0.3026) loss 4.7543 (6.1304) grad_norm 1.4376 (1.6072) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][320/625] eta 0:01:33 lr 0.001752 wd 0.0500 time 0.2595 (0.3055) data time 0.0007 (0.0064) model time 0.2588 (0.2991) loss 5.7503 (6.1290) grad_norm 1.3731 (1.5846) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][330/625] eta 0:01:29 lr 0.001752 wd 0.0500 time 0.2537 (0.3020) data time 0.0009 (0.0060) model time 0.2527 (0.2960) loss 6.2287 (6.1031) grad_norm 2.4220 (1.5801) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][340/625] eta 0:01:25 lr 0.001752 wd 0.0500 time 0.2509 (0.2991) data time 0.0010 (0.0057) model time 0.2499 (0.2934) loss 6.4984 (6.0968) grad_norm 3.1602 (1.6102) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][350/625] eta 0:01:21 lr 0.001751 wd 0.0500 time 0.2535 (0.2964) data time 0.0009 (0.0054) model time 0.2526 (0.2910) loss 6.7511 (6.1059) grad_norm 1.6733 (1.6219) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:23:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][360/625] eta 0:01:17 lr 0.001751 wd 0.0500 time 0.2533 (0.2941) data time 0.0008 (0.0051) model time 0.2525 (0.2890) loss 5.8445 (6.0938) grad_norm 2.5536 (1.6316) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][370/625] eta 0:01:14 lr 0.001751 wd 0.0500 time 0.2527 (0.2920) data time 0.0008 (0.0049) model time 0.2519 (0.2870) loss 5.9054 (6.0792) grad_norm 1.9310 (1.6373) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][380/625] eta 0:01:11 lr 0.001751 wd 0.0500 time 0.2585 (0.2902) data time 0.0009 (0.0047) model time 0.2576 (0.2855) loss 4.7339 (6.0658) grad_norm 1.4136 (1.6392) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][390/625] eta 0:01:07 lr 0.001751 wd 0.0500 time 0.2523 (0.2886) data time 0.0009 (0.0045) model time 0.2515 (0.2840) loss 5.3932 (6.0469) grad_norm 1.5676 (1.6220) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][400/625] eta 0:01:04 lr 0.001751 wd 0.0500 time 0.2536 (0.2872) data time 0.0010 (0.0044) model time 0.2526 (0.2829) loss 5.7437 (6.0374) grad_norm 1.4635 (1.6085) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][410/625] eta 0:01:01 lr 0.001751 wd 0.0500 time 0.2545 (0.2859) data time 0.0008 (0.0042) model time 0.2537 (0.2817) loss 5.4675 (6.0412) grad_norm 1.4612 (1.6003) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][420/625] eta 0:00:58 lr 0.001751 wd 0.0500 time 0.2571 (0.2847) data time 0.0010 (0.0041) model time 0.2561 (0.2806) loss 6.4421 (6.0244) grad_norm 1.1397 (1.6133) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][430/625] eta 0:00:55 lr 0.001750 wd 0.0500 time 0.2582 (0.2836) data time 0.0010 (0.0039) model time 0.2572 (0.2796) loss 5.4310 (6.0303) grad_norm 1.5969 (1.6053) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][440/625] eta 0:00:52 lr 0.001750 wd 0.0500 time 0.2530 (0.2825) data time 0.0008 (0.0038) model time 0.2522 (0.2787) loss 5.5018 (6.0175) grad_norm 1.4697 (1.6033) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][450/625] eta 0:00:49 lr 0.001750 wd 0.0500 time 0.2579 (0.2818) data time 0.0007 (0.0037) model time 0.2572 (0.2781) loss 4.8337 (6.0098) grad_norm 1.3937 (1.6042) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][460/625] eta 0:00:46 lr 0.001750 wd 0.0500 time 0.2580 (0.2809) data time 0.0009 (0.0036) model time 0.2572 (0.2773) loss 6.9186 (6.0161) grad_norm 1.2612 (1.6015) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][470/625] eta 0:00:43 lr 0.001750 wd 0.0500 time 0.2561 (0.2801) data time 0.0009 (0.0035) model time 0.2553 (0.2765) loss 6.6513 (6.0210) grad_norm 1.7968 (1.6123) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][480/625] eta 0:00:40 lr 0.001750 wd 0.0500 time 0.2700 (0.2793) data time 0.0016 (0.0034) model time 0.2684 (0.2759) loss 6.0150 (6.0081) grad_norm 1.5059 (1.6130) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][490/625] eta 0:00:37 lr 0.001750 wd 0.0500 time 0.2527 (0.2786) data time 0.0009 (0.0034) model time 0.2518 (0.2752) loss 4.9901 (5.9986) grad_norm 1.4521 (1.6219) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][500/625] eta 0:00:34 lr 0.001750 wd 0.0500 time 0.2583 (0.2779) data time 0.0008 (0.0033) model time 0.2575 (0.2746) loss 6.9641 (6.0045) grad_norm 1.5177 (1.6255) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][510/625] eta 0:00:31 lr 0.001750 wd 0.0500 time 0.2583 (0.2772) data time 0.0008 (0.0032) model time 0.2575 (0.2740) loss 6.7298 (6.0233) grad_norm 1.4606 (1.6267) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][520/625] eta 0:00:29 lr 0.001749 wd 0.0500 time 0.2611 (0.2766) data time 0.0006 (0.0031) model time 0.2604 (0.2735) loss 6.3254 (6.0195) grad_norm 1.4879 (1.6274) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][530/625] eta 0:00:26 lr 0.001749 wd 0.0500 time 0.2531 (0.2761) data time 0.0008 (0.0031) model time 0.2523 (0.2730) loss 4.7516 (6.0205) grad_norm 1.1707 (1.6183) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][540/625] eta 0:00:23 lr 0.001749 wd 0.0500 time 0.2547 (0.2755) data time 0.0009 (0.0030) model time 0.2538 (0.2725) loss 5.5277 (6.0224) grad_norm 1.8039 (1.6137) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][550/625] eta 0:00:20 lr 0.001749 wd 0.0500 time 0.2597 (0.2751) data time 0.0008 (0.0030) model time 0.2589 (0.2721) loss 5.8344 (6.0192) grad_norm 1.1524 (1.6151) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][560/625] eta 0:00:17 lr 0.001749 wd 0.0500 time 0.2529 (0.2746) data time 0.0007 (0.0029) model time 0.2522 (0.2717) loss 5.0997 (6.0193) grad_norm 1.4483 (1.6141) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][570/625] eta 0:00:15 lr 0.001749 wd 0.0500 time 0.2534 (0.2741) data time 0.0007 (0.0028) model time 0.2528 (0.2713) loss 4.7986 (6.0088) grad_norm 1.7972 (1.6142) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][580/625] eta 0:00:12 lr 0.001749 wd 0.0500 time 0.2617 (0.2737) data time 0.0008 (0.0028) model time 0.2610 (0.2709) loss 6.4856 (6.0131) grad_norm 2.2534 (1.6165) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][590/625] eta 0:00:09 lr 0.001749 wd 0.0500 time 0.2560 (0.2733) data time 0.0008 (0.0027) model time 0.2552 (0.2706) loss 5.7651 (6.0187) grad_norm 2.6559 (1.6307) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][600/625] eta 0:00:06 lr 0.001748 wd 0.0500 time 0.2549 (0.2730) data time 0.0009 (0.0027) model time 0.2540 (0.2703) loss 6.5526 (6.0223) grad_norm 1.8370 (1.6407) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][610/625] eta 0:00:04 lr 0.001748 wd 0.0500 time 0.2584 (0.2732) data time 0.0004 (0.0027) model time 0.2580 (0.2705) loss 7.0240 (6.0192) grad_norm 2.0815 (1.6441) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][620/625] eta 0:00:01 lr 0.001748 wd 0.0500 time 0.2540 (0.2727) data time 0.0003 (0.0026) model time 0.2537 (0.2701) loss 6.6903 (6.0322) grad_norm 1.7964 (1.6418) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 07:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 84 training takes 0:01:59 [2024-07-30 07:25:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:25:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.394 (0.394) Loss 0.6709 (0.6709) Acc@1 86.230 (86.230) Acc@5 97.705 (97.705) Mem 9656MB [2024-07-30 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 1.1182 (0.8583) Acc@1 74.805 (82.022) Acc@5 93.213 (96.333) Mem 9656MB [2024-07-30 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.2773 (1.0220) Acc@1 70.020 (77.872) Acc@5 91.064 (94.299) Mem 9656MB [2024-07-30 07:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.599 Acc@5 94.246 [2024-07-30 07:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.6% [2024-07-30 07:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.726 (0.726) Loss 0.5586 (0.5586) Acc@1 87.354 (87.354) Acc@5 98.047 (98.047) Mem 9656MB [2024-07-30 07:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.128) Loss 0.9546 (0.7180) Acc@1 77.051 (83.416) Acc@5 94.141 (96.746) Mem 9656MB [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.093) Loss 1.1162 (0.8693) Acc@1 71.973 (79.578) Acc@5 92.090 (94.945) Mem 9656MB [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.307 Acc@5 94.916 [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.31% [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:25:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][0/625] eta 0:11:11 lr 0.001748 wd 0.0500 time 1.0736 (1.0736) data time 0.4589 (0.4589) model time 0.0000 (0.0000) loss 5.6127 (5.6127) grad_norm 1.1731 (1.1731) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 07:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][10/625] eta 0:03:23 lr 0.001748 wd 0.0500 time 0.2637 (0.3312) data time 0.0008 (0.0426) model time 0.0000 (0.0000) loss 6.4184 (5.8158) grad_norm 1.6709 (1.3574) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][20/625] eta 0:02:58 lr 0.001748 wd 0.0500 time 0.2563 (0.2955) data time 0.0006 (0.0227) model time 0.0000 (0.0000) loss 6.0009 (5.6645) grad_norm 1.3769 (1.3923) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][30/625] eta 0:02:48 lr 0.001748 wd 0.0500 time 0.2540 (0.2830) data time 0.0011 (0.0157) model time 0.0000 (0.0000) loss 6.3181 (5.5950) grad_norm 1.8041 (1.5489) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][40/625] eta 0:02:41 lr 0.001748 wd 0.0500 time 0.2528 (0.2762) data time 0.0009 (0.0121) model time 0.0000 (0.0000) loss 4.8344 (5.6217) grad_norm 1.1524 (1.5669) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][50/625] eta 0:02:36 lr 0.001748 wd 0.0500 time 0.2527 (0.2721) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 5.2354 (5.7796) grad_norm 1.0471 (1.5258) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][60/625] eta 0:02:32 lr 0.001747 wd 0.0500 time 0.2532 (0.2696) data time 0.0009 (0.0084) model time 0.2523 (0.2557) loss 5.3570 (5.8095) grad_norm 2.8365 (1.5836) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][70/625] eta 0:02:28 lr 0.001747 wd 0.0500 time 0.2571 (0.2682) data time 0.0009 (0.0074) model time 0.2562 (0.2572) loss 5.9818 (5.8623) grad_norm 1.2755 (1.6112) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][80/625] eta 0:02:25 lr 0.001747 wd 0.0500 time 0.2541 (0.2666) data time 0.0008 (0.0066) model time 0.2533 (0.2562) loss 5.5796 (5.8816) grad_norm 1.4445 (1.6702) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][90/625] eta 0:02:22 lr 0.001747 wd 0.0500 time 0.2546 (0.2656) data time 0.0012 (0.0060) model time 0.2535 (0.2563) loss 5.9974 (5.8626) grad_norm 2.4834 (1.6752) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][100/625] eta 0:02:18 lr 0.001747 wd 0.0500 time 0.2578 (0.2648) data time 0.0006 (0.0055) model time 0.2572 (0.2563) loss 5.9868 (5.8829) grad_norm 1.6759 (1.6514) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][110/625] eta 0:02:15 lr 0.001747 wd 0.0500 time 0.2562 (0.2640) data time 0.0011 (0.0050) model time 0.2551 (0.2562) loss 5.0416 (5.8808) grad_norm 1.6035 (1.6266) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][120/625] eta 0:02:13 lr 0.001747 wd 0.0500 time 0.2715 (0.2635) data time 0.0010 (0.0047) model time 0.2705 (0.2562) loss 6.2851 (5.9295) grad_norm 1.3804 (1.5979) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][130/625] eta 0:02:10 lr 0.001747 wd 0.0500 time 0.2507 (0.2628) data time 0.0011 (0.0044) model time 0.2496 (0.2560) loss 6.6330 (5.9472) grad_norm 1.5392 (1.6179) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][140/625] eta 0:02:08 lr 0.001747 wd 0.0500 time 0.2584 (0.2643) data time 0.0008 (0.0042) model time 0.2576 (0.2590) loss 5.5359 (5.9543) grad_norm 2.4145 (1.6290) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][150/625] eta 0:02:05 lr 0.001746 wd 0.0500 time 0.2605 (0.2639) data time 0.0009 (0.0040) model time 0.2596 (0.2587) loss 7.0914 (5.9636) grad_norm 1.1124 (1.6259) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][160/625] eta 0:02:02 lr 0.001746 wd 0.0500 time 0.2548 (0.2637) data time 0.0010 (0.0038) model time 0.2538 (0.2588) loss 6.7650 (5.9735) grad_norm 1.6855 (1.6407) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][170/625] eta 0:01:59 lr 0.001746 wd 0.0500 time 0.2660 (0.2633) data time 0.0007 (0.0036) model time 0.2653 (0.2586) loss 5.8489 (5.9704) grad_norm 1.6142 (1.6376) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][180/625] eta 0:01:57 lr 0.001746 wd 0.0500 time 0.2592 (0.2630) data time 0.0009 (0.0035) model time 0.2583 (0.2584) loss 6.5732 (5.9792) grad_norm 2.1016 (1.6370) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][190/625] eta 0:01:54 lr 0.001746 wd 0.0500 time 0.2562 (0.2627) data time 0.0007 (0.0034) model time 0.2555 (0.2584) loss 6.7962 (5.9932) grad_norm 1.1254 (1.6278) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][200/625] eta 0:01:51 lr 0.001746 wd 0.0500 time 0.2539 (0.2624) data time 0.0006 (0.0032) model time 0.2533 (0.2581) loss 3.9608 (5.9891) grad_norm 1.3023 (1.6172) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][210/625] eta 0:01:48 lr 0.001746 wd 0.0500 time 0.2549 (0.2621) data time 0.0009 (0.0031) model time 0.2540 (0.2579) loss 6.4363 (5.9764) grad_norm 1.3878 (1.5993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][220/625] eta 0:01:46 lr 0.001746 wd 0.0500 time 0.2594 (0.2618) data time 0.0008 (0.0030) model time 0.2586 (0.2578) loss 5.0959 (5.9582) grad_norm 1.5672 (1.5891) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][230/625] eta 0:01:43 lr 0.001745 wd 0.0500 time 0.2567 (0.2617) data time 0.0007 (0.0029) model time 0.2560 (0.2578) loss 6.6145 (5.9693) grad_norm 2.0594 (1.5892) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][240/625] eta 0:01:40 lr 0.001745 wd 0.0500 time 0.2580 (0.2615) data time 0.0011 (0.0028) model time 0.2569 (0.2577) loss 5.4219 (5.9764) grad_norm 2.2170 (1.5886) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][250/625] eta 0:01:37 lr 0.001745 wd 0.0500 time 0.2561 (0.2613) data time 0.0008 (0.0028) model time 0.2553 (0.2576) loss 4.7746 (5.9661) grad_norm 2.8513 (1.5997) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][260/625] eta 0:01:35 lr 0.001745 wd 0.0500 time 0.2525 (0.2611) data time 0.0006 (0.0027) model time 0.2518 (0.2575) loss 5.4031 (5.9539) grad_norm 1.3161 (1.5979) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][270/625] eta 0:01:32 lr 0.001745 wd 0.0500 time 0.2535 (0.2610) data time 0.0009 (0.0026) model time 0.2526 (0.2575) loss 6.7793 (5.9691) grad_norm 2.1814 (1.5989) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][280/625] eta 0:01:29 lr 0.001745 wd 0.0500 time 0.2575 (0.2608) data time 0.0009 (0.0026) model time 0.2566 (0.2574) loss 5.6971 (5.9560) grad_norm 1.2979 (1.6119) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][290/625] eta 0:01:27 lr 0.001745 wd 0.0500 time 0.2576 (0.2607) data time 0.0009 (0.0025) model time 0.2567 (0.2573) loss 6.6205 (5.9571) grad_norm 1.5316 (1.6096) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][300/625] eta 0:01:24 lr 0.001745 wd 0.0500 time 0.2587 (0.2605) data time 0.0007 (0.0025) model time 0.2580 (0.2572) loss 6.2877 (5.9757) grad_norm 1.1037 (1.6043) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][310/625] eta 0:01:22 lr 0.001745 wd 0.0500 time 0.2594 (0.2603) data time 0.0009 (0.0024) model time 0.2584 (0.2571) loss 5.9818 (5.9765) grad_norm 1.7619 (1.6016) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][320/625] eta 0:01:19 lr 0.001744 wd 0.0500 time 0.2557 (0.2602) data time 0.0010 (0.0024) model time 0.2547 (0.2571) loss 5.9256 (5.9760) grad_norm 1.2223 (1.6040) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][330/625] eta 0:01:16 lr 0.001744 wd 0.0500 time 0.2563 (0.2602) data time 0.0010 (0.0023) model time 0.2553 (0.2571) loss 6.9353 (5.9796) grad_norm 1.2099 (1.5975) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][340/625] eta 0:01:14 lr 0.001744 wd 0.0500 time 0.2608 (0.2601) data time 0.0008 (0.0023) model time 0.2601 (0.2571) loss 6.7167 (5.9873) grad_norm 1.7396 (1.6000) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][350/625] eta 0:01:11 lr 0.001744 wd 0.0500 time 0.2547 (0.2600) data time 0.0008 (0.0023) model time 0.2539 (0.2570) loss 5.6604 (5.9834) grad_norm 2.1147 (1.5981) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][360/625] eta 0:01:08 lr 0.001744 wd 0.0500 time 0.2610 (0.2600) data time 0.0009 (0.0022) model time 0.2601 (0.2570) loss 5.3607 (5.9814) grad_norm 1.3262 (1.5912) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][370/625] eta 0:01:06 lr 0.001744 wd 0.0500 time 0.2486 (0.2599) data time 0.0008 (0.0022) model time 0.2478 (0.2571) loss 4.8889 (5.9696) grad_norm 1.9032 (1.5923) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][380/625] eta 0:01:03 lr 0.001744 wd 0.0500 time 0.2579 (0.2598) data time 0.0006 (0.0022) model time 0.2573 (0.2570) loss 4.3334 (5.9647) grad_norm 1.2314 (1.5960) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][390/625] eta 0:01:01 lr 0.001744 wd 0.0500 time 0.2569 (0.2598) data time 0.0010 (0.0021) model time 0.2559 (0.2570) loss 6.8537 (5.9576) grad_norm 1.1143 (1.5942) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][400/625] eta 0:00:58 lr 0.001743 wd 0.0500 time 0.2626 (0.2597) data time 0.0008 (0.0021) model time 0.2619 (0.2570) loss 4.8666 (5.9509) grad_norm 1.6033 (1.5955) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][410/625] eta 0:00:55 lr 0.001743 wd 0.0500 time 0.2605 (0.2597) data time 0.0007 (0.0021) model time 0.2598 (0.2570) loss 6.5272 (5.9479) grad_norm 1.9317 (1.5974) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][420/625] eta 0:00:53 lr 0.001743 wd 0.0500 time 0.2569 (0.2597) data time 0.0007 (0.0020) model time 0.2562 (0.2571) loss 5.3038 (5.9510) grad_norm 1.1911 (1.5956) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][430/625] eta 0:00:50 lr 0.001743 wd 0.0500 time 0.2554 (0.2597) data time 0.0007 (0.0020) model time 0.2546 (0.2571) loss 5.8281 (5.9536) grad_norm 1.0936 (1.5904) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][440/625] eta 0:00:48 lr 0.001743 wd 0.0500 time 0.2543 (0.2597) data time 0.0009 (0.0020) model time 0.2534 (0.2571) loss 5.0379 (5.9463) grad_norm 2.2132 (1.5906) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][450/625] eta 0:00:45 lr 0.001743 wd 0.0500 time 0.2588 (0.2596) data time 0.0010 (0.0020) model time 0.2578 (0.2571) loss 4.3918 (5.9456) grad_norm 4.0819 (1.6008) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][460/625] eta 0:00:42 lr 0.001743 wd 0.0500 time 0.2601 (0.2595) data time 0.0006 (0.0019) model time 0.2595 (0.2570) loss 7.4194 (5.9468) grad_norm 1.4350 (1.6032) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][470/625] eta 0:00:40 lr 0.001743 wd 0.0500 time 0.2576 (0.2595) data time 0.0007 (0.0019) model time 0.2569 (0.2570) loss 6.3488 (5.9528) grad_norm 1.4965 (1.6022) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][480/625] eta 0:00:37 lr 0.001742 wd 0.0500 time 0.2553 (0.2595) data time 0.0009 (0.0019) model time 0.2545 (0.2570) loss 6.5308 (5.9522) grad_norm 2.1390 (1.6062) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][490/625] eta 0:00:35 lr 0.001742 wd 0.0500 time 0.2565 (0.2594) data time 0.0008 (0.0019) model time 0.2557 (0.2570) loss 5.6225 (5.9582) grad_norm 2.9120 (1.6059) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][500/625] eta 0:00:32 lr 0.001742 wd 0.0500 time 0.2545 (0.2594) data time 0.0011 (0.0019) model time 0.2534 (0.2570) loss 6.6578 (5.9604) grad_norm 1.7400 (1.6152) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][510/625] eta 0:00:29 lr 0.001742 wd 0.0500 time 0.2572 (0.2593) data time 0.0007 (0.0019) model time 0.2565 (0.2569) loss 4.8193 (5.9499) grad_norm 1.3955 (1.6219) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][520/625] eta 0:00:27 lr 0.001742 wd 0.0500 time 0.2576 (0.2593) data time 0.0008 (0.0018) model time 0.2569 (0.2569) loss 6.9679 (5.9468) grad_norm 1.2959 (1.6215) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][530/625] eta 0:00:24 lr 0.001742 wd 0.0500 time 0.2580 (0.2592) data time 0.0008 (0.0018) model time 0.2572 (0.2569) loss 6.3265 (5.9508) grad_norm 1.3269 (1.6215) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][540/625] eta 0:00:22 lr 0.001742 wd 0.0500 time 0.2563 (0.2592) data time 0.0007 (0.0018) model time 0.2556 (0.2569) loss 6.5438 (5.9519) grad_norm 2.1515 (1.6258) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][550/625] eta 0:00:19 lr 0.001742 wd 0.0500 time 0.2588 (0.2594) data time 0.0008 (0.0018) model time 0.2580 (0.2572) loss 6.2280 (5.9533) grad_norm 2.3539 (1.6395) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][560/625] eta 0:00:16 lr 0.001742 wd 0.0500 time 0.2547 (0.2594) data time 0.0008 (0.0018) model time 0.2540 (0.2571) loss 5.5195 (5.9517) grad_norm 2.0412 (1.6521) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][570/625] eta 0:00:14 lr 0.001741 wd 0.0500 time 0.2543 (0.2593) data time 0.0010 (0.0018) model time 0.2533 (0.2571) loss 6.7484 (5.9540) grad_norm 1.6739 (1.6569) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][580/625] eta 0:00:11 lr 0.001741 wd 0.0500 time 0.2569 (0.2593) data time 0.0007 (0.0018) model time 0.2563 (0.2571) loss 6.0831 (5.9534) grad_norm 1.3481 (1.6602) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][590/625] eta 0:00:09 lr 0.001741 wd 0.0500 time 0.2558 (0.2593) data time 0.0007 (0.0018) model time 0.2551 (0.2571) loss 5.6820 (5.9477) grad_norm 1.3398 (1.6570) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][600/625] eta 0:00:06 lr 0.001741 wd 0.0500 time 0.2580 (0.2592) data time 0.0006 (0.0017) model time 0.2574 (0.2570) loss 6.1421 (5.9530) grad_norm 1.6073 (1.6526) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][610/625] eta 0:00:03 lr 0.001741 wd 0.0500 time 0.2526 (0.2592) data time 0.0006 (0.0017) model time 0.2520 (0.2570) loss 7.7420 (5.9578) grad_norm 1.5383 (1.6514) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][620/625] eta 0:00:01 lr 0.001741 wd 0.0500 time 0.2538 (0.2591) data time 0.0006 (0.0017) model time 0.2532 (0.2569) loss 5.1714 (5.9589) grad_norm 2.5662 (1.6483) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 85 training takes 0:02:41 [2024-07-30 07:27:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:27:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.694 (0.694) Loss 0.6436 (0.6436) Acc@1 86.914 (86.914) Acc@5 97.559 (97.559) Mem 9655MB [2024-07-30 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.061 (0.116) Loss 1.0469 (0.8014) Acc@1 76.172 (82.298) Acc@5 93.896 (96.302) Mem 9655MB [2024-07-30 07:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.087) Loss 1.1855 (0.9564) Acc@1 71.729 (78.437) Acc@5 91.992 (94.420) Mem 9655MB [2024-07-30 07:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.101 Acc@5 94.366 [2024-07-30 07:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-30 07:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.10% [2024-07-30 07:27:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 07:27:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 07:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.406 (0.406) Loss 0.5581 (0.5581) Acc@1 87.500 (87.500) Acc@5 98.193 (98.193) Mem 9655MB [2024-07-30 07:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 0.9551 (0.7178) Acc@1 76.953 (83.456) Acc@5 94.141 (96.791) Mem 9655MB [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.1152 (0.8688) Acc@1 72.168 (79.622) Acc@5 92.236 (94.994) Mem 9655MB [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.347 Acc@5 94.962 [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.35% [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:28:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:28:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][0/625] eta 0:07:52 lr 0.001741 wd 0.0500 time 0.7558 (0.7558) data time 0.5028 (0.5028) model time 0.0000 (0.0000) loss 7.0230 (7.0230) grad_norm 1.2596 (1.2596) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][10/625] eta 0:03:04 lr 0.001741 wd 0.0500 time 0.2551 (0.3006) data time 0.0009 (0.0465) model time 0.0000 (0.0000) loss 5.5562 (5.7052) grad_norm 1.0824 (1.2794) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][20/625] eta 0:02:48 lr 0.001740 wd 0.0500 time 0.2547 (0.2791) data time 0.0007 (0.0248) model time 0.0000 (0.0000) loss 4.5235 (5.8897) grad_norm 1.5038 (1.4036) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][30/625] eta 0:02:41 lr 0.001740 wd 0.0500 time 0.2570 (0.2713) data time 0.0006 (0.0171) model time 0.0000 (0.0000) loss 4.9617 (5.7620) grad_norm 2.0084 (1.5298) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][40/625] eta 0:02:36 lr 0.001740 wd 0.0500 time 0.2564 (0.2676) data time 0.0006 (0.0131) model time 0.0000 (0.0000) loss 5.7749 (5.8146) grad_norm 1.5204 (1.5383) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][50/625] eta 0:02:32 lr 0.001740 wd 0.0500 time 0.2575 (0.2654) data time 0.0007 (0.0107) model time 0.0000 (0.0000) loss 7.8220 (5.8895) grad_norm 1.4133 (1.5392) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][60/625] eta 0:02:29 lr 0.001740 wd 0.0500 time 0.2544 (0.2637) data time 0.0008 (0.0091) model time 0.2536 (0.2544) loss 6.7319 (5.9504) grad_norm 1.3278 (1.5993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][70/625] eta 0:02:26 lr 0.001740 wd 0.0500 time 0.2547 (0.2631) data time 0.0009 (0.0080) model time 0.2538 (0.2564) loss 5.3401 (5.9616) grad_norm 1.5306 (1.5869) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][80/625] eta 0:02:22 lr 0.001740 wd 0.0500 time 0.2578 (0.2622) data time 0.0008 (0.0071) model time 0.2570 (0.2559) loss 5.9650 (5.9750) grad_norm 1.4076 (1.5970) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][90/625] eta 0:02:19 lr 0.001740 wd 0.0500 time 0.2566 (0.2616) data time 0.0008 (0.0064) model time 0.2558 (0.2559) loss 6.5741 (5.9760) grad_norm 1.4703 (1.5814) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][100/625] eta 0:02:17 lr 0.001740 wd 0.0500 time 0.2589 (0.2612) data time 0.0008 (0.0059) model time 0.2581 (0.2560) loss 6.2648 (5.9533) grad_norm 1.7580 (1.5709) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][110/625] eta 0:02:14 lr 0.001739 wd 0.0500 time 0.2539 (0.2608) data time 0.0008 (0.0054) model time 0.2531 (0.2559) loss 6.8366 (5.9989) grad_norm 1.2873 (1.5859) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][120/625] eta 0:02:11 lr 0.001739 wd 0.0500 time 0.2524 (0.2604) data time 0.0008 (0.0051) model time 0.2516 (0.2559) loss 6.2292 (6.0453) grad_norm 1.4285 (1.5794) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][130/625] eta 0:02:08 lr 0.001739 wd 0.0500 time 0.2546 (0.2602) data time 0.0006 (0.0048) model time 0.2540 (0.2559) loss 4.6800 (6.0375) grad_norm 1.1042 (1.5765) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][140/625] eta 0:02:06 lr 0.001739 wd 0.0500 time 0.2563 (0.2599) data time 0.0007 (0.0045) model time 0.2556 (0.2558) loss 6.6151 (6.0731) grad_norm 1.8203 (1.5744) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][150/625] eta 0:02:03 lr 0.001739 wd 0.0500 time 0.2523 (0.2596) data time 0.0009 (0.0042) model time 0.2514 (0.2557) loss 6.1354 (6.0598) grad_norm 2.8037 (1.5993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 07:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][160/625] eta 0:02:00 lr 0.001739 wd 0.0500 time 0.2605 (0.2594) data time 0.0008 (0.0040) model time 0.2597 (0.2556) loss 6.6743 (6.0515) grad_norm 1.1028 (inf) loss_scale 2048.0000 (4006.9565) mem 9655MB [2024-07-30 07:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][170/625] eta 0:01:57 lr 0.001739 wd 0.0500 time 0.2564 (0.2592) data time 0.0009 (0.0039) model time 0.2556 (0.2557) loss 6.6675 (6.0392) grad_norm 1.8258 (inf) loss_scale 2048.0000 (3892.3977) mem 9655MB [2024-07-30 07:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][180/625] eta 0:01:55 lr 0.001739 wd 0.0500 time 0.2548 (0.2590) data time 0.0008 (0.0037) model time 0.2540 (0.2556) loss 5.3664 (6.0516) grad_norm 1.6099 (inf) loss_scale 2048.0000 (3790.4972) mem 9655MB [2024-07-30 07:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][190/625] eta 0:01:52 lr 0.001738 wd 0.0500 time 0.2526 (0.2589) data time 0.0006 (0.0035) model time 0.2520 (0.2556) loss 5.0881 (6.0271) grad_norm 2.6625 (inf) loss_scale 2048.0000 (3699.2670) mem 9655MB [2024-07-30 07:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][200/625] eta 0:01:49 lr 0.001738 wd 0.0500 time 0.2607 (0.2588) data time 0.0007 (0.0034) model time 0.2601 (0.2556) loss 6.9238 (6.0421) grad_norm 2.0871 (inf) loss_scale 2048.0000 (3617.1144) mem 9655MB [2024-07-30 07:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][210/625] eta 0:01:47 lr 0.001738 wd 0.0500 time 0.2528 (0.2587) data time 0.0007 (0.0033) model time 0.2521 (0.2556) loss 6.0280 (6.0461) grad_norm 1.8754 (inf) loss_scale 2048.0000 (3542.7488) mem 9655MB [2024-07-30 07:28:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][220/625] eta 0:01:44 lr 0.001738 wd 0.0500 time 0.2558 (0.2586) data time 0.0009 (0.0032) model time 0.2549 (0.2556) loss 7.3855 (6.0679) grad_norm 1.6179 (inf) loss_scale 2048.0000 (3475.1131) mem 9655MB [2024-07-30 07:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][230/625] eta 0:01:42 lr 0.001738 wd 0.0500 time 0.2566 (0.2584) data time 0.0006 (0.0031) model time 0.2560 (0.2556) loss 6.3936 (6.0706) grad_norm 1.5045 (inf) loss_scale 2048.0000 (3413.3333) mem 9655MB [2024-07-30 07:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][240/625] eta 0:01:39 lr 0.001738 wd 0.0500 time 0.2616 (0.2584) data time 0.0010 (0.0030) model time 0.2606 (0.2556) loss 5.0997 (6.0704) grad_norm 2.9774 (inf) loss_scale 2048.0000 (3356.6805) mem 9655MB [2024-07-30 07:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][250/625] eta 0:01:36 lr 0.001738 wd 0.0500 time 0.2547 (0.2584) data time 0.0009 (0.0029) model time 0.2539 (0.2556) loss 4.5562 (6.0530) grad_norm 1.2340 (inf) loss_scale 2048.0000 (3304.5418) mem 9655MB [2024-07-30 07:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][260/625] eta 0:01:34 lr 0.001738 wd 0.0500 time 0.2595 (0.2584) data time 0.0007 (0.0029) model time 0.2588 (0.2557) loss 6.8871 (6.0351) grad_norm 0.9314 (inf) loss_scale 2048.0000 (3256.3985) mem 9655MB [2024-07-30 07:29:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][270/625] eta 0:01:31 lr 0.001737 wd 0.0500 time 0.2567 (0.2583) data time 0.0009 (0.0028) model time 0.2558 (0.2557) loss 4.5267 (6.0354) grad_norm 1.2135 (inf) loss_scale 2048.0000 (3211.8081) mem 9655MB [2024-07-30 07:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][280/625] eta 0:01:29 lr 0.001737 wd 0.0500 time 0.2609 (0.2583) data time 0.0006 (0.0027) model time 0.2603 (0.2558) loss 5.8870 (6.0397) grad_norm 3.7868 (inf) loss_scale 2048.0000 (3170.3915) mem 9655MB [2024-07-30 07:29:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][290/625] eta 0:01:26 lr 0.001737 wd 0.0500 time 0.2553 (0.2582) data time 0.0006 (0.0027) model time 0.2546 (0.2558) loss 5.3887 (6.0467) grad_norm 1.8423 (inf) loss_scale 2048.0000 (3131.8213) mem 9655MB [2024-07-30 07:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][300/625] eta 0:01:24 lr 0.001737 wd 0.0500 time 0.2597 (0.2589) data time 0.0009 (0.0026) model time 0.2588 (0.2566) loss 6.5293 (6.0595) grad_norm 1.2571 (inf) loss_scale 2048.0000 (3095.8140) mem 9655MB [2024-07-30 07:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][310/625] eta 0:01:21 lr 0.001737 wd 0.0500 time 0.2550 (0.2588) data time 0.0006 (0.0026) model time 0.2543 (0.2566) loss 6.4687 (6.0744) grad_norm 1.3587 (inf) loss_scale 2048.0000 (3062.1222) mem 9655MB [2024-07-30 07:29:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][320/625] eta 0:01:18 lr 0.001737 wd 0.0500 time 0.2599 (0.2588) data time 0.0007 (0.0025) model time 0.2592 (0.2567) loss 6.4036 (6.0726) grad_norm 2.9870 (inf) loss_scale 2048.0000 (3030.5296) mem 9655MB [2024-07-30 07:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][330/625] eta 0:01:16 lr 0.001737 wd 0.0500 time 0.2568 (0.2588) data time 0.0008 (0.0025) model time 0.2560 (0.2566) loss 6.4846 (6.0697) grad_norm 1.6300 (inf) loss_scale 2048.0000 (3000.8459) mem 9655MB [2024-07-30 07:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][340/625] eta 0:01:13 lr 0.001737 wd 0.0500 time 0.2573 (0.2587) data time 0.0009 (0.0024) model time 0.2564 (0.2566) loss 6.3459 (6.0792) grad_norm 1.4995 (inf) loss_scale 2048.0000 (2972.9032) mem 9655MB [2024-07-30 07:29:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][350/625] eta 0:01:11 lr 0.001737 wd 0.0500 time 0.2559 (0.2587) data time 0.0008 (0.0024) model time 0.2551 (0.2566) loss 5.6213 (6.0829) grad_norm 2.3322 (inf) loss_scale 2048.0000 (2946.5527) mem 9655MB [2024-07-30 07:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][360/625] eta 0:01:08 lr 0.001736 wd 0.0500 time 0.2550 (0.2586) data time 0.0009 (0.0023) model time 0.2541 (0.2566) loss 5.5854 (6.0760) grad_norm 1.2529 (inf) loss_scale 2048.0000 (2921.6620) mem 9655MB [2024-07-30 07:29:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][370/625] eta 0:01:05 lr 0.001736 wd 0.0500 time 0.2544 (0.2586) data time 0.0010 (0.0023) model time 0.2534 (0.2565) loss 7.1072 (6.0766) grad_norm 1.2541 (inf) loss_scale 2048.0000 (2898.1132) mem 9655MB [2024-07-30 07:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][380/625] eta 0:01:03 lr 0.001736 wd 0.0500 time 0.2608 (0.2586) data time 0.0007 (0.0023) model time 0.2601 (0.2566) loss 5.6124 (6.0795) grad_norm 1.3839 (inf) loss_scale 2048.0000 (2875.8005) mem 9655MB [2024-07-30 07:29:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][390/625] eta 0:01:00 lr 0.001736 wd 0.0500 time 0.2582 (0.2586) data time 0.0007 (0.0022) model time 0.2575 (0.2566) loss 5.9348 (6.0805) grad_norm 2.4119 (inf) loss_scale 2048.0000 (2854.6292) mem 9655MB [2024-07-30 07:29:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][400/625] eta 0:00:58 lr 0.001736 wd 0.0500 time 0.2531 (0.2585) data time 0.0010 (0.0022) model time 0.2522 (0.2565) loss 6.6992 (6.0918) grad_norm 1.5478 (inf) loss_scale 2048.0000 (2834.5137) mem 9655MB [2024-07-30 07:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][410/625] eta 0:00:55 lr 0.001736 wd 0.0500 time 0.2570 (0.2585) data time 0.0016 (0.0022) model time 0.2554 (0.2565) loss 4.4410 (6.0847) grad_norm 1.6492 (inf) loss_scale 2048.0000 (2815.3771) mem 9655MB [2024-07-30 07:29:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][420/625] eta 0:00:52 lr 0.001736 wd 0.0500 time 0.2571 (0.2584) data time 0.0010 (0.0021) model time 0.2561 (0.2565) loss 6.8330 (6.0668) grad_norm 1.3878 (inf) loss_scale 2048.0000 (2797.1496) mem 9655MB [2024-07-30 07:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][430/625] eta 0:00:50 lr 0.001736 wd 0.0500 time 0.2592 (0.2584) data time 0.0006 (0.0021) model time 0.2586 (0.2565) loss 5.1503 (6.0501) grad_norm 1.3724 (inf) loss_scale 2048.0000 (2779.7680) mem 9655MB [2024-07-30 07:29:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][440/625] eta 0:00:47 lr 0.001735 wd 0.0500 time 0.3942 (0.2587) data time 0.0009 (0.0021) model time 0.3934 (0.2569) loss 6.4606 (6.0543) grad_norm 2.3864 (inf) loss_scale 2048.0000 (2763.1746) mem 9655MB [2024-07-30 07:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][450/625] eta 0:00:45 lr 0.001735 wd 0.0500 time 0.2556 (0.2586) data time 0.0007 (0.0021) model time 0.2549 (0.2568) loss 7.0186 (6.0596) grad_norm 3.4438 (inf) loss_scale 2048.0000 (2747.3171) mem 9655MB [2024-07-30 07:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][460/625] eta 0:00:42 lr 0.001735 wd 0.0500 time 0.2645 (0.2586) data time 0.0008 (0.0020) model time 0.2637 (0.2568) loss 5.9919 (6.0607) grad_norm 1.9628 (inf) loss_scale 2048.0000 (2732.1475) mem 9655MB [2024-07-30 07:30:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][470/625] eta 0:00:40 lr 0.001735 wd 0.0500 time 0.2610 (0.2586) data time 0.0010 (0.0020) model time 0.2601 (0.2568) loss 6.1387 (6.0562) grad_norm 1.2191 (inf) loss_scale 2048.0000 (2717.6221) mem 9655MB [2024-07-30 07:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][480/625] eta 0:00:37 lr 0.001735 wd 0.0500 time 0.2478 (0.2586) data time 0.0010 (0.0020) model time 0.2468 (0.2568) loss 5.3902 (6.0518) grad_norm 1.3638 (inf) loss_scale 2048.0000 (2703.7006) mem 9655MB [2024-07-30 07:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][490/625] eta 0:00:34 lr 0.001735 wd 0.0500 time 0.2535 (0.2585) data time 0.0007 (0.0020) model time 0.2528 (0.2567) loss 6.7554 (6.0505) grad_norm 2.3564 (inf) loss_scale 2048.0000 (2690.3462) mem 9655MB [2024-07-30 07:30:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][500/625] eta 0:00:32 lr 0.001735 wd 0.0500 time 0.2577 (0.2585) data time 0.0007 (0.0020) model time 0.2570 (0.2567) loss 6.6663 (6.0464) grad_norm 1.4863 (inf) loss_scale 2048.0000 (2677.5250) mem 9655MB [2024-07-30 07:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][510/625] eta 0:00:29 lr 0.001735 wd 0.0500 time 0.2544 (0.2584) data time 0.0006 (0.0019) model time 0.2538 (0.2567) loss 6.5583 (6.0458) grad_norm 1.2400 (inf) loss_scale 2048.0000 (2665.2055) mem 9655MB [2024-07-30 07:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][520/625] eta 0:00:27 lr 0.001734 wd 0.0500 time 0.2552 (0.2585) data time 0.0006 (0.0019) model time 0.2546 (0.2568) loss 6.2066 (6.0379) grad_norm 1.5276 (inf) loss_scale 2048.0000 (2653.3589) mem 9655MB [2024-07-30 07:30:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][530/625] eta 0:00:24 lr 0.001734 wd 0.0500 time 0.2591 (0.2585) data time 0.0008 (0.0019) model time 0.2584 (0.2568) loss 6.1331 (6.0447) grad_norm 1.0729 (inf) loss_scale 2048.0000 (2641.9586) mem 9655MB [2024-07-30 07:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][540/625] eta 0:00:21 lr 0.001734 wd 0.0500 time 0.2703 (0.2585) data time 0.0009 (0.0019) model time 0.2694 (0.2568) loss 6.8464 (6.0546) grad_norm 1.1469 (inf) loss_scale 2048.0000 (2630.9797) mem 9655MB [2024-07-30 07:30:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][550/625] eta 0:00:19 lr 0.001734 wd 0.0500 time 0.2573 (0.2586) data time 0.0007 (0.0019) model time 0.2566 (0.2569) loss 5.8575 (6.0511) grad_norm 2.0288 (inf) loss_scale 2048.0000 (2620.3993) mem 9655MB [2024-07-30 07:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][560/625] eta 0:00:16 lr 0.001734 wd 0.0500 time 0.2613 (0.2585) data time 0.0005 (0.0019) model time 0.2608 (0.2569) loss 4.5029 (6.0489) grad_norm 2.3957 (inf) loss_scale 2048.0000 (2610.1961) mem 9655MB [2024-07-30 07:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][570/625] eta 0:00:14 lr 0.001734 wd 0.0500 time 0.2537 (0.2585) data time 0.0009 (0.0018) model time 0.2528 (0.2568) loss 5.9689 (6.0548) grad_norm 1.3528 (inf) loss_scale 2048.0000 (2600.3503) mem 9655MB [2024-07-30 07:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][580/625] eta 0:00:11 lr 0.001734 wd 0.0500 time 0.2573 (0.2585) data time 0.0007 (0.0018) model time 0.2565 (0.2568) loss 6.5660 (6.0507) grad_norm 1.0411 (inf) loss_scale 2048.0000 (2590.8434) mem 9655MB [2024-07-30 07:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][590/625] eta 0:00:09 lr 0.001734 wd 0.0500 time 0.2557 (0.2584) data time 0.0008 (0.0018) model time 0.2549 (0.2568) loss 5.0113 (6.0392) grad_norm 1.2718 (inf) loss_scale 2048.0000 (2581.6582) mem 9655MB [2024-07-30 07:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][600/625] eta 0:00:06 lr 0.001734 wd 0.0500 time 0.2569 (0.2584) data time 0.0007 (0.0018) model time 0.2562 (0.2568) loss 6.4216 (6.0349) grad_norm 1.3443 (inf) loss_scale 2048.0000 (2572.7787) mem 9655MB [2024-07-30 07:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][610/625] eta 0:00:03 lr 0.001733 wd 0.0500 time 0.2538 (0.2583) data time 0.0004 (0.0018) model time 0.2534 (0.2567) loss 6.5679 (6.0393) grad_norm 1.1539 (inf) loss_scale 2048.0000 (2564.1899) mem 9655MB [2024-07-30 07:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][620/625] eta 0:00:01 lr 0.001733 wd 0.0500 time 0.2545 (0.2583) data time 0.0005 (0.0018) model time 0.2540 (0.2567) loss 7.4877 (6.0413) grad_norm 3.2726 (inf) loss_scale 2048.0000 (2555.8776) mem 9655MB [2024-07-30 07:30:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 86 training takes 0:02:41 [2024-07-30 07:30:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:30:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.641 (0.641) Loss 0.6763 (0.6763) Acc@1 85.400 (85.400) Acc@5 97.852 (97.852) Mem 9655MB [2024-07-30 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.119) Loss 1.1270 (0.8258) Acc@1 75.391 (82.160) Acc@5 92.676 (96.351) Mem 9655MB [2024-07-30 07:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.088) Loss 1.2451 (0.9950) Acc@1 71.631 (78.144) Acc@5 91.455 (94.327) Mem 9655MB [2024-07-30 07:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.893 Acc@5 94.354 [2024-07-30 07:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-07-30 07:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.891 (0.891) Loss 0.5591 (0.5591) Acc@1 87.598 (87.598) Acc@5 98.193 (98.193) Mem 9655MB [2024-07-30 07:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.150) Loss 0.9546 (0.7173) Acc@1 77.051 (83.492) Acc@5 94.141 (96.826) Mem 9655MB [2024-07-30 07:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.105) Loss 1.1113 (0.8680) Acc@1 72.314 (79.653) Acc@5 92.285 (95.036) Mem 9655MB [2024-07-30 07:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.381 Acc@5 95.012 [2024-07-30 07:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-30 07:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.38% [2024-07-30 07:30:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:30:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][0/625] eta 0:06:20 lr 0.001733 wd 0.0500 time 0.6084 (0.6084) data time 0.3683 (0.3683) model time 0.0000 (0.0000) loss 5.2253 (5.2253) grad_norm 2.1294 (2.1294) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][10/625] eta 0:02:56 lr 0.001733 wd 0.0500 time 0.2549 (0.2875) data time 0.0008 (0.0343) model time 0.0000 (0.0000) loss 7.3893 (6.2206) grad_norm 1.0883 (1.5704) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][20/625] eta 0:02:45 lr 0.001733 wd 0.0500 time 0.2566 (0.2734) data time 0.0008 (0.0184) model time 0.0000 (0.0000) loss 5.5904 (6.1230) grad_norm 1.6386 (1.4988) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][30/625] eta 0:02:39 lr 0.001733 wd 0.0500 time 0.2565 (0.2679) data time 0.0010 (0.0127) model time 0.0000 (0.0000) loss 6.0957 (6.0630) grad_norm 1.4099 (1.5075) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][40/625] eta 0:02:35 lr 0.001733 wd 0.0500 time 0.2626 (0.2652) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 5.9475 (6.0598) grad_norm 1.1580 (1.5084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][50/625] eta 0:02:31 lr 0.001733 wd 0.0500 time 0.2599 (0.2634) data time 0.0008 (0.0081) model time 0.0000 (0.0000) loss 6.2269 (6.0433) grad_norm 2.6722 (1.6032) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][60/625] eta 0:02:28 lr 0.001732 wd 0.0500 time 0.2547 (0.2622) data time 0.0010 (0.0069) model time 0.2537 (0.2550) loss 4.9449 (6.0086) grad_norm 1.1449 (1.5985) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][70/625] eta 0:02:25 lr 0.001732 wd 0.0500 time 0.2694 (0.2615) data time 0.0007 (0.0061) model time 0.2686 (0.2555) loss 6.6000 (5.9810) grad_norm 1.6068 (1.6064) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][80/625] eta 0:02:22 lr 0.001732 wd 0.0500 time 0.2531 (0.2608) data time 0.0006 (0.0055) model time 0.2525 (0.2553) loss 5.4722 (6.0031) grad_norm 1.4476 (1.6209) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][90/625] eta 0:02:19 lr 0.001732 wd 0.0500 time 0.2518 (0.2603) data time 0.0011 (0.0050) model time 0.2507 (0.2552) loss 6.5072 (6.0051) grad_norm 2.3795 (1.6169) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][100/625] eta 0:02:16 lr 0.001732 wd 0.0500 time 0.2543 (0.2599) data time 0.0009 (0.0046) model time 0.2534 (0.2552) loss 6.6340 (5.9925) grad_norm 1.5631 (1.6046) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][110/625] eta 0:02:13 lr 0.001732 wd 0.0500 time 0.2542 (0.2595) data time 0.0008 (0.0043) model time 0.2534 (0.2552) loss 6.9745 (6.0422) grad_norm 1.4368 (1.5944) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][120/625] eta 0:02:10 lr 0.001732 wd 0.0500 time 0.2581 (0.2593) data time 0.0008 (0.0040) model time 0.2573 (0.2554) loss 5.5630 (6.0291) grad_norm 1.4055 (1.6049) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][130/625] eta 0:02:08 lr 0.001732 wd 0.0500 time 0.2678 (0.2592) data time 0.0009 (0.0037) model time 0.2669 (0.2555) loss 6.2701 (6.0200) grad_norm 1.1481 (1.6115) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][140/625] eta 0:02:05 lr 0.001731 wd 0.0500 time 0.2517 (0.2590) data time 0.0008 (0.0035) model time 0.2510 (0.2555) loss 7.0939 (6.0370) grad_norm 2.2098 (1.6420) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][150/625] eta 0:02:02 lr 0.001731 wd 0.0500 time 0.2620 (0.2589) data time 0.0009 (0.0034) model time 0.2611 (0.2556) loss 6.8441 (6.0333) grad_norm 1.6145 (1.6436) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][160/625] eta 0:02:00 lr 0.001731 wd 0.0500 time 0.2535 (0.2586) data time 0.0009 (0.0032) model time 0.2525 (0.2555) loss 6.3586 (6.0259) grad_norm 1.8900 (1.6361) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][170/625] eta 0:01:57 lr 0.001731 wd 0.0500 time 0.2546 (0.2586) data time 0.0009 (0.0031) model time 0.2537 (0.2556) loss 6.0158 (6.0244) grad_norm 1.2953 (1.6419) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][180/625] eta 0:01:55 lr 0.001731 wd 0.0500 time 0.2554 (0.2585) data time 0.0008 (0.0029) model time 0.2546 (0.2557) loss 7.2212 (6.0116) grad_norm 1.4870 (1.6622) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][190/625] eta 0:01:52 lr 0.001731 wd 0.0500 time 0.2572 (0.2584) data time 0.0010 (0.0028) model time 0.2563 (0.2556) loss 6.9746 (6.0260) grad_norm 1.3163 (1.6605) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][200/625] eta 0:01:49 lr 0.001731 wd 0.0500 time 0.2534 (0.2583) data time 0.0007 (0.0027) model time 0.2527 (0.2556) loss 4.9117 (6.0159) grad_norm 1.1314 (1.6506) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][210/625] eta 0:01:47 lr 0.001731 wd 0.0500 time 0.2655 (0.2583) data time 0.0008 (0.0027) model time 0.2647 (0.2558) loss 5.9252 (6.0032) grad_norm 2.2310 (1.6445) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][220/625] eta 0:01:44 lr 0.001731 wd 0.0500 time 0.2588 (0.2583) data time 0.0008 (0.0026) model time 0.2580 (0.2558) loss 5.4419 (6.0133) grad_norm 1.8854 (1.6456) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][230/625] eta 0:01:41 lr 0.001730 wd 0.0500 time 0.2559 (0.2581) data time 0.0007 (0.0025) model time 0.2552 (0.2557) loss 5.2136 (5.9896) grad_norm 1.7797 (1.6443) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][240/625] eta 0:01:39 lr 0.001730 wd 0.0500 time 0.2591 (0.2581) data time 0.0006 (0.0024) model time 0.2585 (0.2558) loss 6.0195 (5.9847) grad_norm 1.1931 (1.6407) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][250/625] eta 0:01:36 lr 0.001730 wd 0.0500 time 0.2561 (0.2580) data time 0.0009 (0.0024) model time 0.2551 (0.2557) loss 5.5784 (5.9673) grad_norm 2.7020 (1.6464) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][260/625] eta 0:01:34 lr 0.001730 wd 0.0500 time 0.2576 (0.2580) data time 0.0009 (0.0023) model time 0.2567 (0.2557) loss 7.0146 (5.9615) grad_norm 2.1957 (1.6531) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][270/625] eta 0:01:31 lr 0.001730 wd 0.0500 time 0.2662 (0.2579) data time 0.0006 (0.0023) model time 0.2656 (0.2557) loss 5.1537 (5.9566) grad_norm 1.4342 (1.6461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][280/625] eta 0:01:28 lr 0.001730 wd 0.0500 time 0.2576 (0.2579) data time 0.0008 (0.0022) model time 0.2567 (0.2557) loss 6.4101 (5.9424) grad_norm 1.1752 (1.6369) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][290/625] eta 0:01:26 lr 0.001730 wd 0.0500 time 0.2607 (0.2578) data time 0.0007 (0.0022) model time 0.2600 (0.2557) loss 4.5207 (5.9472) grad_norm 1.5328 (1.6379) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][300/625] eta 0:01:23 lr 0.001730 wd 0.0500 time 0.2591 (0.2578) data time 0.0006 (0.0021) model time 0.2585 (0.2557) loss 5.5675 (5.9468) grad_norm 1.5920 (1.6317) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][310/625] eta 0:01:21 lr 0.001729 wd 0.0500 time 0.2542 (0.2577) data time 0.0008 (0.0021) model time 0.2534 (0.2557) loss 5.9128 (5.9431) grad_norm 2.2753 (1.6429) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][320/625] eta 0:01:18 lr 0.001729 wd 0.0500 time 0.2563 (0.2577) data time 0.0008 (0.0020) model time 0.2555 (0.2557) loss 5.1095 (5.9504) grad_norm 1.4236 (1.6429) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][330/625] eta 0:01:16 lr 0.001729 wd 0.0500 time 0.2514 (0.2576) data time 0.0009 (0.0020) model time 0.2505 (0.2557) loss 6.1460 (5.9558) grad_norm 0.9985 (1.6366) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][340/625] eta 0:01:13 lr 0.001729 wd 0.0500 time 0.2603 (0.2576) data time 0.0006 (0.0020) model time 0.2598 (0.2556) loss 6.7485 (5.9606) grad_norm 3.4205 (1.6524) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][350/625] eta 0:01:10 lr 0.001729 wd 0.0500 time 0.2566 (0.2576) data time 0.0007 (0.0020) model time 0.2558 (0.2557) loss 6.2593 (5.9715) grad_norm 1.5846 (1.6625) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][360/625] eta 0:01:08 lr 0.001729 wd 0.0500 time 0.2530 (0.2579) data time 0.0006 (0.0019) model time 0.2524 (0.2561) loss 6.0041 (5.9701) grad_norm 1.3334 (1.6677) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][370/625] eta 0:01:05 lr 0.001729 wd 0.0500 time 0.2551 (0.2579) data time 0.0008 (0.0019) model time 0.2544 (0.2561) loss 5.6212 (5.9597) grad_norm 2.3371 (1.6750) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][380/625] eta 0:01:03 lr 0.001729 wd 0.0500 time 0.2605 (0.2579) data time 0.0006 (0.0019) model time 0.2599 (0.2561) loss 5.6042 (5.9599) grad_norm 1.7578 (1.6693) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][390/625] eta 0:01:00 lr 0.001728 wd 0.0500 time 0.2610 (0.2579) data time 0.0008 (0.0018) model time 0.2603 (0.2561) loss 5.6246 (5.9653) grad_norm 1.7400 (1.6681) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][400/625] eta 0:00:58 lr 0.001728 wd 0.0500 time 0.2556 (0.2578) data time 0.0008 (0.0018) model time 0.2548 (0.2561) loss 5.4103 (5.9694) grad_norm 1.0871 (1.6599) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][410/625] eta 0:00:55 lr 0.001728 wd 0.0500 time 0.2538 (0.2578) data time 0.0008 (0.0018) model time 0.2531 (0.2561) loss 4.8367 (5.9683) grad_norm 1.7015 (1.6569) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][420/625] eta 0:00:52 lr 0.001728 wd 0.0500 time 0.2556 (0.2578) data time 0.0007 (0.0018) model time 0.2549 (0.2561) loss 5.2499 (5.9669) grad_norm 1.2544 (1.6537) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][430/625] eta 0:00:50 lr 0.001728 wd 0.0500 time 0.2580 (0.2579) data time 0.0006 (0.0018) model time 0.2573 (0.2562) loss 4.9169 (5.9618) grad_norm 1.0443 (1.6460) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][440/625] eta 0:00:47 lr 0.001728 wd 0.0500 time 0.2548 (0.2578) data time 0.0006 (0.0018) model time 0.2542 (0.2561) loss 6.2712 (5.9624) grad_norm 1.6624 (1.6411) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][450/625] eta 0:00:45 lr 0.001728 wd 0.0500 time 0.2554 (0.2578) data time 0.0009 (0.0017) model time 0.2545 (0.2561) loss 7.4369 (5.9762) grad_norm 1.2867 (1.6365) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][460/625] eta 0:00:42 lr 0.001728 wd 0.0500 time 0.2359 (0.2580) data time 0.0009 (0.0017) model time 0.2351 (0.2564) loss 4.8502 (5.9754) grad_norm 1.3465 (1.6535) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][470/625] eta 0:00:39 lr 0.001727 wd 0.0500 time 0.2678 (0.2580) data time 0.0006 (0.0017) model time 0.2672 (0.2564) loss 5.1522 (5.9813) grad_norm 1.4759 (1.6561) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][480/625] eta 0:00:37 lr 0.001727 wd 0.0500 time 0.2660 (0.2580) data time 0.0006 (0.0017) model time 0.2655 (0.2564) loss 6.5920 (5.9822) grad_norm 1.4472 (1.6552) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][490/625] eta 0:00:34 lr 0.001727 wd 0.0500 time 0.2548 (0.2580) data time 0.0009 (0.0017) model time 0.2539 (0.2564) loss 6.7852 (5.9842) grad_norm 1.4487 (1.6535) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][500/625] eta 0:00:32 lr 0.001727 wd 0.0500 time 0.2565 (0.2579) data time 0.0006 (0.0017) model time 0.2559 (0.2564) loss 6.4928 (5.9877) grad_norm 1.0192 (1.6482) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][510/625] eta 0:00:29 lr 0.001727 wd 0.0500 time 0.2525 (0.2579) data time 0.0009 (0.0016) model time 0.2516 (0.2564) loss 5.0293 (5.9803) grad_norm 1.3835 (1.6475) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][520/625] eta 0:00:27 lr 0.001727 wd 0.0500 time 0.2565 (0.2579) data time 0.0007 (0.0016) model time 0.2558 (0.2563) loss 6.9252 (5.9810) grad_norm 0.9778 (1.6487) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][530/625] eta 0:00:24 lr 0.001727 wd 0.0500 time 0.2568 (0.2578) data time 0.0010 (0.0016) model time 0.2558 (0.2563) loss 6.1976 (5.9731) grad_norm 1.9663 (1.6456) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][540/625] eta 0:00:21 lr 0.001727 wd 0.0500 time 0.2572 (0.2578) data time 0.0006 (0.0016) model time 0.2566 (0.2563) loss 6.4752 (5.9663) grad_norm 1.7029 (1.6465) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][550/625] eta 0:00:19 lr 0.001726 wd 0.0500 time 0.2578 (0.2578) data time 0.0009 (0.0016) model time 0.2569 (0.2563) loss 6.3295 (5.9638) grad_norm 2.8747 (1.6480) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][560/625] eta 0:00:16 lr 0.001726 wd 0.0500 time 0.2611 (0.2578) data time 0.0008 (0.0016) model time 0.2604 (0.2563) loss 5.9445 (5.9601) grad_norm 1.5183 (1.6438) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][570/625] eta 0:00:14 lr 0.001726 wd 0.0500 time 0.2615 (0.2577) data time 0.0009 (0.0016) model time 0.2607 (0.2563) loss 4.9790 (5.9597) grad_norm 1.7081 (1.6558) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][580/625] eta 0:00:11 lr 0.001726 wd 0.0500 time 0.2541 (0.2577) data time 0.0007 (0.0015) model time 0.2534 (0.2563) loss 4.7174 (5.9636) grad_norm 1.6913 (1.6668) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][590/625] eta 0:00:09 lr 0.001726 wd 0.0500 time 0.2522 (0.2577) data time 0.0008 (0.0015) model time 0.2514 (0.2563) loss 6.0997 (5.9640) grad_norm 1.3425 (1.6682) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][600/625] eta 0:00:06 lr 0.001726 wd 0.0500 time 0.2589 (0.2577) data time 0.0007 (0.0015) model time 0.2582 (0.2563) loss 7.7809 (5.9639) grad_norm 1.5693 (1.6658) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][610/625] eta 0:00:03 lr 0.001726 wd 0.0500 time 0.2551 (0.2577) data time 0.0006 (0.0015) model time 0.2545 (0.2563) loss 5.4612 (5.9650) grad_norm 1.4026 (1.6667) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][620/625] eta 0:00:01 lr 0.001726 wd 0.0500 time 0.2531 (0.2577) data time 0.0003 (0.0015) model time 0.2528 (0.2562) loss 4.9359 (5.9617) grad_norm 1.5290 (1.6663) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 87 training takes 0:02:41 [2024-07-30 07:33:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:33:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.772 (0.772) Loss 0.6699 (0.6699) Acc@1 86.572 (86.572) Acc@5 97.852 (97.852) Mem 9655MB [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 1.1074 (0.8382) Acc@1 75.684 (82.244) Acc@5 93.262 (96.396) Mem 9655MB [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.2773 (0.9939) Acc@1 71.143 (78.406) Acc@5 91.016 (94.415) Mem 9655MB [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.131 Acc@5 94.396 [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.13% [2024-07-30 07:33:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 07:33:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 07:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.770 (0.770) Loss 0.5591 (0.5591) Acc@1 87.549 (87.549) Acc@5 98.193 (98.193) Mem 9655MB [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.123) Loss 0.9536 (0.7170) Acc@1 77.148 (83.527) Acc@5 94.141 (96.848) Mem 9655MB [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.090) Loss 1.1084 (0.8670) Acc@1 72.461 (79.701) Acc@5 92.236 (95.054) Mem 9655MB [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.429 Acc@5 95.034 [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.43% [2024-07-30 07:33:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:33:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][0/625] eta 0:09:04 lr 0.001726 wd 0.0500 time 0.8715 (0.8715) data time 0.5467 (0.5467) model time 0.0000 (0.0000) loss 5.2120 (5.2120) grad_norm 1.6780 (1.6780) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][10/625] eta 0:03:11 lr 0.001725 wd 0.0500 time 0.2534 (0.3120) data time 0.0009 (0.0504) model time 0.0000 (0.0000) loss 6.6947 (6.2197) grad_norm 1.4859 (1.4675) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][20/625] eta 0:02:52 lr 0.001725 wd 0.0500 time 0.2591 (0.2851) data time 0.0008 (0.0268) model time 0.0000 (0.0000) loss 7.2423 (6.1975) grad_norm 1.1458 (1.4348) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][30/625] eta 0:02:43 lr 0.001725 wd 0.0500 time 0.2520 (0.2754) data time 0.0008 (0.0184) model time 0.0000 (0.0000) loss 6.3740 (6.1266) grad_norm 1.2181 (1.4539) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][40/625] eta 0:02:38 lr 0.001725 wd 0.0500 time 0.2554 (0.2706) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 5.5136 (6.0593) grad_norm 1.2778 (1.5474) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][50/625] eta 0:02:33 lr 0.001725 wd 0.0500 time 0.2541 (0.2677) data time 0.0009 (0.0115) model time 0.0000 (0.0000) loss 5.8773 (6.0252) grad_norm 2.5567 (1.6312) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][60/625] eta 0:02:30 lr 0.001725 wd 0.0500 time 0.2561 (0.2657) data time 0.0007 (0.0098) model time 0.2554 (0.2550) loss 5.0758 (5.9876) grad_norm 1.6932 (1.6441) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][70/625] eta 0:02:26 lr 0.001725 wd 0.0500 time 0.2523 (0.2644) data time 0.0007 (0.0085) model time 0.2515 (0.2550) loss 6.8997 (5.9843) grad_norm 2.1783 (1.6241) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:33:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][80/625] eta 0:02:23 lr 0.001725 wd 0.0500 time 0.2563 (0.2634) data time 0.0008 (0.0076) model time 0.2554 (0.2554) loss 4.6206 (5.9404) grad_norm 2.5156 (1.6637) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][90/625] eta 0:02:20 lr 0.001724 wd 0.0500 time 0.2530 (0.2627) data time 0.0010 (0.0068) model time 0.2520 (0.2554) loss 6.5145 (5.9791) grad_norm 1.5170 (1.6638) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][100/625] eta 0:02:17 lr 0.001724 wd 0.0500 time 0.2589 (0.2622) data time 0.0009 (0.0062) model time 0.2581 (0.2558) loss 6.6213 (6.0026) grad_norm 1.5526 (1.6424) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][110/625] eta 0:02:14 lr 0.001724 wd 0.0500 time 0.2539 (0.2616) data time 0.0007 (0.0058) model time 0.2531 (0.2556) loss 5.4265 (5.9715) grad_norm 1.2871 (1.6120) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][120/625] eta 0:02:11 lr 0.001724 wd 0.0500 time 0.2531 (0.2613) data time 0.0007 (0.0054) model time 0.2524 (0.2557) loss 6.1267 (5.9856) grad_norm 2.1983 (1.6301) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][130/625] eta 0:02:09 lr 0.001724 wd 0.0500 time 0.2530 (0.2609) data time 0.0007 (0.0050) model time 0.2523 (0.2557) loss 6.4258 (5.9895) grad_norm 2.1273 (1.6260) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][140/625] eta 0:02:06 lr 0.001724 wd 0.0500 time 0.2557 (0.2606) data time 0.0009 (0.0047) model time 0.2548 (0.2557) loss 6.6135 (6.0354) grad_norm 1.9217 (1.6237) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][150/625] eta 0:02:03 lr 0.001724 wd 0.0500 time 0.2582 (0.2604) data time 0.0006 (0.0045) model time 0.2576 (0.2558) loss 6.0634 (6.0106) grad_norm 1.6103 (1.6121) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][160/625] eta 0:02:01 lr 0.001724 wd 0.0500 time 0.2539 (0.2603) data time 0.0009 (0.0042) model time 0.2530 (0.2561) loss 6.4441 (6.0140) grad_norm 1.5477 (1.6230) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][170/625] eta 0:01:58 lr 0.001723 wd 0.0500 time 0.2585 (0.2601) data time 0.0008 (0.0040) model time 0.2577 (0.2560) loss 6.7447 (6.0106) grad_norm 2.3393 (1.6300) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][180/625] eta 0:01:55 lr 0.001723 wd 0.0500 time 0.2565 (0.2599) data time 0.0008 (0.0039) model time 0.2556 (0.2559) loss 5.6668 (6.0138) grad_norm 2.2073 (1.6222) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][190/625] eta 0:01:52 lr 0.001723 wd 0.0500 time 0.2571 (0.2597) data time 0.0007 (0.0037) model time 0.2565 (0.2560) loss 6.8818 (5.9985) grad_norm 1.4411 (1.6180) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][200/625] eta 0:01:50 lr 0.001723 wd 0.0500 time 0.2604 (0.2596) data time 0.0009 (0.0036) model time 0.2595 (0.2559) loss 6.5854 (5.9932) grad_norm 1.3583 (1.6191) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][210/625] eta 0:01:47 lr 0.001723 wd 0.0500 time 0.2569 (0.2594) data time 0.0010 (0.0034) model time 0.2560 (0.2559) loss 6.3896 (6.0007) grad_norm 1.6493 (1.6472) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][220/625] eta 0:01:45 lr 0.001723 wd 0.0500 time 0.2558 (0.2593) data time 0.0007 (0.0033) model time 0.2551 (0.2559) loss 6.8510 (6.0210) grad_norm 1.2769 (1.6405) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][230/625] eta 0:01:42 lr 0.001723 wd 0.0500 time 0.2603 (0.2591) data time 0.0008 (0.0032) model time 0.2595 (0.2558) loss 6.1633 (6.0117) grad_norm 1.5038 (1.6414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][240/625] eta 0:01:39 lr 0.001723 wd 0.0500 time 0.2556 (0.2591) data time 0.0008 (0.0031) model time 0.2548 (0.2559) loss 5.8962 (6.0252) grad_norm 2.3061 (1.6477) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][250/625] eta 0:01:37 lr 0.001722 wd 0.0500 time 0.2552 (0.2590) data time 0.0009 (0.0030) model time 0.2543 (0.2558) loss 6.5810 (6.0439) grad_norm 1.9313 (1.6651) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][260/625] eta 0:01:34 lr 0.001722 wd 0.0500 time 0.2535 (0.2588) data time 0.0009 (0.0030) model time 0.2526 (0.2558) loss 5.5021 (6.0473) grad_norm 1.1552 (1.6525) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][270/625] eta 0:01:31 lr 0.001722 wd 0.0500 time 0.2529 (0.2587) data time 0.0008 (0.0029) model time 0.2522 (0.2557) loss 5.4433 (6.0413) grad_norm 1.3790 (1.6448) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][280/625] eta 0:01:29 lr 0.001722 wd 0.0500 time 0.2561 (0.2586) data time 0.0007 (0.0028) model time 0.2553 (0.2557) loss 5.1948 (6.0277) grad_norm 1.4494 (1.6379) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][290/625] eta 0:01:26 lr 0.001722 wd 0.0500 time 0.2575 (0.2586) data time 0.0007 (0.0028) model time 0.2568 (0.2557) loss 5.7317 (6.0292) grad_norm 1.5447 (1.6303) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][300/625] eta 0:01:24 lr 0.001722 wd 0.0500 time 0.2545 (0.2585) data time 0.0007 (0.0027) model time 0.2538 (0.2557) loss 6.3709 (6.0259) grad_norm 1.1064 (1.6266) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][310/625] eta 0:01:21 lr 0.001722 wd 0.0500 time 0.2551 (0.2584) data time 0.0008 (0.0026) model time 0.2543 (0.2557) loss 6.4422 (6.0307) grad_norm 1.3075 (1.6179) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][320/625] eta 0:01:18 lr 0.001722 wd 0.0500 time 0.2564 (0.2584) data time 0.0009 (0.0026) model time 0.2555 (0.2557) loss 5.9129 (6.0387) grad_norm 1.3095 (1.6116) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][330/625] eta 0:01:16 lr 0.001722 wd 0.0500 time 0.2521 (0.2583) data time 0.0007 (0.0025) model time 0.2514 (0.2557) loss 6.9673 (6.0494) grad_norm 1.5584 (1.6188) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][340/625] eta 0:01:13 lr 0.001721 wd 0.0500 time 0.2620 (0.2582) data time 0.0008 (0.0025) model time 0.2612 (0.2557) loss 6.7294 (6.0433) grad_norm 1.1975 (1.6298) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][350/625] eta 0:01:10 lr 0.001721 wd 0.0500 time 0.2542 (0.2582) data time 0.0009 (0.0024) model time 0.2534 (0.2557) loss 6.1759 (6.0469) grad_norm 1.8907 (1.6365) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][360/625] eta 0:01:08 lr 0.001721 wd 0.0500 time 0.2593 (0.2581) data time 0.0006 (0.0024) model time 0.2587 (0.2556) loss 5.6311 (6.0542) grad_norm 1.7419 (1.6498) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][370/625] eta 0:01:05 lr 0.001721 wd 0.0500 time 0.2620 (0.2581) data time 0.0008 (0.0024) model time 0.2612 (0.2556) loss 5.0253 (6.0553) grad_norm 1.8820 (1.6626) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][380/625] eta 0:01:03 lr 0.001721 wd 0.0500 time 0.2576 (0.2580) data time 0.0006 (0.0023) model time 0.2571 (0.2556) loss 7.3390 (6.0623) grad_norm 1.9794 (1.6731) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][390/625] eta 0:01:00 lr 0.001721 wd 0.0500 time 0.2524 (0.2579) data time 0.0012 (0.0023) model time 0.2512 (0.2555) loss 6.7912 (6.0682) grad_norm 1.6248 (1.6667) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][400/625] eta 0:00:58 lr 0.001721 wd 0.0500 time 0.2555 (0.2583) data time 0.0009 (0.0023) model time 0.2547 (0.2560) loss 6.9192 (6.0721) grad_norm 1.3449 (1.6670) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][410/625] eta 0:00:55 lr 0.001721 wd 0.0500 time 0.2529 (0.2582) data time 0.0008 (0.0022) model time 0.2521 (0.2560) loss 7.3745 (6.0737) grad_norm 1.9222 (1.6640) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][420/625] eta 0:00:52 lr 0.001720 wd 0.0500 time 0.2529 (0.2582) data time 0.0007 (0.0022) model time 0.2522 (0.2560) loss 5.3071 (6.0722) grad_norm 2.2145 (1.6808) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][430/625] eta 0:00:50 lr 0.001720 wd 0.0500 time 0.2547 (0.2581) data time 0.0008 (0.0022) model time 0.2540 (0.2559) loss 6.5007 (6.0669) grad_norm 1.5332 (1.6812) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][440/625] eta 0:00:47 lr 0.001720 wd 0.0500 time 0.2523 (0.2581) data time 0.0008 (0.0021) model time 0.2515 (0.2559) loss 4.3619 (6.0631) grad_norm 2.5167 (1.6925) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][450/625] eta 0:00:45 lr 0.001720 wd 0.0500 time 0.2523 (0.2581) data time 0.0009 (0.0021) model time 0.2513 (0.2559) loss 6.1465 (6.0673) grad_norm 2.1720 (1.6934) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][460/625] eta 0:00:42 lr 0.001720 wd 0.0500 time 0.2562 (0.2580) data time 0.0007 (0.0021) model time 0.2556 (0.2559) loss 6.5732 (6.0731) grad_norm 1.2964 (1.6885) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][470/625] eta 0:00:39 lr 0.001720 wd 0.0500 time 0.2583 (0.2580) data time 0.0007 (0.0021) model time 0.2576 (0.2559) loss 5.8950 (6.0684) grad_norm 1.2912 (1.6821) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][480/625] eta 0:00:37 lr 0.001720 wd 0.0500 time 0.2561 (0.2580) data time 0.0011 (0.0020) model time 0.2550 (0.2559) loss 6.3865 (6.0632) grad_norm 1.9861 (1.6802) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][490/625] eta 0:00:34 lr 0.001720 wd 0.0500 time 0.2547 (0.2579) data time 0.0010 (0.0020) model time 0.2538 (0.2559) loss 6.8794 (6.0646) grad_norm 1.3684 (1.6729) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][500/625] eta 0:00:32 lr 0.001719 wd 0.0500 time 0.2570 (0.2579) data time 0.0008 (0.0020) model time 0.2561 (0.2559) loss 6.1579 (6.0658) grad_norm 2.8771 (1.6810) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][510/625] eta 0:00:29 lr 0.001719 wd 0.0500 time 0.2567 (0.2579) data time 0.0007 (0.0020) model time 0.2560 (0.2559) loss 7.0951 (6.0606) grad_norm 1.6647 (1.6784) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][520/625] eta 0:00:27 lr 0.001719 wd 0.0500 time 0.2574 (0.2579) data time 0.0009 (0.0020) model time 0.2565 (0.2559) loss 5.3462 (6.0589) grad_norm 1.4458 (1.6749) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][530/625] eta 0:00:24 lr 0.001719 wd 0.0500 time 0.2559 (0.2578) data time 0.0008 (0.0019) model time 0.2551 (0.2558) loss 6.1780 (6.0544) grad_norm 1.8621 (1.6692) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][540/625] eta 0:00:21 lr 0.001719 wd 0.0500 time 0.2574 (0.2578) data time 0.0007 (0.0019) model time 0.2567 (0.2558) loss 5.6313 (6.0494) grad_norm 1.9089 (1.6661) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][550/625] eta 0:00:19 lr 0.001719 wd 0.0500 time 0.2543 (0.2581) data time 0.0015 (0.0019) model time 0.2528 (0.2562) loss 6.9724 (6.0391) grad_norm 1.2176 (1.6674) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][560/625] eta 0:00:16 lr 0.001719 wd 0.0500 time 0.2531 (0.2581) data time 0.0008 (0.0019) model time 0.2523 (0.2562) loss 6.1759 (6.0389) grad_norm 1.2634 (1.6671) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][570/625] eta 0:00:14 lr 0.001719 wd 0.0500 time 0.2509 (0.2581) data time 0.0010 (0.0019) model time 0.2499 (0.2562) loss 5.8470 (6.0356) grad_norm 1.3120 (1.6619) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][580/625] eta 0:00:11 lr 0.001718 wd 0.0500 time 0.2626 (0.2581) data time 0.0006 (0.0018) model time 0.2620 (0.2562) loss 6.7063 (6.0373) grad_norm 2.6083 (1.6575) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][590/625] eta 0:00:09 lr 0.001718 wd 0.0500 time 0.2528 (0.2580) data time 0.0006 (0.0018) model time 0.2522 (0.2562) loss 5.0573 (6.0338) grad_norm 2.3224 (1.6579) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][600/625] eta 0:00:06 lr 0.001718 wd 0.0500 time 0.2585 (0.2580) data time 0.0009 (0.0018) model time 0.2576 (0.2562) loss 6.1735 (6.0303) grad_norm 1.2303 (1.6563) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][610/625] eta 0:00:03 lr 0.001718 wd 0.0500 time 0.2542 (0.2580) data time 0.0004 (0.0018) model time 0.2537 (0.2562) loss 6.1474 (6.0265) grad_norm 2.5201 (1.6634) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][620/625] eta 0:00:01 lr 0.001718 wd 0.0500 time 0.2543 (0.2580) data time 0.0004 (0.0018) model time 0.2539 (0.2562) loss 6.3995 (6.0363) grad_norm 1.5487 (1.6743) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 88 training takes 0:02:41 [2024-07-30 07:36:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:36:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:36:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.575 (0.575) Loss 0.6797 (0.6797) Acc@1 85.938 (85.938) Acc@5 97.656 (97.656) Mem 9655MB [2024-07-30 07:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.070 (0.113) Loss 1.0957 (0.8348) Acc@1 75.781 (82.258) Acc@5 93.359 (96.400) Mem 9655MB [2024-07-30 07:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.2441 (0.9958) Acc@1 71.826 (78.269) Acc@5 91.895 (94.464) Mem 9655MB [2024-07-30 07:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.045 Acc@5 94.438 [2024-07-30 07:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.0% [2024-07-30 07:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.909 (0.909) Loss 0.5591 (0.5591) Acc@1 87.598 (87.598) Acc@5 98.242 (98.242) Mem 9655MB [2024-07-30 07:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.156) Loss 0.9521 (0.7166) Acc@1 77.393 (83.616) Acc@5 94.189 (96.875) Mem 9655MB [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.108) Loss 1.1074 (0.8662) Acc@1 72.656 (79.804) Acc@5 92.334 (95.082) Mem 9655MB [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.499 Acc@5 95.056 [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.50% [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 07:36:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 07:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][0/625] eta 0:08:59 lr 0.001718 wd 0.0500 time 0.8632 (0.8632) data time 0.5465 (0.5465) model time 0.0000 (0.0000) loss 5.3286 (5.3286) grad_norm 3.3622 (3.3622) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][10/625] eta 0:03:11 lr 0.001718 wd 0.0500 time 0.2564 (0.3113) data time 0.0008 (0.0505) model time 0.0000 (0.0000) loss 6.4475 (5.6696) grad_norm 1.9599 (1.8256) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][20/625] eta 0:02:52 lr 0.001718 wd 0.0500 time 0.2547 (0.2848) data time 0.0010 (0.0268) model time 0.0000 (0.0000) loss 6.9266 (5.8192) grad_norm 1.3657 (1.8454) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][30/625] eta 0:02:43 lr 0.001717 wd 0.0500 time 0.2518 (0.2754) data time 0.0010 (0.0185) model time 0.0000 (0.0000) loss 6.2790 (5.8622) grad_norm 1.4071 (1.8425) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][40/625] eta 0:02:38 lr 0.001717 wd 0.0500 time 0.2540 (0.2710) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 5.1692 (5.9413) grad_norm 1.1935 (1.7272) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][50/625] eta 0:02:34 lr 0.001717 wd 0.0500 time 0.2536 (0.2678) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 6.3393 (5.9075) grad_norm 1.7735 (1.7410) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][60/625] eta 0:02:30 lr 0.001717 wd 0.0500 time 0.2565 (0.2658) data time 0.0007 (0.0098) model time 0.2558 (0.2544) loss 6.8455 (5.9276) grad_norm 1.1348 (1.7065) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][70/625] eta 0:02:26 lr 0.001717 wd 0.0500 time 0.2596 (0.2645) data time 0.0008 (0.0086) model time 0.2588 (0.2552) loss 7.2741 (5.9235) grad_norm 1.6547 (1.6856) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][80/625] eta 0:02:23 lr 0.001717 wd 0.0500 time 0.2590 (0.2639) data time 0.0008 (0.0076) model time 0.2582 (0.2563) loss 6.0483 (5.9544) grad_norm 1.0296 (1.6786) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][90/625] eta 0:02:20 lr 0.001717 wd 0.0500 time 0.2604 (0.2632) data time 0.0006 (0.0069) model time 0.2598 (0.2564) loss 5.2037 (5.9196) grad_norm 1.4809 (1.6661) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][100/625] eta 0:02:17 lr 0.001717 wd 0.0500 time 0.2555 (0.2625) data time 0.0007 (0.0063) model time 0.2548 (0.2562) loss 5.4360 (5.9627) grad_norm 1.2986 (1.6569) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][110/625] eta 0:02:14 lr 0.001716 wd 0.0500 time 0.2588 (0.2620) data time 0.0009 (0.0058) model time 0.2580 (0.2561) loss 6.8625 (5.9812) grad_norm 1.3476 (1.6461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][120/625] eta 0:02:12 lr 0.001716 wd 0.0500 time 0.2659 (0.2616) data time 0.0006 (0.0054) model time 0.2653 (0.2562) loss 5.2048 (5.9819) grad_norm 1.0677 (1.6410) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 07:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 07:36:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:36:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 07:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 07:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 07:39:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 89) [2024-07-30 07:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 07:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 07:39:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 07:39:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 08:04:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 08:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 08:07:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 08:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 08:07:44 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 89) [2024-07-30 08:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 08:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][130/625] eta 0:10:59 lr 0.001716 wd 0.0500 time 0.2648 (1.3324) data time 0.0008 (0.1121) model time 0.2640 (1.2203) loss 6.4805 (6.5966) grad_norm 1.9391 (1.9285) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][140/625] eta 0:05:58 lr 0.001716 wd 0.0500 time 0.2691 (0.7386) data time 0.0007 (0.0504) model time 0.2685 (0.6882) loss 7.1543 (6.4398) grad_norm 1.8847 (2.1832) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][150/625] eta 0:04:30 lr 0.001716 wd 0.0500 time 0.2661 (0.5699) data time 0.0009 (0.0328) model time 0.2652 (0.5371) loss 6.8272 (6.3821) grad_norm 1.5692 (2.1048) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][160/625] eta 0:03:47 lr 0.001716 wd 0.0500 time 0.2645 (0.4898) data time 0.0009 (0.0245) model time 0.2636 (0.4653) loss 6.1147 (6.2951) grad_norm 2.5510 (1.9751) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][170/625] eta 0:03:21 lr 0.001716 wd 0.0500 time 0.2658 (0.4431) data time 0.0009 (0.0196) model time 0.2649 (0.4235) loss 6.0831 (6.2423) grad_norm 1.5789 (1.8987) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][180/625] eta 0:03:03 lr 0.001716 wd 0.0500 time 0.2665 (0.4127) data time 0.0007 (0.0164) model time 0.2658 (0.3963) loss 5.0192 (6.1805) grad_norm 1.3441 (1.9097) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][190/625] eta 0:02:50 lr 0.001715 wd 0.0500 time 0.2656 (0.3910) data time 0.0008 (0.0141) model time 0.2648 (0.3768) loss 4.9620 (6.1347) grad_norm 1.3675 (1.8472) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][200/625] eta 0:02:39 lr 0.001715 wd 0.0500 time 0.2657 (0.3749) data time 0.0007 (0.0125) model time 0.2650 (0.3624) loss 5.1539 (6.1219) grad_norm 1.3195 (1.7861) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][210/625] eta 0:02:30 lr 0.001715 wd 0.0500 time 0.2640 (0.3624) data time 0.0012 (0.0112) model time 0.2628 (0.3512) loss 6.7640 (6.1047) grad_norm 1.2687 (1.7414) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][220/625] eta 0:02:22 lr 0.001715 wd 0.0500 time 0.2642 (0.3523) data time 0.0008 (0.0101) model time 0.2634 (0.3422) loss 7.2726 (6.1261) grad_norm 1.7515 (1.7121) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][230/625] eta 0:02:15 lr 0.001715 wd 0.0500 time 0.2607 (0.3442) data time 0.0007 (0.0093) model time 0.2599 (0.3349) loss 5.0564 (6.1426) grad_norm 1.1851 (1.6957) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][240/625] eta 0:02:09 lr 0.001715 wd 0.0500 time 0.2617 (0.3375) data time 0.0011 (0.0086) model time 0.2606 (0.3289) loss 5.4342 (6.1435) grad_norm 1.4348 (1.7015) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][250/625] eta 0:02:04 lr 0.001715 wd 0.0500 time 0.2684 (0.3320) data time 0.0008 (0.0080) model time 0.2676 (0.3240) loss 6.1233 (6.1165) grad_norm 1.9041 (1.6905) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][260/625] eta 0:01:59 lr 0.001715 wd 0.0500 time 0.2625 (0.3273) data time 0.0010 (0.0075) model time 0.2616 (0.3197) loss 6.0131 (6.1197) grad_norm 2.2322 (1.7035) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][270/625] eta 0:01:54 lr 0.001714 wd 0.0500 time 0.2636 (0.3234) data time 0.0012 (0.0071) model time 0.2624 (0.3163) loss 6.4433 (6.1208) grad_norm 0.9829 (1.6892) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 08:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][280/625] eta 0:01:50 lr 0.001714 wd 0.0500 time 0.2683 (0.3199) data time 0.0007 (0.0067) model time 0.2676 (0.3132) loss 5.0582 (6.1147) grad_norm 2.1341 (1.6974) loss_scale 4096.0000 (2073.9241) mem 9656MB [2024-07-30 08:09:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][290/625] eta 0:01:46 lr 0.001714 wd 0.0500 time 0.2635 (0.3166) data time 0.0010 (0.0064) model time 0.2624 (0.3103) loss 6.8749 (6.1446) grad_norm 1.7176 (1.7236) loss_scale 4096.0000 (2194.2857) mem 9656MB [2024-07-30 08:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][300/625] eta 0:01:41 lr 0.001714 wd 0.0500 time 0.2651 (0.3138) data time 0.0007 (0.0061) model time 0.2644 (0.3077) loss 4.5668 (6.1117) grad_norm 1.2014 (1.7179) loss_scale 4096.0000 (2301.1236) mem 9656MB [2024-07-30 08:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][310/625] eta 0:01:38 lr 0.001714 wd 0.0500 time 0.2666 (0.3112) data time 0.0008 (0.0058) model time 0.2658 (0.3054) loss 6.3378 (6.0943) grad_norm 1.6028 (1.7039) loss_scale 4096.0000 (2396.5957) mem 9656MB [2024-07-30 08:09:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][320/625] eta 0:01:34 lr 0.001714 wd 0.0500 time 0.2750 (0.3090) data time 0.0009 (0.0056) model time 0.2740 (0.3034) loss 3.9221 (6.0686) grad_norm 3.2296 (1.7159) loss_scale 4096.0000 (2482.4242) mem 9656MB [2024-07-30 08:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][330/625] eta 0:01:30 lr 0.001714 wd 0.0500 time 0.2673 (0.3071) data time 0.0009 (0.0054) model time 0.2664 (0.3017) loss 6.8794 (6.0489) grad_norm 1.4831 (1.7136) loss_scale 4096.0000 (2560.0000) mem 9656MB [2024-07-30 08:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][340/625] eta 0:01:26 lr 0.001714 wd 0.0500 time 0.2645 (0.3052) data time 0.0012 (0.0052) model time 0.2633 (0.3001) loss 5.6119 (6.0416) grad_norm 1.2629 (1.7122) loss_scale 4096.0000 (2630.4587) mem 9656MB [2024-07-30 08:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][350/625] eta 0:01:23 lr 0.001713 wd 0.0500 time 0.2744 (0.3037) data time 0.0011 (0.0050) model time 0.2733 (0.2987) loss 7.0112 (6.0546) grad_norm 1.7174 (1.7013) loss_scale 4096.0000 (2694.7368) mem 9656MB [2024-07-30 08:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][360/625] eta 0:01:20 lr 0.001713 wd 0.0500 time 0.2698 (0.3021) data time 0.0007 (0.0048) model time 0.2691 (0.2973) loss 6.6387 (6.0435) grad_norm 1.8193 (1.6922) loss_scale 4096.0000 (2753.6134) mem 9656MB [2024-07-30 08:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][370/625] eta 0:01:16 lr 0.001713 wd 0.0500 time 0.2642 (0.3007) data time 0.0009 (0.0047) model time 0.2633 (0.2960) loss 5.2284 (6.0381) grad_norm 1.2266 (1.6834) loss_scale 4096.0000 (2807.7419) mem 9656MB [2024-07-30 08:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][380/625] eta 0:01:13 lr 0.001713 wd 0.0500 time 0.2681 (0.2995) data time 0.0009 (0.0045) model time 0.2672 (0.2949) loss 5.9806 (6.0266) grad_norm 1.3172 (1.6808) loss_scale 4096.0000 (2857.6744) mem 9656MB [2024-07-30 08:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][390/625] eta 0:01:10 lr 0.001713 wd 0.0500 time 0.2659 (0.2983) data time 0.0008 (0.0044) model time 0.2651 (0.2939) loss 6.6422 (6.0159) grad_norm 1.6140 (1.6835) loss_scale 4096.0000 (2903.8806) mem 9656MB [2024-07-30 08:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][400/625] eta 0:01:06 lr 0.001713 wd 0.0500 time 0.2661 (0.2972) data time 0.0010 (0.0043) model time 0.2652 (0.2929) loss 4.8149 (6.0207) grad_norm 1.8092 (1.6949) loss_scale 4096.0000 (2946.7626) mem 9656MB [2024-07-30 08:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][410/625] eta 0:01:03 lr 0.001713 wd 0.0500 time 0.2670 (0.2962) data time 0.0011 (0.0042) model time 0.2659 (0.2920) loss 7.1699 (6.0335) grad_norm 1.5153 (1.6938) loss_scale 4096.0000 (2986.6667) mem 9656MB [2024-07-30 08:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][420/625] eta 0:01:00 lr 0.001713 wd 0.0500 time 0.2662 (0.2952) data time 0.0011 (0.0041) model time 0.2651 (0.2911) loss 5.2756 (6.0095) grad_norm 1.5539 (1.6909) loss_scale 4096.0000 (3023.8926) mem 9656MB [2024-07-30 08:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][430/625] eta 0:00:57 lr 0.001712 wd 0.0500 time 0.2688 (0.2943) data time 0.0007 (0.0040) model time 0.2681 (0.2903) loss 5.2475 (6.0039) grad_norm 1.3205 (1.6833) loss_scale 4096.0000 (3058.7013) mem 9656MB [2024-07-30 08:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][440/625] eta 0:00:54 lr 0.001712 wd 0.0500 time 0.2649 (0.2935) data time 0.0009 (0.0039) model time 0.2640 (0.2896) loss 5.4132 (6.0213) grad_norm 2.0170 (1.6785) loss_scale 4096.0000 (3091.3208) mem 9656MB [2024-07-30 08:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][450/625] eta 0:00:51 lr 0.001712 wd 0.0500 time 0.2720 (0.2927) data time 0.0010 (0.0038) model time 0.2711 (0.2889) loss 6.2751 (6.0327) grad_norm 4.1966 (1.6943) loss_scale 4096.0000 (3121.9512) mem 9656MB [2024-07-30 08:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][460/625] eta 0:00:48 lr 0.001712 wd 0.0500 time 0.2673 (0.2919) data time 0.0010 (0.0037) model time 0.2663 (0.2882) loss 6.1233 (6.0301) grad_norm 1.3735 (1.6964) loss_scale 4096.0000 (3150.7692) mem 9656MB [2024-07-30 08:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][470/625] eta 0:00:45 lr 0.001712 wd 0.0500 time 0.2681 (0.2912) data time 0.0007 (0.0036) model time 0.2674 (0.2876) loss 6.2058 (6.0360) grad_norm 1.1386 (1.6885) loss_scale 4096.0000 (3177.9310) mem 9656MB [2024-07-30 08:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][480/625] eta 0:00:42 lr 0.001712 wd 0.0500 time 0.2626 (0.2905) data time 0.0010 (0.0036) model time 0.2616 (0.2869) loss 6.7489 (6.0353) grad_norm 2.3280 (1.6879) loss_scale 4096.0000 (3203.5754) mem 9656MB [2024-07-30 08:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][490/625] eta 0:00:39 lr 0.001712 wd 0.0500 time 0.2716 (0.2899) data time 0.0010 (0.0035) model time 0.2706 (0.2864) loss 5.3455 (6.0293) grad_norm 1.7605 (1.6856) loss_scale 4096.0000 (3227.8261) mem 9656MB [2024-07-30 08:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][500/625] eta 0:00:36 lr 0.001712 wd 0.0500 time 0.2649 (0.2894) data time 0.0010 (0.0034) model time 0.2639 (0.2859) loss 6.7753 (6.0311) grad_norm 2.2882 (1.6875) loss_scale 4096.0000 (3250.7937) mem 9656MB [2024-07-30 08:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][510/625] eta 0:00:33 lr 0.001711 wd 0.0500 time 0.2684 (0.2888) data time 0.0009 (0.0034) model time 0.2676 (0.2854) loss 5.0314 (6.0236) grad_norm 1.5540 (1.6961) loss_scale 4096.0000 (3272.5773) mem 9656MB [2024-07-30 08:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][520/625] eta 0:00:30 lr 0.001711 wd 0.0500 time 0.2642 (0.2882) data time 0.0008 (0.0033) model time 0.2634 (0.2849) loss 5.5489 (6.0235) grad_norm 1.5271 (1.6922) loss_scale 4096.0000 (3293.2663) mem 9656MB [2024-07-30 08:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][530/625] eta 0:00:27 lr 0.001711 wd 0.0500 time 0.2658 (0.2877) data time 0.0011 (0.0033) model time 0.2647 (0.2845) loss 6.9428 (6.0309) grad_norm 2.4425 (1.6885) loss_scale 4096.0000 (3312.9412) mem 9656MB [2024-07-30 08:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][540/625] eta 0:00:24 lr 0.001711 wd 0.0500 time 0.2687 (0.2873) data time 0.0007 (0.0032) model time 0.2680 (0.2841) loss 6.8669 (6.0326) grad_norm 2.8220 (1.7027) loss_scale 4096.0000 (3331.6746) mem 9656MB [2024-07-30 08:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 08:10:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 08:10:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 08:28:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 08:28:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 08:29:09 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 89) [2024-07-30 08:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 08:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 08:29:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 08:29:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 08:54:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 08:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 08:55:06 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 89) [2024-07-30 08:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 08:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][550/625] eta 0:08:50 lr 0.001711 wd 0.0500 time 7.0699 (7.0699) data time 0.9333 (0.9333) model time 6.1366 (6.1366) loss 6.9100 (6.9100) grad_norm 1.2853 (1.2853) loss_scale 4096.0000 (4096.0000) mem 10975MB [2024-07-30 08:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][560/625] eta 0:00:59 lr 0.001711 wd 0.0500 time 0.2538 (0.9172) data time 0.0008 (0.0856) model time 0.2530 (0.8315) loss 5.0628 (6.5210) grad_norm 1.3545 (1.3294) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][570/625] eta 0:00:33 lr 0.001711 wd 0.0500 time 0.2521 (0.6012) data time 0.0009 (0.0454) model time 0.2512 (0.5558) loss 5.9658 (6.3608) grad_norm 1.5706 (1.6762) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][580/625] eta 0:00:22 lr 0.001711 wd 0.0500 time 0.2548 (0.4892) data time 0.0006 (0.0310) model time 0.2542 (0.4582) loss 4.9733 (6.3717) grad_norm 1.4249 (1.7482) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][590/625] eta 0:00:15 lr 0.001710 wd 0.0500 time 0.2733 (0.4327) data time 0.0009 (0.0237) model time 0.2724 (0.4090) loss 5.8079 (6.3045) grad_norm 1.3144 (1.7394) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][600/625] eta 0:00:09 lr 0.001710 wd 0.0500 time 0.2525 (0.3975) data time 0.0008 (0.0192) model time 0.2517 (0.3783) loss 6.3779 (6.2967) grad_norm 1.2793 (1.7768) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][610/625] eta 0:00:05 lr 0.001710 wd 0.0500 time 0.2514 (0.3740) data time 0.0007 (0.0163) model time 0.2507 (0.3577) loss 5.8649 (6.2393) grad_norm 1.3047 (1.7672) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][620/625] eta 0:00:01 lr 0.001710 wd 0.0500 time 0.2517 (0.3568) data time 0.0006 (0.0141) model time 0.2511 (0.3427) loss 5.4490 (6.1971) grad_norm 1.6222 (1.7212) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 08:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 89 training takes 0:00:26 [2024-07-30 08:55:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 08:55:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 08:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.406 (0.406) Loss 0.6274 (0.6274) Acc@1 87.012 (87.012) Acc@5 97.998 (97.998) Mem 9652MB [2024-07-30 08:55:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.088) Loss 1.1191 (0.8053) Acc@1 73.291 (82.071) Acc@5 92.920 (96.458) Mem 9652MB [2024-07-30 08:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.2236 (0.9604) Acc@1 71.533 (78.332) Acc@5 92.090 (94.582) Mem 9652MB [2024-07-30 08:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.155 Acc@5 94.536 [2024-07-30 08:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-30 08:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.15% [2024-07-30 08:55:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 08:55:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 08:55:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.543 (0.543) Loss 0.5601 (0.5601) Acc@1 87.646 (87.646) Acc@5 98.193 (98.193) Mem 9652MB [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.100) Loss 0.9521 (0.7167) Acc@1 77.295 (83.691) Acc@5 94.287 (96.888) Mem 9652MB [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.078) Loss 1.1064 (0.8657) Acc@1 72.656 (79.874) Acc@5 92.480 (95.117) Mem 9652MB [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.557 Acc@5 95.092 [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.56% [2024-07-30 08:55:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 08:55:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 08:55:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][0/625] eta 0:12:32 lr 0.001710 wd 0.0500 time 1.2048 (1.2048) data time 0.5682 (0.5682) model time 0.0000 (0.0000) loss 6.4494 (6.4494) grad_norm 0.9734 (0.9734) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 08:55:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][10/625] eta 0:03:29 lr 0.001710 wd 0.0500 time 0.2527 (0.3411) data time 0.0007 (0.0525) model time 0.0000 (0.0000) loss 5.0066 (5.8287) grad_norm 1.4244 (1.7169) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:55:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][20/625] eta 0:03:01 lr 0.001710 wd 0.0500 time 0.2557 (0.2999) data time 0.0007 (0.0279) model time 0.0000 (0.0000) loss 6.2175 (5.9838) grad_norm 1.2812 (1.6118) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][30/625] eta 0:02:49 lr 0.001710 wd 0.0500 time 0.2562 (0.2855) data time 0.0008 (0.0192) model time 0.0000 (0.0000) loss 6.5761 (6.1190) grad_norm 1.8596 (1.6166) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][40/625] eta 0:02:42 lr 0.001710 wd 0.0500 time 0.2542 (0.2779) data time 0.0007 (0.0148) model time 0.0000 (0.0000) loss 7.0569 (6.0727) grad_norm 1.3457 (1.5440) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][50/625] eta 0:02:37 lr 0.001709 wd 0.0500 time 0.2582 (0.2733) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 4.7367 (5.9810) grad_norm 1.6460 (1.5455) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][60/625] eta 0:02:32 lr 0.001709 wd 0.0500 time 0.2584 (0.2704) data time 0.0010 (0.0103) model time 0.2574 (0.2546) loss 5.2771 (6.0036) grad_norm 1.2308 (1.6003) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][70/625] eta 0:02:28 lr 0.001709 wd 0.0500 time 0.2552 (0.2683) data time 0.0007 (0.0089) model time 0.2545 (0.2546) loss 4.7040 (5.9710) grad_norm 1.0392 (1.6343) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][80/625] eta 0:02:25 lr 0.001709 wd 0.0500 time 0.2547 (0.2668) data time 0.0009 (0.0080) model time 0.2538 (0.2548) loss 6.5569 (6.0123) grad_norm 1.3731 (1.6247) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][90/625] eta 0:02:22 lr 0.001709 wd 0.0500 time 0.2566 (0.2657) data time 0.0009 (0.0072) model time 0.2556 (0.2550) loss 6.4802 (6.0369) grad_norm 1.8280 (1.6209) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][100/625] eta 0:02:19 lr 0.001709 wd 0.0500 time 0.2543 (0.2648) data time 0.0007 (0.0066) model time 0.2535 (0.2551) loss 5.1316 (6.0078) grad_norm 1.2810 (1.6083) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][110/625] eta 0:02:15 lr 0.001709 wd 0.0500 time 0.2495 (0.2639) data time 0.0010 (0.0061) model time 0.2486 (0.2549) loss 5.2557 (6.0056) grad_norm 1.8874 (1.6125) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][120/625] eta 0:02:13 lr 0.001709 wd 0.0500 time 0.2561 (0.2634) data time 0.0006 (0.0057) model time 0.2554 (0.2552) loss 4.8830 (5.9730) grad_norm 1.1252 (1.6285) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][130/625] eta 0:02:10 lr 0.001708 wd 0.0500 time 0.2589 (0.2628) data time 0.0007 (0.0053) model time 0.2582 (0.2551) loss 4.5756 (5.9397) grad_norm 1.6082 (1.6143) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][140/625] eta 0:02:07 lr 0.001708 wd 0.0500 time 0.2522 (0.2622) data time 0.0008 (0.0050) model time 0.2515 (0.2550) loss 4.4754 (5.9373) grad_norm 2.2310 (1.6169) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][150/625] eta 0:02:04 lr 0.001708 wd 0.0500 time 0.2525 (0.2618) data time 0.0007 (0.0047) model time 0.2518 (0.2549) loss 5.7688 (5.9519) grad_norm 1.0465 (1.6128) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][160/625] eta 0:02:01 lr 0.001708 wd 0.0500 time 0.2575 (0.2616) data time 0.0007 (0.0045) model time 0.2568 (0.2551) loss 6.8438 (5.9547) grad_norm 2.2063 (1.6315) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][170/625] eta 0:01:58 lr 0.001708 wd 0.0500 time 0.2552 (0.2612) data time 0.0008 (0.0043) model time 0.2545 (0.2551) loss 4.6439 (5.9486) grad_norm 2.8841 (1.6537) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][180/625] eta 0:01:56 lr 0.001708 wd 0.0500 time 0.2497 (0.2610) data time 0.0011 (0.0041) model time 0.2486 (0.2552) loss 4.8905 (5.9323) grad_norm 1.5563 (1.6670) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][190/625] eta 0:01:53 lr 0.001708 wd 0.0500 time 0.2556 (0.2607) data time 0.0008 (0.0039) model time 0.2549 (0.2551) loss 5.7810 (5.9240) grad_norm 1.4084 (1.6604) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][200/625] eta 0:01:50 lr 0.001708 wd 0.0500 time 0.2543 (0.2604) data time 0.0010 (0.0038) model time 0.2534 (0.2551) loss 6.4365 (5.9320) grad_norm 1.1524 (1.6584) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][210/625] eta 0:01:47 lr 0.001707 wd 0.0500 time 0.2545 (0.2602) data time 0.0008 (0.0037) model time 0.2537 (0.2550) loss 5.5698 (5.9352) grad_norm 1.5111 (1.6748) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][220/625] eta 0:01:45 lr 0.001707 wd 0.0500 time 0.2572 (0.2600) data time 0.0005 (0.0035) model time 0.2566 (0.2550) loss 5.1659 (5.9348) grad_norm 1.7459 (1.6784) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][230/625] eta 0:01:42 lr 0.001707 wd 0.0500 time 0.2534 (0.2598) data time 0.0007 (0.0034) model time 0.2527 (0.2549) loss 4.6923 (5.9274) grad_norm 2.0502 (1.6732) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][240/625] eta 0:01:39 lr 0.001707 wd 0.0500 time 0.2584 (0.2596) data time 0.0007 (0.0033) model time 0.2577 (0.2549) loss 6.3652 (5.9515) grad_norm 1.7280 (1.6726) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][250/625] eta 0:01:37 lr 0.001707 wd 0.0500 time 0.2594 (0.2595) data time 0.0008 (0.0032) model time 0.2587 (0.2549) loss 7.1622 (5.9812) grad_norm 1.2196 (1.6633) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][260/625] eta 0:01:34 lr 0.001707 wd 0.0500 time 0.2519 (0.2593) data time 0.0009 (0.0031) model time 0.2510 (0.2549) loss 5.5625 (5.9706) grad_norm 1.4380 (1.6647) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][270/625] eta 0:01:31 lr 0.001707 wd 0.0500 time 0.2544 (0.2591) data time 0.0009 (0.0031) model time 0.2534 (0.2549) loss 6.6653 (5.9784) grad_norm 2.2036 (1.6920) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][280/625] eta 0:01:29 lr 0.001707 wd 0.0500 time 0.2563 (0.2590) data time 0.0008 (0.0030) model time 0.2556 (0.2549) loss 4.9793 (5.9817) grad_norm 2.1258 (1.6906) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][290/625] eta 0:01:26 lr 0.001706 wd 0.0500 time 0.2514 (0.2589) data time 0.0009 (0.0029) model time 0.2505 (0.2549) loss 6.4156 (5.9809) grad_norm 1.4790 (1.6891) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][300/625] eta 0:01:24 lr 0.001706 wd 0.0500 time 0.2618 (0.2589) data time 0.0006 (0.0029) model time 0.2611 (0.2549) loss 4.4305 (5.9743) grad_norm 0.9861 (1.6770) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][310/625] eta 0:01:21 lr 0.001706 wd 0.0500 time 0.2515 (0.2588) data time 0.0008 (0.0028) model time 0.2507 (0.2549) loss 6.7177 (5.9626) grad_norm 1.6227 (1.6722) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][320/625] eta 0:01:18 lr 0.001706 wd 0.0500 time 0.2712 (0.2588) data time 0.0008 (0.0027) model time 0.2704 (0.2550) loss 6.5304 (5.9688) grad_norm 1.5422 (1.6726) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][330/625] eta 0:01:16 lr 0.001706 wd 0.0500 time 0.2568 (0.2594) data time 0.0006 (0.0027) model time 0.2562 (0.2559) loss 6.9316 (5.9774) grad_norm 1.7843 (1.6706) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][340/625] eta 0:01:13 lr 0.001706 wd 0.0500 time 0.2562 (0.2594) data time 0.0007 (0.0027) model time 0.2556 (0.2559) loss 4.9306 (5.9783) grad_norm 1.3429 (1.6605) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][350/625] eta 0:01:11 lr 0.001706 wd 0.0500 time 0.2530 (0.2594) data time 0.0009 (0.0026) model time 0.2521 (0.2560) loss 5.6514 (5.9803) grad_norm 1.1627 (1.6583) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][360/625] eta 0:01:08 lr 0.001705 wd 0.0500 time 0.2527 (0.2594) data time 0.0009 (0.0026) model time 0.2518 (0.2561) loss 5.5732 (5.9916) grad_norm 1.8317 (1.6550) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][370/625] eta 0:01:06 lr 0.001705 wd 0.0500 time 0.2585 (0.2600) data time 0.0008 (0.0026) model time 0.2577 (0.2569) loss 5.6421 (5.9934) grad_norm 1.8026 (1.6767) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][380/625] eta 0:01:03 lr 0.001705 wd 0.0500 time 0.2573 (0.2601) data time 0.0006 (0.0025) model time 0.2567 (0.2569) loss 5.1874 (5.9859) grad_norm 1.5443 (1.6846) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][390/625] eta 0:01:01 lr 0.001705 wd 0.0500 time 0.2597 (0.2600) data time 0.0008 (0.0025) model time 0.2590 (0.2568) loss 5.0441 (5.9758) grad_norm 1.8061 (1.6875) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][400/625] eta 0:00:58 lr 0.001705 wd 0.0500 time 0.2529 (0.2599) data time 0.0008 (0.0025) model time 0.2521 (0.2568) loss 5.6799 (5.9692) grad_norm 1.3354 (1.6893) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][410/625] eta 0:00:55 lr 0.001705 wd 0.0500 time 0.2562 (0.2598) data time 0.0009 (0.0025) model time 0.2553 (0.2568) loss 6.6726 (5.9776) grad_norm 1.2462 (1.6848) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][420/625] eta 0:00:53 lr 0.001705 wd 0.0500 time 0.2573 (0.2598) data time 0.0007 (0.0024) model time 0.2566 (0.2568) loss 5.1164 (5.9753) grad_norm 1.3005 (1.6793) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][430/625] eta 0:00:50 lr 0.001705 wd 0.0500 time 0.2561 (0.2597) data time 0.0008 (0.0024) model time 0.2553 (0.2568) loss 6.7454 (5.9747) grad_norm 1.8335 (1.6732) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][440/625] eta 0:00:48 lr 0.001704 wd 0.0500 time 0.2523 (0.2597) data time 0.0010 (0.0024) model time 0.2513 (0.2568) loss 5.6998 (5.9885) grad_norm 1.4871 (1.6758) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][450/625] eta 0:00:45 lr 0.001704 wd 0.0500 time 0.2583 (0.2596) data time 0.0007 (0.0023) model time 0.2576 (0.2568) loss 4.4240 (5.9792) grad_norm 1.3688 (1.6759) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][460/625] eta 0:00:42 lr 0.001704 wd 0.0500 time 0.2537 (0.2596) data time 0.0010 (0.0023) model time 0.2528 (0.2568) loss 6.4756 (5.9744) grad_norm 1.5234 (1.6751) loss_scale 4096.0000 (4096.0000) mem 9658MB [2024-07-30 08:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 08:57:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 08:57:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 09:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 09:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 09:03:47 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 09:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 09:04:03 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 09:04:03 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 09:04:03 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 09:04:03 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 90) [2024-07-30 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 09:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][470/625] eta 0:09:24 lr 0.001704 wd 0.0500 time 0.7468 (3.6416) data time 0.0008 (0.3904) model time 0.7460 (3.2513) loss 6.5305 (6.6151) grad_norm 1.2527 (1.1914) loss_scale 4096.0000 (4096.0000) mem 9654MB [2024-07-30 09:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][480/625] eta 0:01:58 lr 0.001704 wd 0.0500 time 0.2532 (0.8174) data time 0.0006 (0.0658) model time 0.2527 (0.7516) loss 4.4666 (6.4189) grad_norm 2.0882 (1.4693) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][490/625] eta 0:01:15 lr 0.001704 wd 0.0500 time 0.2528 (0.5620) data time 0.0007 (0.0363) model time 0.2521 (0.5257) loss 6.1261 (6.3796) grad_norm 1.3069 (1.7076) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][500/625] eta 0:00:58 lr 0.001704 wd 0.0500 time 0.2535 (0.4660) data time 0.0007 (0.0252) model time 0.2529 (0.4407) loss 6.5957 (6.4630) grad_norm 1.0544 (1.6706) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][510/625] eta 0:00:47 lr 0.001704 wd 0.0500 time 0.2531 (0.4156) data time 0.0009 (0.0195) model time 0.2522 (0.3961) loss 6.6413 (6.3615) grad_norm 1.8280 (1.7372) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][520/625] eta 0:00:40 lr 0.001703 wd 0.0500 time 0.2533 (0.3845) data time 0.0008 (0.0159) model time 0.2525 (0.3686) loss 6.3516 (6.3255) grad_norm 1.7622 (1.8088) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][530/625] eta 0:00:34 lr 0.001703 wd 0.0500 time 0.2540 (0.3634) data time 0.0007 (0.0135) model time 0.2533 (0.3499) loss 6.5965 (6.2809) grad_norm 2.0195 (1.7678) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][540/625] eta 0:00:29 lr 0.001703 wd 0.0500 time 0.2576 (0.3482) data time 0.0010 (0.0117) model time 0.2565 (0.3364) loss 5.4995 (6.2068) grad_norm 1.8787 (1.7770) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][550/625] eta 0:00:25 lr 0.001703 wd 0.0500 time 0.2552 (0.3367) data time 0.0008 (0.0104) model time 0.2544 (0.3263) loss 6.4430 (6.2124) grad_norm 1.6774 (1.7690) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][560/625] eta 0:00:21 lr 0.001703 wd 0.0500 time 0.2496 (0.3278) data time 0.0008 (0.0094) model time 0.2489 (0.3184) loss 4.7085 (6.1687) grad_norm 2.3135 (1.7976) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][570/625] eta 0:00:17 lr 0.001703 wd 0.0500 time 0.2521 (0.3206) data time 0.0006 (0.0086) model time 0.2514 (0.3120) loss 7.0614 (6.2028) grad_norm 1.5189 (1.7763) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][580/625] eta 0:00:14 lr 0.001703 wd 0.0500 time 0.2550 (0.3146) data time 0.0008 (0.0079) model time 0.2542 (0.3067) loss 6.9612 (6.1943) grad_norm 1.2645 (1.7373) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][590/625] eta 0:00:10 lr 0.001703 wd 0.0500 time 0.2585 (0.3097) data time 0.0006 (0.0073) model time 0.2579 (0.3023) loss 6.4922 (6.2016) grad_norm 1.5148 (1.7135) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][600/625] eta 0:00:07 lr 0.001702 wd 0.0500 time 0.2542 (0.3055) data time 0.0009 (0.0068) model time 0.2533 (0.2987) loss 6.4127 (6.1808) grad_norm 2.1111 (1.7449) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][610/625] eta 0:00:04 lr 0.001702 wd 0.0500 time 0.2511 (0.3019) data time 0.0006 (0.0064) model time 0.2505 (0.2954) loss 6.6994 (6.1794) grad_norm 1.6194 (1.7480) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][620/625] eta 0:00:01 lr 0.001702 wd 0.0500 time 0.2511 (0.2985) data time 0.0006 (0.0061) model time 0.2505 (0.2925) loss 6.6193 (6.1720) grad_norm 1.5200 (1.7775) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 90 training takes 0:00:46 [2024-07-30 09:04:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 09:04:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 09:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.395 (0.395) Loss 0.6875 (0.6875) Acc@1 85.986 (85.986) Acc@5 97.656 (97.656) Mem 9656MB [2024-07-30 09:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 1.0898 (0.8270) Acc@1 74.951 (82.293) Acc@5 93.457 (96.444) Mem 9656MB [2024-07-30 09:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.3154 (0.9866) Acc@1 69.824 (78.323) Acc@5 90.576 (94.375) Mem 9656MB [2024-07-30 09:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.117 Acc@5 94.326 [2024-07-30 09:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-30 09:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5596 (0.5596) Acc@1 87.695 (87.695) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-30 09:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.129) Loss 0.9507 (0.7161) Acc@1 77.441 (83.731) Acc@5 94.336 (96.893) Mem 9656MB [2024-07-30 09:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.094) Loss 1.1045 (0.8648) Acc@1 72.705 (79.911) Acc@5 92.529 (95.145) Mem 9656MB [2024-07-30 09:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.603 Acc@5 95.106 [2024-07-30 09:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-30 09:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.60% [2024-07-30 09:05:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 09:05:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 09:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][0/625] eta 0:17:11 lr 0.001702 wd 0.0500 time 1.6501 (1.6501) data time 0.4003 (0.4003) model time 0.0000 (0.0000) loss 5.7969 (5.7969) grad_norm 2.5129 (2.5129) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 09:05:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][10/625] eta 0:03:54 lr 0.001702 wd 0.0500 time 0.2493 (0.3807) data time 0.0011 (0.0373) model time 0.0000 (0.0000) loss 5.6269 (6.1347) grad_norm 2.3276 (1.8660) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][20/625] eta 0:03:14 lr 0.001702 wd 0.0500 time 0.2575 (0.3214) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 5.2111 (5.8224) grad_norm 1.7858 (1.8958) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][30/625] eta 0:02:58 lr 0.001702 wd 0.0500 time 0.2566 (0.3005) data time 0.0007 (0.0139) model time 0.0000 (0.0000) loss 5.1983 (5.8870) grad_norm 1.6184 (1.9477) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][40/625] eta 0:02:49 lr 0.001702 wd 0.0500 time 0.2586 (0.2892) data time 0.0006 (0.0107) model time 0.0000 (0.0000) loss 6.1758 (5.8891) grad_norm 1.5111 (1.8173) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][50/625] eta 0:02:42 lr 0.001702 wd 0.0500 time 0.2556 (0.2824) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 5.6941 (5.8124) grad_norm 1.3718 (1.7332) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][60/625] eta 0:02:36 lr 0.001701 wd 0.0500 time 0.2568 (0.2778) data time 0.0007 (0.0075) model time 0.2561 (0.2536) loss 6.2251 (5.8328) grad_norm 1.1407 (1.7953) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][70/625] eta 0:02:32 lr 0.001701 wd 0.0500 time 0.2547 (0.2744) data time 0.0009 (0.0066) model time 0.2538 (0.2532) loss 5.7131 (5.8768) grad_norm 0.9763 (1.7318) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][80/625] eta 0:02:28 lr 0.001701 wd 0.0500 time 0.2583 (0.2720) data time 0.0006 (0.0059) model time 0.2577 (0.2534) loss 5.8155 (5.8765) grad_norm 2.5767 (1.7211) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][90/625] eta 0:02:24 lr 0.001701 wd 0.0500 time 0.2592 (0.2703) data time 0.0009 (0.0054) model time 0.2583 (0.2539) loss 6.5383 (5.8732) grad_norm 2.2975 (1.6999) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][100/625] eta 0:02:21 lr 0.001701 wd 0.0500 time 0.2535 (0.2688) data time 0.0007 (0.0049) model time 0.2528 (0.2540) loss 4.6889 (5.8351) grad_norm 2.0982 (1.6757) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][110/625] eta 0:02:17 lr 0.001701 wd 0.0500 time 0.2549 (0.2674) data time 0.0007 (0.0046) model time 0.2542 (0.2538) loss 4.7556 (5.8376) grad_norm 1.3142 (1.6689) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][120/625] eta 0:02:14 lr 0.001701 wd 0.0500 time 0.2567 (0.2664) data time 0.0009 (0.0043) model time 0.2558 (0.2538) loss 6.0999 (5.8799) grad_norm 1.2311 (1.6503) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][130/625] eta 0:02:11 lr 0.001700 wd 0.0500 time 0.2573 (0.2655) data time 0.0006 (0.0040) model time 0.2567 (0.2538) loss 7.1744 (5.8980) grad_norm 1.2485 (1.6653) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][140/625] eta 0:02:08 lr 0.001700 wd 0.0500 time 0.2589 (0.2649) data time 0.0009 (0.0038) model time 0.2580 (0.2540) loss 4.3552 (5.8637) grad_norm 1.1018 (1.6533) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][150/625] eta 0:02:05 lr 0.001700 wd 0.0500 time 0.2619 (0.2644) data time 0.0008 (0.0036) model time 0.2610 (0.2542) loss 6.9041 (5.8620) grad_norm 1.0341 (1.6334) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][160/625] eta 0:02:02 lr 0.001700 wd 0.0500 time 0.2556 (0.2639) data time 0.0009 (0.0035) model time 0.2547 (0.2543) loss 6.2542 (5.8891) grad_norm 1.3779 (1.6279) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][170/625] eta 0:01:59 lr 0.001700 wd 0.0500 time 0.2488 (0.2633) data time 0.0008 (0.0033) model time 0.2480 (0.2542) loss 4.9619 (5.9153) grad_norm 2.1315 (1.6401) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][180/625] eta 0:01:56 lr 0.001700 wd 0.0500 time 0.2527 (0.2629) data time 0.0009 (0.0032) model time 0.2519 (0.2543) loss 5.4551 (5.9157) grad_norm 1.4707 (1.6619) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][190/625] eta 0:01:54 lr 0.001700 wd 0.0500 time 0.2560 (0.2626) data time 0.0006 (0.0031) model time 0.2554 (0.2544) loss 5.1811 (5.9259) grad_norm 1.3154 (1.6626) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][200/625] eta 0:01:51 lr 0.001700 wd 0.0500 time 0.2596 (0.2623) data time 0.0009 (0.0030) model time 0.2588 (0.2544) loss 5.5957 (5.9224) grad_norm 1.2832 (1.6560) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][210/625] eta 0:01:48 lr 0.001699 wd 0.0500 time 0.2537 (0.2619) data time 0.0006 (0.0029) model time 0.2531 (0.2544) loss 5.1953 (5.9314) grad_norm 1.0850 (1.6383) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][220/625] eta 0:01:45 lr 0.001699 wd 0.0500 time 0.2536 (0.2616) data time 0.0008 (0.0028) model time 0.2528 (0.2544) loss 5.4603 (5.9274) grad_norm 1.4449 (1.6272) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][230/625] eta 0:01:43 lr 0.001699 wd 0.0500 time 0.2608 (0.2614) data time 0.0005 (0.0027) model time 0.2602 (0.2544) loss 5.9959 (5.9200) grad_norm 2.0707 (1.6395) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][240/625] eta 0:01:40 lr 0.001699 wd 0.0500 time 0.2567 (0.2611) data time 0.0007 (0.0027) model time 0.2561 (0.2544) loss 6.2463 (5.9282) grad_norm 1.5457 (1.6445) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][250/625] eta 0:01:37 lr 0.001699 wd 0.0500 time 0.2544 (0.2609) data time 0.0009 (0.0026) model time 0.2535 (0.2544) loss 5.8562 (5.9381) grad_norm 2.9476 (1.6608) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][260/625] eta 0:01:35 lr 0.001699 wd 0.0500 time 0.2568 (0.2607) data time 0.0007 (0.0025) model time 0.2561 (0.2544) loss 4.2239 (5.9360) grad_norm 1.9067 (1.6535) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][270/625] eta 0:01:32 lr 0.001699 wd 0.0500 time 0.2511 (0.2605) data time 0.0008 (0.0025) model time 0.2503 (0.2544) loss 6.3119 (5.9431) grad_norm 1.4619 (1.6461) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][280/625] eta 0:01:29 lr 0.001699 wd 0.0500 time 0.2733 (0.2603) data time 0.0009 (0.0024) model time 0.2725 (0.2545) loss 6.9702 (5.9629) grad_norm 1.2603 (1.6381) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][290/625] eta 0:01:27 lr 0.001698 wd 0.0500 time 0.2550 (0.2610) data time 0.0009 (0.0024) model time 0.2541 (0.2554) loss 5.3471 (5.9628) grad_norm 1.1471 (1.6375) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][300/625] eta 0:01:24 lr 0.001698 wd 0.0500 time 0.2595 (0.2608) data time 0.0008 (0.0023) model time 0.2587 (0.2554) loss 4.5858 (5.9567) grad_norm 1.2750 (1.6331) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][310/625] eta 0:01:22 lr 0.001698 wd 0.0500 time 0.2543 (0.2606) data time 0.0008 (0.0023) model time 0.2535 (0.2553) loss 5.5944 (5.9491) grad_norm 1.5346 (1.6355) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][320/625] eta 0:01:19 lr 0.001698 wd 0.0500 time 0.2576 (0.2612) data time 0.0006 (0.0022) model time 0.2570 (0.2562) loss 6.9193 (5.9429) grad_norm 1.6627 (1.6261) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][330/625] eta 0:01:17 lr 0.001698 wd 0.0500 time 0.2507 (0.2611) data time 0.0010 (0.0022) model time 0.2498 (0.2562) loss 5.0859 (5.9527) grad_norm 1.5414 (1.6191) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][340/625] eta 0:01:14 lr 0.001698 wd 0.0500 time 0.2547 (0.2609) data time 0.0007 (0.0021) model time 0.2540 (0.2562) loss 6.4460 (5.9469) grad_norm 1.3506 (1.6187) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][350/625] eta 0:01:11 lr 0.001698 wd 0.0500 time 0.2531 (0.2608) data time 0.0008 (0.0021) model time 0.2523 (0.2561) loss 6.1455 (5.9509) grad_norm 1.9869 (1.6269) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][360/625] eta 0:01:09 lr 0.001698 wd 0.0500 time 0.2548 (0.2607) data time 0.0009 (0.0021) model time 0.2540 (0.2561) loss 6.0180 (5.9638) grad_norm 1.4087 (1.6407) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][370/625] eta 0:01:06 lr 0.001697 wd 0.0500 time 0.2562 (0.2605) data time 0.0014 (0.0021) model time 0.2548 (0.2561) loss 6.5592 (5.9545) grad_norm 1.5102 (1.6536) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][380/625] eta 0:01:03 lr 0.001697 wd 0.0500 time 0.2437 (0.2604) data time 0.0007 (0.0020) model time 0.2430 (0.2560) loss 4.6920 (5.9522) grad_norm 1.3891 (1.6649) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][390/625] eta 0:01:01 lr 0.001697 wd 0.0500 time 0.2514 (0.2602) data time 0.0008 (0.0020) model time 0.2506 (0.2559) loss 6.5232 (5.9552) grad_norm 1.9970 (1.6711) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][400/625] eta 0:00:58 lr 0.001697 wd 0.0500 time 0.2563 (0.2601) data time 0.0006 (0.0020) model time 0.2557 (0.2559) loss 4.6063 (5.9609) grad_norm 1.3920 (1.6644) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][410/625] eta 0:00:55 lr 0.001697 wd 0.0500 time 0.2547 (0.2601) data time 0.0009 (0.0019) model time 0.2538 (0.2560) loss 6.4330 (5.9667) grad_norm 1.2718 (1.6585) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][420/625] eta 0:00:53 lr 0.001697 wd 0.0500 time 0.2519 (0.2600) data time 0.0006 (0.0019) model time 0.2513 (0.2559) loss 6.2378 (5.9676) grad_norm 1.0256 (1.6545) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][430/625] eta 0:00:50 lr 0.001697 wd 0.0500 time 0.2514 (0.2598) data time 0.0006 (0.0019) model time 0.2508 (0.2558) loss 5.9574 (5.9708) grad_norm 1.7303 (1.6543) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][440/625] eta 0:00:48 lr 0.001697 wd 0.0500 time 0.2548 (0.2597) data time 0.0007 (0.0019) model time 0.2541 (0.2557) loss 6.4509 (5.9691) grad_norm 0.9626 (1.6494) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:06:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][450/625] eta 0:00:45 lr 0.001696 wd 0.0500 time 0.2569 (0.2596) data time 0.0008 (0.0019) model time 0.2561 (0.2557) loss 6.7087 (5.9699) grad_norm 1.0520 (1.6461) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][460/625] eta 0:00:42 lr 0.001696 wd 0.0500 time 0.2556 (0.2594) data time 0.0008 (0.0018) model time 0.2548 (0.2556) loss 6.1532 (5.9692) grad_norm 1.5211 (1.6449) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][470/625] eta 0:00:40 lr 0.001696 wd 0.0500 time 0.2540 (0.2593) data time 0.0007 (0.0018) model time 0.2533 (0.2556) loss 5.7922 (5.9734) grad_norm 1.2350 (1.6458) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][480/625] eta 0:00:37 lr 0.001696 wd 0.0500 time 0.2572 (0.2592) data time 0.0006 (0.0018) model time 0.2566 (0.2555) loss 5.2921 (5.9735) grad_norm 1.3377 (1.6470) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][490/625] eta 0:00:34 lr 0.001696 wd 0.0500 time 0.2576 (0.2591) data time 0.0005 (0.0018) model time 0.2570 (0.2555) loss 5.9269 (5.9687) grad_norm 1.9826 (1.6499) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][500/625] eta 0:00:32 lr 0.001696 wd 0.0500 time 0.2555 (0.2590) data time 0.0010 (0.0018) model time 0.2546 (0.2554) loss 4.1115 (5.9636) grad_norm 1.4538 (1.6473) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][510/625] eta 0:00:29 lr 0.001696 wd 0.0500 time 0.2549 (0.2589) data time 0.0008 (0.0017) model time 0.2541 (0.2554) loss 6.2823 (5.9583) grad_norm 2.3252 (1.6488) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][520/625] eta 0:00:27 lr 0.001695 wd 0.0500 time 0.2536 (0.2589) data time 0.0007 (0.0017) model time 0.2529 (0.2553) loss 6.0792 (5.9644) grad_norm 1.1303 (1.6443) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][530/625] eta 0:00:24 lr 0.001695 wd 0.0500 time 0.2574 (0.2588) data time 0.0008 (0.0017) model time 0.2565 (0.2553) loss 6.8874 (5.9682) grad_norm 1.2502 (1.6407) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][540/625] eta 0:00:21 lr 0.001695 wd 0.0500 time 0.2565 (0.2587) data time 0.0007 (0.0017) model time 0.2558 (0.2553) loss 4.8397 (5.9604) grad_norm 1.6088 (1.6406) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][550/625] eta 0:00:19 lr 0.001695 wd 0.0500 time 0.2574 (0.2587) data time 0.0006 (0.0017) model time 0.2568 (0.2553) loss 4.8253 (5.9638) grad_norm 1.4243 (1.6391) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][560/625] eta 0:00:16 lr 0.001695 wd 0.0500 time 0.2540 (0.2586) data time 0.0010 (0.0017) model time 0.2530 (0.2553) loss 4.9560 (5.9554) grad_norm 2.4251 (1.6410) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][570/625] eta 0:00:14 lr 0.001695 wd 0.0500 time 0.2497 (0.2585) data time 0.0009 (0.0017) model time 0.2488 (0.2552) loss 4.6638 (5.9541) grad_norm 1.4506 (1.6413) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][580/625] eta 0:00:11 lr 0.001695 wd 0.0500 time 0.2557 (0.2585) data time 0.0006 (0.0016) model time 0.2550 (0.2552) loss 6.9871 (5.9624) grad_norm 1.3409 (1.6417) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][590/625] eta 0:00:09 lr 0.001695 wd 0.0500 time 0.2503 (0.2584) data time 0.0008 (0.0016) model time 0.2495 (0.2552) loss 6.7827 (5.9628) grad_norm 1.3242 (1.6428) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][600/625] eta 0:00:06 lr 0.001694 wd 0.0500 time 0.2561 (0.2584) data time 0.0009 (0.0016) model time 0.2552 (0.2552) loss 5.7501 (5.9601) grad_norm 1.0490 (1.6394) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 09:07:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 09:07:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 09:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 09:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 09:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 09:51:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 09:52:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 91) [2024-07-30 09:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 09:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][610/625] eta 0:00:18 lr 0.001694 wd 0.0500 time 0.2484 (1.2121) data time 0.0008 (0.1109) model time 0.2476 (1.1012) loss 6.8306 (6.6409) grad_norm 1.4095 (1.4646) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][620/625] eta 0:00:03 lr 0.001694 wd 0.0500 time 0.2707 (0.6468) data time 0.0005 (0.0460) model time 0.2702 (0.6008) loss 6.5689 (6.4413) grad_norm 0.9281 (1.4372) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 09:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 91 training takes 0:00:11 [2024-07-30 09:52:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 09:52:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 09:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.434 (0.434) Loss 0.6597 (0.6597) Acc@1 87.207 (87.207) Acc@5 97.559 (97.559) Mem 9656MB [2024-07-30 09:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.091) Loss 1.1377 (0.8525) Acc@1 74.756 (81.969) Acc@5 93.359 (96.222) Mem 9656MB [2024-07-30 09:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.074) Loss 1.2529 (1.0113) Acc@1 71.436 (78.118) Acc@5 91.992 (94.368) Mem 9656MB [2024-07-30 09:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.003 Acc@5 94.412 [2024-07-30 09:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.0% [2024-07-30 09:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.758 (0.758) Loss 0.5605 (0.5605) Acc@1 87.842 (87.842) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-30 09:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.130) Loss 0.9502 (0.7163) Acc@1 77.637 (83.785) Acc@5 94.385 (96.888) Mem 9656MB [2024-07-30 09:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.094) Loss 1.1045 (0.8644) Acc@1 72.607 (79.955) Acc@5 92.480 (95.140) Mem 9656MB [2024-07-30 09:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.635 Acc@5 95.104 [2024-07-30 09:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-30 09:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.63% [2024-07-30 09:52:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 09:52:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 09:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][0/625] eta 0:07:59 lr 0.001694 wd 0.0500 time 0.7674 (0.7674) data time 0.3859 (0.3859) model time 0.0000 (0.0000) loss 6.7922 (6.7922) grad_norm 2.2745 (2.2745) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 09:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][10/625] eta 0:03:04 lr 0.001694 wd 0.0500 time 0.2491 (0.2995) data time 0.0008 (0.0359) model time 0.0000 (0.0000) loss 6.6511 (6.3423) grad_norm 1.2368 (1.6345) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][20/625] eta 0:02:47 lr 0.001694 wd 0.0500 time 0.2494 (0.2766) data time 0.0012 (0.0193) model time 0.0000 (0.0000) loss 6.7025 (6.1662) grad_norm 2.3238 (1.7753) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:52:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][30/625] eta 0:02:40 lr 0.001694 wd 0.0500 time 0.2505 (0.2700) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 6.2077 (6.0739) grad_norm 1.8564 (1.8131) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][40/625] eta 0:02:35 lr 0.001694 wd 0.0500 time 0.2529 (0.2658) data time 0.0007 (0.0104) model time 0.0000 (0.0000) loss 6.8377 (6.0568) grad_norm 1.5582 (1.9022) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][50/625] eta 0:02:31 lr 0.001694 wd 0.0500 time 0.2503 (0.2634) data time 0.0010 (0.0086) model time 0.0000 (0.0000) loss 6.0709 (5.9912) grad_norm 1.7105 (1.8410) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][60/625] eta 0:02:27 lr 0.001693 wd 0.0500 time 0.2533 (0.2618) data time 0.0010 (0.0074) model time 0.2523 (0.2530) loss 6.3253 (6.0256) grad_norm 1.2222 (1.7773) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][70/625] eta 0:02:24 lr 0.001693 wd 0.0500 time 0.2542 (0.2612) data time 0.0007 (0.0064) model time 0.2534 (0.2546) loss 4.5361 (6.0187) grad_norm 1.3805 (1.8512) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][80/625] eta 0:02:21 lr 0.001693 wd 0.0500 time 0.2543 (0.2603) data time 0.0009 (0.0058) model time 0.2534 (0.2541) loss 7.0384 (6.0969) grad_norm 2.7230 (1.8641) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][90/625] eta 0:02:18 lr 0.001693 wd 0.0500 time 0.2484 (0.2596) data time 0.0011 (0.0052) model time 0.2473 (0.2537) loss 6.3539 (6.0751) grad_norm 1.4353 (1.8233) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][100/625] eta 0:02:16 lr 0.001693 wd 0.0500 time 0.2517 (0.2591) data time 0.0008 (0.0048) model time 0.2508 (0.2537) loss 6.3880 (6.0931) grad_norm 1.7961 (1.8234) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][110/625] eta 0:02:13 lr 0.001693 wd 0.0500 time 0.2549 (0.2587) data time 0.0008 (0.0045) model time 0.2541 (0.2538) loss 5.7708 (6.0649) grad_norm 1.8973 (1.8579) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][120/625] eta 0:02:10 lr 0.001693 wd 0.0500 time 0.2525 (0.2584) data time 0.0011 (0.0042) model time 0.2514 (0.2538) loss 5.6323 (6.0421) grad_norm 1.9331 (1.8734) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][130/625] eta 0:02:07 lr 0.001692 wd 0.0500 time 0.2552 (0.2581) data time 0.0009 (0.0040) model time 0.2543 (0.2537) loss 7.0592 (6.0312) grad_norm 1.4183 (1.8645) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][140/625] eta 0:02:05 lr 0.001692 wd 0.0500 time 0.2531 (0.2578) data time 0.0011 (0.0037) model time 0.2519 (0.2537) loss 6.1440 (6.0409) grad_norm 1.0902 (1.8296) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][150/625] eta 0:02:02 lr 0.001692 wd 0.0500 time 0.2531 (0.2577) data time 0.0009 (0.0036) model time 0.2522 (0.2538) loss 5.3694 (6.0251) grad_norm 1.3676 (1.7991) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][160/625] eta 0:01:59 lr 0.001692 wd 0.0500 time 0.2554 (0.2575) data time 0.0009 (0.0034) model time 0.2545 (0.2537) loss 6.9486 (6.0246) grad_norm 1.3701 (1.7908) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][170/625] eta 0:01:57 lr 0.001692 wd 0.0500 time 0.2523 (0.2573) data time 0.0009 (0.0033) model time 0.2514 (0.2537) loss 6.6529 (6.0184) grad_norm 1.0130 (1.7897) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][180/625] eta 0:01:54 lr 0.001692 wd 0.0500 time 0.2516 (0.2571) data time 0.0009 (0.0031) model time 0.2507 (0.2536) loss 6.7854 (5.9978) grad_norm 3.1271 (1.8456) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][190/625] eta 0:01:51 lr 0.001692 wd 0.0500 time 0.2533 (0.2570) data time 0.0010 (0.0031) model time 0.2524 (0.2536) loss 6.4458 (5.9853) grad_norm 1.0018 (1.8231) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][200/625] eta 0:01:49 lr 0.001692 wd 0.0500 time 0.2599 (0.2570) data time 0.0007 (0.0030) model time 0.2592 (0.2537) loss 6.9958 (5.9843) grad_norm 1.1805 (1.8116) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][210/625] eta 0:01:46 lr 0.001691 wd 0.0500 time 0.2553 (0.2569) data time 0.0008 (0.0029) model time 0.2545 (0.2538) loss 6.0895 (5.9838) grad_norm 1.4720 (1.7952) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][220/625] eta 0:01:44 lr 0.001691 wd 0.0500 time 0.2566 (0.2568) data time 0.0007 (0.0028) model time 0.2559 (0.2538) loss 6.6220 (5.9850) grad_norm 1.3641 (1.7829) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][230/625] eta 0:01:41 lr 0.001691 wd 0.0500 time 0.2513 (0.2568) data time 0.0013 (0.0027) model time 0.2500 (0.2539) loss 6.6999 (5.9745) grad_norm 1.3153 (1.7848) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][240/625] eta 0:01:38 lr 0.001691 wd 0.0500 time 0.2537 (0.2568) data time 0.0009 (0.0026) model time 0.2529 (0.2540) loss 6.8878 (5.9626) grad_norm 1.3281 (1.7778) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][250/625] eta 0:01:36 lr 0.001691 wd 0.0500 time 0.2566 (0.2568) data time 0.0009 (0.0026) model time 0.2557 (0.2541) loss 5.7430 (5.9491) grad_norm 2.4006 (1.7646) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][260/625] eta 0:01:33 lr 0.001691 wd 0.0500 time 0.2549 (0.2568) data time 0.0015 (0.0025) model time 0.2534 (0.2542) loss 4.7395 (5.9571) grad_norm 1.3297 (1.7548) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][270/625] eta 0:01:31 lr 0.001691 wd 0.0500 time 0.2547 (0.2568) data time 0.0008 (0.0025) model time 0.2539 (0.2542) loss 6.1042 (5.9507) grad_norm 3.1707 (1.7617) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][280/625] eta 0:01:28 lr 0.001691 wd 0.0500 time 0.2552 (0.2567) data time 0.0009 (0.0024) model time 0.2543 (0.2542) loss 5.6232 (5.9346) grad_norm 1.1820 (1.7621) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][290/625] eta 0:01:25 lr 0.001690 wd 0.0500 time 0.2546 (0.2567) data time 0.0009 (0.0024) model time 0.2537 (0.2543) loss 6.8511 (5.9462) grad_norm 1.6244 (1.7589) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][300/625] eta 0:01:23 lr 0.001690 wd 0.0500 time 0.2517 (0.2566) data time 0.0010 (0.0023) model time 0.2507 (0.2542) loss 5.8395 (5.9739) grad_norm 1.4651 (1.7525) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][310/625] eta 0:01:20 lr 0.001690 wd 0.0500 time 0.2546 (0.2567) data time 0.0011 (0.0023) model time 0.2535 (0.2543) loss 6.6634 (5.9806) grad_norm 1.2714 (1.7396) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][320/625] eta 0:01:18 lr 0.001690 wd 0.0500 time 0.2571 (0.2567) data time 0.0008 (0.0023) model time 0.2562 (0.2543) loss 6.4294 (5.9878) grad_norm 1.8592 (1.7351) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][330/625] eta 0:01:15 lr 0.001690 wd 0.0500 time 0.2536 (0.2566) data time 0.0009 (0.0022) model time 0.2527 (0.2543) loss 7.5315 (6.0016) grad_norm 1.0543 (1.7253) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][340/625] eta 0:01:13 lr 0.001690 wd 0.0500 time 0.2553 (0.2566) data time 0.0007 (0.0022) model time 0.2546 (0.2544) loss 7.1586 (6.0007) grad_norm 1.9020 (1.7354) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][350/625] eta 0:01:10 lr 0.001690 wd 0.0500 time 0.2593 (0.2566) data time 0.0010 (0.0022) model time 0.2583 (0.2544) loss 6.1148 (5.9942) grad_norm 1.6727 (1.7310) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][360/625] eta 0:01:08 lr 0.001690 wd 0.0500 time 0.2528 (0.2566) data time 0.0009 (0.0021) model time 0.2519 (0.2544) loss 6.9364 (5.9970) grad_norm 1.5467 (1.7422) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][370/625] eta 0:01:05 lr 0.001689 wd 0.0500 time 0.2525 (0.2566) data time 0.0009 (0.0021) model time 0.2517 (0.2544) loss 5.7913 (5.9877) grad_norm 1.6259 (1.7480) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][380/625] eta 0:01:02 lr 0.001689 wd 0.0500 time 0.2566 (0.2566) data time 0.0006 (0.0021) model time 0.2560 (0.2545) loss 7.1176 (5.9999) grad_norm 1.5739 (1.7454) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][390/625] eta 0:01:00 lr 0.001689 wd 0.0500 time 0.2558 (0.2566) data time 0.0015 (0.0020) model time 0.2543 (0.2545) loss 6.6989 (6.0056) grad_norm 2.4347 (1.7517) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][400/625] eta 0:00:57 lr 0.001689 wd 0.0500 time 0.2560 (0.2572) data time 0.0008 (0.0020) model time 0.2552 (0.2553) loss 6.3851 (6.0059) grad_norm 2.0764 (1.7510) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][410/625] eta 0:00:55 lr 0.001689 wd 0.0500 time 0.2591 (0.2572) data time 0.0007 (0.0020) model time 0.2584 (0.2553) loss 5.4020 (6.0107) grad_norm 1.2534 (1.7469) loss_scale 8192.0000 (4165.7616) mem 9655MB [2024-07-30 09:54:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][420/625] eta 0:00:52 lr 0.001689 wd 0.0500 time 0.2556 (0.2572) data time 0.0007 (0.0020) model time 0.2549 (0.2553) loss 6.1035 (6.0214) grad_norm 1.0660 (1.7392) loss_scale 8192.0000 (4261.3967) mem 9655MB [2024-07-30 09:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][430/625] eta 0:00:50 lr 0.001689 wd 0.0500 time 0.2564 (0.2572) data time 0.0009 (0.0019) model time 0.2555 (0.2553) loss 4.9198 (6.0135) grad_norm 1.5176 (1.7383) loss_scale 8192.0000 (4352.5940) mem 9655MB [2024-07-30 09:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][440/625] eta 0:00:47 lr 0.001688 wd 0.0500 time 0.2538 (0.2572) data time 0.0010 (0.0019) model time 0.2528 (0.2554) loss 4.4834 (6.0078) grad_norm 2.2412 (1.7451) loss_scale 8192.0000 (4439.6553) mem 9655MB [2024-07-30 09:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][450/625] eta 0:00:45 lr 0.001688 wd 0.0500 time 0.2558 (0.2572) data time 0.0011 (0.0019) model time 0.2547 (0.2554) loss 6.5327 (5.9960) grad_norm 4.0680 (1.7705) loss_scale 8192.0000 (4522.8559) mem 9655MB [2024-07-30 09:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][460/625] eta 0:00:42 lr 0.001688 wd 0.0500 time 0.2562 (0.2572) data time 0.0008 (0.0019) model time 0.2554 (0.2554) loss 7.0174 (5.9967) grad_norm 1.1280 (1.7636) loss_scale 8192.0000 (4602.4469) mem 9655MB [2024-07-30 09:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][470/625] eta 0:00:39 lr 0.001688 wd 0.0500 time 0.2532 (0.2572) data time 0.0011 (0.0018) model time 0.2521 (0.2554) loss 5.9268 (6.0044) grad_norm 1.5984 (1.7556) loss_scale 8192.0000 (4678.6582) mem 9655MB [2024-07-30 09:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][480/625] eta 0:00:37 lr 0.001688 wd 0.0500 time 0.2586 (0.2576) data time 0.0012 (0.0018) model time 0.2574 (0.2559) loss 6.0213 (6.0004) grad_norm 1.5240 (1.7489) loss_scale 8192.0000 (4751.7006) mem 9655MB [2024-07-30 09:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][490/625] eta 0:00:34 lr 0.001688 wd 0.0500 time 0.2542 (0.2576) data time 0.0007 (0.0018) model time 0.2534 (0.2559) loss 6.7941 (6.0094) grad_norm 1.7386 (1.7482) loss_scale 8192.0000 (4821.7678) mem 9655MB [2024-07-30 09:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][500/625] eta 0:00:32 lr 0.001688 wd 0.0500 time 0.2573 (0.2576) data time 0.0006 (0.0018) model time 0.2567 (0.2559) loss 5.1132 (6.0073) grad_norm 2.6072 (1.7507) loss_scale 8192.0000 (4889.0379) mem 9655MB [2024-07-30 09:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][510/625] eta 0:00:29 lr 0.001688 wd 0.0500 time 0.2610 (0.2575) data time 0.0008 (0.0018) model time 0.2602 (0.2559) loss 6.2615 (6.0025) grad_norm 1.1066 (1.7469) loss_scale 8192.0000 (4953.6751) mem 9655MB [2024-07-30 09:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][520/625] eta 0:00:27 lr 0.001687 wd 0.0500 time 0.2536 (0.2575) data time 0.0008 (0.0018) model time 0.2528 (0.2558) loss 4.9264 (5.9990) grad_norm 2.2963 (1.7471) loss_scale 8192.0000 (5015.8311) mem 9655MB [2024-07-30 09:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][530/625] eta 0:00:24 lr 0.001687 wd 0.0500 time 0.2555 (0.2575) data time 0.0008 (0.0017) model time 0.2546 (0.2558) loss 6.9134 (5.9999) grad_norm 1.5600 (1.7450) loss_scale 8192.0000 (5075.6460) mem 9655MB [2024-07-30 09:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][540/625] eta 0:00:21 lr 0.001687 wd 0.0500 time 0.2506 (0.2575) data time 0.0008 (0.0017) model time 0.2498 (0.2558) loss 7.1204 (6.0087) grad_norm 2.1066 (1.7411) loss_scale 8192.0000 (5133.2495) mem 9655MB [2024-07-30 09:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][550/625] eta 0:00:19 lr 0.001687 wd 0.0500 time 0.2532 (0.2574) data time 0.0010 (0.0017) model time 0.2521 (0.2558) loss 5.1861 (6.0136) grad_norm 1.9281 (1.7388) loss_scale 8192.0000 (5188.7623) mem 9655MB [2024-07-30 09:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][560/625] eta 0:00:16 lr 0.001687 wd 0.0500 time 0.2629 (0.2574) data time 0.0009 (0.0017) model time 0.2620 (0.2558) loss 5.8978 (6.0124) grad_norm 2.8534 (1.7352) loss_scale 8192.0000 (5242.2959) mem 9655MB [2024-07-30 09:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][570/625] eta 0:00:14 lr 0.001687 wd 0.0500 time 0.2545 (0.2574) data time 0.0009 (0.0017) model time 0.2536 (0.2558) loss 6.9640 (6.0206) grad_norm 2.6083 (1.7377) loss_scale 8192.0000 (5293.9545) mem 9655MB [2024-07-30 09:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][580/625] eta 0:00:11 lr 0.001687 wd 0.0500 time 0.2599 (0.2573) data time 0.0008 (0.0017) model time 0.2591 (0.2557) loss 5.1844 (6.0160) grad_norm 1.5664 (1.7385) loss_scale 8192.0000 (5343.8348) mem 9655MB [2024-07-30 09:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][590/625] eta 0:00:09 lr 0.001687 wd 0.0500 time 0.2554 (0.2573) data time 0.0008 (0.0017) model time 0.2545 (0.2557) loss 4.7957 (6.0147) grad_norm 1.1701 (1.7363) loss_scale 8192.0000 (5392.0271) mem 9655MB [2024-07-30 09:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][600/625] eta 0:00:06 lr 0.001686 wd 0.0500 time 0.2551 (0.2573) data time 0.0010 (0.0017) model time 0.2541 (0.2557) loss 6.1932 (6.0222) grad_norm 1.8674 (inf) loss_scale 4096.0000 (5390.9085) mem 9655MB [2024-07-30 09:55:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][610/625] eta 0:00:03 lr 0.001686 wd 0.0500 time 0.2521 (0.2573) data time 0.0006 (0.0017) model time 0.2515 (0.2557) loss 6.8733 (6.0238) grad_norm 1.2849 (inf) loss_scale 4096.0000 (5369.7152) mem 9655MB [2024-07-30 09:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][620/625] eta 0:00:01 lr 0.001686 wd 0.0500 time 0.2585 (0.2572) data time 0.0006 (0.0016) model time 0.2579 (0.2556) loss 4.3517 (6.0179) grad_norm 1.7478 (inf) loss_scale 4096.0000 (5349.2045) mem 9655MB [2024-07-30 09:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 92 training takes 0:02:40 [2024-07-30 09:55:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 09:55:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 09:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.6411 (0.6411) Acc@1 86.816 (86.816) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 1.0527 (0.8113) Acc@1 76.025 (82.422) Acc@5 93.457 (96.440) Mem 9655MB [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.2139 (0.9659) Acc@1 71.729 (78.534) Acc@5 92.139 (94.557) Mem 9655MB [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.283 Acc@5 94.556 [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.3% [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.28% [2024-07-30 09:55:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 09:55:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 09:55:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.421 (0.421) Loss 0.5615 (0.5615) Acc@1 87.744 (87.744) Acc@5 98.242 (98.242) Mem 9655MB [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 0.9492 (0.7165) Acc@1 77.734 (83.798) Acc@5 94.238 (96.902) Mem 9655MB [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.1045 (0.8642) Acc@1 72.705 (79.967) Acc@5 92.627 (95.159) Mem 9655MB [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.663 Acc@5 95.118 [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.66% [2024-07-30 09:55:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 09:55:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 09:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][0/625] eta 0:06:55 lr 0.001686 wd 0.0500 time 0.6648 (0.6648) data time 0.4197 (0.4197) model time 0.0000 (0.0000) loss 5.8399 (5.8399) grad_norm 1.6018 (1.6018) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][10/625] eta 0:02:59 lr 0.001686 wd 0.0500 time 0.2592 (0.2922) data time 0.0008 (0.0390) model time 0.0000 (0.0000) loss 4.7548 (5.9152) grad_norm 2.3964 (2.1274) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][20/625] eta 0:02:46 lr 0.001686 wd 0.0500 time 0.2551 (0.2745) data time 0.0011 (0.0209) model time 0.0000 (0.0000) loss 6.1385 (5.8132) grad_norm 1.8018 (2.0102) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][30/625] eta 0:02:39 lr 0.001686 wd 0.0500 time 0.2599 (0.2685) data time 0.0008 (0.0144) model time 0.0000 (0.0000) loss 5.1288 (5.9395) grad_norm 1.7395 (1.9478) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][40/625] eta 0:02:35 lr 0.001686 wd 0.0500 time 0.2577 (0.2655) data time 0.0017 (0.0112) model time 0.0000 (0.0000) loss 6.5736 (5.9569) grad_norm 1.3553 (1.8268) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][50/625] eta 0:02:31 lr 0.001685 wd 0.0500 time 0.2557 (0.2635) data time 0.0011 (0.0092) model time 0.0000 (0.0000) loss 3.8605 (5.8245) grad_norm 1.3792 (1.7974) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][60/625] eta 0:02:28 lr 0.001685 wd 0.0500 time 0.2552 (0.2622) data time 0.0007 (0.0078) model time 0.2545 (0.2545) loss 4.4228 (5.8809) grad_norm 1.7098 (1.7892) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][70/625] eta 0:02:25 lr 0.001685 wd 0.0500 time 0.2600 (0.2613) data time 0.0008 (0.0069) model time 0.2592 (0.2546) loss 5.0355 (5.8617) grad_norm 2.4928 (1.8469) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][80/625] eta 0:02:22 lr 0.001685 wd 0.0500 time 0.2579 (0.2606) data time 0.0009 (0.0061) model time 0.2570 (0.2548) loss 5.7569 (5.8793) grad_norm 1.4252 (1.7911) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][90/625] eta 0:02:19 lr 0.001685 wd 0.0500 time 0.2517 (0.2600) data time 0.0008 (0.0056) model time 0.2509 (0.2546) loss 6.4437 (5.9520) grad_norm 1.7075 (1.7519) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][100/625] eta 0:02:16 lr 0.001685 wd 0.0500 time 0.2596 (0.2596) data time 0.0006 (0.0051) model time 0.2589 (0.2547) loss 6.5852 (5.9750) grad_norm 1.2362 (1.7109) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:55:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][110/625] eta 0:02:13 lr 0.001685 wd 0.0500 time 0.2548 (0.2592) data time 0.0008 (0.0047) model time 0.2539 (0.2546) loss 5.8562 (5.9463) grad_norm 1.6239 (1.6893) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][120/625] eta 0:02:10 lr 0.001685 wd 0.0500 time 0.2541 (0.2589) data time 0.0009 (0.0044) model time 0.2532 (0.2545) loss 4.7866 (5.9794) grad_norm 1.3362 (1.7344) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][130/625] eta 0:02:08 lr 0.001684 wd 0.0500 time 0.2549 (0.2587) data time 0.0009 (0.0042) model time 0.2539 (0.2546) loss 4.3460 (5.9813) grad_norm 2.3043 (1.7537) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][140/625] eta 0:02:05 lr 0.001684 wd 0.0500 time 0.2615 (0.2585) data time 0.0007 (0.0039) model time 0.2608 (0.2546) loss 6.7254 (5.9904) grad_norm 1.7605 (1.7811) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][150/625] eta 0:02:02 lr 0.001684 wd 0.0500 time 0.2547 (0.2583) data time 0.0006 (0.0038) model time 0.2541 (0.2547) loss 6.7014 (5.9805) grad_norm 1.8588 (1.7845) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][160/625] eta 0:02:00 lr 0.001684 wd 0.0500 time 0.2606 (0.2582) data time 0.0007 (0.0036) model time 0.2599 (0.2547) loss 4.2010 (5.9530) grad_norm 1.7372 (1.7718) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][170/625] eta 0:01:57 lr 0.001684 wd 0.0500 time 0.2529 (0.2580) data time 0.0011 (0.0034) model time 0.2518 (0.2547) loss 6.6028 (5.9514) grad_norm 1.4003 (1.7554) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][180/625] eta 0:01:54 lr 0.001684 wd 0.0500 time 0.2645 (0.2579) data time 0.0006 (0.0033) model time 0.2639 (0.2547) loss 4.7092 (5.9401) grad_norm 1.2325 (1.7444) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][190/625] eta 0:01:52 lr 0.001684 wd 0.0500 time 0.2544 (0.2577) data time 0.0011 (0.0032) model time 0.2532 (0.2547) loss 4.5665 (5.9263) grad_norm 1.2988 (1.7213) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][200/625] eta 0:01:49 lr 0.001683 wd 0.0500 time 0.2531 (0.2577) data time 0.0012 (0.0031) model time 0.2520 (0.2548) loss 6.9020 (5.9361) grad_norm 1.1882 (1.7017) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][210/625] eta 0:01:46 lr 0.001683 wd 0.0500 time 0.2660 (0.2577) data time 0.0010 (0.0030) model time 0.2650 (0.2548) loss 5.4787 (5.9419) grad_norm 3.0157 (1.7117) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][220/625] eta 0:01:44 lr 0.001683 wd 0.0500 time 0.2655 (0.2575) data time 0.0006 (0.0029) model time 0.2648 (0.2548) loss 5.2238 (5.9407) grad_norm 2.1977 (1.7238) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][230/625] eta 0:01:41 lr 0.001683 wd 0.0500 time 0.2533 (0.2575) data time 0.0009 (0.0028) model time 0.2524 (0.2548) loss 4.9612 (5.9480) grad_norm 1.3896 (1.7116) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 09:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 09:56:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 09:56:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 10:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 10:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 10:13:56 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 12:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 12:44:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 12:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 12:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 12:49:09 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 93) [2024-07-30 12:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 12:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][240/625] eta 0:10:35 lr 0.001683 wd 0.0500 time 0.2665 (1.6515) data time 0.0007 (0.1617) model time 0.2658 (1.4897) loss 7.2272 (6.9275) grad_norm 1.2919 (1.3153) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][250/625] eta 0:04:53 lr 0.001683 wd 0.0500 time 0.2583 (0.7825) data time 0.0010 (0.0614) model time 0.2573 (0.7211) loss 6.6674 (6.4773) grad_norm 2.6052 (1.7384) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][260/625] eta 0:03:32 lr 0.001683 wd 0.0500 time 0.2613 (0.5833) data time 0.0008 (0.0382) model time 0.2606 (0.5451) loss 6.0501 (6.4201) grad_norm 1.2507 (1.7426) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][270/625] eta 0:02:56 lr 0.001683 wd 0.0500 time 0.2626 (0.4960) data time 0.0010 (0.0279) model time 0.2616 (0.4681) loss 6.4592 (6.3841) grad_norm 2.9287 (1.7468) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][280/625] eta 0:02:33 lr 0.001682 wd 0.0500 time 0.2638 (0.4455) data time 0.0010 (0.0220) model time 0.2628 (0.4235) loss 5.1800 (6.3219) grad_norm 1.7881 (1.7345) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][290/625] eta 0:02:18 lr 0.001682 wd 0.0500 time 0.2657 (0.4133) data time 0.0007 (0.0183) model time 0.2650 (0.3951) loss 6.9665 (6.2832) grad_norm 2.3751 (1.7957) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][300/625] eta 0:02:06 lr 0.001682 wd 0.0500 time 0.2585 (0.3907) data time 0.0007 (0.0156) model time 0.2578 (0.3751) loss 5.2656 (6.2168) grad_norm 1.1808 (1.8010) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][310/625] eta 0:01:57 lr 0.001682 wd 0.0500 time 0.2648 (0.3742) data time 0.0010 (0.0137) model time 0.2639 (0.3605) loss 6.2998 (6.1516) grad_norm 1.1184 (1.7739) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][320/625] eta 0:01:50 lr 0.001682 wd 0.0500 time 0.2624 (0.3615) data time 0.0009 (0.0123) model time 0.2615 (0.3492) loss 5.4631 (6.0903) grad_norm 1.2854 (1.7163) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][330/625] eta 0:01:43 lr 0.001682 wd 0.0500 time 0.2564 (0.3511) data time 0.0010 (0.0111) model time 0.2554 (0.3400) loss 6.2394 (6.0966) grad_norm 1.7087 (1.6906) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][340/625] eta 0:01:37 lr 0.001682 wd 0.0500 time 0.2656 (0.3428) data time 0.0012 (0.0101) model time 0.2645 (0.3326) loss 6.6259 (6.1179) grad_norm 2.3876 (1.7051) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][350/625] eta 0:01:32 lr 0.001682 wd 0.0500 time 0.2604 (0.3361) data time 0.0007 (0.0094) model time 0.2597 (0.3267) loss 7.3981 (6.1087) grad_norm 0.9429 (1.7369) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][360/625] eta 0:01:27 lr 0.001681 wd 0.0500 time 0.2604 (0.3304) data time 0.0010 (0.0087) model time 0.2594 (0.3217) loss 4.4399 (6.1029) grad_norm 1.3982 (1.7409) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][370/625] eta 0:01:23 lr 0.001681 wd 0.0500 time 0.2640 (0.3257) data time 0.0010 (0.0081) model time 0.2630 (0.3175) loss 5.1917 (6.1032) grad_norm 1.3433 (1.7324) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][380/625] eta 0:01:18 lr 0.001681 wd 0.0500 time 0.2678 (0.3215) data time 0.0007 (0.0076) model time 0.2670 (0.3139) loss 5.1957 (6.1054) grad_norm 1.3550 (1.7356) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][390/625] eta 0:01:14 lr 0.001681 wd 0.0500 time 0.2593 (0.3179) data time 0.0013 (0.0072) model time 0.2580 (0.3106) loss 6.9384 (6.1165) grad_norm 1.0992 (1.7104) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][400/625] eta 0:01:10 lr 0.001681 wd 0.0500 time 0.2599 (0.3146) data time 0.0010 (0.0069) model time 0.2588 (0.3077) loss 6.7115 (6.1181) grad_norm 1.2027 (1.6980) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][410/625] eta 0:01:07 lr 0.001681 wd 0.0500 time 0.2679 (0.3118) data time 0.0007 (0.0065) model time 0.2671 (0.3053) loss 6.1811 (6.1069) grad_norm 1.8451 (1.7090) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][420/625] eta 0:01:03 lr 0.001681 wd 0.0500 time 0.2638 (0.3092) data time 0.0011 (0.0062) model time 0.2627 (0.3030) loss 5.4043 (6.1053) grad_norm 1.7373 (1.7164) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][430/625] eta 0:00:59 lr 0.001680 wd 0.0500 time 0.2613 (0.3069) data time 0.0007 (0.0060) model time 0.2605 (0.3010) loss 4.8951 (6.0887) grad_norm 1.6592 (1.7175) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][440/625] eta 0:00:56 lr 0.001680 wd 0.0500 time 0.2644 (0.3050) data time 0.0007 (0.0057) model time 0.2637 (0.2993) loss 4.9356 (6.0658) grad_norm 2.2180 (1.7279) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][450/625] eta 0:00:53 lr 0.001680 wd 0.0500 time 0.2658 (0.3031) data time 0.0007 (0.0055) model time 0.2651 (0.2976) loss 4.7382 (6.0618) grad_norm 1.7115 (1.7265) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][460/625] eta 0:00:49 lr 0.001680 wd 0.0500 time 0.2641 (0.3015) data time 0.0008 (0.0053) model time 0.2633 (0.2962) loss 5.5494 (6.0617) grad_norm 1.4920 (1.7436) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][470/625] eta 0:00:46 lr 0.001680 wd 0.0500 time 0.2615 (0.3000) data time 0.0011 (0.0051) model time 0.2604 (0.2948) loss 6.5084 (6.0653) grad_norm 2.3803 (1.7455) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][480/625] eta 0:00:43 lr 0.001680 wd 0.0500 time 0.2587 (0.2986) data time 0.0010 (0.0050) model time 0.2577 (0.2936) loss 4.3247 (6.0630) grad_norm 1.3005 (1.7434) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][490/625] eta 0:00:40 lr 0.001680 wd 0.0500 time 0.2656 (0.2972) data time 0.0009 (0.0048) model time 0.2647 (0.2924) loss 4.7274 (6.0428) grad_norm 1.7006 (1.7366) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][500/625] eta 0:00:37 lr 0.001680 wd 0.0500 time 0.2720 (0.2961) data time 0.0008 (0.0047) model time 0.2712 (0.2914) loss 5.4185 (6.0319) grad_norm 1.0285 (1.7212) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][510/625] eta 0:00:33 lr 0.001679 wd 0.0500 time 0.2724 (0.2951) data time 0.0014 (0.0045) model time 0.2710 (0.2906) loss 7.0124 (6.0375) grad_norm 2.7187 (1.7333) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][520/625] eta 0:00:30 lr 0.001679 wd 0.0500 time 0.2617 (0.2941) data time 0.0011 (0.0044) model time 0.2605 (0.2897) loss 5.1221 (6.0354) grad_norm 1.7952 (1.7609) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][530/625] eta 0:00:27 lr 0.001679 wd 0.0500 time 0.2627 (0.2932) data time 0.0008 (0.0043) model time 0.2619 (0.2889) loss 4.8940 (6.0305) grad_norm 2.8481 (1.7823) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:50:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][540/625] eta 0:00:24 lr 0.001679 wd 0.0500 time 0.2636 (0.2923) data time 0.0009 (0.0042) model time 0.2627 (0.2881) loss 4.8963 (6.0172) grad_norm 1.6028 (1.7897) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][550/625] eta 0:00:21 lr 0.001679 wd 0.0500 time 0.2621 (0.2915) data time 0.0010 (0.0041) model time 0.2611 (0.2874) loss 7.4178 (6.0378) grad_norm 1.9863 (1.7882) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][560/625] eta 0:00:18 lr 0.001679 wd 0.0500 time 0.2598 (0.2907) data time 0.0013 (0.0040) model time 0.2585 (0.2866) loss 7.0252 (6.0576) grad_norm 1.6471 (1.7837) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][570/625] eta 0:00:15 lr 0.001679 wd 0.0500 time 0.2659 (0.2899) data time 0.0009 (0.0039) model time 0.2650 (0.2859) loss 5.7689 (6.0510) grad_norm 1.5609 (1.7930) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][580/625] eta 0:00:13 lr 0.001679 wd 0.0500 time 0.2624 (0.2891) data time 0.0010 (0.0038) model time 0.2615 (0.2853) loss 6.7741 (6.0523) grad_norm 2.6608 (1.8170) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][590/625] eta 0:00:10 lr 0.001678 wd 0.0500 time 0.2707 (0.2885) data time 0.0007 (0.0038) model time 0.2700 (0.2847) loss 4.8969 (6.0513) grad_norm 1.1048 (1.8133) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][600/625] eta 0:00:07 lr 0.001678 wd 0.0500 time 0.2597 (0.2878) data time 0.0011 (0.0037) model time 0.2586 (0.2841) loss 6.7161 (6.0558) grad_norm 1.1170 (1.8041) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][610/625] eta 0:00:04 lr 0.001678 wd 0.0500 time 0.2670 (0.2872) data time 0.0005 (0.0036) model time 0.2665 (0.2836) loss 4.5785 (6.0495) grad_norm 1.2523 (1.7943) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][620/625] eta 0:00:01 lr 0.001678 wd 0.0500 time 0.2666 (0.2867) data time 0.0008 (0.0036) model time 0.2659 (0.2831) loss 7.2663 (6.0425) grad_norm 2.3361 (1.7824) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 93 training takes 0:01:51 [2024-07-30 12:51:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 12:51:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 12:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.443 (0.443) Loss 0.6729 (0.6729) Acc@1 86.523 (86.523) Acc@5 97.656 (97.656) Mem 9656MB [2024-07-30 12:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.095) Loss 1.0771 (0.8330) Acc@1 75.977 (82.253) Acc@5 93.115 (96.338) Mem 9656MB [2024-07-30 12:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.2168 (0.9804) Acc@1 71.924 (78.448) Acc@5 92.334 (94.434) Mem 9656MB [2024-07-30 12:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.167 Acc@5 94.442 [2024-07-30 12:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-30 12:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.871 (0.871) Loss 0.5625 (0.5625) Acc@1 87.988 (87.988) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.141) Loss 0.9497 (0.7164) Acc@1 77.832 (83.882) Acc@5 94.336 (96.924) Mem 9656MB [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.100) Loss 1.1035 (0.8638) Acc@1 72.900 (80.050) Acc@5 92.676 (95.185) Mem 9656MB [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.752 Acc@5 95.152 [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.75% [2024-07-30 12:51:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 12:51:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 12:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][0/625] eta 0:13:35 lr 0.001678 wd 0.0500 time 1.3047 (1.3047) data time 0.5337 (0.5337) model time 0.0000 (0.0000) loss 6.5151 (6.5151) grad_norm 1.1205 (1.1205) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 12:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][10/625] eta 0:03:41 lr 0.001678 wd 0.0500 time 0.2627 (0.3605) data time 0.0009 (0.0494) model time 0.0000 (0.0000) loss 6.8560 (6.2548) grad_norm 2.4811 (1.8385) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][20/625] eta 0:03:10 lr 0.001678 wd 0.0500 time 0.2621 (0.3144) data time 0.0010 (0.0263) model time 0.0000 (0.0000) loss 6.2178 (6.2264) grad_norm 1.2910 (1.7365) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][30/625] eta 0:02:57 lr 0.001678 wd 0.0500 time 0.2642 (0.2983) data time 0.0009 (0.0182) model time 0.0000 (0.0000) loss 6.0373 (6.1186) grad_norm 1.3465 (1.6104) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][40/625] eta 0:02:50 lr 0.001677 wd 0.0500 time 0.2738 (0.2912) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 6.3069 (6.1748) grad_norm 1.4748 (1.5912) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][50/625] eta 0:02:45 lr 0.001677 wd 0.0500 time 0.2673 (0.2871) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 6.6625 (6.1703) grad_norm 1.7680 (1.6089) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][60/625] eta 0:02:40 lr 0.001677 wd 0.0500 time 0.2629 (0.2836) data time 0.0009 (0.0097) model time 0.2620 (0.2647) loss 6.0053 (6.1207) grad_norm 2.1678 (1.6413) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][70/625] eta 0:02:35 lr 0.001677 wd 0.0500 time 0.2596 (0.2810) data time 0.0010 (0.0085) model time 0.2586 (0.2643) loss 5.5897 (6.0271) grad_norm 2.0453 (1.6259) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][80/625] eta 0:02:31 lr 0.001677 wd 0.0500 time 0.2607 (0.2789) data time 0.0013 (0.0076) model time 0.2595 (0.2639) loss 6.5188 (5.9422) grad_norm 1.2485 (1.5960) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][90/625] eta 0:02:28 lr 0.001677 wd 0.0500 time 0.2607 (0.2774) data time 0.0008 (0.0069) model time 0.2599 (0.2640) loss 5.8538 (5.9476) grad_norm 1.8215 (1.5916) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][100/625] eta 0:02:25 lr 0.001677 wd 0.0500 time 0.2639 (0.2763) data time 0.0010 (0.0063) model time 0.2629 (0.2643) loss 6.9004 (5.9810) grad_norm 1.4196 (1.5786) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:51:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][110/625] eta 0:02:21 lr 0.001676 wd 0.0500 time 0.2660 (0.2754) data time 0.0013 (0.0058) model time 0.2647 (0.2644) loss 6.2241 (5.9631) grad_norm 1.3820 (1.6418) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][120/625] eta 0:02:18 lr 0.001676 wd 0.0500 time 0.2622 (0.2747) data time 0.0008 (0.0054) model time 0.2614 (0.2647) loss 7.5548 (5.9898) grad_norm 1.9271 (1.6764) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][130/625] eta 0:02:15 lr 0.001676 wd 0.0500 time 0.2683 (0.2743) data time 0.0007 (0.0051) model time 0.2675 (0.2651) loss 5.4925 (5.9830) grad_norm 1.7635 (1.6637) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][140/625] eta 0:02:13 lr 0.001676 wd 0.0500 time 0.2611 (0.2742) data time 0.0009 (0.0048) model time 0.2601 (0.2659) loss 6.6493 (5.9591) grad_norm 1.6787 (1.6562) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][150/625] eta 0:02:09 lr 0.001676 wd 0.0500 time 0.2700 (0.2737) data time 0.0010 (0.0046) model time 0.2691 (0.2658) loss 7.0011 (5.9772) grad_norm 1.1110 (1.6456) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][160/625] eta 0:02:07 lr 0.001676 wd 0.0500 time 0.2613 (0.2731) data time 0.0008 (0.0043) model time 0.2605 (0.2656) loss 7.6942 (5.9764) grad_norm 2.6741 (1.6547) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][170/625] eta 0:02:04 lr 0.001676 wd 0.0500 time 0.5351 (0.2742) data time 0.0009 (0.0042) model time 0.5342 (0.2677) loss 6.1092 (6.0008) grad_norm 1.2364 (1.6525) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][180/625] eta 0:02:01 lr 0.001676 wd 0.0500 time 0.2615 (0.2737) data time 0.0010 (0.0040) model time 0.2606 (0.2674) loss 6.9649 (6.0094) grad_norm 2.4931 (1.6869) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][190/625] eta 0:01:58 lr 0.001675 wd 0.0500 time 0.2685 (0.2733) data time 0.0008 (0.0038) model time 0.2677 (0.2672) loss 6.9207 (6.0086) grad_norm 2.0668 (1.7090) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][200/625] eta 0:01:55 lr 0.001675 wd 0.0500 time 0.2640 (0.2729) data time 0.0010 (0.0037) model time 0.2630 (0.2670) loss 7.1046 (6.0199) grad_norm 1.5079 (1.7109) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][210/625] eta 0:01:53 lr 0.001675 wd 0.0500 time 0.2664 (0.2726) data time 0.0010 (0.0036) model time 0.2654 (0.2669) loss 4.1999 (6.0061) grad_norm 1.6483 (1.7034) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][220/625] eta 0:01:50 lr 0.001675 wd 0.0500 time 0.2594 (0.2722) data time 0.0011 (0.0035) model time 0.2583 (0.2667) loss 5.9748 (5.9970) grad_norm 2.2022 (1.7072) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][230/625] eta 0:01:47 lr 0.001675 wd 0.0500 time 0.2615 (0.2719) data time 0.0009 (0.0034) model time 0.2606 (0.2666) loss 6.9615 (6.0070) grad_norm 1.3575 (1.6964) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][240/625] eta 0:01:44 lr 0.001675 wd 0.0500 time 0.2608 (0.2716) data time 0.0012 (0.0033) model time 0.2596 (0.2663) loss 6.4621 (6.0094) grad_norm 1.4153 (1.6845) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][250/625] eta 0:01:41 lr 0.001675 wd 0.0500 time 0.2647 (0.2713) data time 0.0008 (0.0032) model time 0.2639 (0.2662) loss 4.7916 (6.0112) grad_norm 1.5112 (1.6815) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][260/625] eta 0:01:38 lr 0.001675 wd 0.0500 time 0.2648 (0.2711) data time 0.0008 (0.0031) model time 0.2641 (0.2661) loss 6.9813 (6.0031) grad_norm 1.9358 (1.6803) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][270/625] eta 0:01:36 lr 0.001674 wd 0.0500 time 0.2593 (0.2709) data time 0.0009 (0.0030) model time 0.2584 (0.2660) loss 3.9277 (5.9805) grad_norm 1.4569 (1.6785) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][280/625] eta 0:01:33 lr 0.001674 wd 0.0500 time 0.2667 (0.2707) data time 0.0009 (0.0029) model time 0.2658 (0.2660) loss 7.4235 (6.0038) grad_norm 1.1277 (1.6968) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][290/625] eta 0:01:30 lr 0.001674 wd 0.0500 time 0.2651 (0.2705) data time 0.0009 (0.0029) model time 0.2642 (0.2660) loss 6.3244 (5.9932) grad_norm 1.7092 (1.7170) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][300/625] eta 0:01:27 lr 0.001674 wd 0.0500 time 0.2766 (0.2705) data time 0.0009 (0.0028) model time 0.2757 (0.2660) loss 5.4544 (5.9767) grad_norm 1.5288 (1.7141) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][310/625] eta 0:01:25 lr 0.001674 wd 0.0500 time 0.2643 (0.2703) data time 0.0009 (0.0028) model time 0.2634 (0.2659) loss 7.2836 (5.9764) grad_norm 1.6760 (1.7054) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][320/625] eta 0:01:22 lr 0.001674 wd 0.0500 time 0.2744 (0.2702) data time 0.0008 (0.0027) model time 0.2737 (0.2659) loss 4.5711 (5.9749) grad_norm 2.5972 (1.7053) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][330/625] eta 0:01:19 lr 0.001674 wd 0.0500 time 0.2639 (0.2700) data time 0.0008 (0.0027) model time 0.2631 (0.2659) loss 6.5995 (5.9642) grad_norm 1.0201 (1.7052) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][340/625] eta 0:01:16 lr 0.001673 wd 0.0500 time 0.2622 (0.2700) data time 0.0011 (0.0026) model time 0.2612 (0.2660) loss 6.0370 (5.9664) grad_norm 2.2264 (1.7041) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][350/625] eta 0:01:14 lr 0.001673 wd 0.0500 time 0.2658 (0.2699) data time 0.0009 (0.0026) model time 0.2648 (0.2659) loss 7.1364 (5.9848) grad_norm 1.7093 (1.6985) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][360/625] eta 0:01:11 lr 0.001673 wd 0.0500 time 0.2603 (0.2697) data time 0.0011 (0.0025) model time 0.2592 (0.2658) loss 5.6662 (5.9816) grad_norm 1.1786 (1.6952) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][370/625] eta 0:01:08 lr 0.001673 wd 0.0500 time 0.2704 (0.2696) data time 0.0009 (0.0025) model time 0.2695 (0.2658) loss 6.2693 (5.9859) grad_norm 1.1343 (1.6879) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][380/625] eta 0:01:06 lr 0.001673 wd 0.0500 time 0.2634 (0.2695) data time 0.0011 (0.0024) model time 0.2623 (0.2657) loss 7.1273 (5.9873) grad_norm 1.4301 (1.6861) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][390/625] eta 0:01:03 lr 0.001673 wd 0.0500 time 0.2568 (0.2694) data time 0.0009 (0.0024) model time 0.2559 (0.2657) loss 7.2132 (5.9868) grad_norm 2.5344 (1.6871) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][400/625] eta 0:01:00 lr 0.001673 wd 0.0500 time 0.2639 (0.2693) data time 0.0010 (0.0024) model time 0.2629 (0.2656) loss 5.3559 (5.9821) grad_norm 1.5835 (1.6831) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][410/625] eta 0:00:57 lr 0.001673 wd 0.0500 time 0.2660 (0.2692) data time 0.0009 (0.0023) model time 0.2651 (0.2656) loss 6.4830 (5.9826) grad_norm 1.3589 (1.6803) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][420/625] eta 0:00:55 lr 0.001672 wd 0.0500 time 0.2648 (0.2692) data time 0.0007 (0.0023) model time 0.2641 (0.2656) loss 4.6752 (5.9625) grad_norm 2.4455 (1.6761) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][430/625] eta 0:00:52 lr 0.001672 wd 0.0500 time 0.2672 (0.2691) data time 0.0007 (0.0023) model time 0.2665 (0.2656) loss 6.7991 (5.9637) grad_norm 1.2182 (1.6764) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][440/625] eta 0:00:49 lr 0.001672 wd 0.0500 time 0.2565 (0.2690) data time 0.0014 (0.0022) model time 0.2551 (0.2655) loss 4.7399 (5.9566) grad_norm 1.1024 (1.6832) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][450/625] eta 0:00:47 lr 0.001672 wd 0.0500 time 0.2629 (0.2689) data time 0.0007 (0.0022) model time 0.2622 (0.2655) loss 6.1091 (5.9592) grad_norm 1.1971 (1.6760) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][460/625] eta 0:00:44 lr 0.001672 wd 0.0500 time 0.2625 (0.2688) data time 0.0009 (0.0022) model time 0.2616 (0.2655) loss 7.3564 (5.9603) grad_norm 1.2568 (1.6760) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][470/625] eta 0:00:41 lr 0.001672 wd 0.0500 time 0.2639 (0.2687) data time 0.0010 (0.0022) model time 0.2629 (0.2654) loss 6.9515 (5.9626) grad_norm 1.3577 (1.6814) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][480/625] eta 0:00:38 lr 0.001672 wd 0.0500 time 0.2617 (0.2686) data time 0.0008 (0.0021) model time 0.2610 (0.2654) loss 5.8200 (5.9692) grad_norm 1.1438 (1.6920) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][490/625] eta 0:00:36 lr 0.001671 wd 0.0500 time 0.2658 (0.2686) data time 0.0008 (0.0021) model time 0.2650 (0.2654) loss 6.7028 (5.9649) grad_norm 3.1308 (1.6956) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][500/625] eta 0:00:33 lr 0.001671 wd 0.0500 time 0.2646 (0.2685) data time 0.0010 (0.0021) model time 0.2636 (0.2654) loss 4.9827 (5.9604) grad_norm 1.2640 (1.6926) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][510/625] eta 0:00:30 lr 0.001671 wd 0.0500 time 0.2642 (0.2685) data time 0.0013 (0.0021) model time 0.2629 (0.2654) loss 7.1416 (5.9661) grad_norm 1.3872 (1.6899) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][520/625] eta 0:00:28 lr 0.001671 wd 0.0500 time 0.2672 (0.2685) data time 0.0010 (0.0021) model time 0.2662 (0.2654) loss 5.4225 (5.9667) grad_norm 1.3871 (1.6881) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][530/625] eta 0:00:25 lr 0.001671 wd 0.0500 time 0.2650 (0.2684) data time 0.0010 (0.0020) model time 0.2640 (0.2654) loss 7.0305 (5.9706) grad_norm 1.5156 (1.6970) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][540/625] eta 0:00:22 lr 0.001671 wd 0.0500 time 0.2614 (0.2684) data time 0.0011 (0.0020) model time 0.2603 (0.2654) loss 6.5523 (5.9777) grad_norm 2.4428 (1.6940) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][550/625] eta 0:00:20 lr 0.001671 wd 0.0500 time 0.2628 (0.2683) data time 0.0008 (0.0020) model time 0.2620 (0.2654) loss 5.9206 (5.9740) grad_norm 1.5669 (1.7001) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][560/625] eta 0:00:17 lr 0.001671 wd 0.0500 time 0.2643 (0.2683) data time 0.0010 (0.0020) model time 0.2633 (0.2654) loss 4.9500 (5.9600) grad_norm 2.0845 (1.6969) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 12:54:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 12:54:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 12:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 12:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 12:56:48 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 94) [2024-07-30 12:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 12:57:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][570/625] eta 0:03:14 lr 0.001670 wd 0.0500 time 0.2686 (3.5276) data time 0.0007 (0.2579) model time 0.2679 (3.2697) loss 5.4317 (6.6009) grad_norm 1.2534 (2.1423) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][580/625] eta 0:00:45 lr 0.001670 wd 0.0500 time 0.2678 (1.0189) data time 0.0010 (0.0603) model time 0.2667 (0.9586) loss 6.4767 (6.3668) grad_norm 1.8895 (1.7136) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][590/625] eta 0:00:24 lr 0.001670 wd 0.0500 time 0.2692 (0.6913) data time 0.0008 (0.0346) model time 0.2683 (0.6567) loss 6.5161 (6.4188) grad_norm 1.2246 (1.5282) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][600/625] eta 0:00:14 lr 0.001670 wd 0.0500 time 0.2618 (0.5620) data time 0.0008 (0.0244) model time 0.2610 (0.5376) loss 6.5976 (6.4009) grad_norm 3.4444 (1.6479) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][610/625] eta 0:00:07 lr 0.001670 wd 0.0500 time 0.2650 (0.4931) data time 0.0008 (0.0191) model time 0.2642 (0.4740) loss 5.9906 (6.2828) grad_norm 1.8192 (1.6448) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][620/625] eta 0:00:02 lr 0.001670 wd 0.0500 time 0.2641 (0.4500) data time 0.0007 (0.0156) model time 0.2634 (0.4344) loss 6.5258 (6.2590) grad_norm 1.4696 (1.6411) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 12:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 94 training takes 0:00:24 [2024-07-30 12:57:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 12:57:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 12:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.6499 (0.6499) Acc@1 87.451 (87.451) Acc@5 97.656 (97.656) Mem 9656MB [2024-07-30 12:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.099) Loss 1.0029 (0.8022) Acc@1 77.832 (82.892) Acc@5 94.043 (96.613) Mem 9656MB [2024-07-30 12:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.1660 (0.9631) Acc@1 72.705 (78.776) Acc@5 93.018 (94.757) Mem 9656MB [2024-07-30 12:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.493 Acc@5 94.708 [2024-07-30 12:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-30 12:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.49% [2024-07-30 12:57:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 12:57:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 12:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.580 (0.580) Loss 0.5620 (0.5620) Acc@1 88.086 (88.086) Acc@5 98.242 (98.242) Mem 9656MB [2024-07-30 12:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.109) Loss 0.9478 (0.7160) Acc@1 78.076 (83.975) Acc@5 94.482 (96.950) Mem 9656MB [2024-07-30 12:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.083) Loss 1.1016 (0.8633) Acc@1 72.998 (80.122) Acc@5 92.676 (95.203) Mem 9656MB [2024-07-30 12:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.810 Acc@5 95.170 [2024-07-30 12:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-07-30 12:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.81% [2024-07-30 12:57:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 12:57:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 12:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][0/625] eta 0:07:52 lr 0.001670 wd 0.0500 time 0.7558 (0.7558) data time 0.4021 (0.4021) model time 0.0000 (0.0000) loss 4.7780 (4.7780) grad_norm 1.1775 (1.1775) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 12:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][10/625] eta 0:03:11 lr 0.001670 wd 0.0500 time 0.2643 (0.3114) data time 0.0010 (0.0375) model time 0.0000 (0.0000) loss 4.6088 (5.7721) grad_norm 1.5116 (1.6319) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 12:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 12:57:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 12:57:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 12:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 12:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:00:02 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:00:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:00:16 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:00:16 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:00:16 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:00:17 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 95) [2024-07-30 13:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][20/625] eta 0:15:38 lr 0.001669 wd 0.0500 time 0.2654 (1.5507) data time 0.0007 (0.1567) model time 0.0000 (0.0000) loss 7.4518 (6.7624) grad_norm 2.8469 (2.1749) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][30/625] eta 0:07:23 lr 0.001669 wd 0.0500 time 0.2712 (0.7457) data time 0.0014 (0.0595) model time 0.0000 (0.0000) loss 6.2065 (6.4224) grad_norm 1.9740 (2.2685) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][40/625] eta 0:05:27 lr 0.001669 wd 0.0500 time 0.2677 (0.5605) data time 0.0008 (0.0370) model time 0.0000 (0.0000) loss 6.1506 (6.3342) grad_norm 1.7623 (2.1184) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][50/625] eta 0:04:35 lr 0.001669 wd 0.0500 time 0.2647 (0.4786) data time 0.0010 (0.0271) model time 0.0000 (0.0000) loss 6.7867 (6.3574) grad_norm 1.2094 (1.9825) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][60/625] eta 0:04:03 lr 0.001669 wd 0.0500 time 0.2668 (0.4317) data time 0.0010 (0.0214) model time 0.2657 (0.2618) loss 5.0682 (6.2655) grad_norm 1.4098 (1.9144) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][70/625] eta 0:03:42 lr 0.001669 wd 0.0500 time 0.2640 (0.4016) data time 0.0008 (0.0178) model time 0.2632 (0.2620) loss 6.5277 (6.2106) grad_norm 1.2187 (1.9251) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][80/625] eta 0:03:27 lr 0.001669 wd 0.0500 time 0.2635 (0.3805) data time 0.0007 (0.0152) model time 0.2628 (0.2618) loss 5.5133 (6.1805) grad_norm 1.4806 (1.9589) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][90/625] eta 0:03:15 lr 0.001668 wd 0.0500 time 0.2684 (0.3656) data time 0.0011 (0.0134) model time 0.2673 (0.2627) loss 7.1273 (6.1262) grad_norm 1.2624 (1.9025) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][100/625] eta 0:03:05 lr 0.001668 wd 0.0500 time 0.2644 (0.3541) data time 0.0012 (0.0120) model time 0.2632 (0.2633) loss 5.1643 (6.0774) grad_norm 2.2199 (1.8471) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][110/625] eta 0:02:57 lr 0.001668 wd 0.0500 time 0.2608 (0.3446) data time 0.0009 (0.0108) model time 0.2599 (0.2632) loss 6.2285 (6.0856) grad_norm 1.5302 (1.8207) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][120/625] eta 0:02:50 lr 0.001668 wd 0.0500 time 0.2637 (0.3369) data time 0.0011 (0.0099) model time 0.2627 (0.2629) loss 6.8180 (6.1233) grad_norm 1.6639 (1.8094) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][130/625] eta 0:02:43 lr 0.001668 wd 0.0500 time 0.2591 (0.3305) data time 0.0011 (0.0091) model time 0.2581 (0.2628) loss 6.2683 (6.0983) grad_norm 1.3118 (1.7824) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][140/625] eta 0:02:37 lr 0.001668 wd 0.0500 time 0.2630 (0.3254) data time 0.0009 (0.0085) model time 0.2621 (0.2630) loss 4.8541 (6.0812) grad_norm 1.5060 (1.7693) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][150/625] eta 0:02:32 lr 0.001668 wd 0.0500 time 0.2658 (0.3209) data time 0.0012 (0.0080) model time 0.2645 (0.2631) loss 5.2635 (6.0880) grad_norm 1.3674 (1.7681) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][160/625] eta 0:02:27 lr 0.001668 wd 0.0500 time 0.2601 (0.3170) data time 0.0009 (0.0075) model time 0.2592 (0.2630) loss 4.6507 (6.0665) grad_norm 2.5991 (1.7503) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][170/625] eta 0:02:22 lr 0.001667 wd 0.0500 time 0.2599 (0.3137) data time 0.0014 (0.0071) model time 0.2585 (0.2631) loss 6.1468 (6.0739) grad_norm 1.8760 (1.7928) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:01:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:01:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:03:21 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 95) [2024-07-30 13:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][180/625] eta 0:18:20 lr 0.001667 wd 0.0500 time 0.2655 (2.4726) data time 0.0008 (0.2002) model time 0.2647 (2.2724) loss 7.2828 (6.9095) grad_norm 1.8461 (1.7533) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][190/625] eta 0:06:28 lr 0.001667 wd 0.0500 time 0.2587 (0.8930) data time 0.0007 (0.0580) model time 0.2581 (0.8350) loss 6.3729 (6.3919) grad_norm 1.4611 (1.6534) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][200/625] eta 0:04:27 lr 0.001667 wd 0.0500 time 0.2561 (0.6301) data time 0.0011 (0.0343) model time 0.2550 (0.5958) loss 6.6615 (6.4012) grad_norm 1.3095 (1.5860) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][210/625] eta 0:03:36 lr 0.001667 wd 0.0500 time 0.2578 (0.5217) data time 0.0009 (0.0245) model time 0.2569 (0.4972) loss 4.5439 (6.3966) grad_norm 1.8857 (1.5900) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][220/625] eta 0:03:07 lr 0.001667 wd 0.0500 time 0.2597 (0.4630) data time 0.0009 (0.0192) model time 0.2588 (0.4438) loss 6.5253 (6.3161) grad_norm 2.1150 (1.6847) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][230/625] eta 0:02:48 lr 0.001667 wd 0.0500 time 0.2608 (0.4265) data time 0.0008 (0.0159) model time 0.2600 (0.4106) loss 6.5089 (6.2906) grad_norm 1.5297 (1.7423) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][240/625] eta 0:02:34 lr 0.001666 wd 0.0500 time 0.2614 (0.4007) data time 0.0008 (0.0135) model time 0.2606 (0.3872) loss 6.7885 (6.2557) grad_norm 2.8801 (1.7839) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][250/625] eta 0:02:23 lr 0.001666 wd 0.0500 time 0.2609 (0.3820) data time 0.0007 (0.0119) model time 0.2602 (0.3701) loss 6.5344 (6.2143) grad_norm 1.4114 (1.8025) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][260/625] eta 0:02:14 lr 0.001666 wd 0.0500 time 0.2619 (0.3679) data time 0.0010 (0.0106) model time 0.2609 (0.3573) loss 6.4165 (6.1650) grad_norm 1.1005 (1.7994) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][270/625] eta 0:02:06 lr 0.001666 wd 0.0500 time 0.2663 (0.3568) data time 0.0010 (0.0096) model time 0.2654 (0.3472) loss 6.1560 (6.1398) grad_norm 1.2536 (1.7569) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][280/625] eta 0:01:59 lr 0.001666 wd 0.0500 time 0.2606 (0.3476) data time 0.0010 (0.0088) model time 0.2596 (0.3388) loss 6.3303 (6.1792) grad_norm 1.8720 (1.7292) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][290/625] eta 0:01:53 lr 0.001666 wd 0.0500 time 0.2645 (0.3401) data time 0.0010 (0.0081) model time 0.2635 (0.3320) loss 6.4145 (6.1642) grad_norm 1.0754 (1.6970) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][300/625] eta 0:01:48 lr 0.001666 wd 0.0500 time 0.2608 (0.3340) data time 0.0012 (0.0075) model time 0.2596 (0.3264) loss 4.8695 (6.1602) grad_norm 1.7558 (1.6782) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][310/625] eta 0:01:43 lr 0.001666 wd 0.0500 time 0.2799 (0.3288) data time 0.0010 (0.0071) model time 0.2789 (0.3217) loss 6.8057 (6.1462) grad_norm 2.2011 (1.6675) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][320/625] eta 0:01:38 lr 0.001665 wd 0.0500 time 0.2617 (0.3244) data time 0.0009 (0.0066) model time 0.2608 (0.3177) loss 5.4716 (6.1247) grad_norm 1.8471 (1.6815) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][330/625] eta 0:01:34 lr 0.001665 wd 0.0500 time 0.2635 (0.3204) data time 0.0008 (0.0063) model time 0.2627 (0.3141) loss 6.1669 (6.1087) grad_norm 2.1639 (1.6978) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][340/625] eta 0:01:30 lr 0.001665 wd 0.0500 time 0.2599 (0.3168) data time 0.0009 (0.0060) model time 0.2590 (0.3108) loss 6.0277 (6.1115) grad_norm 1.0438 (1.6898) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][350/625] eta 0:01:26 lr 0.001665 wd 0.0500 time 0.2627 (0.3137) data time 0.0010 (0.0057) model time 0.2617 (0.3080) loss 4.4396 (6.0959) grad_norm 1.4723 (1.6818) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][360/625] eta 0:01:22 lr 0.001665 wd 0.0500 time 0.2636 (0.3109) data time 0.0008 (0.0054) model time 0.2628 (0.3055) loss 6.0144 (6.0839) grad_norm 1.8682 (1.6753) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][370/625] eta 0:01:18 lr 0.001665 wd 0.0500 time 0.2623 (0.3085) data time 0.0008 (0.0052) model time 0.2615 (0.3033) loss 6.0313 (6.0699) grad_norm 2.3938 (1.6794) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][380/625] eta 0:01:15 lr 0.001665 wd 0.0500 time 0.2626 (0.3063) data time 0.0011 (0.0050) model time 0.2615 (0.3013) loss 6.0221 (6.0475) grad_norm 2.1075 (1.6943) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][390/625] eta 0:01:11 lr 0.001664 wd 0.0500 time 0.2636 (0.3043) data time 0.0007 (0.0048) model time 0.2628 (0.2994) loss 6.0570 (6.0360) grad_norm 2.4924 (1.6915) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][400/625] eta 0:01:08 lr 0.001664 wd 0.0500 time 0.2656 (0.3024) data time 0.0010 (0.0047) model time 0.2646 (0.2977) loss 6.9779 (6.0375) grad_norm 1.3226 (1.6775) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][410/625] eta 0:01:04 lr 0.001664 wd 0.0500 time 0.2587 (0.3007) data time 0.0009 (0.0045) model time 0.2578 (0.2962) loss 4.1973 (6.0282) grad_norm 1.8960 (1.6791) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][420/625] eta 0:01:01 lr 0.001664 wd 0.0500 time 0.2699 (0.2991) data time 0.0008 (0.0044) model time 0.2691 (0.2948) loss 4.3554 (6.0327) grad_norm 2.2990 (1.6964) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][430/625] eta 0:00:58 lr 0.001664 wd 0.0500 time 0.2600 (0.2976) data time 0.0009 (0.0042) model time 0.2591 (0.2934) loss 4.5783 (6.0158) grad_norm 1.1430 (1.7044) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][440/625] eta 0:00:54 lr 0.001664 wd 0.0500 time 0.2625 (0.2963) data time 0.0008 (0.0041) model time 0.2618 (0.2922) loss 5.8624 (6.0106) grad_norm 1.7145 (1.7094) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][450/625] eta 0:00:51 lr 0.001664 wd 0.0500 time 0.2608 (0.2951) data time 0.0011 (0.0040) model time 0.2597 (0.2911) loss 6.5012 (6.0099) grad_norm 1.4261 (1.7022) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][460/625] eta 0:00:48 lr 0.001664 wd 0.0500 time 0.2627 (0.2941) data time 0.0009 (0.0039) model time 0.2618 (0.2902) loss 4.9675 (6.0077) grad_norm 1.2502 (1.6892) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][470/625] eta 0:00:45 lr 0.001663 wd 0.0500 time 0.2682 (0.2931) data time 0.0008 (0.0038) model time 0.2674 (0.2893) loss 6.0115 (6.0059) grad_norm 1.7005 (1.6863) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][480/625] eta 0:00:42 lr 0.001663 wd 0.0500 time 0.2600 (0.2921) data time 0.0009 (0.0037) model time 0.2590 (0.2884) loss 6.7881 (5.9976) grad_norm 2.1154 (1.6856) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][490/625] eta 0:00:39 lr 0.001663 wd 0.0500 time 0.2648 (0.2912) data time 0.0011 (0.0036) model time 0.2637 (0.2876) loss 6.6420 (6.0038) grad_norm 1.2308 (1.6754) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][500/625] eta 0:00:36 lr 0.001663 wd 0.0500 time 0.2617 (0.2904) data time 0.0008 (0.0036) model time 0.2610 (0.2868) loss 6.9700 (6.0210) grad_norm 3.1021 (1.6814) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][510/625] eta 0:00:33 lr 0.001663 wd 0.0500 time 0.2669 (0.2897) data time 0.0007 (0.0035) model time 0.2662 (0.2862) loss 5.7533 (6.0179) grad_norm 1.7816 (1.6876) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][520/625] eta 0:00:30 lr 0.001663 wd 0.0500 time 0.2598 (0.2889) data time 0.0011 (0.0034) model time 0.2587 (0.2855) loss 6.4003 (6.0254) grad_norm 1.2448 (1.6891) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][530/625] eta 0:00:27 lr 0.001663 wd 0.0500 time 0.2611 (0.2882) data time 0.0011 (0.0034) model time 0.2600 (0.2848) loss 7.5196 (6.0348) grad_norm 1.9847 (1.6883) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][540/625] eta 0:00:24 lr 0.001662 wd 0.0500 time 0.2622 (0.2876) data time 0.0008 (0.0033) model time 0.2614 (0.2843) loss 5.2044 (6.0297) grad_norm 1.9663 (1.6887) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][550/625] eta 0:00:21 lr 0.001662 wd 0.0500 time 0.2633 (0.2869) data time 0.0010 (0.0032) model time 0.2623 (0.2837) loss 6.9391 (6.0319) grad_norm 1.8837 (1.6970) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][560/625] eta 0:00:18 lr 0.001662 wd 0.0500 time 0.2656 (0.2863) data time 0.0008 (0.0032) model time 0.2649 (0.2832) loss 5.3974 (6.0182) grad_norm 3.1138 (1.6979) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][570/625] eta 0:00:15 lr 0.001662 wd 0.0500 time 0.2686 (0.2858) data time 0.0009 (0.0031) model time 0.2677 (0.2826) loss 5.9607 (6.0188) grad_norm 2.2896 (1.7005) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][580/625] eta 0:00:12 lr 0.001662 wd 0.0500 time 0.2660 (0.2853) data time 0.0008 (0.0031) model time 0.2653 (0.2822) loss 6.0450 (6.0263) grad_norm 1.4879 (1.6964) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][590/625] eta 0:00:09 lr 0.001662 wd 0.0500 time 0.2662 (0.2848) data time 0.0009 (0.0030) model time 0.2653 (0.2818) loss 5.2514 (6.0278) grad_norm 1.8837 (1.6988) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][600/625] eta 0:00:07 lr 0.001662 wd 0.0500 time 0.2777 (0.2850) data time 0.0007 (0.0030) model time 0.2770 (0.2820) loss 6.3711 (6.0215) grad_norm 1.2757 (1.6919) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][610/625] eta 0:00:04 lr 0.001662 wd 0.0500 time 0.2654 (0.2845) data time 0.0007 (0.0029) model time 0.2647 (0.2816) loss 6.6557 (6.0311) grad_norm 1.8229 (1.6937) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][620/625] eta 0:00:01 lr 0.001661 wd 0.0500 time 0.2664 (0.2841) data time 0.0007 (0.0029) model time 0.2657 (0.2812) loss 6.7025 (6.0332) grad_norm 1.5868 (1.6917) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 95 training takes 0:02:07 [2024-07-30 13:05:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:05:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:05:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6621 (0.6621) Acc@1 87.402 (87.402) Acc@5 97.607 (97.607) Mem 9656MB [2024-07-30 13:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0947 (0.8240) Acc@1 75.293 (82.551) Acc@5 93.555 (96.515) Mem 9656MB [2024-07-30 13:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.2578 (0.9801) Acc@1 70.850 (78.632) Acc@5 91.602 (94.608) Mem 9656MB [2024-07-30 13:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.283 Acc@5 94.614 [2024-07-30 13:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.3% [2024-07-30 13:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.771 (0.771) Loss 0.5620 (0.5620) Acc@1 88.037 (88.037) Acc@5 98.340 (98.340) Mem 9656MB [2024-07-30 13:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.132) Loss 0.9468 (0.7160) Acc@1 78.027 (84.015) Acc@5 94.580 (96.990) Mem 9656MB [2024-07-30 13:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.095) Loss 1.0986 (0.8631) Acc@1 72.949 (80.162) Acc@5 92.725 (95.252) Mem 9656MB [2024-07-30 13:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.852 Acc@5 95.230 [2024-07-30 13:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-30 13:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.85% [2024-07-30 13:06:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 13:06:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 13:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][0/625] eta 0:10:43 lr 0.001661 wd 0.0500 time 1.0299 (1.0299) data time 0.3873 (0.3873) model time 0.0000 (0.0000) loss 6.2007 (6.2007) grad_norm 4.1344 (4.1344) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 13:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:06:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:06:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:08:37 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 96) [2024-07-30 13:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][10/625] eta 0:09:24 lr 0.001661 wd 0.0500 time 0.2535 (0.9185) data time 0.0008 (0.0774) model time 0.0000 (0.0000) loss 6.7455 (6.6378) grad_norm 1.7208 (2.2173) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][20/625] eta 0:05:54 lr 0.001661 wd 0.0500 time 0.2494 (0.5856) data time 0.0007 (0.0392) model time 0.0000 (0.0000) loss 6.9227 (6.3334) grad_norm 2.0468 (1.8916) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][30/625] eta 0:04:42 lr 0.001661 wd 0.0500 time 0.2605 (0.4753) data time 0.0010 (0.0265) model time 0.0000 (0.0000) loss 6.9874 (6.5082) grad_norm 1.0988 (1.8643) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][40/625] eta 0:04:05 lr 0.001661 wd 0.0500 time 0.2482 (0.4200) data time 0.0006 (0.0201) model time 0.0000 (0.0000) loss 4.8037 (6.3423) grad_norm 0.9684 (1.8143) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][50/625] eta 0:03:42 lr 0.001661 wd 0.0500 time 0.2515 (0.3872) data time 0.0009 (0.0163) model time 0.0000 (0.0000) loss 5.2351 (6.2708) grad_norm 1.3977 (1.6966) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][60/625] eta 0:03:26 lr 0.001661 wd 0.0500 time 0.2485 (0.3649) data time 0.0009 (0.0137) model time 0.2475 (0.2524) loss 6.1557 (6.2282) grad_norm 1.1401 (1.6366) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][70/625] eta 0:03:13 lr 0.001660 wd 0.0500 time 0.2507 (0.3490) data time 0.0008 (0.0119) model time 0.2499 (0.2526) loss 4.5574 (6.1704) grad_norm 1.1597 (1.6505) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][80/625] eta 0:03:03 lr 0.001660 wd 0.0500 time 0.2597 (0.3372) data time 0.0008 (0.0106) model time 0.2589 (0.2529) loss 6.9912 (6.1336) grad_norm 1.7864 (1.6814) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][90/625] eta 0:02:55 lr 0.001660 wd 0.0500 time 0.2539 (0.3280) data time 0.0005 (0.0095) model time 0.2533 (0.2530) loss 6.5568 (6.0882) grad_norm 1.1294 (1.6893) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 13:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][100/625] eta 0:02:48 lr 0.001660 wd 0.0500 time 0.2521 (0.3206) data time 0.0010 (0.0086) model time 0.2512 (0.2530) loss 6.5241 (6.1079) grad_norm 1.9032 (1.7154) loss_scale 8192.0000 (4382.7200) mem 9656MB [2024-07-30 13:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][110/625] eta 0:02:41 lr 0.001660 wd 0.0500 time 0.2547 (0.3145) data time 0.0008 (0.0079) model time 0.2538 (0.2529) loss 5.3679 (6.0955) grad_norm 1.3766 (1.7209) loss_scale 8192.0000 (4729.0182) mem 9656MB [2024-07-30 13:09:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][120/625] eta 0:02:36 lr 0.001660 wd 0.0500 time 0.2516 (0.3095) data time 0.0007 (0.0074) model time 0.2509 (0.2531) loss 7.1384 (6.1180) grad_norm 1.4603 (1.7317) loss_scale 8192.0000 (5017.6000) mem 9656MB [2024-07-30 13:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][130/625] eta 0:02:31 lr 0.001660 wd 0.0500 time 0.2540 (0.3053) data time 0.0007 (0.0069) model time 0.2533 (0.2531) loss 6.8002 (6.0927) grad_norm 1.2801 (1.7191) loss_scale 8192.0000 (5261.7846) mem 9656MB [2024-07-30 13:09:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][140/625] eta 0:02:26 lr 0.001659 wd 0.0500 time 0.2517 (0.3016) data time 0.0006 (0.0065) model time 0.2511 (0.2531) loss 4.3343 (6.0875) grad_norm 1.2360 (1.6956) loss_scale 8192.0000 (5471.0857) mem 9656MB [2024-07-30 13:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][150/625] eta 0:02:21 lr 0.001659 wd 0.0500 time 0.2579 (0.2984) data time 0.0008 (0.0061) model time 0.2571 (0.2531) loss 7.0308 (6.0787) grad_norm 1.1158 (1.6725) loss_scale 8192.0000 (5652.4800) mem 9656MB [2024-07-30 13:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][160/625] eta 0:02:17 lr 0.001659 wd 0.0500 time 0.2546 (0.2957) data time 0.0008 (0.0058) model time 0.2538 (0.2532) loss 6.7396 (6.0793) grad_norm 2.9555 (1.6827) loss_scale 8192.0000 (5811.2000) mem 9656MB [2024-07-30 13:09:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][170/625] eta 0:02:13 lr 0.001659 wd 0.0500 time 0.2550 (0.2933) data time 0.0006 (0.0055) model time 0.2545 (0.2533) loss 4.4190 (6.0875) grad_norm 1.7762 (1.6762) loss_scale 8192.0000 (5951.2471) mem 9656MB [2024-07-30 13:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][180/625] eta 0:02:09 lr 0.001659 wd 0.0500 time 0.2526 (0.2912) data time 0.0008 (0.0052) model time 0.2518 (0.2533) loss 5.2546 (6.0580) grad_norm 1.6956 (1.6711) loss_scale 8192.0000 (6075.7333) mem 9656MB [2024-07-30 13:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][190/625] eta 0:02:05 lr 0.001659 wd 0.0500 time 0.2508 (0.2893) data time 0.0007 (0.0050) model time 0.2501 (0.2534) loss 5.9201 (6.0591) grad_norm 1.0809 (1.6585) loss_scale 8192.0000 (6187.1158) mem 9656MB [2024-07-30 13:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][200/625] eta 0:02:02 lr 0.001659 wd 0.0500 time 0.2558 (0.2877) data time 0.0008 (0.0048) model time 0.2550 (0.2535) loss 6.4040 (6.0337) grad_norm 1.1993 (1.6476) loss_scale 8192.0000 (6287.3600) mem 9656MB [2024-07-30 13:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][210/625] eta 0:01:58 lr 0.001659 wd 0.0500 time 0.2550 (0.2861) data time 0.0007 (0.0046) model time 0.2544 (0.2535) loss 5.9187 (6.0272) grad_norm 1.3152 (1.6350) loss_scale 8192.0000 (6378.0571) mem 9656MB [2024-07-30 13:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][220/625] eta 0:01:55 lr 0.001658 wd 0.0500 time 0.2551 (0.2847) data time 0.0010 (0.0045) model time 0.2542 (0.2535) loss 5.6027 (6.0193) grad_norm 1.3240 (1.6385) loss_scale 8192.0000 (6460.5091) mem 9656MB [2024-07-30 13:09:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:09:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:09:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:11:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:11:45 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 96) [2024-07-30 13:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:13:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:14:03 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:16:26 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:16:37 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:16:38 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:16:38 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:16:38 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 96) [2024-07-30 13:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][230/625] eta 0:16:30 lr 0.001658 wd 0.0500 time 0.2498 (2.5065) data time 0.0010 (0.2986) model time 0.2488 (2.2079) loss 6.8899 (6.3995) grad_norm 2.7964 (1.9130) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][240/625] eta 0:05:44 lr 0.001658 wd 0.0500 time 0.2431 (0.8935) data time 0.0010 (0.0861) model time 0.2421 (0.8074) loss 6.2914 (6.1338) grad_norm 5.6345 (2.2610) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][250/625] eta 0:03:57 lr 0.001658 wd 0.0500 time 0.2454 (0.6324) data time 0.0011 (0.0507) model time 0.2443 (0.5817) loss 5.5662 (6.1699) grad_norm 2.4822 (2.3235) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][260/625] eta 0:03:09 lr 0.001658 wd 0.0500 time 0.2413 (0.5184) data time 0.0010 (0.0361) model time 0.2402 (0.4823) loss 4.7439 (6.1594) grad_norm 1.3308 (2.3087) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][270/625] eta 0:02:42 lr 0.001658 wd 0.0500 time 0.2589 (0.4586) data time 0.0008 (0.0282) model time 0.2581 (0.4304) loss 5.6226 (6.1258) grad_norm 1.0724 (2.0973) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][280/625] eta 0:02:24 lr 0.001658 wd 0.0500 time 0.2467 (0.4191) data time 0.0007 (0.0232) model time 0.2460 (0.3959) loss 6.8633 (6.1245) grad_norm 4.5641 (2.0989) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][290/625] eta 0:02:11 lr 0.001657 wd 0.0500 time 0.2446 (0.3931) data time 0.0008 (0.0197) model time 0.2437 (0.3733) loss 6.8604 (6.0629) grad_norm 1.2484 (2.0692) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][300/625] eta 0:02:01 lr 0.001657 wd 0.0500 time 0.2454 (0.3747) data time 0.0008 (0.0172) model time 0.2446 (0.3575) loss 6.1373 (6.0564) grad_norm 1.4583 (1.9989) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][310/625] eta 0:01:53 lr 0.001657 wd 0.0500 time 0.2457 (0.3594) data time 0.0010 (0.0153) model time 0.2447 (0.3441) loss 6.5575 (6.0508) grad_norm 2.2685 (1.9552) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][320/625] eta 0:01:46 lr 0.001657 wd 0.0500 time 0.2581 (0.3476) data time 0.0015 (0.0138) model time 0.2566 (0.3338) loss 5.4612 (6.0364) grad_norm 1.1980 (1.9468) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][330/625] eta 0:01:40 lr 0.001657 wd 0.0500 time 0.2426 (0.3396) data time 0.0015 (0.0126) model time 0.2411 (0.3270) loss 6.2974 (6.0996) grad_norm 3.1326 (1.9487) loss_scale 8192.0000 (8192.0000) mem 9661MB [2024-07-30 13:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:17:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:17:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:19:08 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 96) [2024-07-30 13:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:19:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][340/625] eta 0:05:47 lr 0.001657 wd 0.0500 time 0.2513 (1.2204) data time 0.0014 (0.1042) model time 0.2499 (1.1161) loss 6.3000 (6.7511) grad_norm 1.8866 (1.8440) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][350/625] eta 0:02:59 lr 0.001657 wd 0.0500 time 0.2620 (0.6525) data time 0.0010 (0.0435) model time 0.2610 (0.6090) loss 6.0545 (6.4556) grad_norm 1.9264 (1.9416) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][360/625] eta 0:02:13 lr 0.001657 wd 0.0500 time 0.2558 (0.5046) data time 0.0006 (0.0277) model time 0.2553 (0.4769) loss 6.9515 (6.4044) grad_norm 2.1053 (2.0624) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][370/625] eta 0:01:51 lr 0.001656 wd 0.0500 time 0.2515 (0.4368) data time 0.0011 (0.0205) model time 0.2503 (0.4163) loss 5.9110 (6.3637) grad_norm 1.3759 (1.9044) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][380/625] eta 0:01:37 lr 0.001656 wd 0.0500 time 0.2536 (0.3980) data time 0.0008 (0.0164) model time 0.2529 (0.3817) loss 6.5869 (6.3156) grad_norm 1.2137 (1.8587) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][390/625] eta 0:01:27 lr 0.001656 wd 0.0500 time 0.2514 (0.3731) data time 0.0009 (0.0137) model time 0.2505 (0.3594) loss 6.4027 (6.2555) grad_norm 1.7933 (1.8224) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][400/625] eta 0:01:19 lr 0.001656 wd 0.0500 time 0.2533 (0.3555) data time 0.0009 (0.0118) model time 0.2524 (0.3438) loss 6.6906 (6.2064) grad_norm 1.3975 (1.7635) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][410/625] eta 0:01:13 lr 0.001656 wd 0.0500 time 0.2545 (0.3424) data time 0.0008 (0.0103) model time 0.2537 (0.3321) loss 6.3480 (6.1615) grad_norm 1.6367 (1.7537) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][420/625] eta 0:01:08 lr 0.001656 wd 0.0500 time 0.2570 (0.3324) data time 0.0010 (0.0093) model time 0.2560 (0.3231) loss 6.0631 (6.1073) grad_norm 1.0126 (1.7186) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][430/625] eta 0:01:03 lr 0.001656 wd 0.0500 time 0.2559 (0.3244) data time 0.0008 (0.0084) model time 0.2550 (0.3160) loss 6.4240 (6.1113) grad_norm 2.3927 (1.7438) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][440/625] eta 0:00:58 lr 0.001655 wd 0.0500 time 0.2522 (0.3178) data time 0.0008 (0.0077) model time 0.2514 (0.3101) loss 6.1667 (6.1350) grad_norm 1.5734 (1.7202) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][450/625] eta 0:00:54 lr 0.001655 wd 0.0500 time 0.2508 (0.3125) data time 0.0009 (0.0071) model time 0.2498 (0.3053) loss 6.7683 (6.1313) grad_norm 1.4203 (1.6965) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][460/625] eta 0:00:50 lr 0.001655 wd 0.0500 time 0.2570 (0.3081) data time 0.0009 (0.0066) model time 0.2561 (0.3015) loss 6.8023 (6.1164) grad_norm 1.4129 (1.6676) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:20:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:20:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:20:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:22:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:24:23 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 96) [2024-07-30 13:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][470/625] eta 0:02:59 lr 0.001655 wd 0.0500 time 0.2560 (1.1578) data time 0.0009 (0.0901) model time 0.2551 (1.0677) loss 5.9971 (6.6355) grad_norm 1.0288 (1.4477) loss_scale 8192.0000 (8192.0000) mem 9656MB [2024-07-30 13:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][480/625] eta 0:01:39 lr 0.001655 wd 0.0500 time 0.2595 (0.6863) data time 0.0009 (0.0433) model time 0.2585 (0.6430) loss 5.8698 (6.3619) grad_norm 1.4037 (inf) loss_scale 4096.0000 (7760.8421) mem 9656MB [2024-07-30 13:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][490/625] eta 0:01:12 lr 0.001655 wd 0.0500 time 0.2597 (0.5395) data time 0.0007 (0.0287) model time 0.2590 (0.5108) loss 6.6965 (6.3707) grad_norm 1.3583 (inf) loss_scale 4096.0000 (6497.1034) mem 9656MB [2024-07-30 13:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][500/625] eta 0:00:58 lr 0.001655 wd 0.0500 time 0.2636 (0.4678) data time 0.0010 (0.0216) model time 0.2625 (0.4462) loss 6.2680 (6.2684) grad_norm 1.5671 (inf) loss_scale 4096.0000 (5881.4359) mem 9656MB [2024-07-30 13:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][510/625] eta 0:00:48 lr 0.001655 wd 0.0500 time 0.2598 (0.4254) data time 0.0010 (0.0174) model time 0.2587 (0.4080) loss 6.0472 (6.2225) grad_norm 1.4474 (inf) loss_scale 4096.0000 (5517.0612) mem 9656MB [2024-07-30 13:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][520/625] eta 0:00:41 lr 0.001654 wd 0.0500 time 0.2624 (0.3976) data time 0.0007 (0.0146) model time 0.2616 (0.3830) loss 5.1301 (6.1807) grad_norm 2.5025 (inf) loss_scale 4096.0000 (5276.2034) mem 9656MB [2024-07-30 13:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][530/625] eta 0:00:35 lr 0.001654 wd 0.0500 time 0.2598 (0.3781) data time 0.0010 (0.0126) model time 0.2588 (0.3655) loss 6.1616 (6.1609) grad_norm 1.8851 (inf) loss_scale 4096.0000 (5105.1594) mem 9656MB [2024-07-30 13:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][540/625] eta 0:00:30 lr 0.001654 wd 0.0500 time 0.2556 (0.3633) data time 0.0012 (0.0112) model time 0.2544 (0.3521) loss 5.8163 (6.1260) grad_norm 2.3061 (inf) loss_scale 4096.0000 (4977.4177) mem 9656MB [2024-07-30 13:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][550/625] eta 0:00:26 lr 0.001654 wd 0.0500 time 0.2624 (0.3519) data time 0.0008 (0.0100) model time 0.2616 (0.3418) loss 5.6097 (6.0947) grad_norm 1.4727 (inf) loss_scale 4096.0000 (4878.3820) mem 9656MB [2024-07-30 13:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][560/625] eta 0:00:22 lr 0.001654 wd 0.0500 time 0.2670 (0.3428) data time 0.0009 (0.0091) model time 0.2661 (0.3337) loss 6.3398 (6.1339) grad_norm 1.4225 (inf) loss_scale 4096.0000 (4799.3535) mem 9656MB [2024-07-30 13:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][570/625] eta 0:00:18 lr 0.001654 wd 0.0500 time 0.2683 (0.3357) data time 0.0007 (0.0084) model time 0.2675 (0.3273) loss 6.6869 (6.1385) grad_norm 1.5876 (inf) loss_scale 4096.0000 (4734.8257) mem 9656MB [2024-07-30 13:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][580/625] eta 0:00:14 lr 0.001654 wd 0.0500 time 0.2698 (0.3295) data time 0.0009 (0.0078) model time 0.2689 (0.3217) loss 6.6480 (6.1350) grad_norm 1.8346 (inf) loss_scale 4096.0000 (4681.1429) mem 9656MB [2024-07-30 13:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][590/625] eta 0:00:11 lr 0.001653 wd 0.0500 time 0.2611 (0.3241) data time 0.0007 (0.0072) model time 0.2604 (0.3169) loss 5.2626 (6.1044) grad_norm 1.2869 (inf) loss_scale 4096.0000 (4635.7829) mem 9656MB [2024-07-30 13:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][600/625] eta 0:00:07 lr 0.001653 wd 0.0500 time 0.2614 (0.3196) data time 0.0007 (0.0068) model time 0.2607 (0.3128) loss 6.8351 (6.1092) grad_norm 1.8633 (inf) loss_scale 4096.0000 (4596.9496) mem 9656MB [2024-07-30 13:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][610/625] eta 0:00:04 lr 0.001653 wd 0.0500 time 0.2617 (0.3157) data time 0.0005 (0.0064) model time 0.2613 (0.3093) loss 5.7310 (6.0946) grad_norm 1.7615 (inf) loss_scale 4096.0000 (4563.3289) mem 9656MB [2024-07-30 13:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][620/625] eta 0:00:01 lr 0.001653 wd 0.0500 time 0.2638 (0.3124) data time 0.0005 (0.0061) model time 0.2634 (0.3063) loss 6.8912 (6.0934) grad_norm 1.8956 (inf) loss_scale 4096.0000 (4533.9371) mem 9656MB [2024-07-30 13:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 96 training takes 0:00:50 [2024-07-30 13:25:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:25:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6631 (0.6631) Acc@1 87.549 (87.549) Acc@5 97.754 (97.754) Mem 9656MB [2024-07-30 13:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.097) Loss 1.1182 (0.8438) Acc@1 75.000 (82.244) Acc@5 93.604 (96.542) Mem 9656MB [2024-07-30 13:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.2217 (0.9959) Acc@1 73.389 (78.457) Acc@5 92.090 (94.561) Mem 9656MB [2024-07-30 13:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.177 Acc@5 94.528 [2024-07-30 13:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-30 13:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.823 (0.823) Loss 0.5625 (0.5625) Acc@1 88.184 (88.184) Acc@5 98.291 (98.291) Mem 9656MB [2024-07-30 13:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.136) Loss 0.9453 (0.7161) Acc@1 78.174 (84.038) Acc@5 94.531 (96.990) Mem 9656MB [2024-07-30 13:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.098) Loss 1.0986 (0.8628) Acc@1 73.096 (80.206) Acc@5 92.676 (95.226) Mem 9656MB [2024-07-30 13:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.884 Acc@5 95.208 [2024-07-30 13:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-30 13:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.88% [2024-07-30 13:25:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 13:25:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 13:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][0/625] eta 0:07:54 lr 0.001653 wd 0.0500 time 0.7590 (0.7590) data time 0.4227 (0.4227) model time 0.0000 (0.0000) loss 6.0328 (6.0328) grad_norm 1.3089 (1.3089) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 13:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][10/625] eta 0:03:08 lr 0.001653 wd 0.0500 time 0.2646 (0.3066) data time 0.0009 (0.0394) model time 0.0000 (0.0000) loss 3.9698 (5.9491) grad_norm 2.1796 (1.5657) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 13:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][20/625] eta 0:02:52 lr 0.001653 wd 0.0500 time 0.2589 (0.2848) data time 0.0010 (0.0211) model time 0.0000 (0.0000) loss 5.0311 (5.9193) grad_norm 2.3551 (1.5891) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 13:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][30/625] eta 0:02:45 lr 0.001653 wd 0.0500 time 0.2602 (0.2774) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 5.5791 (5.9664) grad_norm 2.0436 (1.6657) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 13:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][40/625] eta 0:02:40 lr 0.001652 wd 0.0500 time 0.2602 (0.2739) data time 0.0010 (0.0113) model time 0.0000 (0.0000) loss 6.3022 (5.8562) grad_norm 1.6528 (1.6590) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 13:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][50/625] eta 0:02:36 lr 0.001652 wd 0.0500 time 0.2606 (0.2716) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 6.2124 (5.8342) grad_norm 1.5684 (inf) loss_scale 2048.0000 (3855.0588) mem 9655MB [2024-07-30 13:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][60/625] eta 0:02:32 lr 0.001652 wd 0.0500 time 0.2600 (0.2700) data time 0.0010 (0.0079) model time 0.2590 (0.2611) loss 7.0494 (5.8567) grad_norm 2.2145 (inf) loss_scale 2048.0000 (3558.8197) mem 9655MB [2024-07-30 13:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][70/625] eta 0:02:29 lr 0.001652 wd 0.0500 time 0.2743 (0.2692) data time 0.0017 (0.0070) model time 0.2725 (0.2622) loss 5.2216 (5.8525) grad_norm 1.5155 (inf) loss_scale 2048.0000 (3346.0282) mem 9655MB [2024-07-30 13:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][80/625] eta 0:02:26 lr 0.001652 wd 0.0500 time 0.2609 (0.2687) data time 0.0007 (0.0062) model time 0.2602 (0.2627) loss 5.2568 (5.8664) grad_norm 1.7955 (inf) loss_scale 2048.0000 (3185.7778) mem 9655MB [2024-07-30 13:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][90/625] eta 0:02:23 lr 0.001652 wd 0.0500 time 0.2613 (0.2680) data time 0.0008 (0.0057) model time 0.2605 (0.2625) loss 4.6174 (5.8494) grad_norm 1.7602 (inf) loss_scale 2048.0000 (3060.7473) mem 9655MB [2024-07-30 13:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][100/625] eta 0:02:20 lr 0.001652 wd 0.0500 time 0.2610 (0.2675) data time 0.0008 (0.0052) model time 0.2602 (0.2623) loss 5.1148 (5.8243) grad_norm 1.9209 (inf) loss_scale 2048.0000 (2960.4752) mem 9655MB [2024-07-30 13:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][110/625] eta 0:02:17 lr 0.001651 wd 0.0500 time 0.2629 (0.2671) data time 0.0009 (0.0048) model time 0.2620 (0.2624) loss 6.5812 (5.8417) grad_norm 1.3145 (inf) loss_scale 2048.0000 (2878.2703) mem 9655MB [2024-07-30 13:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][120/625] eta 0:02:14 lr 0.001651 wd 0.0500 time 0.2610 (0.2669) data time 0.0010 (0.0045) model time 0.2600 (0.2624) loss 4.9490 (5.8548) grad_norm 1.8155 (inf) loss_scale 2048.0000 (2809.6529) mem 9655MB [2024-07-30 13:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][130/625] eta 0:02:11 lr 0.001651 wd 0.0500 time 0.2647 (0.2666) data time 0.0009 (0.0042) model time 0.2638 (0.2625) loss 5.6817 (5.8468) grad_norm 2.5908 (inf) loss_scale 2048.0000 (2751.5115) mem 9655MB [2024-07-30 13:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][140/625] eta 0:02:09 lr 0.001651 wd 0.0500 time 0.2621 (0.2665) data time 0.0012 (0.0040) model time 0.2610 (0.2626) loss 6.6167 (5.8433) grad_norm 1.7152 (inf) loss_scale 2048.0000 (2701.6170) mem 9655MB [2024-07-30 13:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][150/625] eta 0:02:06 lr 0.001651 wd 0.0500 time 0.2762 (0.2665) data time 0.0010 (0.0038) model time 0.2752 (0.2629) loss 7.2801 (5.8677) grad_norm 2.0047 (inf) loss_scale 2048.0000 (2658.3311) mem 9655MB [2024-07-30 13:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][160/625] eta 0:02:03 lr 0.001651 wd 0.0500 time 0.2607 (0.2662) data time 0.0007 (0.0036) model time 0.2600 (0.2627) loss 7.0750 (5.9187) grad_norm 1.8452 (inf) loss_scale 2048.0000 (2620.4224) mem 9655MB [2024-07-30 13:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][170/625] eta 0:02:01 lr 0.001651 wd 0.0500 time 0.2615 (0.2660) data time 0.0010 (0.0035) model time 0.2606 (0.2626) loss 5.8017 (5.9177) grad_norm 1.8848 (inf) loss_scale 2048.0000 (2586.9474) mem 9655MB [2024-07-30 13:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][180/625] eta 0:01:58 lr 0.001651 wd 0.0500 time 0.2624 (0.2658) data time 0.0013 (0.0034) model time 0.2611 (0.2625) loss 6.1501 (5.9232) grad_norm 1.3053 (inf) loss_scale 2048.0000 (2557.1713) mem 9655MB [2024-07-30 13:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][190/625] eta 0:01:55 lr 0.001650 wd 0.0500 time 0.2627 (0.2657) data time 0.0010 (0.0032) model time 0.2617 (0.2625) loss 6.6003 (5.9353) grad_norm 1.2995 (inf) loss_scale 2048.0000 (2530.5131) mem 9655MB [2024-07-30 13:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][200/625] eta 0:01:52 lr 0.001650 wd 0.0500 time 0.2591 (0.2655) data time 0.0011 (0.0031) model time 0.2580 (0.2625) loss 4.6706 (5.9192) grad_norm 2.0373 (inf) loss_scale 2048.0000 (2506.5075) mem 9655MB [2024-07-30 13:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][210/625] eta 0:01:50 lr 0.001650 wd 0.0500 time 0.2655 (0.2654) data time 0.0011 (0.0030) model time 0.2645 (0.2624) loss 7.2251 (5.9283) grad_norm 1.8893 (inf) loss_scale 2048.0000 (2484.7773) mem 9655MB [2024-07-30 13:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][220/625] eta 0:01:47 lr 0.001650 wd 0.0500 time 0.2622 (0.2653) data time 0.0009 (0.0029) model time 0.2613 (0.2624) loss 5.2583 (5.9158) grad_norm 1.5456 (inf) loss_scale 2048.0000 (2465.0136) mem 9655MB [2024-07-30 13:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][230/625] eta 0:01:44 lr 0.001650 wd 0.0500 time 0.2594 (0.2652) data time 0.0010 (0.0028) model time 0.2585 (0.2624) loss 6.4218 (5.9141) grad_norm 1.9652 (inf) loss_scale 2048.0000 (2446.9610) mem 9655MB [2024-07-30 13:26:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][240/625] eta 0:01:42 lr 0.001650 wd 0.0500 time 0.2562 (0.2652) data time 0.0009 (0.0028) model time 0.2553 (0.2624) loss 6.0754 (5.9171) grad_norm 1.2920 (inf) loss_scale 2048.0000 (2430.4066) mem 9655MB [2024-07-30 13:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][250/625] eta 0:01:39 lr 0.001650 wd 0.0500 time 0.2615 (0.2661) data time 0.0009 (0.0027) model time 0.2605 (0.2637) loss 5.4535 (5.9298) grad_norm 1.8853 (inf) loss_scale 2048.0000 (2415.1713) mem 9655MB [2024-07-30 13:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][260/625] eta 0:01:37 lr 0.001649 wd 0.0500 time 0.2654 (0.2661) data time 0.0010 (0.0026) model time 0.2644 (0.2638) loss 7.3461 (5.9281) grad_norm 1.3058 (inf) loss_scale 2048.0000 (2401.1034) mem 9655MB [2024-07-30 13:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][270/625] eta 0:01:34 lr 0.001649 wd 0.0500 time 0.2652 (0.2661) data time 0.0009 (0.0026) model time 0.2643 (0.2638) loss 6.6072 (5.9429) grad_norm 2.1753 (inf) loss_scale 2048.0000 (2388.0738) mem 9655MB [2024-07-30 13:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][280/625] eta 0:01:31 lr 0.001649 wd 0.0500 time 0.2643 (0.2660) data time 0.0009 (0.0025) model time 0.2634 (0.2638) loss 6.4250 (5.9471) grad_norm 1.4655 (inf) loss_scale 2048.0000 (2375.9715) mem 9655MB [2024-07-30 13:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][290/625] eta 0:01:29 lr 0.001649 wd 0.0500 time 0.2575 (0.2660) data time 0.0009 (0.0025) model time 0.2566 (0.2638) loss 6.6594 (5.9384) grad_norm 1.8550 (inf) loss_scale 2048.0000 (2364.7010) mem 9655MB [2024-07-30 13:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][300/625] eta 0:01:26 lr 0.001649 wd 0.0500 time 0.2613 (0.2659) data time 0.0008 (0.0024) model time 0.2605 (0.2637) loss 7.6410 (5.9367) grad_norm 1.2954 (inf) loss_scale 2048.0000 (2354.1794) mem 9655MB [2024-07-30 13:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][310/625] eta 0:01:24 lr 0.001649 wd 0.0500 time 0.2604 (0.2667) data time 0.0010 (0.0024) model time 0.2593 (0.2648) loss 5.6241 (5.9281) grad_norm 1.5543 (inf) loss_scale 2048.0000 (2344.3344) mem 9655MB [2024-07-30 13:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][320/625] eta 0:01:21 lr 0.001649 wd 0.0500 time 0.2692 (0.2667) data time 0.0011 (0.0023) model time 0.2682 (0.2648) loss 5.8969 (5.9293) grad_norm 1.6025 (inf) loss_scale 2048.0000 (2335.1028) mem 9655MB [2024-07-30 13:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][330/625] eta 0:01:18 lr 0.001648 wd 0.0500 time 0.2621 (0.2667) data time 0.0010 (0.0023) model time 0.2611 (0.2648) loss 5.5992 (5.9359) grad_norm 1.3463 (inf) loss_scale 2048.0000 (2326.4290) mem 9655MB [2024-07-30 13:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][340/625] eta 0:01:15 lr 0.001648 wd 0.0500 time 0.2657 (0.2666) data time 0.0015 (0.0023) model time 0.2641 (0.2647) loss 5.9010 (5.9348) grad_norm 1.3349 (inf) loss_scale 2048.0000 (2318.2639) mem 9655MB [2024-07-30 13:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][350/625] eta 0:01:13 lr 0.001648 wd 0.0500 time 0.2599 (0.2665) data time 0.0010 (0.0022) model time 0.2589 (0.2646) loss 7.1510 (5.9534) grad_norm 1.4061 (inf) loss_scale 2048.0000 (2310.5641) mem 9655MB [2024-07-30 13:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][360/625] eta 0:01:10 lr 0.001648 wd 0.0500 time 0.2671 (0.2665) data time 0.0007 (0.0022) model time 0.2664 (0.2646) loss 4.7800 (5.9405) grad_norm 0.9980 (inf) loss_scale 2048.0000 (2303.2909) mem 9655MB [2024-07-30 13:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][370/625] eta 0:01:07 lr 0.001648 wd 0.0500 time 0.2638 (0.2664) data time 0.0007 (0.0022) model time 0.2631 (0.2646) loss 5.1308 (5.9459) grad_norm 1.8932 (inf) loss_scale 2048.0000 (2296.4097) mem 9655MB [2024-07-30 13:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][380/625] eta 0:01:05 lr 0.001648 wd 0.0500 time 0.2733 (0.2664) data time 0.0007 (0.0021) model time 0.2726 (0.2645) loss 6.5866 (5.9488) grad_norm 2.6525 (inf) loss_scale 2048.0000 (2289.8898) mem 9655MB [2024-07-30 13:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][390/625] eta 0:01:02 lr 0.001648 wd 0.0500 time 0.2630 (0.2663) data time 0.0009 (0.0021) model time 0.2620 (0.2645) loss 6.5985 (5.9555) grad_norm 1.3350 (inf) loss_scale 2048.0000 (2283.7033) mem 9655MB [2024-07-30 13:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][400/625] eta 0:00:59 lr 0.001648 wd 0.0500 time 0.2663 (0.2663) data time 0.0009 (0.0021) model time 0.2655 (0.2645) loss 7.1073 (5.9653) grad_norm 2.3989 (inf) loss_scale 2048.0000 (2277.8254) mem 9655MB [2024-07-30 13:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][410/625] eta 0:00:57 lr 0.001647 wd 0.0500 time 0.2624 (0.2662) data time 0.0007 (0.0020) model time 0.2617 (0.2644) loss 7.1970 (5.9671) grad_norm 1.7802 (inf) loss_scale 2048.0000 (2272.2336) mem 9655MB [2024-07-30 13:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][420/625] eta 0:00:54 lr 0.001647 wd 0.0500 time 0.2647 (0.2662) data time 0.0008 (0.0020) model time 0.2640 (0.2644) loss 5.0402 (5.9690) grad_norm 2.6440 (inf) loss_scale 2048.0000 (2266.9074) mem 9655MB [2024-07-30 13:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][430/625] eta 0:00:51 lr 0.001647 wd 0.0500 time 0.2592 (0.2661) data time 0.0008 (0.0020) model time 0.2584 (0.2643) loss 5.8840 (5.9696) grad_norm 1.5529 (inf) loss_scale 2048.0000 (2261.8283) mem 9655MB [2024-07-30 13:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][440/625] eta 0:00:49 lr 0.001647 wd 0.0500 time 0.2703 (0.2661) data time 0.0009 (0.0020) model time 0.2694 (0.2643) loss 7.0279 (5.9666) grad_norm 1.3969 (inf) loss_scale 2048.0000 (2256.9796) mem 9655MB [2024-07-30 13:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][450/625] eta 0:00:46 lr 0.001647 wd 0.0500 time 0.2627 (0.2660) data time 0.0010 (0.0020) model time 0.2617 (0.2643) loss 5.9165 (5.9717) grad_norm 2.5325 (inf) loss_scale 2048.0000 (2252.3459) mem 9655MB [2024-07-30 13:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][460/625] eta 0:00:43 lr 0.001647 wd 0.0500 time 0.2646 (0.2659) data time 0.0009 (0.0019) model time 0.2637 (0.2642) loss 6.1556 (5.9797) grad_norm 2.5278 (inf) loss_scale 2048.0000 (2247.9132) mem 9655MB [2024-07-30 13:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][470/625] eta 0:00:41 lr 0.001647 wd 0.0500 time 0.2600 (0.2659) data time 0.0010 (0.0019) model time 0.2591 (0.2642) loss 5.4944 (5.9761) grad_norm 2.2195 (inf) loss_scale 2048.0000 (2243.6688) mem 9655MB [2024-07-30 13:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][480/625] eta 0:00:38 lr 0.001646 wd 0.0500 time 0.2640 (0.2658) data time 0.0007 (0.0019) model time 0.2633 (0.2641) loss 5.8233 (5.9719) grad_norm 2.0628 (inf) loss_scale 2048.0000 (2239.6008) mem 9655MB [2024-07-30 13:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][490/625] eta 0:00:35 lr 0.001646 wd 0.0500 time 0.2608 (0.2658) data time 0.0008 (0.0019) model time 0.2600 (0.2641) loss 6.8803 (5.9739) grad_norm 1.4984 (inf) loss_scale 2048.0000 (2235.6986) mem 9655MB [2024-07-30 13:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][500/625] eta 0:00:33 lr 0.001646 wd 0.0500 time 0.2619 (0.2657) data time 0.0010 (0.0019) model time 0.2609 (0.2640) loss 7.5200 (5.9680) grad_norm 1.9674 (inf) loss_scale 2048.0000 (2231.9521) mem 9655MB [2024-07-30 13:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][510/625] eta 0:00:30 lr 0.001646 wd 0.0500 time 0.2622 (0.2657) data time 0.0009 (0.0018) model time 0.2613 (0.2640) loss 7.0502 (5.9782) grad_norm 1.7056 (inf) loss_scale 2048.0000 (2228.3523) mem 9655MB [2024-07-30 13:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][520/625] eta 0:00:27 lr 0.001646 wd 0.0500 time 0.2693 (0.2657) data time 0.0009 (0.0018) model time 0.2684 (0.2640) loss 6.3748 (5.9794) grad_norm 1.5083 (inf) loss_scale 2048.0000 (2224.8906) mem 9655MB [2024-07-30 13:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][530/625] eta 0:00:25 lr 0.001646 wd 0.0500 time 0.2665 (0.2656) data time 0.0010 (0.0018) model time 0.2655 (0.2640) loss 6.3773 (5.9787) grad_norm 2.0348 (inf) loss_scale 2048.0000 (2221.5593) mem 9655MB [2024-07-30 13:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][540/625] eta 0:00:22 lr 0.001646 wd 0.0500 time 0.2719 (0.2656) data time 0.0010 (0.0018) model time 0.2709 (0.2640) loss 6.9124 (5.9739) grad_norm 1.8266 (inf) loss_scale 2048.0000 (2218.3512) mem 9655MB [2024-07-30 13:28:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][550/625] eta 0:00:19 lr 0.001645 wd 0.0500 time 0.2596 (0.2657) data time 0.0009 (0.0018) model time 0.2587 (0.2641) loss 4.5905 (5.9659) grad_norm 2.1814 (inf) loss_scale 2048.0000 (2215.2595) mem 9655MB [2024-07-30 13:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][560/625] eta 0:00:17 lr 0.001645 wd 0.0500 time 0.2688 (0.2656) data time 0.0009 (0.0018) model time 0.2680 (0.2640) loss 5.5691 (5.9664) grad_norm 1.7180 (inf) loss_scale 2048.0000 (2212.2781) mem 9655MB [2024-07-30 13:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][570/625] eta 0:00:14 lr 0.001645 wd 0.0500 time 0.2607 (0.2656) data time 0.0013 (0.0018) model time 0.2594 (0.2640) loss 6.9877 (5.9719) grad_norm 1.1741 (inf) loss_scale 2048.0000 (2209.4011) mem 9655MB [2024-07-30 13:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][580/625] eta 0:00:11 lr 0.001645 wd 0.0500 time 0.2644 (0.2656) data time 0.0007 (0.0017) model time 0.2638 (0.2640) loss 4.3580 (5.9775) grad_norm 1.2722 (inf) loss_scale 2048.0000 (2206.6231) mem 9655MB [2024-07-30 13:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][590/625] eta 0:00:09 lr 0.001645 wd 0.0500 time 0.2631 (0.2655) data time 0.0009 (0.0017) model time 0.2622 (0.2640) loss 4.2857 (5.9777) grad_norm 2.9527 (inf) loss_scale 2048.0000 (2203.9391) mem 9655MB [2024-07-30 13:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][600/625] eta 0:00:06 lr 0.001645 wd 0.0500 time 0.2624 (0.2655) data time 0.0011 (0.0017) model time 0.2613 (0.2639) loss 6.5015 (5.9835) grad_norm 1.0393 (inf) loss_scale 2048.0000 (2201.3444) mem 9655MB [2024-07-30 13:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][610/625] eta 0:00:03 lr 0.001645 wd 0.0500 time 0.2595 (0.2655) data time 0.0006 (0.0017) model time 0.2589 (0.2639) loss 5.9914 (5.9864) grad_norm 1.5514 (inf) loss_scale 2048.0000 (2198.8347) mem 9655MB [2024-07-30 13:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][620/625] eta 0:00:01 lr 0.001645 wd 0.0500 time 0.2655 (0.2654) data time 0.0005 (0.0017) model time 0.2651 (0.2639) loss 5.6281 (5.9868) grad_norm 3.7485 (inf) loss_scale 2048.0000 (2196.4058) mem 9655MB [2024-07-30 13:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 97 training takes 0:02:45 [2024-07-30 13:28:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:28:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.6455 (0.6455) Acc@1 87.207 (87.207) Acc@5 98.242 (98.242) Mem 9655MB [2024-07-30 13:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.103) Loss 1.0557 (0.8199) Acc@1 76.807 (82.844) Acc@5 93.848 (96.631) Mem 9655MB [2024-07-30 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.2178 (0.9818) Acc@1 72.852 (78.864) Acc@5 92.139 (94.608) Mem 9655MB [2024-07-30 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.553 Acc@5 94.548 [2024-07-30 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.6% [2024-07-30 13:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.55% [2024-07-30 13:28:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 13:28:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 13:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5625 (0.5625) Acc@1 88.184 (88.184) Acc@5 98.340 (98.340) Mem 9655MB [2024-07-30 13:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9443 (0.7154) Acc@1 78.271 (84.095) Acc@5 94.531 (97.013) Mem 9655MB [2024-07-30 13:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0977 (0.8622) Acc@1 73.145 (80.222) Acc@5 92.627 (95.240) Mem 9655MB [2024-07-30 13:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.912 Acc@5 95.222 [2024-07-30 13:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-30 13:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.91% [2024-07-30 13:28:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 13:28:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 13:28:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][0/625] eta 0:07:07 lr 0.001644 wd 0.0500 time 0.6845 (0.6845) data time 0.4188 (0.4188) model time 0.0000 (0.0000) loss 6.9956 (6.9956) grad_norm 1.6566 (1.6566) loss_scale 2048.0000 (2048.0000) mem 9654MB [2024-07-30 13:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][10/625] eta 0:03:06 lr 0.001644 wd 0.0500 time 0.2613 (0.3033) data time 0.0012 (0.0390) model time 0.0000 (0.0000) loss 6.3775 (6.1580) grad_norm 1.2074 (1.8318) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][20/625] eta 0:02:51 lr 0.001644 wd 0.0500 time 0.2613 (0.2838) data time 0.0009 (0.0209) model time 0.0000 (0.0000) loss 4.9724 (5.8761) grad_norm 1.7615 (1.7900) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][30/625] eta 0:02:44 lr 0.001644 wd 0.0500 time 0.2616 (0.2772) data time 0.0012 (0.0145) model time 0.0000 (0.0000) loss 6.7529 (6.0113) grad_norm 2.1626 (1.8744) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][40/625] eta 0:02:40 lr 0.001644 wd 0.0500 time 0.2629 (0.2738) data time 0.0010 (0.0112) model time 0.0000 (0.0000) loss 5.8632 (5.9009) grad_norm 1.4441 (1.7864) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][50/625] eta 0:02:36 lr 0.001644 wd 0.0500 time 0.2596 (0.2718) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 6.5447 (5.9202) grad_norm 2.1931 (1.7336) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][60/625] eta 0:02:32 lr 0.001644 wd 0.0500 time 0.2668 (0.2704) data time 0.0009 (0.0078) model time 0.2659 (0.2626) loss 6.0662 (5.8769) grad_norm 1.1649 (1.6790) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][70/625] eta 0:02:29 lr 0.001644 wd 0.0500 time 0.2637 (0.2695) data time 0.0009 (0.0069) model time 0.2629 (0.2629) loss 6.5345 (5.9075) grad_norm 1.1517 (1.6407) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:28:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][80/625] eta 0:02:26 lr 0.001643 wd 0.0500 time 0.2602 (0.2688) data time 0.0011 (0.0062) model time 0.2591 (0.2627) loss 6.6288 (5.9038) grad_norm 2.4153 (1.6748) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][90/625] eta 0:02:23 lr 0.001643 wd 0.0500 time 0.2609 (0.2683) data time 0.0009 (0.0056) model time 0.2600 (0.2629) loss 5.1025 (5.8997) grad_norm 1.3040 (1.6856) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][100/625] eta 0:02:20 lr 0.001643 wd 0.0500 time 0.2882 (0.2681) data time 0.0007 (0.0051) model time 0.2875 (0.2634) loss 6.6881 (5.9132) grad_norm 2.4584 (1.7189) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][110/625] eta 0:02:17 lr 0.001643 wd 0.0500 time 0.2614 (0.2676) data time 0.0009 (0.0047) model time 0.2605 (0.2631) loss 4.8847 (5.8778) grad_norm 2.0625 (1.6876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][120/625] eta 0:02:14 lr 0.001643 wd 0.0500 time 0.2611 (0.2672) data time 0.0009 (0.0044) model time 0.2602 (0.2628) loss 7.2280 (5.9093) grad_norm 1.8180 (1.6775) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][130/625] eta 0:02:12 lr 0.001643 wd 0.0500 time 0.2632 (0.2669) data time 0.0008 (0.0042) model time 0.2624 (0.2629) loss 7.4338 (5.9232) grad_norm 1.6780 (1.6893) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][140/625] eta 0:02:09 lr 0.001643 wd 0.0500 time 0.2671 (0.2669) data time 0.0008 (0.0039) model time 0.2663 (0.2631) loss 7.4608 (5.9524) grad_norm 1.6931 (1.6977) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][150/625] eta 0:02:06 lr 0.001642 wd 0.0500 time 0.2583 (0.2666) data time 0.0011 (0.0037) model time 0.2572 (0.2630) loss 6.2429 (5.9478) grad_norm 1.6573 (1.6797) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][160/625] eta 0:02:03 lr 0.001642 wd 0.0500 time 0.2627 (0.2665) data time 0.0008 (0.0036) model time 0.2619 (0.2631) loss 4.7374 (5.9342) grad_norm 1.0037 (1.6612) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][170/625] eta 0:02:01 lr 0.001642 wd 0.0500 time 0.2581 (0.2662) data time 0.0008 (0.0034) model time 0.2572 (0.2629) loss 4.7113 (5.9129) grad_norm 1.8276 (1.6585) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][180/625] eta 0:01:58 lr 0.001642 wd 0.0500 time 0.2651 (0.2660) data time 0.0010 (0.0033) model time 0.2641 (0.2628) loss 7.0410 (5.9303) grad_norm 3.2281 (1.6696) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][190/625] eta 0:01:55 lr 0.001642 wd 0.0500 time 0.2654 (0.2660) data time 0.0010 (0.0032) model time 0.2644 (0.2628) loss 5.2718 (5.9350) grad_norm 1.4353 (1.6669) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][200/625] eta 0:01:53 lr 0.001642 wd 0.0500 time 0.2590 (0.2659) data time 0.0008 (0.0031) model time 0.2582 (0.2629) loss 7.1404 (5.9418) grad_norm 2.4753 (1.6864) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][210/625] eta 0:01:50 lr 0.001642 wd 0.0500 time 0.2580 (0.2657) data time 0.0011 (0.0030) model time 0.2569 (0.2628) loss 6.7860 (5.9443) grad_norm 1.4983 (1.6861) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][220/625] eta 0:01:47 lr 0.001641 wd 0.0500 time 0.2590 (0.2656) data time 0.0008 (0.0029) model time 0.2581 (0.2627) loss 6.5041 (5.9528) grad_norm 3.2035 (1.6979) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][230/625] eta 0:01:44 lr 0.001641 wd 0.0500 time 0.2612 (0.2656) data time 0.0010 (0.0028) model time 0.2603 (0.2628) loss 5.6115 (5.9514) grad_norm 1.5338 (1.7163) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][240/625] eta 0:01:42 lr 0.001641 wd 0.0500 time 0.2617 (0.2655) data time 0.0011 (0.0027) model time 0.2606 (0.2628) loss 6.6354 (5.9457) grad_norm 1.8051 (1.7212) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][250/625] eta 0:01:39 lr 0.001641 wd 0.0500 time 0.2645 (0.2655) data time 0.0010 (0.0027) model time 0.2634 (0.2629) loss 6.6552 (5.9580) grad_norm 2.6973 (1.7421) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][260/625] eta 0:01:36 lr 0.001641 wd 0.0500 time 0.2721 (0.2655) data time 0.0007 (0.0026) model time 0.2714 (0.2630) loss 7.0181 (5.9609) grad_norm 1.9162 (1.7423) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][270/625] eta 0:01:34 lr 0.001641 wd 0.0500 time 0.2644 (0.2657) data time 0.0009 (0.0025) model time 0.2635 (0.2633) loss 6.3003 (5.9721) grad_norm 2.1963 (1.7397) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][280/625] eta 0:01:31 lr 0.001641 wd 0.0500 time 0.2639 (0.2657) data time 0.0010 (0.0025) model time 0.2630 (0.2633) loss 4.1066 (5.9664) grad_norm 2.2997 (1.7490) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][290/625] eta 0:01:28 lr 0.001641 wd 0.0500 time 0.2646 (0.2656) data time 0.0007 (0.0024) model time 0.2639 (0.2633) loss 7.7396 (5.9663) grad_norm 1.5638 (1.7710) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][300/625] eta 0:01:26 lr 0.001640 wd 0.0500 time 0.2655 (0.2656) data time 0.0007 (0.0024) model time 0.2647 (0.2634) loss 6.0262 (5.9574) grad_norm 2.9066 (1.7727) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][310/625] eta 0:01:23 lr 0.001640 wd 0.0500 time 0.2639 (0.2655) data time 0.0006 (0.0023) model time 0.2632 (0.2633) loss 4.5077 (5.9571) grad_norm 1.3129 (1.7778) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][320/625] eta 0:01:20 lr 0.001640 wd 0.0500 time 0.2719 (0.2655) data time 0.0007 (0.0023) model time 0.2712 (0.2633) loss 4.8955 (5.9647) grad_norm 1.3075 (1.7815) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][330/625] eta 0:01:18 lr 0.001640 wd 0.0500 time 0.2640 (0.2654) data time 0.0009 (0.0023) model time 0.2631 (0.2632) loss 7.1772 (5.9811) grad_norm 1.4369 (1.7818) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][340/625] eta 0:01:15 lr 0.001640 wd 0.0500 time 0.2705 (0.2653) data time 0.0010 (0.0022) model time 0.2695 (0.2632) loss 6.1600 (5.9835) grad_norm 1.3670 (1.7911) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][350/625] eta 0:01:12 lr 0.001640 wd 0.0500 time 0.2707 (0.2653) data time 0.0007 (0.0022) model time 0.2700 (0.2632) loss 6.1694 (5.9876) grad_norm 1.6170 (1.7781) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][360/625] eta 0:01:10 lr 0.001640 wd 0.0500 time 0.2616 (0.2659) data time 0.0010 (0.0022) model time 0.2606 (0.2639) loss 4.9485 (5.9884) grad_norm 1.8786 (1.7678) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][370/625] eta 0:01:07 lr 0.001639 wd 0.0500 time 0.2616 (0.2658) data time 0.0009 (0.0021) model time 0.2607 (0.2638) loss 7.0873 (5.9954) grad_norm 2.2327 (1.7672) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][380/625] eta 0:01:05 lr 0.001639 wd 0.0500 time 0.2606 (0.2657) data time 0.0010 (0.0021) model time 0.2596 (0.2638) loss 6.2199 (5.9859) grad_norm 1.2979 (1.7754) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][390/625] eta 0:01:02 lr 0.001639 wd 0.0500 time 0.2593 (0.2657) data time 0.0011 (0.0021) model time 0.2582 (0.2637) loss 6.8103 (6.0021) grad_norm 1.9311 (1.7717) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][400/625] eta 0:00:59 lr 0.001639 wd 0.0500 time 0.2737 (0.2657) data time 0.0009 (0.0020) model time 0.2729 (0.2638) loss 6.9736 (6.0106) grad_norm 1.1370 (1.7714) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][410/625] eta 0:00:57 lr 0.001639 wd 0.0500 time 0.2648 (0.2656) data time 0.0010 (0.0020) model time 0.2638 (0.2637) loss 6.4119 (6.0103) grad_norm 1.7328 (1.7759) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][420/625] eta 0:00:54 lr 0.001639 wd 0.0500 time 0.2630 (0.2656) data time 0.0007 (0.0020) model time 0.2623 (0.2637) loss 4.7698 (6.0140) grad_norm 2.4378 (1.7815) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][430/625] eta 0:00:51 lr 0.001639 wd 0.0500 time 0.2597 (0.2656) data time 0.0010 (0.0020) model time 0.2588 (0.2637) loss 5.2617 (6.0120) grad_norm 1.6025 (1.7842) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][440/625] eta 0:00:49 lr 0.001638 wd 0.0500 time 0.2633 (0.2655) data time 0.0008 (0.0020) model time 0.2625 (0.2637) loss 5.9816 (6.0145) grad_norm 2.3992 (1.7783) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][450/625] eta 0:00:46 lr 0.001638 wd 0.0500 time 0.2624 (0.2655) data time 0.0011 (0.0019) model time 0.2613 (0.2637) loss 6.0765 (6.0155) grad_norm 1.4021 (1.7765) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][460/625] eta 0:00:43 lr 0.001638 wd 0.0500 time 0.2724 (0.2656) data time 0.0009 (0.0019) model time 0.2715 (0.2638) loss 4.3357 (6.0126) grad_norm 1.2692 (1.7730) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][470/625] eta 0:00:41 lr 0.001638 wd 0.0500 time 0.2588 (0.2655) data time 0.0011 (0.0019) model time 0.2577 (0.2638) loss 6.3678 (6.0133) grad_norm 1.3535 (1.7679) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][480/625] eta 0:00:38 lr 0.001638 wd 0.0500 time 0.2647 (0.2655) data time 0.0010 (0.0019) model time 0.2638 (0.2638) loss 7.1322 (6.0248) grad_norm 1.4208 (1.7656) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][490/625] eta 0:00:35 lr 0.001638 wd 0.0500 time 0.2618 (0.2655) data time 0.0007 (0.0019) model time 0.2611 (0.2638) loss 5.3380 (6.0241) grad_norm 2.0248 (1.7668) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][500/625] eta 0:00:33 lr 0.001638 wd 0.0500 time 0.2713 (0.2655) data time 0.0011 (0.0018) model time 0.2702 (0.2638) loss 5.8166 (6.0332) grad_norm 1.3669 (1.7701) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][510/625] eta 0:00:30 lr 0.001637 wd 0.0500 time 0.2585 (0.2654) data time 0.0010 (0.0018) model time 0.2575 (0.2637) loss 6.4737 (6.0325) grad_norm 1.1916 (1.7690) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][520/625] eta 0:00:27 lr 0.001637 wd 0.0500 time 0.2636 (0.2654) data time 0.0007 (0.0018) model time 0.2629 (0.2637) loss 5.0070 (6.0301) grad_norm 1.2961 (1.7636) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][530/625] eta 0:00:25 lr 0.001637 wd 0.0500 time 0.2593 (0.2653) data time 0.0011 (0.0018) model time 0.2582 (0.2636) loss 5.1356 (6.0216) grad_norm 2.8995 (1.7615) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][540/625] eta 0:00:22 lr 0.001637 wd 0.0500 time 0.2653 (0.2653) data time 0.0012 (0.0018) model time 0.2641 (0.2636) loss 5.9224 (6.0199) grad_norm 1.9615 (1.7573) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][550/625] eta 0:00:19 lr 0.001637 wd 0.0500 time 0.2603 (0.2653) data time 0.0008 (0.0018) model time 0.2596 (0.2636) loss 7.3105 (6.0216) grad_norm 1.5018 (1.7501) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][560/625] eta 0:00:17 lr 0.001637 wd 0.0500 time 0.2615 (0.2653) data time 0.0009 (0.0017) model time 0.2606 (0.2636) loss 6.1986 (6.0204) grad_norm 1.2956 (1.7501) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][570/625] eta 0:00:14 lr 0.001637 wd 0.0500 time 0.2593 (0.2652) data time 0.0011 (0.0017) model time 0.2582 (0.2636) loss 5.3243 (6.0260) grad_norm 1.9918 (1.7494) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][580/625] eta 0:00:11 lr 0.001637 wd 0.0500 time 0.2615 (0.2653) data time 0.0009 (0.0017) model time 0.2606 (0.2636) loss 5.1637 (6.0321) grad_norm 1.4607 (1.7456) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][590/625] eta 0:00:09 lr 0.001636 wd 0.0500 time 0.2660 (0.2653) data time 0.0010 (0.0017) model time 0.2650 (0.2636) loss 5.9866 (6.0358) grad_norm 1.6737 (1.7388) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][600/625] eta 0:00:06 lr 0.001636 wd 0.0500 time 0.2647 (0.2660) data time 0.0007 (0.0017) model time 0.2641 (0.2644) loss 7.0151 (6.0413) grad_norm 1.6151 (1.7355) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][610/625] eta 0:00:03 lr 0.001636 wd 0.0500 time 0.2653 (0.2659) data time 0.0005 (0.0017) model time 0.2648 (0.2644) loss 5.5845 (6.0412) grad_norm 2.6035 (1.7362) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][620/625] eta 0:00:01 lr 0.001636 wd 0.0500 time 0.2631 (0.2659) data time 0.0005 (0.0017) model time 0.2626 (0.2643) loss 5.8639 (6.0445) grad_norm 2.6868 (1.7427) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 98 training takes 0:02:46 [2024-07-30 13:31:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:31:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.547 (0.547) Loss 0.6777 (0.6777) Acc@1 86.133 (86.133) Acc@5 97.705 (97.705) Mem 9655MB [2024-07-30 13:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 1.1104 (0.8451) Acc@1 74.805 (82.271) Acc@5 93.799 (96.586) Mem 9655MB [2024-07-30 13:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.2861 (1.0072) Acc@1 70.947 (78.299) Acc@5 91.992 (94.596) Mem 9655MB [2024-07-30 13:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.077 Acc@5 94.540 [2024-07-30 13:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-30 13:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.818 (0.818) Loss 0.5630 (0.5630) Acc@1 88.135 (88.135) Acc@5 98.340 (98.340) Mem 9655MB [2024-07-30 13:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.136) Loss 0.9443 (0.7156) Acc@1 78.418 (84.082) Acc@5 94.727 (97.030) Mem 9655MB [2024-07-30 13:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0986 (0.8622) Acc@1 73.193 (80.243) Acc@5 92.627 (95.247) Mem 9655MB [2024-07-30 13:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.942 Acc@5 95.228 [2024-07-30 13:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-30 13:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.94% [2024-07-30 13:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 13:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 13:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][0/625] eta 0:09:27 lr 0.001636 wd 0.0500 time 0.9076 (0.9076) data time 0.6545 (0.6545) model time 0.0000 (0.0000) loss 7.1124 (7.1124) grad_norm 1.3963 (1.3963) loss_scale 2048.0000 (2048.0000) mem 9654MB [2024-07-30 13:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][10/625] eta 0:03:18 lr 0.001636 wd 0.0500 time 0.2625 (0.3231) data time 0.0011 (0.0604) model time 0.0000 (0.0000) loss 6.2095 (5.9567) grad_norm 2.7347 (2.3328) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][20/625] eta 0:02:58 lr 0.001636 wd 0.0500 time 0.2609 (0.2950) data time 0.0007 (0.0321) model time 0.0000 (0.0000) loss 4.3163 (5.8663) grad_norm 0.9736 (1.9354) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][30/625] eta 0:02:49 lr 0.001635 wd 0.0500 time 0.2686 (0.2853) data time 0.0008 (0.0221) model time 0.0000 (0.0000) loss 6.9165 (5.9707) grad_norm 1.8170 (1.7741) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][40/625] eta 0:02:43 lr 0.001635 wd 0.0500 time 0.2686 (0.2800) data time 0.0009 (0.0169) model time 0.0000 (0.0000) loss 6.6779 (5.9618) grad_norm 1.3772 (1.6875) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][50/625] eta 0:02:38 lr 0.001635 wd 0.0500 time 0.2623 (0.2765) data time 0.0007 (0.0138) model time 0.0000 (0.0000) loss 7.0387 (6.0623) grad_norm 2.0201 (1.6542) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][60/625] eta 0:02:35 lr 0.001635 wd 0.0500 time 0.2648 (0.2745) data time 0.0009 (0.0117) model time 0.2639 (0.2632) loss 6.2469 (6.0698) grad_norm 3.8590 (1.7031) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][70/625] eta 0:02:31 lr 0.001635 wd 0.0500 time 0.2634 (0.2729) data time 0.0007 (0.0102) model time 0.2626 (0.2630) loss 5.3938 (5.9563) grad_norm 1.7941 (1.7953) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:31:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:31:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:31:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:37:26 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 99) [2024-07-30 13:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][80/625] eta 0:22:08 lr 0.001635 wd 0.0500 time 0.2548 (2.4371) data time 0.0006 (0.2787) model time 0.2542 (2.1583) loss 5.0179 (6.3768) grad_norm 1.2660 (1.9544) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][90/625] eta 0:06:46 lr 0.001635 wd 0.0500 time 0.2531 (0.7589) data time 0.0009 (0.0651) model time 0.2522 (0.6938) loss 5.5866 (6.3216) grad_norm 1.2095 (2.0623) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][100/625] eta 0:04:42 lr 0.001635 wd 0.0500 time 0.2537 (0.5387) data time 0.0005 (0.0372) model time 0.2531 (0.5015) loss 7.0826 (6.3241) grad_norm 1.4358 (1.8023) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][110/625] eta 0:03:52 lr 0.001634 wd 0.0500 time 0.2553 (0.4521) data time 0.0006 (0.0262) model time 0.2547 (0.4259) loss 7.2412 (6.3350) grad_norm 2.9784 (1.7105) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][120/625] eta 0:03:24 lr 0.001634 wd 0.0500 time 0.2512 (0.4055) data time 0.0009 (0.0203) model time 0.2504 (0.3852) loss 5.7100 (6.2322) grad_norm 2.8372 (1.7414) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][130/625] eta 0:03:06 lr 0.001634 wd 0.0500 time 0.2548 (0.3766) data time 0.0008 (0.0167) model time 0.2540 (0.3600) loss 5.7402 (6.2178) grad_norm 1.9585 (1.8094) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][140/625] eta 0:02:53 lr 0.001634 wd 0.0500 time 0.2563 (0.3574) data time 0.0008 (0.0142) model time 0.2555 (0.3432) loss 5.9371 (6.1762) grad_norm 1.2095 (1.7699) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][150/625] eta 0:02:43 lr 0.001634 wd 0.0500 time 0.2541 (0.3434) data time 0.0007 (0.0124) model time 0.2533 (0.3310) loss 6.7769 (6.1290) grad_norm 2.3156 (1.7887) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][160/625] eta 0:02:34 lr 0.001634 wd 0.0500 time 0.2536 (0.3326) data time 0.0006 (0.0110) model time 0.2530 (0.3216) loss 4.6326 (6.0892) grad_norm 1.6876 (1.7798) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][170/625] eta 0:02:27 lr 0.001634 wd 0.0500 time 0.2531 (0.3242) data time 0.0007 (0.0099) model time 0.2524 (0.3143) loss 6.3903 (6.0900) grad_norm 1.2322 (1.7571) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][180/625] eta 0:02:21 lr 0.001633 wd 0.0500 time 0.2499 (0.3174) data time 0.0007 (0.0090) model time 0.2492 (0.3083) loss 6.7830 (6.1336) grad_norm 2.0920 (1.7792) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][190/625] eta 0:02:15 lr 0.001633 wd 0.0500 time 0.2471 (0.3117) data time 0.0007 (0.0083) model time 0.2464 (0.3034) loss 5.6330 (6.1131) grad_norm 1.2052 (1.8201) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][200/625] eta 0:02:10 lr 0.001633 wd 0.0500 time 0.2544 (0.3071) data time 0.0008 (0.0077) model time 0.2537 (0.2993) loss 5.8318 (6.1081) grad_norm 2.6270 (1.8154) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][210/625] eta 0:02:05 lr 0.001633 wd 0.0500 time 0.2509 (0.3031) data time 0.0009 (0.0072) model time 0.2500 (0.2959) loss 6.0735 (6.0997) grad_norm 1.6396 (1.8348) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][220/625] eta 0:02:01 lr 0.001633 wd 0.0500 time 0.2538 (0.2996) data time 0.0008 (0.0068) model time 0.2530 (0.2928) loss 6.4795 (6.0855) grad_norm 2.2519 (1.8299) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][230/625] eta 0:01:57 lr 0.001633 wd 0.0500 time 0.2514 (0.2967) data time 0.0007 (0.0064) model time 0.2507 (0.2903) loss 5.9065 (6.0829) grad_norm 1.2040 (1.8179) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][240/625] eta 0:01:53 lr 0.001633 wd 0.0500 time 0.2503 (0.2941) data time 0.0008 (0.0061) model time 0.2495 (0.2880) loss 5.3339 (6.0866) grad_norm 1.1710 (1.8145) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][250/625] eta 0:01:49 lr 0.001632 wd 0.0500 time 0.2560 (0.2918) data time 0.0007 (0.0058) model time 0.2553 (0.2860) loss 6.5220 (6.0873) grad_norm 1.3896 (1.8233) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][260/625] eta 0:01:45 lr 0.001632 wd 0.0500 time 0.2543 (0.2898) data time 0.0007 (0.0055) model time 0.2536 (0.2843) loss 5.9379 (6.0776) grad_norm 1.7498 (1.8452) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][270/625] eta 0:01:42 lr 0.001632 wd 0.0500 time 0.2553 (0.2880) data time 0.0008 (0.0053) model time 0.2545 (0.2827) loss 5.8090 (6.0671) grad_norm 1.2487 (1.8440) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][280/625] eta 0:01:38 lr 0.001632 wd 0.0500 time 0.2542 (0.2864) data time 0.0007 (0.0051) model time 0.2535 (0.2813) loss 4.8190 (6.0426) grad_norm 1.5770 (1.8325) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][290/625] eta 0:01:35 lr 0.001632 wd 0.0500 time 0.2532 (0.2849) data time 0.0009 (0.0049) model time 0.2523 (0.2800) loss 4.4974 (6.0413) grad_norm 2.0502 (1.8186) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][300/625] eta 0:01:32 lr 0.001632 wd 0.0500 time 0.2571 (0.2837) data time 0.0006 (0.0047) model time 0.2565 (0.2790) loss 5.0018 (6.0331) grad_norm 2.6600 (1.8148) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][310/625] eta 0:01:28 lr 0.001632 wd 0.0500 time 0.2538 (0.2825) data time 0.0008 (0.0045) model time 0.2530 (0.2779) loss 5.3304 (6.0284) grad_norm 1.5179 (1.8054) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][320/625] eta 0:01:25 lr 0.001631 wd 0.0500 time 0.2567 (0.2813) data time 0.0008 (0.0044) model time 0.2559 (0.2770) loss 6.7501 (6.0321) grad_norm 2.0192 (1.8020) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][330/625] eta 0:01:22 lr 0.001631 wd 0.0500 time 0.2524 (0.2803) data time 0.0009 (0.0042) model time 0.2515 (0.2760) loss 7.3930 (6.0188) grad_norm 1.6292 (1.7900) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][340/625] eta 0:01:19 lr 0.001631 wd 0.0500 time 0.2549 (0.2793) data time 0.0007 (0.0041) model time 0.2542 (0.2752) loss 5.4161 (6.0005) grad_norm 2.0707 (1.7935) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][350/625] eta 0:01:16 lr 0.001631 wd 0.0500 time 0.2536 (0.2784) data time 0.0007 (0.0040) model time 0.2529 (0.2744) loss 7.6124 (5.9968) grad_norm 1.5873 (1.7886) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][360/625] eta 0:01:13 lr 0.001631 wd 0.0500 time 0.2529 (0.2776) data time 0.0008 (0.0039) model time 0.2521 (0.2737) loss 6.3372 (5.9985) grad_norm 1.4135 (1.7765) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][370/625] eta 0:01:10 lr 0.001631 wd 0.0500 time 0.2551 (0.2768) data time 0.0009 (0.0038) model time 0.2541 (0.2730) loss 5.8750 (5.9934) grad_norm 2.8032 (1.7810) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][380/625] eta 0:01:07 lr 0.001631 wd 0.0500 time 0.2531 (0.2761) data time 0.0007 (0.0037) model time 0.2523 (0.2724) loss 5.8082 (5.9816) grad_norm 1.1486 (1.8031) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][390/625] eta 0:01:04 lr 0.001631 wd 0.0500 time 0.2563 (0.2754) data time 0.0007 (0.0036) model time 0.2556 (0.2718) loss 6.4727 (5.9900) grad_norm 1.7517 (1.8105) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][400/625] eta 0:01:01 lr 0.001630 wd 0.0500 time 0.2513 (0.2748) data time 0.0010 (0.0035) model time 0.2503 (0.2712) loss 6.1118 (5.9985) grad_norm 2.6737 (1.8043) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][410/625] eta 0:00:58 lr 0.001630 wd 0.0500 time 0.2539 (0.2742) data time 0.0009 (0.0035) model time 0.2530 (0.2707) loss 6.9215 (6.0083) grad_norm 1.6329 (1.8050) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][420/625] eta 0:00:56 lr 0.001630 wd 0.0500 time 0.2560 (0.2737) data time 0.0010 (0.0034) model time 0.2550 (0.2703) loss 6.9206 (6.0118) grad_norm 1.6329 (1.8050) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][430/625] eta 0:00:53 lr 0.001630 wd 0.0500 time 0.2575 (0.2732) data time 0.0006 (0.0033) model time 0.2569 (0.2699) loss 6.7907 (6.0160) grad_norm 1.2836 (1.7987) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][440/625] eta 0:00:50 lr 0.001630 wd 0.0500 time 0.2538 (0.2727) data time 0.0006 (0.0032) model time 0.2532 (0.2695) loss 5.7747 (6.0156) grad_norm 1.3248 (1.7880) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][450/625] eta 0:00:47 lr 0.001630 wd 0.0500 time 0.2527 (0.2722) data time 0.0007 (0.0032) model time 0.2520 (0.2691) loss 6.6082 (6.0098) grad_norm 0.9923 (1.7841) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][460/625] eta 0:00:44 lr 0.001630 wd 0.0500 time 0.2554 (0.2718) data time 0.0010 (0.0031) model time 0.2545 (0.2687) loss 4.7410 (6.0058) grad_norm 1.9172 (1.7779) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][470/625] eta 0:00:42 lr 0.001629 wd 0.0500 time 0.2538 (0.2714) data time 0.0009 (0.0031) model time 0.2529 (0.2683) loss 6.5401 (6.0019) grad_norm 2.5955 (1.7748) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][480/625] eta 0:00:39 lr 0.001629 wd 0.0500 time 0.2522 (0.2710) data time 0.0010 (0.0030) model time 0.2512 (0.2680) loss 5.3902 (6.0068) grad_norm 1.5903 (1.7719) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][490/625] eta 0:00:36 lr 0.001629 wd 0.0500 time 0.2518 (0.2706) data time 0.0008 (0.0030) model time 0.2509 (0.2676) loss 6.7296 (6.0128) grad_norm 2.0176 (1.7699) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:39:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:39:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:41:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:41:49 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 99) [2024-07-30 13:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:42:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:42:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:42:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:44:44 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 99) [2024-07-30 13:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][500/625] eta 0:04:10 lr 0.001629 wd 0.0500 time 0.2661 (2.0068) data time 0.0011 (0.1789) model time 0.2650 (1.8279) loss 6.7323 (6.8263) grad_norm 2.4062 (2.1313) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][510/625] eta 0:01:52 lr 0.001629 wd 0.0500 time 0.2618 (0.9806) data time 0.0010 (0.0743) model time 0.2608 (0.9063) loss 5.5115 (6.4607) grad_norm 1.5356 (2.1142) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][520/625] eta 0:01:15 lr 0.001629 wd 0.0500 time 0.2587 (0.7152) data time 0.0011 (0.0472) model time 0.2576 (0.6681) loss 7.2154 (6.4403) grad_norm 2.5952 (2.1114) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][530/625] eta 0:00:56 lr 0.001629 wd 0.0500 time 0.2610 (0.5930) data time 0.0010 (0.0347) model time 0.2601 (0.5583) loss 6.2954 (6.3864) grad_norm 2.5479 (2.0886) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][540/625] eta 0:00:44 lr 0.001628 wd 0.0500 time 0.2608 (0.5226) data time 0.0008 (0.0275) model time 0.2601 (0.4951) loss 5.6651 (6.3047) grad_norm 2.1032 (2.1142) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][550/625] eta 0:00:35 lr 0.001628 wd 0.0500 time 0.2640 (0.4771) data time 0.0010 (0.0229) model time 0.2630 (0.4542) loss 5.9046 (6.2450) grad_norm 1.9636 (2.1109) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][560/625] eta 0:00:28 lr 0.001628 wd 0.0500 time 0.2646 (0.4454) data time 0.0010 (0.0196) model time 0.2637 (0.4258) loss 6.7667 (6.2034) grad_norm 2.6851 (2.0802) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][570/625] eta 0:00:23 lr 0.001628 wd 0.0500 time 0.2618 (0.4218) data time 0.0010 (0.0172) model time 0.2608 (0.4046) loss 6.8179 (6.1546) grad_norm 1.1978 (2.0805) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][580/625] eta 0:00:18 lr 0.001628 wd 0.0500 time 0.2648 (0.4044) data time 0.0010 (0.0154) model time 0.2638 (0.3889) loss 5.7859 (6.1124) grad_norm 1.4193 (2.0728) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][590/625] eta 0:00:13 lr 0.001628 wd 0.0500 time 0.2606 (0.3901) data time 0.0010 (0.0140) model time 0.2596 (0.3762) loss 6.9051 (6.1183) grad_norm 1.5788 (2.0415) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][600/625] eta 0:00:09 lr 0.001628 wd 0.0500 time 0.2632 (0.3789) data time 0.0009 (0.0128) model time 0.2623 (0.3661) loss 5.7737 (6.1508) grad_norm 1.0301 (1.9957) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][610/625] eta 0:00:05 lr 0.001627 wd 0.0500 time 0.2605 (0.3694) data time 0.0007 (0.0119) model time 0.2598 (0.3575) loss 7.3536 (6.1503) grad_norm 2.0336 (1.9707) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][620/625] eta 0:00:01 lr 0.001627 wd 0.0500 time 0.2604 (0.3615) data time 0.0007 (0.0111) model time 0.2597 (0.3505) loss 6.5556 (6.1313) grad_norm 1.7611 (1.9671) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 13:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 99 training takes 0:00:46 [2024-07-30 13:45:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:45:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:45:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.6592 (0.6592) Acc@1 87.256 (87.256) Acc@5 97.998 (97.998) Mem 9656MB [2024-07-30 13:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.061 (0.096) Loss 1.1260 (0.8339) Acc@1 74.463 (82.644) Acc@5 93.701 (96.529) Mem 9656MB [2024-07-30 13:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 1.2578 (0.9915) Acc@1 72.021 (78.734) Acc@5 91.895 (94.727) Mem 9656MB [2024-07-30 13:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.493 Acc@5 94.704 [2024-07-30 13:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-30 13:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.820 (0.820) Loss 0.5630 (0.5630) Acc@1 88.086 (88.086) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 13:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.138) Loss 0.9443 (0.7158) Acc@1 78.271 (84.086) Acc@5 94.775 (97.066) Mem 9656MB [2024-07-30 13:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.098) Loss 1.0977 (0.8621) Acc@1 73.047 (80.259) Acc@5 92.773 (95.287) Mem 9656MB [2024-07-30 13:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.966 Acc@5 95.262 [2024-07-30 13:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-30 13:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.97% [2024-07-30 13:45:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 13:45:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 13:45:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][0/625] eta 0:08:04 lr 0.001627 wd 0.0500 time 0.7750 (0.7750) data time 0.4093 (0.4093) model time 0.0000 (0.0000) loss 6.3431 (6.3431) grad_norm 1.5299 (1.5299) loss_scale 2048.0000 (2048.0000) mem 9651MB [2024-07-30 13:46:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][10/625] eta 0:03:10 lr 0.001627 wd 0.0500 time 0.2641 (0.3095) data time 0.0010 (0.0383) model time 0.0000 (0.0000) loss 6.0243 (5.9600) grad_norm 2.5578 (1.8049) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][20/625] eta 0:02:53 lr 0.001627 wd 0.0500 time 0.2630 (0.2872) data time 0.0009 (0.0205) model time 0.0000 (0.0000) loss 6.4321 (5.8883) grad_norm 1.8529 (1.7212) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][30/625] eta 0:02:46 lr 0.001627 wd 0.0500 time 0.2645 (0.2795) data time 0.0010 (0.0142) model time 0.0000 (0.0000) loss 6.1883 (6.0064) grad_norm 1.1805 (1.6319) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][40/625] eta 0:02:41 lr 0.001627 wd 0.0500 time 0.2615 (0.2758) data time 0.0012 (0.0110) model time 0.0000 (0.0000) loss 5.5283 (5.9984) grad_norm 1.1675 (1.5688) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][50/625] eta 0:02:37 lr 0.001627 wd 0.0500 time 0.2617 (0.2734) data time 0.0010 (0.0091) model time 0.0000 (0.0000) loss 6.4057 (6.0126) grad_norm 2.2178 (1.5728) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][60/625] eta 0:02:33 lr 0.001626 wd 0.0500 time 0.2624 (0.2718) data time 0.0010 (0.0077) model time 0.2614 (0.2622) loss 6.3274 (6.0201) grad_norm 3.1281 (1.6440) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][70/625] eta 0:02:30 lr 0.001626 wd 0.0500 time 0.2622 (0.2706) data time 0.0007 (0.0068) model time 0.2614 (0.2625) loss 6.4942 (5.9738) grad_norm 2.3297 (1.6415) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][80/625] eta 0:02:27 lr 0.001626 wd 0.0500 time 0.2612 (0.2700) data time 0.0011 (0.0061) model time 0.2601 (0.2633) loss 5.6578 (5.9653) grad_norm 2.1454 (1.6928) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][90/625] eta 0:02:24 lr 0.001626 wd 0.0500 time 0.2648 (0.2695) data time 0.0008 (0.0055) model time 0.2640 (0.2635) loss 6.9268 (5.9233) grad_norm 1.7496 (1.7375) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][100/625] eta 0:02:21 lr 0.001626 wd 0.0500 time 0.2648 (0.2689) data time 0.0007 (0.0051) model time 0.2641 (0.2633) loss 5.8958 (5.9249) grad_norm 1.2430 (1.6991) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][110/625] eta 0:02:18 lr 0.001626 wd 0.0500 time 0.2637 (0.2685) data time 0.0007 (0.0047) model time 0.2630 (0.2634) loss 6.4614 (5.9316) grad_norm 1.2937 (1.6591) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][120/625] eta 0:02:15 lr 0.001626 wd 0.0500 time 0.2660 (0.2682) data time 0.0008 (0.0044) model time 0.2652 (0.2634) loss 7.3688 (5.9111) grad_norm 1.6258 (1.6439) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][130/625] eta 0:02:12 lr 0.001625 wd 0.0500 time 0.2659 (0.2680) data time 0.0011 (0.0041) model time 0.2648 (0.2636) loss 6.2846 (5.8862) grad_norm 1.4144 (1.6529) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 13:46:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:46:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:46:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 13:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 13:49:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 13:50:06 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 100) [2024-07-30 13:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 13:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][140/625] eta 0:29:35 lr 0.001625 wd 0.0500 time 0.2460 (3.6599) data time 0.0008 (0.2769) model time 0.2452 (3.3831) loss 5.5086 (6.6118) grad_norm 1.4370 (1.7396) loss_scale 2048.0000 (2048.0000) mem 9661MB [2024-07-30 13:50:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 13:50:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 13:50:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:17:56 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:18:17 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:18:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:18:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:18:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 100) [2024-07-30 14:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 14:18:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:18:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:20:25 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 100) [2024-07-30 14:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][150/625] eta 1:06:35 lr 0.001625 wd 0.0500 time 8.4114 (8.4114) data time 0.7153 (0.7153) model time 7.6961 (7.6961) loss 6.6924 (6.6924) grad_norm 1.6127 (1.6127) loss_scale 2048.0000 (2048.0000) mem 10976MB [2024-07-30 14:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][160/625] eta 0:08:14 lr 0.001625 wd 0.0500 time 0.2566 (1.0640) data time 0.0011 (0.0660) model time 0.2555 (0.9980) loss 4.7356 (6.1863) grad_norm 1.0028 (1.7121) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][170/625] eta 0:05:09 lr 0.001625 wd 0.0500 time 0.2554 (0.6808) data time 0.0012 (0.0351) model time 0.2542 (0.6457) loss 6.1071 (6.1197) grad_norm 1.4951 (1.5615) loss_scale 4096.0000 (2145.5238) mem 9656MB [2024-07-30 14:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][180/625] eta 0:04:02 lr 0.001625 wd 0.0500 time 0.2591 (0.5451) data time 0.0009 (0.0242) model time 0.2582 (0.5209) loss 4.9744 (6.2317) grad_norm 1.1976 (1.6031) loss_scale 4096.0000 (2774.7097) mem 9656MB [2024-07-30 14:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][190/625] eta 0:03:27 lr 0.001625 wd 0.0500 time 0.2571 (0.4759) data time 0.0013 (0.0186) model time 0.2559 (0.4573) loss 5.3152 (6.1358) grad_norm 1.4507 (1.5627) loss_scale 4096.0000 (3096.9756) mem 9656MB [2024-07-30 14:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][200/625] eta 0:03:04 lr 0.001624 wd 0.0500 time 0.2661 (0.4338) data time 0.0007 (0.0152) model time 0.2654 (0.4186) loss 6.7394 (6.1481) grad_norm 1.6179 (1.6405) loss_scale 4096.0000 (3292.8627) mem 9656MB [2024-07-30 14:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][210/625] eta 0:02:48 lr 0.001624 wd 0.0500 time 0.2541 (0.4054) data time 0.0011 (0.0129) model time 0.2531 (0.3925) loss 6.3669 (6.1096) grad_norm 1.2742 (1.6201) loss_scale 4096.0000 (3424.5246) mem 9656MB [2024-07-30 14:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][220/625] eta 0:02:36 lr 0.001624 wd 0.0500 time 0.2627 (0.3852) data time 0.0011 (0.0112) model time 0.2616 (0.3740) loss 5.7600 (6.0495) grad_norm 1.5009 (1.5743) loss_scale 4096.0000 (3519.0986) mem 9656MB [2024-07-30 14:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][230/625] eta 0:02:26 lr 0.001624 wd 0.0500 time 0.2565 (0.3697) data time 0.0012 (0.0100) model time 0.2553 (0.3597) loss 5.0923 (6.0421) grad_norm 1.2456 (1.5809) loss_scale 4096.0000 (3590.3210) mem 9656MB [2024-07-30 14:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][240/625] eta 0:02:17 lr 0.001624 wd 0.0500 time 0.2611 (0.3579) data time 0.0008 (0.0090) model time 0.2603 (0.3489) loss 6.7573 (6.0130) grad_norm 1.3074 (1.5942) loss_scale 4096.0000 (3645.8901) mem 9656MB [2024-07-30 14:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][250/625] eta 0:02:10 lr 0.001624 wd 0.0500 time 0.2579 (0.3481) data time 0.0011 (0.0082) model time 0.2568 (0.3399) loss 6.7926 (6.0432) grad_norm 1.6207 (1.5838) loss_scale 4096.0000 (3690.4554) mem 9656MB [2024-07-30 14:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][260/625] eta 0:02:04 lr 0.001624 wd 0.0500 time 0.2591 (0.3402) data time 0.0014 (0.0076) model time 0.2577 (0.3326) loss 6.2493 (6.0465) grad_norm 3.9593 (1.6599) loss_scale 4096.0000 (3726.9910) mem 9656MB [2024-07-30 14:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][270/625] eta 0:01:58 lr 0.001623 wd 0.0500 time 0.2605 (0.3338) data time 0.0007 (0.0071) model time 0.2597 (0.3267) loss 4.8137 (6.0626) grad_norm 2.3996 (1.6826) loss_scale 4096.0000 (3757.4876) mem 9656MB [2024-07-30 14:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][280/625] eta 0:01:53 lr 0.001623 wd 0.0500 time 0.2593 (0.3282) data time 0.0011 (0.0066) model time 0.2582 (0.3216) loss 6.8240 (6.0472) grad_norm 1.5432 (1.6929) loss_scale 4096.0000 (3783.3282) mem 9656MB [2024-07-30 14:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][290/625] eta 0:01:48 lr 0.001623 wd 0.0500 time 0.2541 (0.3233) data time 0.0010 (0.0062) model time 0.2531 (0.3171) loss 6.9865 (6.0441) grad_norm 1.8120 (1.6918) loss_scale 4096.0000 (3805.5035) mem 9656MB [2024-07-30 14:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][300/625] eta 0:01:43 lr 0.001623 wd 0.0500 time 0.2620 (0.3193) data time 0.0013 (0.0059) model time 0.2607 (0.3134) loss 5.2044 (6.0334) grad_norm 3.2184 (1.7375) loss_scale 4096.0000 (3824.7417) mem 9656MB [2024-07-30 14:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][310/625] eta 0:01:39 lr 0.001623 wd 0.0500 time 0.2640 (0.3157) data time 0.0010 (0.0056) model time 0.2630 (0.3101) loss 6.7245 (6.0515) grad_norm 2.1373 (1.7889) loss_scale 4096.0000 (3841.5901) mem 9656MB [2024-07-30 14:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][320/625] eta 0:01:35 lr 0.001623 wd 0.0500 time 0.2614 (0.3127) data time 0.0012 (0.0053) model time 0.2602 (0.3074) loss 6.2265 (6.0532) grad_norm 1.2196 (1.7970) loss_scale 4096.0000 (3856.4678) mem 9656MB [2024-07-30 14:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][330/625] eta 0:01:31 lr 0.001623 wd 0.0500 time 0.2611 (0.3099) data time 0.0012 (0.0051) model time 0.2599 (0.3048) loss 6.9058 (6.0369) grad_norm 2.4172 (1.8178) loss_scale 4096.0000 (3869.7017) mem 9656MB [2024-07-30 14:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][340/625] eta 0:01:27 lr 0.001623 wd 0.0500 time 0.2579 (0.3074) data time 0.0011 (0.0049) model time 0.2569 (0.3025) loss 5.8483 (6.0373) grad_norm 1.4533 (1.8175) loss_scale 4096.0000 (3881.5497) mem 9656MB [2024-07-30 14:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][350/625] eta 0:01:23 lr 0.001622 wd 0.0500 time 0.2566 (0.3052) data time 0.0013 (0.0047) model time 0.2553 (0.3005) loss 5.3903 (6.0173) grad_norm 2.1752 (1.8060) loss_scale 4096.0000 (3892.2189) mem 9656MB [2024-07-30 14:21:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][360/625] eta 0:01:20 lr 0.001622 wd 0.0500 time 0.2645 (0.3031) data time 0.0010 (0.0046) model time 0.2635 (0.2986) loss 6.9910 (6.0052) grad_norm 1.5025 (1.7949) loss_scale 4096.0000 (3901.8768) mem 9656MB [2024-07-30 14:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][370/625] eta 0:01:16 lr 0.001622 wd 0.0500 time 0.2648 (0.3013) data time 0.0008 (0.0044) model time 0.2641 (0.2969) loss 5.9898 (5.9948) grad_norm 1.1830 (1.8082) loss_scale 4096.0000 (3910.6606) mem 9656MB [2024-07-30 14:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][380/625] eta 0:01:13 lr 0.001622 wd 0.0500 time 0.2624 (0.2996) data time 0.0010 (0.0043) model time 0.2614 (0.2954) loss 4.6394 (5.9925) grad_norm 1.4288 (1.8142) loss_scale 4096.0000 (3918.6840) mem 9656MB [2024-07-30 14:21:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][390/625] eta 0:01:10 lr 0.001622 wd 0.0500 time 0.2651 (0.2981) data time 0.0009 (0.0041) model time 0.2642 (0.2940) loss 6.3379 (5.9903) grad_norm 1.1230 (1.7963) loss_scale 4096.0000 (3926.0415) mem 9656MB [2024-07-30 14:21:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][400/625] eta 0:01:06 lr 0.001622 wd 0.0500 time 0.2649 (0.2967) data time 0.0008 (0.0040) model time 0.2642 (0.2926) loss 5.4612 (5.9770) grad_norm 1.4776 (1.7815) loss_scale 4096.0000 (3932.8127) mem 9656MB [2024-07-30 14:21:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][410/625] eta 0:01:03 lr 0.001622 wd 0.0500 time 0.2615 (0.2953) data time 0.0010 (0.0039) model time 0.2606 (0.2914) loss 6.0926 (5.9666) grad_norm 2.3680 (1.7817) loss_scale 4096.0000 (3939.0651) mem 9656MB [2024-07-30 14:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][420/625] eta 0:01:00 lr 0.001621 wd 0.0500 time 0.2652 (0.2941) data time 0.0008 (0.0038) model time 0.2645 (0.2903) loss 6.4889 (5.9595) grad_norm 1.1967 (1.7721) loss_scale 4096.0000 (3944.8561) mem 9656MB [2024-07-30 14:22:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][430/625] eta 0:00:57 lr 0.001621 wd 0.0500 time 0.2611 (0.2930) data time 0.0011 (0.0037) model time 0.2601 (0.2892) loss 5.7528 (5.9732) grad_norm 1.3289 (1.7760) loss_scale 4096.0000 (3950.2349) mem 9656MB [2024-07-30 14:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][440/625] eta 0:00:54 lr 0.001621 wd 0.0500 time 0.2710 (0.2920) data time 0.0010 (0.0036) model time 0.2699 (0.2884) loss 5.2285 (5.9756) grad_norm 2.9942 (1.7893) loss_scale 4096.0000 (3955.2440) mem 9656MB [2024-07-30 14:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][450/625] eta 0:00:50 lr 0.001621 wd 0.0500 time 0.2589 (0.2910) data time 0.0011 (0.0035) model time 0.2578 (0.2875) loss 5.0863 (5.9623) grad_norm 3.9526 (1.8099) loss_scale 4096.0000 (3959.9203) mem 9656MB [2024-07-30 14:22:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][460/625] eta 0:00:47 lr 0.001621 wd 0.0500 time 0.2641 (0.2902) data time 0.0012 (0.0035) model time 0.2629 (0.2867) loss 6.8037 (5.9660) grad_norm 1.3918 (1.8069) loss_scale 4096.0000 (3964.2958) mem 9656MB [2024-07-30 14:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][470/625] eta 0:00:44 lr 0.001621 wd 0.0500 time 0.2652 (0.2894) data time 0.0009 (0.0034) model time 0.2643 (0.2860) loss 7.2809 (5.9887) grad_norm 1.4170 (1.7980) loss_scale 4096.0000 (3968.3988) mem 9656MB [2024-07-30 14:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][480/625] eta 0:00:41 lr 0.001621 wd 0.0500 time 0.2604 (0.2886) data time 0.0011 (0.0033) model time 0.2594 (0.2853) loss 4.9661 (5.9874) grad_norm 1.1822 (1.7787) loss_scale 4096.0000 (3972.2538) mem 9656MB [2024-07-30 14:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][490/625] eta 0:00:38 lr 0.001620 wd 0.0500 time 0.2615 (0.2879) data time 0.0011 (0.0033) model time 0.2604 (0.2846) loss 6.8793 (5.9939) grad_norm 1.8258 (1.7796) loss_scale 4096.0000 (3975.8827) mem 9656MB [2024-07-30 14:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][500/625] eta 0:00:35 lr 0.001620 wd 0.0500 time 0.2700 (0.2872) data time 0.0008 (0.0032) model time 0.2692 (0.2840) loss 5.9372 (5.9924) grad_norm 2.2389 (1.7733) loss_scale 4096.0000 (3979.3048) mem 9656MB [2024-07-30 14:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][510/625] eta 0:00:32 lr 0.001620 wd 0.0500 time 0.2655 (0.2866) data time 0.0008 (0.0032) model time 0.2647 (0.2834) loss 5.2528 (5.9998) grad_norm 1.3335 (1.7672) loss_scale 4096.0000 (3982.5374) mem 9656MB [2024-07-30 14:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][520/625] eta 0:00:30 lr 0.001620 wd 0.0500 time 0.2633 (0.2859) data time 0.0010 (0.0031) model time 0.2622 (0.2828) loss 4.6403 (5.9964) grad_norm 1.4262 (1.7638) loss_scale 4096.0000 (3985.5957) mem 9656MB [2024-07-30 14:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][530/625] eta 0:00:27 lr 0.001620 wd 0.0500 time 0.2604 (0.2853) data time 0.0009 (0.0031) model time 0.2595 (0.2823) loss 4.7759 (5.9933) grad_norm 1.1956 (1.7608) loss_scale 4096.0000 (3988.4934) mem 9656MB [2024-07-30 14:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][540/625] eta 0:00:24 lr 0.001620 wd 0.0500 time 0.2669 (0.2848) data time 0.0008 (0.0030) model time 0.2660 (0.2818) loss 7.1701 (5.9884) grad_norm 1.4577 (1.7603) loss_scale 4096.0000 (3991.2430) mem 9656MB [2024-07-30 14:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][550/625] eta 0:00:21 lr 0.001620 wd 0.0500 time 0.2614 (0.2843) data time 0.0012 (0.0030) model time 0.2602 (0.2814) loss 7.1138 (5.9969) grad_norm 1.3325 (1.7675) loss_scale 4096.0000 (3993.8554) mem 9656MB [2024-07-30 14:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][560/625] eta 0:00:18 lr 0.001619 wd 0.0500 time 0.2626 (0.2838) data time 0.0012 (0.0029) model time 0.2614 (0.2809) loss 6.3990 (6.0004) grad_norm 1.5073 (1.7675) loss_scale 4096.0000 (3996.3406) mem 9656MB [2024-07-30 14:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][570/625] eta 0:00:15 lr 0.001619 wd 0.0500 time 0.2516 (0.2840) data time 0.0009 (0.0029) model time 0.2507 (0.2811) loss 6.4185 (5.9959) grad_norm 1.5188 (1.7581) loss_scale 4096.0000 (3998.7078) mem 9656MB [2024-07-30 14:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][580/625] eta 0:00:12 lr 0.001619 wd 0.0500 time 0.2655 (0.2835) data time 0.0008 (0.0028) model time 0.2648 (0.2807) loss 6.0859 (6.0009) grad_norm 1.4730 (1.7487) loss_scale 4096.0000 (4000.9652) mem 9656MB [2024-07-30 14:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][590/625] eta 0:00:09 lr 0.001619 wd 0.0500 time 0.2546 (0.2831) data time 0.0009 (0.0028) model time 0.2537 (0.2803) loss 6.5774 (6.0067) grad_norm 1.6893 (1.7535) loss_scale 4096.0000 (4003.1202) mem 9656MB [2024-07-30 14:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][600/625] eta 0:00:07 lr 0.001619 wd 0.0500 time 0.2614 (0.2857) data time 0.0009 (0.0028) model time 0.2605 (0.2829) loss 5.4986 (6.0058) grad_norm 2.1093 (1.7549) loss_scale 4096.0000 (4005.1796) mem 9656MB [2024-07-30 14:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][610/625] eta 0:00:04 lr 0.001619 wd 0.0500 time 0.2650 (0.2852) data time 0.0007 (0.0027) model time 0.2643 (0.2825) loss 6.3088 (5.9951) grad_norm 2.1150 (1.7577) loss_scale 4096.0000 (4007.1497) mem 9656MB [2024-07-30 14:22:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][620/625] eta 0:00:01 lr 0.001619 wd 0.0500 time 0.2639 (0.2848) data time 0.0007 (0.0027) model time 0.2631 (0.2821) loss 5.7364 (5.9840) grad_norm 1.3587 (1.7537) loss_scale 4096.0000 (4009.0361) mem 9656MB [2024-07-30 14:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 100 training takes 0:02:15 [2024-07-30 14:22:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:22:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.441 (0.441) Loss 0.6470 (0.6470) Acc@1 87.256 (87.256) Acc@5 97.705 (97.705) Mem 9656MB [2024-07-30 14:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.094) Loss 1.0898 (0.8103) Acc@1 75.342 (82.431) Acc@5 93.701 (96.591) Mem 9656MB [2024-07-30 14:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.076) Loss 1.2539 (0.9723) Acc@1 71.729 (78.706) Acc@5 91.504 (94.661) Mem 9656MB [2024-07-30 14:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.389 Acc@5 94.624 [2024-07-30 14:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-07-30 14:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.813 (0.813) Loss 0.5635 (0.5635) Acc@1 88.135 (88.135) Acc@5 98.389 (98.389) Mem 9656MB [2024-07-30 14:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:28:07 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 100) [2024-07-30 14:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][0/625] eta 1:18:10 lr 0.001619 wd 0.0500 time 7.5045 (7.5045) data time 1.4622 (1.4622) model time 0.0000 (0.0000) loss 6.9671 (6.9671) grad_norm 1.5535 (1.5535) loss_scale 4096.0000 (4096.0000) mem 10976MB [2024-07-30 14:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 14:28:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:28:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:30:50 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 101) [2024-07-30 14:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][10/625] eta 0:19:06 lr 0.001618 wd 0.0500 time 0.2437 (1.8648) data time 0.0016 (0.2031) model time 0.0000 (0.0000) loss 6.1029 (6.4870) grad_norm 3.3802 (2.1817) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][20/625] eta 0:09:11 lr 0.001618 wd 0.0500 time 0.2432 (0.9118) data time 0.0009 (0.0843) model time 0.0000 (0.0000) loss 6.1231 (6.3393) grad_norm 1.2074 (1.9154) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][30/625] eta 0:06:40 lr 0.001618 wd 0.0500 time 0.3216 (0.6739) data time 0.0013 (0.0535) model time 0.0000 (0.0000) loss 6.7169 (6.3973) grad_norm 1.7995 (2.1034) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][40/625] eta 0:05:28 lr 0.001618 wd 0.0500 time 0.2433 (0.5616) data time 0.0014 (0.0393) model time 0.0000 (0.0000) loss 4.9003 (6.2728) grad_norm 1.6651 (2.0031) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][50/625] eta 0:04:44 lr 0.001618 wd 0.0500 time 0.2423 (0.4944) data time 0.0007 (0.0312) model time 0.0000 (0.0000) loss 6.4792 (6.2643) grad_norm 1.0633 (1.8612) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][60/625] eta 0:04:15 lr 0.001618 wd 0.0500 time 0.2996 (0.4520) data time 0.0014 (0.0259) model time 0.2982 (0.2520) loss 6.6771 (6.2301) grad_norm 2.8558 (1.8564) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][70/625] eta 0:03:55 lr 0.001618 wd 0.0500 time 0.2455 (0.4241) data time 0.0011 (0.0222) model time 0.2444 (0.2577) loss 6.9624 (6.1991) grad_norm 1.2293 (1.8533) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][80/625] eta 0:03:38 lr 0.001617 wd 0.0500 time 0.2429 (0.4007) data time 0.0009 (0.0195) model time 0.2420 (0.2528) loss 5.3602 (6.1488) grad_norm 2.1081 (1.8142) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][90/625] eta 0:03:25 lr 0.001617 wd 0.0500 time 0.2470 (0.3848) data time 0.0014 (0.0174) model time 0.2456 (0.2549) loss 5.7746 (6.1182) grad_norm 1.5948 (1.8164) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][100/625] eta 0:03:14 lr 0.001617 wd 0.0500 time 0.2472 (0.3705) data time 0.0012 (0.0157) model time 0.2460 (0.2530) loss 6.9862 (6.1430) grad_norm 2.3947 (1.8153) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][110/625] eta 0:03:06 lr 0.001617 wd 0.0500 time 0.2434 (0.3613) data time 0.0007 (0.0143) model time 0.2426 (0.2559) loss 5.7994 (6.1604) grad_norm 2.3272 (1.7892) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][120/625] eta 0:02:57 lr 0.001617 wd 0.0500 time 0.2427 (0.3513) data time 0.0011 (0.0132) model time 0.2416 (0.2541) loss 6.3088 (6.1547) grad_norm 1.9797 (1.7627) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][130/625] eta 0:02:49 lr 0.001617 wd 0.0500 time 0.2431 (0.3431) data time 0.0011 (0.0122) model time 0.2419 (0.2531) loss 7.5285 (6.1413) grad_norm 1.5574 (1.7510) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][140/625] eta 0:02:43 lr 0.001617 wd 0.0500 time 0.2490 (0.3366) data time 0.0009 (0.0114) model time 0.2481 (0.2531) loss 5.0881 (6.1354) grad_norm 1.7945 (1.7931) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][150/625] eta 0:02:37 lr 0.001616 wd 0.0500 time 0.2715 (0.3314) data time 0.0014 (0.0107) model time 0.2701 (0.2537) loss 5.4090 (6.1199) grad_norm 2.6160 (1.8058) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][160/625] eta 0:02:31 lr 0.001616 wd 0.0500 time 0.2895 (0.3264) data time 0.0009 (0.0101) model time 0.2886 (0.2536) loss 4.9401 (6.1111) grad_norm 1.1156 (1.8062) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][170/625] eta 0:02:26 lr 0.001616 wd 0.0500 time 0.2421 (0.3221) data time 0.0012 (0.0095) model time 0.2408 (0.2536) loss 5.8133 (6.1027) grad_norm 1.3609 (1.8223) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][180/625] eta 0:02:21 lr 0.001616 wd 0.0500 time 0.3063 (0.3182) data time 0.0009 (0.0091) model time 0.3054 (0.2534) loss 6.3576 (6.0920) grad_norm 1.5605 (1.8046) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][190/625] eta 0:02:17 lr 0.001616 wd 0.0500 time 0.2506 (0.3155) data time 0.0007 (0.0086) model time 0.2499 (0.2544) loss 5.6041 (6.0716) grad_norm 1.3769 (1.7825) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][200/625] eta 0:02:12 lr 0.001616 wd 0.0500 time 0.2438 (0.3120) data time 0.0008 (0.0082) model time 0.2430 (0.2538) loss 6.6483 (6.0459) grad_norm 1.1456 (1.7635) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][210/625] eta 0:02:08 lr 0.001616 wd 0.0500 time 0.2428 (0.3089) data time 0.0008 (0.0079) model time 0.2421 (0.2534) loss 5.3944 (6.0224) grad_norm 1.3743 (1.7519) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][220/625] eta 0:02:04 lr 0.001615 wd 0.0500 time 0.3199 (0.3072) data time 0.0009 (0.0076) model time 0.3190 (0.2544) loss 7.2519 (6.0167) grad_norm 2.1405 (1.7493) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][230/625] eta 0:02:00 lr 0.001615 wd 0.0500 time 0.2470 (0.3047) data time 0.0011 (0.0073) model time 0.2459 (0.2541) loss 6.1692 (6.0119) grad_norm 1.3780 (1.7555) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][240/625] eta 0:01:56 lr 0.001615 wd 0.0500 time 0.2467 (0.3028) data time 0.0007 (0.0070) model time 0.2460 (0.2543) loss 5.7739 (5.9975) grad_norm 1.2415 (1.7557) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][250/625] eta 0:01:52 lr 0.001615 wd 0.0500 time 0.2477 (0.3004) data time 0.0009 (0.0068) model time 0.2468 (0.2538) loss 6.1574 (5.9913) grad_norm 1.6458 (1.7495) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][260/625] eta 0:01:49 lr 0.001615 wd 0.0500 time 0.2468 (0.2988) data time 0.0007 (0.0066) model time 0.2460 (0.2539) loss 4.6463 (5.9754) grad_norm 1.9787 (1.7399) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][270/625] eta 0:01:45 lr 0.001615 wd 0.0500 time 0.2585 (0.2973) data time 0.0006 (0.0064) model time 0.2579 (0.2542) loss 4.6472 (5.9652) grad_norm 1.6540 (1.7364) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][280/625] eta 0:01:41 lr 0.001615 wd 0.0500 time 0.2429 (0.2955) data time 0.0009 (0.0062) model time 0.2420 (0.2538) loss 6.4856 (5.9769) grad_norm 1.4381 (1.7532) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][290/625] eta 0:01:38 lr 0.001614 wd 0.0500 time 0.2461 (0.2942) data time 0.0007 (0.0060) model time 0.2454 (0.2540) loss 7.4518 (5.9789) grad_norm 1.5019 (1.7579) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][300/625] eta 0:01:35 lr 0.001614 wd 0.0500 time 0.2465 (0.2931) data time 0.0009 (0.0058) model time 0.2457 (0.2542) loss 4.7396 (5.9595) grad_norm 1.0379 (1.7559) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][310/625] eta 0:01:31 lr 0.001614 wd 0.0500 time 0.2457 (0.2915) data time 0.0006 (0.0057) model time 0.2450 (0.2538) loss 7.2711 (5.9589) grad_norm 1.3831 (1.7477) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][320/625] eta 0:01:28 lr 0.001614 wd 0.0500 time 0.2468 (0.2903) data time 0.0011 (0.0055) model time 0.2458 (0.2537) loss 6.9657 (5.9744) grad_norm 1.0787 (1.7438) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][330/625] eta 0:01:25 lr 0.001614 wd 0.0500 time 0.2433 (0.2889) data time 0.0010 (0.0054) model time 0.2423 (0.2534) loss 5.2018 (5.9862) grad_norm 1.3237 (1.7333) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][340/625] eta 0:01:22 lr 0.001614 wd 0.0500 time 0.3249 (0.2881) data time 0.0009 (0.0053) model time 0.3240 (0.2536) loss 5.7849 (5.9860) grad_norm 1.3085 (1.7331) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][350/625] eta 0:01:18 lr 0.001614 wd 0.0500 time 0.2441 (0.2871) data time 0.0011 (0.0052) model time 0.2430 (0.2535) loss 5.3897 (5.9890) grad_norm 2.6603 (1.7372) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][360/625] eta 0:01:15 lr 0.001613 wd 0.0500 time 0.2436 (0.2859) data time 0.0009 (0.0050) model time 0.2426 (0.2533) loss 4.9758 (5.9901) grad_norm 1.4872 (1.7429) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][370/625] eta 0:01:12 lr 0.001613 wd 0.0500 time 0.2454 (0.2852) data time 0.0009 (0.0049) model time 0.2445 (0.2534) loss 5.1529 (5.9874) grad_norm 1.3611 (1.7422) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][380/625] eta 0:01:09 lr 0.001613 wd 0.0500 time 0.2470 (0.2843) data time 0.0010 (0.0048) model time 0.2459 (0.2533) loss 6.1593 (5.9897) grad_norm 1.4795 (1.7426) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][390/625] eta 0:01:06 lr 0.001613 wd 0.0500 time 0.2943 (0.2838) data time 0.0007 (0.0047) model time 0.2936 (0.2536) loss 6.5722 (5.9907) grad_norm 1.3697 (1.7367) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][400/625] eta 0:01:03 lr 0.001613 wd 0.0500 time 0.2466 (0.2829) data time 0.0007 (0.0046) model time 0.2459 (0.2534) loss 6.1856 (5.9962) grad_norm 1.7195 (1.7354) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][410/625] eta 0:01:00 lr 0.001613 wd 0.0500 time 0.2442 (0.2820) data time 0.0009 (0.0046) model time 0.2433 (0.2532) loss 5.8310 (6.0021) grad_norm 1.8337 (1.7291) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][420/625] eta 0:00:57 lr 0.001613 wd 0.0500 time 0.2529 (0.2816) data time 0.0007 (0.0045) model time 0.2522 (0.2535) loss 4.7397 (6.0040) grad_norm 1.2292 (1.7337) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][430/625] eta 0:00:54 lr 0.001612 wd 0.0500 time 0.2452 (0.2813) data time 0.0009 (0.0044) model time 0.2443 (0.2539) loss 6.5136 (6.0082) grad_norm 1.3075 (1.7334) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][440/625] eta 0:00:51 lr 0.001612 wd 0.0500 time 0.2450 (0.2808) data time 0.0009 (0.0043) model time 0.2440 (0.2540) loss 6.7599 (6.0140) grad_norm 1.7515 (1.7302) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][450/625] eta 0:00:48 lr 0.001612 wd 0.0500 time 0.2432 (0.2800) data time 0.0010 (0.0042) model time 0.2422 (0.2537) loss 5.4611 (6.0118) grad_norm 2.0005 (1.7339) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][460/625] eta 0:00:46 lr 0.001612 wd 0.0500 time 0.2429 (0.2795) data time 0.0011 (0.0042) model time 0.2418 (0.2538) loss 4.7036 (6.0044) grad_norm 2.5053 (1.7468) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][470/625] eta 0:00:43 lr 0.001612 wd 0.0500 time 0.2447 (0.2790) data time 0.0009 (0.0041) model time 0.2437 (0.2538) loss 4.8040 (5.9992) grad_norm 2.8380 (1.7527) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][480/625] eta 0:00:40 lr 0.001612 wd 0.0500 time 0.2425 (0.2784) data time 0.0009 (0.0040) model time 0.2416 (0.2538) loss 7.8782 (5.9969) grad_norm 1.5191 (1.7568) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][490/625] eta 0:00:37 lr 0.001612 wd 0.0500 time 0.2643 (0.2781) data time 0.0010 (0.0040) model time 0.2633 (0.2540) loss 5.6794 (6.0004) grad_norm 1.8366 (1.7600) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][500/625] eta 0:00:34 lr 0.001611 wd 0.0500 time 0.2431 (0.2777) data time 0.0007 (0.0039) model time 0.2423 (0.2540) loss 5.3350 (5.9933) grad_norm 1.5285 (1.7557) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][510/625] eta 0:00:31 lr 0.001611 wd 0.0500 time 0.2451 (0.2771) data time 0.0010 (0.0039) model time 0.2441 (0.2538) loss 5.6743 (5.9889) grad_norm 1.2873 (1.7503) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][520/625] eta 0:00:29 lr 0.001611 wd 0.0500 time 0.2426 (0.2767) data time 0.0012 (0.0038) model time 0.2414 (0.2538) loss 6.4613 (5.9967) grad_norm 1.9256 (1.7461) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][530/625] eta 0:00:26 lr 0.001611 wd 0.0500 time 0.2449 (0.2761) data time 0.0009 (0.0038) model time 0.2440 (0.2537) loss 5.9588 (5.9895) grad_norm 3.2018 (1.7489) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][540/625] eta 0:00:23 lr 0.001611 wd 0.0500 time 0.2441 (0.2760) data time 0.0007 (0.0037) model time 0.2434 (0.2540) loss 4.8251 (5.9839) grad_norm 2.0183 (1.7632) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][550/625] eta 0:00:20 lr 0.001611 wd 0.0500 time 0.2435 (0.2755) data time 0.0008 (0.0037) model time 0.2428 (0.2538) loss 6.2539 (5.9882) grad_norm 1.4823 (1.7564) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][560/625] eta 0:00:17 lr 0.001611 wd 0.0500 time 0.2463 (0.2749) data time 0.0010 (0.0036) model time 0.2453 (0.2536) loss 4.5585 (5.9949) grad_norm 1.5704 (1.7519) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][570/625] eta 0:00:15 lr 0.001611 wd 0.0500 time 0.2595 (0.2746) data time 0.0009 (0.0036) model time 0.2586 (0.2537) loss 6.2388 (6.0024) grad_norm 1.8591 (1.7642) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][580/625] eta 0:00:12 lr 0.001610 wd 0.0500 time 0.2457 (0.2742) data time 0.0007 (0.0035) model time 0.2449 (0.2537) loss 6.9046 (6.0051) grad_norm 1.2714 (1.7638) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][590/625] eta 0:00:09 lr 0.001610 wd 0.0500 time 0.2433 (0.2737) data time 0.0008 (0.0035) model time 0.2425 (0.2535) loss 6.7746 (6.0107) grad_norm 1.3053 (1.7618) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][600/625] eta 0:00:06 lr 0.001610 wd 0.0500 time 0.2476 (0.2735) data time 0.0007 (0.0034) model time 0.2469 (0.2535) loss 7.2012 (6.0102) grad_norm 1.6946 (1.7608) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][610/625] eta 0:00:04 lr 0.001610 wd 0.0500 time 0.2451 (0.2730) data time 0.0007 (0.0034) model time 0.2444 (0.2534) loss 6.6510 (6.0047) grad_norm 1.8668 (1.7571) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][620/625] eta 0:00:01 lr 0.001610 wd 0.0500 time 0.2462 (0.2729) data time 0.0007 (0.0034) model time 0.2455 (0.2536) loss 6.6437 (6.0048) grad_norm 1.8118 (1.7618) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 14:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 101 training takes 0:02:49 [2024-07-30 14:34:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:34:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.6553 (0.6553) Acc@1 87.354 (87.354) Acc@5 97.900 (97.900) Mem 9656MB [2024-07-30 14:34:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.104) Loss 1.1113 (0.8271) Acc@1 74.170 (82.702) Acc@5 93.848 (96.586) Mem 9656MB [2024-07-30 14:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.080) Loss 1.2070 (0.9813) Acc@1 72.168 (78.897) Acc@5 92.725 (94.775) Mem 9656MB [2024-07-30 14:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.593 Acc@5 94.774 [2024-07-30 14:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.6% [2024-07-30 14:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.59% [2024-07-30 14:34:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 14:34:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 14:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.550 (0.550) Loss 0.5645 (0.5645) Acc@1 88.232 (88.232) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.119) Loss 0.9453 (0.7163) Acc@1 78.027 (84.242) Acc@5 94.580 (97.075) Mem 9656MB [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.088) Loss 1.0967 (0.8619) Acc@1 72.900 (80.378) Acc@5 92.822 (95.326) Mem 9656MB [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.060 Acc@5 95.312 [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.06% [2024-07-30 14:34:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 14:34:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 14:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][0/625] eta 0:09:46 lr 0.001610 wd 0.0500 time 0.9379 (0.9379) data time 0.5676 (0.5676) model time 0.0000 (0.0000) loss 6.1361 (6.1361) grad_norm 1.5477 (1.5477) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 14:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][10/625] eta 0:03:15 lr 0.001610 wd 0.0500 time 0.2499 (0.3172) data time 0.0009 (0.0525) model time 0.0000 (0.0000) loss 6.9807 (6.0885) grad_norm 2.3751 (2.1065) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][20/625] eta 0:02:51 lr 0.001609 wd 0.0500 time 0.2472 (0.2828) data time 0.0009 (0.0280) model time 0.0000 (0.0000) loss 4.2480 (5.8530) grad_norm 1.2299 (1.9511) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][30/625] eta 0:02:51 lr 0.001609 wd 0.0500 time 0.2413 (0.2875) data time 0.0013 (0.0193) model time 0.0000 (0.0000) loss 4.1181 (5.8407) grad_norm 1.8186 (1.8204) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][40/625] eta 0:02:42 lr 0.001609 wd 0.0500 time 0.2496 (0.2774) data time 0.0011 (0.0148) model time 0.0000 (0.0000) loss 7.2047 (5.7736) grad_norm 3.0750 (1.8667) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][50/625] eta 0:02:35 lr 0.001609 wd 0.0500 time 0.2408 (0.2711) data time 0.0010 (0.0121) model time 0.0000 (0.0000) loss 5.4941 (5.8572) grad_norm 1.2634 (1.8307) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][60/625] eta 0:02:32 lr 0.001609 wd 0.0500 time 0.2411 (0.2693) data time 0.0013 (0.0103) model time 0.2398 (0.2590) loss 5.6067 (5.8770) grad_norm 1.5945 (1.7945) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][70/625] eta 0:02:28 lr 0.001609 wd 0.0500 time 0.2442 (0.2681) data time 0.0013 (0.0090) model time 0.2429 (0.2594) loss 6.6246 (5.8236) grad_norm 1.5527 (1.7589) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][80/625] eta 0:02:25 lr 0.001609 wd 0.0500 time 0.2506 (0.2670) data time 0.0007 (0.0080) model time 0.2499 (0.2590) loss 5.2490 (5.7996) grad_norm 1.3687 (1.7578) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][90/625] eta 0:02:21 lr 0.001608 wd 0.0500 time 0.2432 (0.2649) data time 0.0011 (0.0072) model time 0.2421 (0.2559) loss 3.7582 (5.7845) grad_norm 1.6490 (1.7712) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][100/625] eta 0:02:18 lr 0.001608 wd 0.0500 time 0.3157 (0.2636) data time 0.0007 (0.0066) model time 0.3149 (0.2548) loss 5.5581 (5.7949) grad_norm 1.7236 (1.7658) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][110/625] eta 0:02:15 lr 0.001608 wd 0.0500 time 0.2462 (0.2633) data time 0.0009 (0.0061) model time 0.2453 (0.2556) loss 6.7687 (5.8220) grad_norm 2.9189 (1.7834) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][120/625] eta 0:02:12 lr 0.001608 wd 0.0500 time 0.2556 (0.2620) data time 0.0008 (0.0057) model time 0.2547 (0.2543) loss 4.6465 (5.8707) grad_norm 1.2531 (1.7857) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][130/625] eta 0:02:09 lr 0.001608 wd 0.0500 time 0.2651 (0.2618) data time 0.0007 (0.0053) model time 0.2645 (0.2549) loss 6.9417 (5.8884) grad_norm 2.1541 (1.7799) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][140/625] eta 0:02:06 lr 0.001608 wd 0.0500 time 0.3057 (0.2612) data time 0.0007 (0.0050) model time 0.3050 (0.2545) loss 6.7001 (5.8905) grad_norm 2.5080 (1.7936) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][150/625] eta 0:02:03 lr 0.001608 wd 0.0500 time 0.2451 (0.2605) data time 0.0007 (0.0048) model time 0.2444 (0.2540) loss 5.7637 (5.9050) grad_norm 1.2727 (1.7900) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][160/625] eta 0:02:00 lr 0.001607 wd 0.0500 time 0.2442 (0.2602) data time 0.0010 (0.0046) model time 0.2432 (0.2540) loss 6.3695 (5.9159) grad_norm 2.0035 (1.7839) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][170/625] eta 0:01:57 lr 0.001607 wd 0.0500 time 0.2476 (0.2593) data time 0.0009 (0.0044) model time 0.2467 (0.2531) loss 6.5561 (5.9113) grad_norm 1.9111 (1.7919) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][180/625] eta 0:01:55 lr 0.001607 wd 0.0500 time 0.2563 (0.2591) data time 0.0007 (0.0042) model time 0.2556 (0.2533) loss 5.2661 (5.9119) grad_norm 2.4795 (1.8201) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][190/625] eta 0:01:52 lr 0.001607 wd 0.0500 time 0.2459 (0.2589) data time 0.0010 (0.0040) model time 0.2450 (0.2533) loss 6.3563 (5.8966) grad_norm 1.6067 (1.8163) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][200/625] eta 0:01:49 lr 0.001607 wd 0.0500 time 0.2449 (0.2584) data time 0.0007 (0.0039) model time 0.2442 (0.2530) loss 5.0708 (5.9085) grad_norm 1.5036 (1.8125) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][210/625] eta 0:01:47 lr 0.001607 wd 0.0500 time 0.2467 (0.2585) data time 0.0008 (0.0037) model time 0.2459 (0.2533) loss 5.8191 (5.8941) grad_norm 3.3841 (1.8186) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][220/625] eta 0:01:44 lr 0.001607 wd 0.0500 time 0.2428 (0.2587) data time 0.0011 (0.0036) model time 0.2417 (0.2538) loss 4.5263 (5.8936) grad_norm 2.4277 (1.8311) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][230/625] eta 0:01:41 lr 0.001606 wd 0.0500 time 0.2432 (0.2582) data time 0.0014 (0.0035) model time 0.2418 (0.2534) loss 5.7172 (5.9059) grad_norm 1.4530 (1.8293) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][240/625] eta 0:01:39 lr 0.001606 wd 0.0500 time 0.2456 (0.2582) data time 0.0009 (0.0034) model time 0.2448 (0.2537) loss 5.2581 (5.9046) grad_norm 1.3448 (1.8187) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][250/625] eta 0:01:36 lr 0.001606 wd 0.0500 time 0.2479 (0.2578) data time 0.0010 (0.0033) model time 0.2469 (0.2533) loss 5.1345 (5.9111) grad_norm 1.5684 (1.8018) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 14:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][260/625] eta 0:01:34 lr 0.001606 wd 0.0500 time 0.2507 (0.2578) data time 0.0007 (0.0032) model time 0.2500 (0.2535) loss 4.4805 (5.9043) grad_norm 2.4505 (inf) loss_scale 2048.0000 (4080.3065) mem 9655MB [2024-07-30 14:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][270/625] eta 0:01:31 lr 0.001606 wd 0.0500 time 0.2466 (0.2577) data time 0.0007 (0.0031) model time 0.2460 (0.2535) loss 6.4316 (5.9073) grad_norm 1.1920 (inf) loss_scale 2048.0000 (4005.3137) mem 9655MB [2024-07-30 14:35:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][280/625] eta 0:01:28 lr 0.001606 wd 0.0500 time 0.3099 (0.2575) data time 0.0007 (0.0031) model time 0.3092 (0.2534) loss 6.8460 (5.9119) grad_norm 2.8347 (inf) loss_scale 2048.0000 (3935.6584) mem 9655MB [2024-07-30 14:35:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][290/625] eta 0:01:26 lr 0.001606 wd 0.0500 time 0.2467 (0.2575) data time 0.0006 (0.0030) model time 0.2460 (0.2536) loss 4.5952 (5.9089) grad_norm 1.3932 (inf) loss_scale 2048.0000 (3870.7904) mem 9655MB [2024-07-30 14:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][300/625] eta 0:01:23 lr 0.001605 wd 0.0500 time 0.2430 (0.2576) data time 0.0011 (0.0029) model time 0.2419 (0.2537) loss 6.3139 (5.9234) grad_norm 1.0537 (inf) loss_scale 2048.0000 (3810.2326) mem 9655MB [2024-07-30 14:35:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][310/625] eta 0:01:21 lr 0.001605 wd 0.0500 time 0.2421 (0.2575) data time 0.0008 (0.0029) model time 0.2413 (0.2538) loss 7.1748 (5.9409) grad_norm 1.3756 (inf) loss_scale 2048.0000 (3753.5691) mem 9655MB [2024-07-30 14:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][320/625] eta 0:01:18 lr 0.001605 wd 0.0500 time 0.2467 (0.2572) data time 0.0009 (0.0028) model time 0.2458 (0.2535) loss 4.5508 (5.9340) grad_norm 1.8496 (inf) loss_scale 2048.0000 (3700.4361) mem 9655MB [2024-07-30 14:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][330/625] eta 0:01:15 lr 0.001605 wd 0.0500 time 0.2457 (0.2569) data time 0.0009 (0.0028) model time 0.2448 (0.2533) loss 6.4662 (5.9162) grad_norm 1.3738 (inf) loss_scale 2048.0000 (3650.5136) mem 9655MB [2024-07-30 14:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][340/625] eta 0:01:13 lr 0.001605 wd 0.0500 time 0.2690 (0.2572) data time 0.0010 (0.0027) model time 0.2680 (0.2537) loss 5.5622 (5.9173) grad_norm 1.8474 (inf) loss_scale 2048.0000 (3603.5191) mem 9655MB [2024-07-30 14:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][350/625] eta 0:01:10 lr 0.001605 wd 0.0500 time 0.2423 (0.2569) data time 0.0011 (0.0027) model time 0.2412 (0.2534) loss 6.6921 (5.9266) grad_norm 1.3389 (inf) loss_scale 2048.0000 (3559.2023) mem 9655MB [2024-07-30 14:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][360/625] eta 0:01:08 lr 0.001605 wd 0.0500 time 0.2513 (0.2571) data time 0.0008 (0.0026) model time 0.2505 (0.2537) loss 7.6753 (5.9330) grad_norm 2.3295 (inf) loss_scale 2048.0000 (3517.3407) mem 9655MB [2024-07-30 14:35:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][370/625] eta 0:01:05 lr 0.001604 wd 0.0500 time 0.2488 (0.2568) data time 0.0019 (0.0026) model time 0.2469 (0.2535) loss 6.2293 (5.9337) grad_norm 2.0368 (inf) loss_scale 2048.0000 (3477.7358) mem 9655MB [2024-07-30 14:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][380/625] eta 0:01:02 lr 0.001604 wd 0.0500 time 0.3128 (0.2569) data time 0.0009 (0.0025) model time 0.3119 (0.2537) loss 5.6655 (5.9335) grad_norm 2.2879 (inf) loss_scale 2048.0000 (3440.2100) mem 9655MB [2024-07-30 14:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][390/625] eta 0:01:00 lr 0.001604 wd 0.0500 time 0.2433 (0.2567) data time 0.0007 (0.0025) model time 0.2426 (0.2535) loss 6.2003 (5.9488) grad_norm 2.4808 (inf) loss_scale 2048.0000 (3404.6036) mem 9655MB [2024-07-30 14:35:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][400/625] eta 0:00:57 lr 0.001604 wd 0.0500 time 0.2477 (0.2565) data time 0.0009 (0.0025) model time 0.2468 (0.2533) loss 6.8382 (5.9469) grad_norm 1.2684 (inf) loss_scale 2048.0000 (3370.7731) mem 9655MB [2024-07-30 14:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][410/625] eta 0:00:55 lr 0.001604 wd 0.0500 time 0.2896 (0.2564) data time 0.0007 (0.0024) model time 0.2889 (0.2532) loss 6.3883 (5.9363) grad_norm 1.7837 (inf) loss_scale 2048.0000 (3338.5888) mem 9655MB [2024-07-30 14:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][420/625] eta 0:00:52 lr 0.001604 wd 0.0500 time 0.2461 (0.2567) data time 0.0007 (0.0024) model time 0.2454 (0.2537) loss 5.9533 (5.9479) grad_norm 1.3221 (inf) loss_scale 2048.0000 (3307.9335) mem 9655MB [2024-07-30 14:36:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][430/625] eta 0:00:50 lr 0.001604 wd 0.0500 time 0.2524 (0.2565) data time 0.0009 (0.0024) model time 0.2516 (0.2535) loss 6.8242 (5.9485) grad_norm 1.9919 (inf) loss_scale 2048.0000 (3278.7007) mem 9655MB [2024-07-30 14:36:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][440/625] eta 0:00:47 lr 0.001603 wd 0.0500 time 0.2476 (0.2566) data time 0.0007 (0.0023) model time 0.2470 (0.2536) loss 6.5561 (5.9549) grad_norm 2.7344 (inf) loss_scale 2048.0000 (3250.7937) mem 9655MB [2024-07-30 14:36:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][450/625] eta 0:00:44 lr 0.001603 wd 0.0500 time 0.2485 (0.2563) data time 0.0008 (0.0023) model time 0.2477 (0.2534) loss 4.4126 (5.9488) grad_norm 1.8338 (inf) loss_scale 2048.0000 (3224.1242) mem 9655MB [2024-07-30 14:36:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][460/625] eta 0:00:42 lr 0.001603 wd 0.0500 time 0.2497 (0.2566) data time 0.0009 (0.0023) model time 0.2488 (0.2538) loss 7.0750 (5.9525) grad_norm 1.6970 (inf) loss_scale 2048.0000 (3198.6117) mem 9655MB [2024-07-30 14:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][470/625] eta 0:00:39 lr 0.001603 wd 0.0500 time 0.2436 (0.2564) data time 0.0009 (0.0023) model time 0.2427 (0.2535) loss 5.4168 (5.9478) grad_norm 1.9097 (inf) loss_scale 2048.0000 (3174.1826) mem 9655MB [2024-07-30 14:36:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][480/625] eta 0:00:37 lr 0.001603 wd 0.0500 time 0.2737 (0.2563) data time 0.0014 (0.0022) model time 0.2722 (0.2534) loss 5.5774 (5.9512) grad_norm 1.1065 (inf) loss_scale 2048.0000 (3150.7692) mem 9655MB [2024-07-30 14:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][490/625] eta 0:00:34 lr 0.001603 wd 0.0500 time 0.2457 (0.2563) data time 0.0009 (0.0022) model time 0.2448 (0.2535) loss 7.1678 (5.9592) grad_norm 2.0502 (inf) loss_scale 2048.0000 (3128.3096) mem 9655MB [2024-07-30 14:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][500/625] eta 0:00:32 lr 0.001603 wd 0.0500 time 0.2429 (0.2563) data time 0.0007 (0.0022) model time 0.2422 (0.2535) loss 5.7391 (5.9648) grad_norm 1.3446 (inf) loss_scale 2048.0000 (3106.7465) mem 9655MB [2024-07-30 14:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][510/625] eta 0:00:29 lr 0.001602 wd 0.0500 time 0.2482 (0.2561) data time 0.0006 (0.0022) model time 0.2475 (0.2533) loss 4.7582 (5.9689) grad_norm 1.4093 (inf) loss_scale 2048.0000 (3086.0274) mem 9655MB [2024-07-30 14:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][520/625] eta 0:00:26 lr 0.001602 wd 0.0500 time 0.2494 (0.2561) data time 0.0009 (0.0021) model time 0.2484 (0.2534) loss 6.3169 (5.9688) grad_norm 1.3025 (inf) loss_scale 2048.0000 (3066.1036) mem 9655MB [2024-07-30 14:36:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][530/625] eta 0:00:24 lr 0.001602 wd 0.0500 time 0.2445 (0.2559) data time 0.0009 (0.0021) model time 0.2436 (0.2532) loss 7.1601 (5.9691) grad_norm 1.6018 (inf) loss_scale 2048.0000 (3046.9303) mem 9655MB [2024-07-30 14:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][540/625] eta 0:00:21 lr 0.001602 wd 0.0500 time 0.2433 (0.2564) data time 0.0007 (0.0021) model time 0.2426 (0.2538) loss 5.4550 (5.9712) grad_norm 2.4952 (inf) loss_scale 2048.0000 (3028.4658) mem 9655MB [2024-07-30 14:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][550/625] eta 0:00:19 lr 0.001602 wd 0.0500 time 0.2445 (0.2562) data time 0.0007 (0.0021) model time 0.2438 (0.2536) loss 5.0949 (5.9752) grad_norm 1.3941 (inf) loss_scale 2048.0000 (3010.6715) mem 9655MB [2024-07-30 14:36:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][560/625] eta 0:00:16 lr 0.001602 wd 0.0500 time 0.3112 (0.2561) data time 0.0006 (0.0021) model time 0.3106 (0.2536) loss 6.8838 (5.9852) grad_norm 1.7086 (inf) loss_scale 2048.0000 (2993.5116) mem 9655MB [2024-07-30 14:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][570/625] eta 0:00:14 lr 0.001602 wd 0.0500 time 0.2539 (0.2560) data time 0.0011 (0.0020) model time 0.2527 (0.2535) loss 5.5967 (5.9847) grad_norm 2.0633 (inf) loss_scale 2048.0000 (2976.9527) mem 9655MB [2024-07-30 14:36:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][580/625] eta 0:00:11 lr 0.001601 wd 0.0500 time 0.2473 (0.2561) data time 0.0009 (0.0020) model time 0.2463 (0.2536) loss 5.0283 (5.9835) grad_norm 2.0377 (inf) loss_scale 2048.0000 (2960.9639) mem 9655MB [2024-07-30 14:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][590/625] eta 0:00:08 lr 0.001601 wd 0.0500 time 0.2460 (0.2561) data time 0.0009 (0.0020) model time 0.2452 (0.2536) loss 6.3397 (5.9840) grad_norm 1.4377 (inf) loss_scale 2048.0000 (2945.5161) mem 9655MB [2024-07-30 14:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][600/625] eta 0:00:06 lr 0.001601 wd 0.0500 time 0.2443 (0.2559) data time 0.0010 (0.0020) model time 0.2433 (0.2535) loss 6.9206 (5.9796) grad_norm 1.6759 (inf) loss_scale 2048.0000 (2930.5824) mem 9655MB [2024-07-30 14:36:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][610/625] eta 0:00:03 lr 0.001601 wd 0.0500 time 0.2735 (0.2559) data time 0.0004 (0.0020) model time 0.2731 (0.2534) loss 4.6051 (5.9789) grad_norm 2.3050 (inf) loss_scale 2048.0000 (2916.1375) mem 9655MB [2024-07-30 14:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][620/625] eta 0:00:01 lr 0.001601 wd 0.0500 time 0.2438 (0.2559) data time 0.0004 (0.0020) model time 0.2433 (0.2535) loss 6.9444 (5.9840) grad_norm 1.0974 (inf) loss_scale 2048.0000 (2902.1578) mem 9655MB [2024-07-30 14:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 102 training takes 0:02:39 [2024-07-30 14:36:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:36:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.588 (0.588) Loss 0.6680 (0.6680) Acc@1 87.109 (87.109) Acc@5 97.998 (97.998) Mem 9655MB [2024-07-30 14:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.061 (0.111) Loss 1.0840 (0.8331) Acc@1 76.074 (82.551) Acc@5 93.164 (96.551) Mem 9655MB [2024-07-30 14:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.064 (0.088) Loss 1.2510 (0.9803) Acc@1 70.264 (78.769) Acc@5 92.236 (94.734) Mem 9655MB [2024-07-30 14:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.443 Acc@5 94.686 [2024-07-30 14:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-07-30 14:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.970 (0.970) Loss 0.5654 (0.5654) Acc@1 88.086 (88.086) Acc@5 98.486 (98.486) Mem 9655MB [2024-07-30 14:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.155) Loss 0.9453 (0.7162) Acc@1 78.174 (84.246) Acc@5 94.678 (97.115) Mem 9655MB [2024-07-30 14:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.106) Loss 1.0957 (0.8615) Acc@1 73.047 (80.397) Acc@5 92.969 (95.357) Mem 9655MB [2024-07-30 14:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.072 Acc@5 95.343 [2024-07-30 14:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-30 14:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.07% [2024-07-30 14:36:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 14:36:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 14:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][0/625] eta 0:12:08 lr 0.001601 wd 0.0500 time 1.1659 (1.1659) data time 0.7872 (0.7872) model time 0.0000 (0.0000) loss 6.9447 (6.9447) grad_norm 1.3848 (1.3848) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][10/625] eta 0:03:28 lr 0.001601 wd 0.0500 time 0.2464 (0.3394) data time 0.0011 (0.0725) model time 0.0000 (0.0000) loss 5.3321 (5.7548) grad_norm 1.2743 (1.6255) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][20/625] eta 0:02:58 lr 0.001601 wd 0.0500 time 0.2457 (0.2953) data time 0.0006 (0.0385) model time 0.0000 (0.0000) loss 6.1812 (5.9690) grad_norm 1.1817 (1.5847) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][30/625] eta 0:02:46 lr 0.001600 wd 0.0500 time 0.2439 (0.2792) data time 0.0009 (0.0264) model time 0.0000 (0.0000) loss 6.1822 (6.0048) grad_norm 1.6791 (1.5646) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][40/625] eta 0:02:40 lr 0.001600 wd 0.0500 time 0.2761 (0.2739) data time 0.0012 (0.0202) model time 0.0000 (0.0000) loss 5.9538 (6.0545) grad_norm 1.1380 (1.5449) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][50/625] eta 0:02:35 lr 0.001600 wd 0.0500 time 0.2499 (0.2705) data time 0.0008 (0.0165) model time 0.0000 (0.0000) loss 5.2721 (6.0476) grad_norm 1.3741 (1.6124) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][60/625] eta 0:02:31 lr 0.001600 wd 0.0500 time 0.3132 (0.2674) data time 0.0006 (0.0139) model time 0.3126 (0.2508) loss 5.5560 (6.0496) grad_norm 1.1832 (1.6306) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][70/625] eta 0:02:26 lr 0.001600 wd 0.0500 time 0.2481 (0.2647) data time 0.0009 (0.0121) model time 0.2472 (0.2488) loss 5.9932 (5.9954) grad_norm 3.1197 (1.6670) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][80/625] eta 0:02:23 lr 0.001600 wd 0.0500 time 0.3133 (0.2641) data time 0.0007 (0.0108) model time 0.3126 (0.2522) loss 5.2130 (5.9647) grad_norm 1.6236 (1.6617) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][90/625] eta 0:02:20 lr 0.001600 wd 0.0500 time 0.2451 (0.2633) data time 0.0011 (0.0097) model time 0.2439 (0.2531) loss 6.2719 (5.9703) grad_norm 1.9520 (1.7003) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][100/625] eta 0:02:18 lr 0.001599 wd 0.0500 time 0.2438 (0.2639) data time 0.0009 (0.0088) model time 0.2429 (0.2561) loss 5.7183 (5.9814) grad_norm 1.8544 (1.6938) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][110/625] eta 0:02:15 lr 0.001599 wd 0.0500 time 0.2909 (0.2627) data time 0.0007 (0.0081) model time 0.2902 (0.2551) loss 7.4553 (6.0255) grad_norm 1.7713 (1.7294) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][120/625] eta 0:02:12 lr 0.001599 wd 0.0500 time 0.2421 (0.2631) data time 0.0009 (0.0076) model time 0.2412 (0.2567) loss 6.2853 (6.0705) grad_norm 1.7043 (1.7466) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][130/625] eta 0:02:09 lr 0.001599 wd 0.0500 time 0.2464 (0.2619) data time 0.0009 (0.0071) model time 0.2456 (0.2553) loss 5.6454 (6.0715) grad_norm 1.7405 (1.7298) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][140/625] eta 0:02:07 lr 0.001599 wd 0.0500 time 0.2430 (0.2619) data time 0.0014 (0.0066) model time 0.2417 (0.2560) loss 6.3673 (6.0797) grad_norm 1.3595 (1.7166) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][150/625] eta 0:02:03 lr 0.001599 wd 0.0500 time 0.2452 (0.2609) data time 0.0009 (0.0063) model time 0.2443 (0.2549) loss 6.9792 (6.0680) grad_norm 1.6505 (1.7035) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][160/625] eta 0:02:01 lr 0.001599 wd 0.0500 time 0.3086 (0.2611) data time 0.0007 (0.0059) model time 0.3080 (0.2558) loss 6.1620 (6.1109) grad_norm 3.6575 (1.7125) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][170/625] eta 0:01:58 lr 0.001598 wd 0.0500 time 0.2416 (0.2608) data time 0.0012 (0.0056) model time 0.2405 (0.2556) loss 5.0626 (6.0928) grad_norm 2.2067 (1.7132) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][180/625] eta 0:01:55 lr 0.001598 wd 0.0500 time 0.2437 (0.2600) data time 0.0009 (0.0054) model time 0.2428 (0.2548) loss 5.4013 (6.0890) grad_norm 1.3125 (1.7192) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][190/625] eta 0:01:53 lr 0.001598 wd 0.0500 time 0.2431 (0.2600) data time 0.0007 (0.0052) model time 0.2424 (0.2551) loss 6.1501 (6.0955) grad_norm 1.5024 (1.7156) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][200/625] eta 0:01:50 lr 0.001598 wd 0.0500 time 0.2461 (0.2598) data time 0.0007 (0.0050) model time 0.2454 (0.2551) loss 6.1747 (6.0994) grad_norm 2.1550 (1.7004) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][210/625] eta 0:01:47 lr 0.001598 wd 0.0500 time 0.2450 (0.2591) data time 0.0009 (0.0048) model time 0.2441 (0.2544) loss 7.2516 (6.1031) grad_norm 1.1971 (1.6976) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][220/625] eta 0:01:44 lr 0.001598 wd 0.0500 time 0.2486 (0.2591) data time 0.0010 (0.0046) model time 0.2476 (0.2546) loss 6.0522 (6.1083) grad_norm 1.5204 (1.6896) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][230/625] eta 0:01:42 lr 0.001598 wd 0.0500 time 0.2420 (0.2584) data time 0.0009 (0.0045) model time 0.2410 (0.2540) loss 4.7537 (6.0982) grad_norm 1.8265 (1.6876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][240/625] eta 0:01:39 lr 0.001597 wd 0.0500 time 0.2423 (0.2589) data time 0.0013 (0.0043) model time 0.2410 (0.2548) loss 4.8601 (6.0718) grad_norm 1.6011 (1.6870) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][250/625] eta 0:01:36 lr 0.001597 wd 0.0500 time 0.2438 (0.2585) data time 0.0009 (0.0042) model time 0.2429 (0.2543) loss 5.4119 (6.0614) grad_norm 2.7401 (1.6820) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][260/625] eta 0:01:34 lr 0.001597 wd 0.0500 time 0.2424 (0.2580) data time 0.0010 (0.0041) model time 0.2414 (0.2539) loss 6.9021 (6.0733) grad_norm 1.9797 (1.6930) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][270/625] eta 0:01:31 lr 0.001597 wd 0.0500 time 0.2423 (0.2579) data time 0.0010 (0.0040) model time 0.2413 (0.2539) loss 6.1688 (6.0766) grad_norm 2.0735 (1.6966) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][280/625] eta 0:01:28 lr 0.001597 wd 0.0500 time 0.2434 (0.2577) data time 0.0011 (0.0039) model time 0.2424 (0.2538) loss 5.4710 (6.0839) grad_norm 1.7368 (1.7273) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][290/625] eta 0:01:26 lr 0.001597 wd 0.0500 time 0.2405 (0.2578) data time 0.0010 (0.0038) model time 0.2395 (0.2540) loss 6.2383 (6.0777) grad_norm 2.4508 (1.7345) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][300/625] eta 0:01:23 lr 0.001597 wd 0.0500 time 0.2434 (0.2574) data time 0.0009 (0.0037) model time 0.2425 (0.2536) loss 6.6767 (6.0721) grad_norm 1.3683 (1.7253) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][310/625] eta 0:01:20 lr 0.001596 wd 0.0500 time 0.2428 (0.2570) data time 0.0010 (0.0036) model time 0.2418 (0.2533) loss 6.0114 (6.0695) grad_norm 2.0823 (1.7255) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][320/625] eta 0:01:18 lr 0.001596 wd 0.0500 time 0.2450 (0.2573) data time 0.0007 (0.0035) model time 0.2443 (0.2537) loss 6.9987 (6.0502) grad_norm 1.7995 (1.7317) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][330/625] eta 0:01:15 lr 0.001596 wd 0.0500 time 0.2404 (0.2569) data time 0.0008 (0.0034) model time 0.2395 (0.2534) loss 5.0699 (6.0412) grad_norm 1.1063 (1.7285) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][340/625] eta 0:01:13 lr 0.001596 wd 0.0500 time 0.2417 (0.2569) data time 0.0010 (0.0034) model time 0.2406 (0.2534) loss 5.4655 (6.0396) grad_norm 1.4127 (1.7410) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][350/625] eta 0:01:10 lr 0.001596 wd 0.0500 time 0.2434 (0.2565) data time 0.0007 (0.0033) model time 0.2427 (0.2531) loss 5.8989 (6.0484) grad_norm 1.4042 (1.7462) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][360/625] eta 0:01:07 lr 0.001596 wd 0.0500 time 0.2427 (0.2565) data time 0.0007 (0.0032) model time 0.2420 (0.2531) loss 7.1004 (6.0562) grad_norm 1.7863 (1.7456) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][370/625] eta 0:01:05 lr 0.001596 wd 0.0500 time 0.2458 (0.2566) data time 0.0007 (0.0032) model time 0.2450 (0.2533) loss 5.7849 (6.0487) grad_norm 1.8452 (1.7510) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][380/625] eta 0:01:02 lr 0.001595 wd 0.0500 time 0.2438 (0.2563) data time 0.0012 (0.0031) model time 0.2426 (0.2531) loss 6.4140 (6.0495) grad_norm 2.3922 (1.7477) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][390/625] eta 0:01:00 lr 0.001595 wd 0.0500 time 0.2429 (0.2560) data time 0.0009 (0.0031) model time 0.2419 (0.2528) loss 6.1091 (6.0550) grad_norm 0.9554 (1.7433) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][400/625] eta 0:00:57 lr 0.001595 wd 0.0500 time 0.2431 (0.2564) data time 0.0008 (0.0030) model time 0.2423 (0.2533) loss 4.4747 (6.0324) grad_norm 1.3385 (1.7333) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][410/625] eta 0:00:55 lr 0.001595 wd 0.0500 time 0.2413 (0.2561) data time 0.0013 (0.0030) model time 0.2400 (0.2530) loss 5.8606 (6.0171) grad_norm 2.3939 (1.7294) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][420/625] eta 0:00:52 lr 0.001595 wd 0.0500 time 0.2451 (0.2563) data time 0.0010 (0.0029) model time 0.2441 (0.2532) loss 6.8772 (6.0088) grad_norm 1.1719 (1.7290) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][430/625] eta 0:00:49 lr 0.001595 wd 0.0500 time 0.2473 (0.2560) data time 0.0009 (0.0029) model time 0.2464 (0.2530) loss 4.6971 (6.0057) grad_norm 1.6260 (1.7256) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][440/625] eta 0:00:47 lr 0.001595 wd 0.0500 time 0.2441 (0.2564) data time 0.0011 (0.0028) model time 0.2430 (0.2535) loss 6.3173 (6.0058) grad_norm 1.2038 (1.7357) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][450/625] eta 0:00:44 lr 0.001594 wd 0.0500 time 0.2444 (0.2561) data time 0.0012 (0.0028) model time 0.2432 (0.2532) loss 5.4216 (6.0112) grad_norm 1.9553 (1.7482) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][460/625] eta 0:00:42 lr 0.001594 wd 0.0500 time 0.2425 (0.2559) data time 0.0011 (0.0028) model time 0.2414 (0.2531) loss 5.3648 (6.0181) grad_norm 1.6987 (1.7438) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][470/625] eta 0:00:39 lr 0.001594 wd 0.0500 time 0.2432 (0.2559) data time 0.0011 (0.0027) model time 0.2421 (0.2531) loss 6.2805 (6.0155) grad_norm 1.6284 (1.7396) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][480/625] eta 0:00:37 lr 0.001594 wd 0.0500 time 0.2452 (0.2559) data time 0.0010 (0.0027) model time 0.2442 (0.2531) loss 5.5509 (6.0151) grad_norm 1.5118 (1.7370) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][490/625] eta 0:00:34 lr 0.001594 wd 0.0500 time 0.2440 (0.2557) data time 0.0007 (0.0027) model time 0.2433 (0.2529) loss 5.5672 (6.0187) grad_norm 2.2433 (1.7472) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][500/625] eta 0:00:31 lr 0.001594 wd 0.0500 time 0.2545 (0.2557) data time 0.0009 (0.0026) model time 0.2536 (0.2530) loss 4.9261 (6.0164) grad_norm 1.9700 (1.7518) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][510/625] eta 0:00:29 lr 0.001594 wd 0.0500 time 0.2487 (0.2555) data time 0.0012 (0.0026) model time 0.2475 (0.2528) loss 6.1526 (6.0083) grad_norm 1.2985 (1.7469) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][520/625] eta 0:00:26 lr 0.001593 wd 0.0500 time 0.2444 (0.2557) data time 0.0007 (0.0026) model time 0.2437 (0.2530) loss 4.3428 (6.0115) grad_norm 1.6487 (1.7450) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][530/625] eta 0:00:24 lr 0.001593 wd 0.0500 time 0.2554 (0.2555) data time 0.0012 (0.0025) model time 0.2542 (0.2528) loss 7.2462 (6.0144) grad_norm 1.3735 (1.7387) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][540/625] eta 0:00:21 lr 0.001593 wd 0.0500 time 0.2421 (0.2553) data time 0.0010 (0.0025) model time 0.2411 (0.2527) loss 5.6799 (6.0086) grad_norm 1.3304 (1.7322) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][550/625] eta 0:00:19 lr 0.001593 wd 0.0500 time 0.2475 (0.2553) data time 0.0010 (0.0025) model time 0.2465 (0.2527) loss 7.2929 (6.0192) grad_norm 1.1668 (1.7283) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][560/625] eta 0:00:16 lr 0.001593 wd 0.0500 time 0.2442 (0.2553) data time 0.0008 (0.0025) model time 0.2434 (0.2528) loss 6.5318 (6.0221) grad_norm 1.9211 (1.7307) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][570/625] eta 0:00:14 lr 0.001593 wd 0.0500 time 0.3114 (0.2553) data time 0.0007 (0.0024) model time 0.3107 (0.2528) loss 6.9502 (6.0202) grad_norm 2.4145 (1.7363) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][580/625] eta 0:00:11 lr 0.001593 wd 0.0500 time 0.2441 (0.2552) data time 0.0007 (0.0024) model time 0.2434 (0.2527) loss 4.9505 (6.0236) grad_norm 1.3484 (1.7318) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][590/625] eta 0:00:08 lr 0.001592 wd 0.0500 time 0.2417 (0.2550) data time 0.0010 (0.0024) model time 0.2407 (0.2525) loss 5.6096 (6.0216) grad_norm 2.3088 (1.7321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][600/625] eta 0:00:06 lr 0.001592 wd 0.0500 time 0.2451 (0.2553) data time 0.0007 (0.0024) model time 0.2444 (0.2528) loss 4.8030 (6.0120) grad_norm 1.7206 (1.7336) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][610/625] eta 0:00:03 lr 0.001592 wd 0.0500 time 0.2550 (0.2551) data time 0.0006 (0.0023) model time 0.2544 (0.2526) loss 6.3764 (6.0091) grad_norm 1.2294 (1.7289) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][620/625] eta 0:00:01 lr 0.001592 wd 0.0500 time 0.2895 (0.2550) data time 0.0004 (0.0023) model time 0.2891 (0.2525) loss 4.6252 (6.0047) grad_norm 2.3595 (1.7344) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 103 training takes 0:02:39 [2024-07-30 14:39:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:39:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.616 (0.616) Loss 0.6553 (0.6553) Acc@1 87.891 (87.891) Acc@5 98.291 (98.291) Mem 9655MB [2024-07-30 14:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.069 (0.119) Loss 1.0566 (0.8094) Acc@1 76.514 (82.928) Acc@5 93.750 (96.604) Mem 9655MB [2024-07-30 14:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.089) Loss 1.2812 (0.9784) Acc@1 71.143 (78.676) Acc@5 91.748 (94.613) Mem 9655MB [2024-07-30 14:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.445 Acc@5 94.578 [2024-07-30 14:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-07-30 14:39:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.937 (0.937) Loss 0.5659 (0.5659) Acc@1 88.135 (88.135) Acc@5 98.438 (98.438) Mem 9655MB [2024-07-30 14:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.156) Loss 0.9453 (0.7164) Acc@1 78.027 (84.273) Acc@5 94.678 (97.093) Mem 9655MB [2024-07-30 14:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.107) Loss 1.0957 (0.8615) Acc@1 73.047 (80.439) Acc@5 92.920 (95.357) Mem 9655MB [2024-07-30 14:39:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.102 Acc@5 95.347 [2024-07-30 14:39:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-30 14:39:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.10% [2024-07-30 14:39:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 14:39:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 14:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][0/625] eta 0:08:58 lr 0.001592 wd 0.0500 time 0.8617 (0.8617) data time 0.5902 (0.5902) model time 0.0000 (0.0000) loss 5.5048 (5.5048) grad_norm 1.5030 (1.5030) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][10/625] eta 0:03:04 lr 0.001592 wd 0.0500 time 0.2455 (0.3004) data time 0.0009 (0.0547) model time 0.0000 (0.0000) loss 6.8493 (6.4255) grad_norm 1.1737 (1.9985) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][20/625] eta 0:02:46 lr 0.001592 wd 0.0500 time 0.2736 (0.2756) data time 0.0008 (0.0291) model time 0.0000 (0.0000) loss 5.0268 (5.9706) grad_norm 2.1841 (2.0465) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][30/625] eta 0:02:42 lr 0.001591 wd 0.0500 time 0.2477 (0.2728) data time 0.0010 (0.0201) model time 0.0000 (0.0000) loss 6.5869 (6.0344) grad_norm 1.1375 (2.1826) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][40/625] eta 0:02:35 lr 0.001591 wd 0.0500 time 0.2429 (0.2661) data time 0.0011 (0.0155) model time 0.0000 (0.0000) loss 6.2159 (6.0844) grad_norm 1.3602 (2.0849) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][50/625] eta 0:02:31 lr 0.001591 wd 0.0500 time 0.2443 (0.2640) data time 0.0010 (0.0127) model time 0.0000 (0.0000) loss 6.9020 (5.9682) grad_norm 1.2188 (2.0520) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][60/625] eta 0:02:28 lr 0.001591 wd 0.0500 time 0.2990 (0.2621) data time 0.0007 (0.0108) model time 0.2983 (0.2511) loss 5.1272 (5.9442) grad_norm 1.1374 (1.9891) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][70/625] eta 0:02:25 lr 0.001591 wd 0.0500 time 0.2427 (0.2613) data time 0.0014 (0.0095) model time 0.2413 (0.2530) loss 5.8947 (5.9251) grad_norm 2.0164 (1.9312) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][80/625] eta 0:02:23 lr 0.001591 wd 0.0500 time 0.2457 (0.2634) data time 0.0012 (0.0085) model time 0.2445 (0.2612) loss 6.1138 (5.9031) grad_norm 1.8960 (1.8991) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][90/625] eta 0:02:19 lr 0.001591 wd 0.0500 time 0.2426 (0.2616) data time 0.0008 (0.0076) model time 0.2418 (0.2573) loss 7.9554 (5.9013) grad_norm 1.0878 (1.8912) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][100/625] eta 0:02:17 lr 0.001590 wd 0.0500 time 0.2649 (0.2620) data time 0.0014 (0.0070) model time 0.2635 (0.2588) loss 5.4742 (5.8489) grad_norm 1.5178 (1.8541) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][110/625] eta 0:02:14 lr 0.001590 wd 0.0500 time 0.2422 (0.2616) data time 0.0011 (0.0065) model time 0.2411 (0.2582) loss 4.5333 (5.8323) grad_norm 1.4090 (1.8241) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][120/625] eta 0:02:11 lr 0.001590 wd 0.0500 time 0.2404 (0.2607) data time 0.0015 (0.0061) model time 0.2389 (0.2569) loss 5.5072 (5.8361) grad_norm 1.6436 (1.7850) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][130/625] eta 0:02:08 lr 0.001590 wd 0.0500 time 0.2482 (0.2605) data time 0.0007 (0.0058) model time 0.2475 (0.2570) loss 5.1636 (5.8355) grad_norm 4.0598 (1.8163) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][140/625] eta 0:02:06 lr 0.001590 wd 0.0500 time 0.2440 (0.2603) data time 0.0011 (0.0055) model time 0.2429 (0.2568) loss 5.7334 (5.8912) grad_norm 1.2523 (1.8090) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][150/625] eta 0:02:03 lr 0.001590 wd 0.0500 time 0.2441 (0.2598) data time 0.0009 (0.0052) model time 0.2431 (0.2563) loss 6.4802 (5.8970) grad_norm 1.2222 (1.7986) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][160/625] eta 0:02:00 lr 0.001590 wd 0.0500 time 0.2466 (0.2589) data time 0.0009 (0.0050) model time 0.2457 (0.2552) loss 6.9072 (5.9353) grad_norm 1.3559 (1.7899) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][170/625] eta 0:01:57 lr 0.001589 wd 0.0500 time 0.2512 (0.2582) data time 0.0011 (0.0047) model time 0.2501 (0.2544) loss 4.9548 (5.9446) grad_norm 1.6162 (1.7849) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][180/625] eta 0:01:55 lr 0.001589 wd 0.0500 time 0.2475 (0.2598) data time 0.0009 (0.0046) model time 0.2466 (0.2568) loss 6.3417 (5.9296) grad_norm 1.6578 (1.7884) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][190/625] eta 0:01:52 lr 0.001589 wd 0.0500 time 0.2442 (0.2592) data time 0.0011 (0.0044) model time 0.2431 (0.2562) loss 4.5344 (5.9104) grad_norm 1.5742 (1.7779) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][200/625] eta 0:01:50 lr 0.001589 wd 0.0500 time 0.2435 (0.2593) data time 0.0010 (0.0042) model time 0.2425 (0.2564) loss 7.2457 (5.9211) grad_norm 2.5388 (1.7922) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][210/625] eta 0:01:47 lr 0.001589 wd 0.0500 time 0.2538 (0.2587) data time 0.0007 (0.0041) model time 0.2531 (0.2557) loss 4.5924 (5.9121) grad_norm 1.2670 (1.7999) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 14:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 14:40:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:40:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:47:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:47:25 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 104) [2024-07-30 14:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:47:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][220/625] eta 0:11:32 lr 0.001589 wd 0.0500 time 0.2690 (1.7093) data time 0.0012 (0.1025) model time 0.2678 (1.6068) loss 6.3237 (6.4467) grad_norm 3.2737 (1.9134) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:47:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][230/625] eta 0:05:38 lr 0.001589 wd 0.0500 time 0.2580 (0.8582) data time 0.0012 (0.0430) model time 0.2567 (0.8152) loss 5.4643 (6.3600) grad_norm 1.8097 (2.1503) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:47:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][240/625] eta 0:04:05 lr 0.001588 wd 0.0500 time 0.2696 (0.6385) data time 0.0008 (0.0278) model time 0.2688 (0.6107) loss 8.3103 (6.4352) grad_norm 1.3198 (2.0204) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][250/625] eta 0:03:21 lr 0.001588 wd 0.0500 time 0.2608 (0.5373) data time 0.0010 (0.0207) model time 0.2597 (0.5166) loss 5.0168 (6.3075) grad_norm 1.2151 (1.8431) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][260/625] eta 0:02:54 lr 0.001588 wd 0.0500 time 0.2685 (0.4793) data time 0.0007 (0.0166) model time 0.2678 (0.4627) loss 6.5486 (6.2695) grad_norm 1.3928 (1.7869) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][270/625] eta 0:02:36 lr 0.001588 wd 0.0500 time 0.2614 (0.4412) data time 0.0010 (0.0139) model time 0.2604 (0.4273) loss 6.0961 (6.2675) grad_norm 1.9200 (1.8317) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][280/625] eta 0:02:23 lr 0.001588 wd 0.0500 time 0.2679 (0.4146) data time 0.0010 (0.0120) model time 0.2669 (0.4026) loss 6.5340 (6.1897) grad_norm 2.0631 (1.8851) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][290/625] eta 0:02:12 lr 0.001588 wd 0.0500 time 0.2623 (0.3951) data time 0.0009 (0.0107) model time 0.2613 (0.3845) loss 6.4174 (6.1548) grad_norm 2.1342 (1.8515) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][300/625] eta 0:02:03 lr 0.001588 wd 0.0500 time 0.2657 (0.3803) data time 0.0010 (0.0096) model time 0.2647 (0.3707) loss 5.7005 (6.1089) grad_norm 1.8138 (1.8281) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][310/625] eta 0:01:56 lr 0.001587 wd 0.0500 time 0.2609 (0.3683) data time 0.0010 (0.0088) model time 0.2599 (0.3595) loss 6.7926 (6.1246) grad_norm 1.6792 (1.8006) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][320/625] eta 0:01:49 lr 0.001587 wd 0.0500 time 0.2622 (0.3586) data time 0.0007 (0.0081) model time 0.2615 (0.3505) loss 5.7970 (6.1569) grad_norm 2.8391 (1.8100) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][330/625] eta 0:01:43 lr 0.001587 wd 0.0500 time 0.2599 (0.3505) data time 0.0013 (0.0076) model time 0.2587 (0.3429) loss 6.5552 (6.1427) grad_norm 1.7781 (1.8215) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][340/625] eta 0:01:37 lr 0.001587 wd 0.0500 time 0.2632 (0.3436) data time 0.0011 (0.0071) model time 0.2621 (0.3365) loss 6.8758 (6.1275) grad_norm 1.2946 (1.8048) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][350/625] eta 0:01:33 lr 0.001587 wd 0.0500 time 0.2599 (0.3383) data time 0.0008 (0.0067) model time 0.2591 (0.3316) loss 4.7428 (6.1139) grad_norm 1.2864 (1.8181) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][360/625] eta 0:01:28 lr 0.001587 wd 0.0500 time 0.2640 (0.3336) data time 0.0012 (0.0064) model time 0.2628 (0.3271) loss 6.4372 (6.1107) grad_norm 1.3478 (1.8154) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][370/625] eta 0:01:23 lr 0.001587 wd 0.0500 time 0.2642 (0.3293) data time 0.0010 (0.0061) model time 0.2632 (0.3232) loss 5.2893 (6.1167) grad_norm 1.4875 (1.8137) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][380/625] eta 0:01:19 lr 0.001586 wd 0.0500 time 0.2776 (0.3260) data time 0.0010 (0.0058) model time 0.2766 (0.3202) loss 6.8903 (6.1346) grad_norm 1.6726 (1.8041) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][390/625] eta 0:01:18 lr 0.001586 wd 0.0500 time 0.2607 (0.3335) data time 0.0010 (0.0056) model time 0.2597 (0.3279) loss 5.6834 (6.1192) grad_norm 2.2964 (1.8191) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][400/625] eta 0:01:14 lr 0.001586 wd 0.0500 time 0.2667 (0.3301) data time 0.0008 (0.0054) model time 0.2659 (0.3246) loss 5.2258 (6.1107) grad_norm 1.3833 (1.8064) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][410/625] eta 0:01:10 lr 0.001586 wd 0.0500 time 0.2643 (0.3270) data time 0.0008 (0.0052) model time 0.2634 (0.3218) loss 6.7827 (6.0927) grad_norm 1.9585 (1.8113) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][420/625] eta 0:01:06 lr 0.001586 wd 0.0500 time 0.2654 (0.3244) data time 0.0008 (0.0050) model time 0.2646 (0.3193) loss 6.4467 (6.0678) grad_norm 1.9098 (1.8389) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][430/625] eta 0:01:02 lr 0.001586 wd 0.0500 time 0.2626 (0.3223) data time 0.0014 (0.0049) model time 0.2612 (0.3174) loss 6.3289 (6.0555) grad_norm 2.7294 (1.8448) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][440/625] eta 0:00:59 lr 0.001586 wd 0.0500 time 0.2650 (0.3200) data time 0.0010 (0.0048) model time 0.2640 (0.3152) loss 6.0701 (6.0633) grad_norm 3.3527 (1.8692) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][450/625] eta 0:00:55 lr 0.001585 wd 0.0500 time 0.2641 (0.3181) data time 0.0008 (0.0047) model time 0.2633 (0.3135) loss 6.2708 (6.0625) grad_norm 2.7028 (1.8679) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][460/625] eta 0:00:52 lr 0.001585 wd 0.0500 time 0.2572 (0.3164) data time 0.0015 (0.0045) model time 0.2558 (0.3118) loss 6.5984 (6.0540) grad_norm 2.0198 (1.8608) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][470/625] eta 0:00:48 lr 0.001585 wd 0.0500 time 0.2617 (0.3149) data time 0.0008 (0.0044) model time 0.2609 (0.3105) loss 4.5529 (6.0410) grad_norm 2.3412 (1.8575) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][480/625] eta 0:00:45 lr 0.001585 wd 0.0500 time 0.2637 (0.3132) data time 0.0007 (0.0043) model time 0.2630 (0.3088) loss 4.2372 (6.0272) grad_norm 1.8490 (1.8565) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][490/625] eta 0:00:42 lr 0.001585 wd 0.0500 time 0.2612 (0.3133) data time 0.0010 (0.0042) model time 0.2602 (0.3091) loss 5.9106 (6.0356) grad_norm 1.5312 (1.8500) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][500/625] eta 0:00:38 lr 0.001585 wd 0.0500 time 0.2590 (0.3119) data time 0.0009 (0.0042) model time 0.2581 (0.3077) loss 6.7257 (6.0370) grad_norm 1.7392 (1.8463) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][510/625] eta 0:00:35 lr 0.001585 wd 0.0500 time 0.2653 (0.3105) data time 0.0011 (0.0041) model time 0.2643 (0.3064) loss 4.7233 (6.0252) grad_norm 1.7944 (1.8398) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][520/625] eta 0:00:32 lr 0.001584 wd 0.0500 time 0.2614 (0.3091) data time 0.0009 (0.0040) model time 0.2604 (0.3051) loss 7.0553 (6.0197) grad_norm 2.6417 (1.8377) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][530/625] eta 0:00:29 lr 0.001584 wd 0.0500 time 0.2663 (0.3078) data time 0.0010 (0.0040) model time 0.2653 (0.3039) loss 6.3080 (6.0296) grad_norm 1.7840 (1.8349) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][540/625] eta 0:00:26 lr 0.001584 wd 0.0500 time 0.2735 (0.3167) data time 0.0007 (0.0039) model time 0.2728 (0.3128) loss 4.9184 (6.0382) grad_norm 1.3708 (1.8280) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][550/625] eta 0:00:23 lr 0.001584 wd 0.0500 time 0.2594 (0.3153) data time 0.0013 (0.0038) model time 0.2582 (0.3115) loss 5.7501 (6.0378) grad_norm 1.5386 (1.8268) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][560/625] eta 0:00:20 lr 0.001584 wd 0.0500 time 0.2622 (0.3141) data time 0.0014 (0.0037) model time 0.2609 (0.3104) loss 5.7787 (6.0439) grad_norm 1.5697 (1.8335) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][570/625] eta 0:00:17 lr 0.001584 wd 0.0500 time 0.2725 (0.3128) data time 0.0010 (0.0037) model time 0.2715 (0.3092) loss 5.1382 (6.0431) grad_norm 2.4132 (1.8513) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][580/625] eta 0:00:14 lr 0.001584 wd 0.0500 time 0.2627 (0.3116) data time 0.0007 (0.0036) model time 0.2620 (0.3080) loss 5.0886 (6.0421) grad_norm 1.5436 (1.8417) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][590/625] eta 0:00:10 lr 0.001583 wd 0.0500 time 0.2632 (0.3104) data time 0.0010 (0.0035) model time 0.2622 (0.3068) loss 6.0634 (6.0356) grad_norm 1.1485 (1.8304) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][600/625] eta 0:00:07 lr 0.001583 wd 0.0500 time 0.2643 (0.3093) data time 0.0008 (0.0035) model time 0.2635 (0.3058) loss 6.5231 (6.0325) grad_norm 1.4705 (1.8293) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][610/625] eta 0:00:04 lr 0.001583 wd 0.0500 time 0.2635 (0.3082) data time 0.0005 (0.0034) model time 0.2630 (0.3048) loss 6.4769 (6.0324) grad_norm 1.6533 (1.8319) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][620/625] eta 0:00:01 lr 0.001583 wd 0.0500 time 0.2658 (0.3072) data time 0.0007 (0.0034) model time 0.2650 (0.3038) loss 5.7916 (6.0319) grad_norm 2.4666 (1.8348) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 104 training takes 0:02:06 [2024-07-30 14:49:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:49:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 14:55:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 14:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 14:55:53 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 14:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 14:56:10 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 14:56:10 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 14:56:10 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 14:56:11 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 104) [2024-07-30 14:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 14:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][0/625] eta 1:28:02 lr 0.001583 wd 0.0500 time 8.4528 (8.4528) data time 0.8910 (0.8910) model time 0.0000 (0.0000) loss 6.8825 (6.8825) grad_norm 1.3512 (1.3512) loss_scale 2048.0000 (2048.0000) mem 10976MB [2024-07-30 14:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][10/625] eta 0:10:40 lr 0.001583 wd 0.0500 time 0.2513 (1.0415) data time 0.0010 (0.0821) model time 0.0000 (0.0000) loss 5.3001 (6.4967) grad_norm 1.5769 (2.0465) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][20/625] eta 0:06:43 lr 0.001583 wd 0.0500 time 0.2539 (0.6669) data time 0.0016 (0.0436) model time 0.0000 (0.0000) loss 6.2804 (6.3056) grad_norm 1.5143 (1.8275) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][30/625] eta 0:05:17 lr 0.001582 wd 0.0500 time 0.2536 (0.5337) data time 0.0007 (0.0299) model time 0.0000 (0.0000) loss 5.5968 (6.3694) grad_norm 2.8566 (1.9061) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][40/625] eta 0:04:32 lr 0.001582 wd 0.0500 time 0.2572 (0.4666) data time 0.0014 (0.0229) model time 0.0000 (0.0000) loss 5.9818 (6.2687) grad_norm 2.4412 (1.8895) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][50/625] eta 0:04:04 lr 0.001582 wd 0.0500 time 0.2521 (0.4248) data time 0.0008 (0.0187) model time 0.0000 (0.0000) loss 6.6427 (6.2392) grad_norm 1.0684 (1.8180) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][60/625] eta 0:03:44 lr 0.001582 wd 0.0500 time 0.2513 (0.3969) data time 0.0009 (0.0158) model time 0.2504 (0.2533) loss 5.7651 (6.1849) grad_norm 1.1376 (1.7768) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][70/625] eta 0:03:29 lr 0.001582 wd 0.0500 time 0.2521 (0.3769) data time 0.0008 (0.0138) model time 0.2513 (0.2535) loss 5.4291 (6.1240) grad_norm 1.2268 (1.7475) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][80/625] eta 0:03:17 lr 0.001582 wd 0.0500 time 0.2582 (0.3620) data time 0.0010 (0.0122) model time 0.2573 (0.2539) loss 5.3279 (6.1090) grad_norm 1.3977 (1.7151) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][90/625] eta 0:03:07 lr 0.001582 wd 0.0500 time 0.2606 (0.3506) data time 0.0008 (0.0110) model time 0.2598 (0.2547) loss 6.4251 (6.0853) grad_norm 3.7214 (1.7416) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][100/625] eta 0:02:59 lr 0.001581 wd 0.0500 time 0.2495 (0.3410) data time 0.0008 (0.0101) model time 0.2487 (0.2543) loss 5.8660 (6.1184) grad_norm 3.7122 (1.7943) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][110/625] eta 0:02:51 lr 0.001581 wd 0.0500 time 0.2530 (0.3332) data time 0.0008 (0.0093) model time 0.2522 (0.2540) loss 5.8711 (6.1298) grad_norm 1.9513 (1.7937) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][120/625] eta 0:02:44 lr 0.001581 wd 0.0500 time 0.2487 (0.3267) data time 0.0007 (0.0086) model time 0.2480 (0.2540) loss 4.2338 (6.1206) grad_norm 1.4746 (1.7780) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][130/625] eta 0:02:38 lr 0.001581 wd 0.0500 time 0.2552 (0.3212) data time 0.0008 (0.0080) model time 0.2544 (0.2540) loss 7.7255 (6.1345) grad_norm 1.1496 (1.7364) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][140/625] eta 0:02:33 lr 0.001581 wd 0.0500 time 0.2551 (0.3165) data time 0.0007 (0.0075) model time 0.2545 (0.2539) loss 5.3862 (6.1140) grad_norm 1.2142 (1.7124) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][150/625] eta 0:02:28 lr 0.001581 wd 0.0500 time 0.2561 (0.3125) data time 0.0008 (0.0071) model time 0.2553 (0.2540) loss 5.2143 (6.1031) grad_norm 1.9891 (1.7795) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][160/625] eta 0:02:23 lr 0.001581 wd 0.0500 time 0.2538 (0.3090) data time 0.0009 (0.0068) model time 0.2529 (0.2541) loss 6.8719 (6.1305) grad_norm 1.1509 (1.7715) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][170/625] eta 0:02:19 lr 0.001580 wd 0.0500 time 0.2554 (0.3060) data time 0.0008 (0.0065) model time 0.2546 (0.2542) loss 5.8341 (6.1313) grad_norm 1.6911 (1.7648) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][180/625] eta 0:02:14 lr 0.001580 wd 0.0500 time 0.2535 (0.3032) data time 0.0011 (0.0062) model time 0.2524 (0.2542) loss 7.0341 (6.1120) grad_norm 2.1969 (1.7931) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][190/625] eta 0:02:10 lr 0.001580 wd 0.0500 time 0.2512 (0.3007) data time 0.0008 (0.0060) model time 0.2504 (0.2542) loss 6.2343 (6.1190) grad_norm 3.7934 (1.8215) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][200/625] eta 0:02:06 lr 0.001580 wd 0.0500 time 0.2532 (0.2985) data time 0.0008 (0.0057) model time 0.2524 (0.2542) loss 5.3964 (6.0865) grad_norm 1.6604 (1.8506) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][210/625] eta 0:02:03 lr 0.001580 wd 0.0500 time 0.2535 (0.2964) data time 0.0008 (0.0055) model time 0.2527 (0.2542) loss 7.1432 (6.0860) grad_norm 1.9182 (1.8613) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][220/625] eta 0:01:59 lr 0.001580 wd 0.0500 time 0.2537 (0.2946) data time 0.0014 (0.0053) model time 0.2523 (0.2542) loss 6.6587 (6.0795) grad_norm 1.7698 (1.8489) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][230/625] eta 0:01:55 lr 0.001580 wd 0.0500 time 0.2571 (0.2929) data time 0.0009 (0.0051) model time 0.2562 (0.2542) loss 4.5530 (6.0779) grad_norm 1.4914 (1.8434) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][240/625] eta 0:01:52 lr 0.001579 wd 0.0500 time 0.2687 (0.2914) data time 0.0007 (0.0050) model time 0.2679 (0.2544) loss 6.5591 (6.0726) grad_norm 2.9449 (1.8552) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][250/625] eta 0:01:48 lr 0.001579 wd 0.0500 time 0.2560 (0.2900) data time 0.0007 (0.0048) model time 0.2553 (0.2544) loss 6.0414 (6.0630) grad_norm 2.3678 (1.8487) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][260/625] eta 0:01:45 lr 0.001579 wd 0.0500 time 0.2546 (0.2887) data time 0.0007 (0.0047) model time 0.2539 (0.2544) loss 5.7482 (6.0492) grad_norm 1.2337 (1.8308) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][270/625] eta 0:01:42 lr 0.001579 wd 0.0500 time 0.2662 (0.2879) data time 0.0008 (0.0045) model time 0.2654 (0.2549) loss 5.9658 (6.0366) grad_norm 1.7151 (1.8124) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][280/625] eta 0:01:38 lr 0.001579 wd 0.0500 time 0.2619 (0.2868) data time 0.0010 (0.0044) model time 0.2609 (0.2550) loss 6.0380 (6.0438) grad_norm 1.7278 (1.8039) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][290/625] eta 0:01:35 lr 0.001579 wd 0.0500 time 0.2585 (0.2858) data time 0.0008 (0.0043) model time 0.2577 (0.2550) loss 5.1261 (6.0367) grad_norm 1.9741 (1.7963) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][300/625] eta 0:01:32 lr 0.001579 wd 0.0500 time 0.2522 (0.2848) data time 0.0008 (0.0042) model time 0.2514 (0.2550) loss 4.8915 (6.0201) grad_norm 1.7298 (1.7915) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][310/625] eta 0:01:29 lr 0.001578 wd 0.0500 time 0.2614 (0.2839) data time 0.0009 (0.0041) model time 0.2605 (0.2550) loss 6.2338 (6.0196) grad_norm 2.0868 (1.8113) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][320/625] eta 0:01:26 lr 0.001578 wd 0.0500 time 0.2535 (0.2831) data time 0.0007 (0.0040) model time 0.2529 (0.2551) loss 7.2244 (6.0350) grad_norm 1.2419 (1.8111) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][330/625] eta 0:01:23 lr 0.001578 wd 0.0500 time 0.2586 (0.2823) data time 0.0006 (0.0039) model time 0.2580 (0.2551) loss 4.5349 (6.0296) grad_norm 1.9885 (1.8080) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][340/625] eta 0:01:20 lr 0.001578 wd 0.0500 time 0.2508 (0.2816) data time 0.0008 (0.0039) model time 0.2500 (0.2551) loss 7.0646 (6.0371) grad_norm 1.6670 (1.8000) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][350/625] eta 0:01:17 lr 0.001578 wd 0.0500 time 0.2587 (0.2810) data time 0.0008 (0.0038) model time 0.2579 (0.2553) loss 5.9183 (6.0380) grad_norm 1.5231 (1.7900) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][360/625] eta 0:01:14 lr 0.001578 wd 0.0500 time 0.2578 (0.2803) data time 0.0006 (0.0037) model time 0.2572 (0.2553) loss 5.6836 (6.0371) grad_norm 1.0471 (1.7757) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][370/625] eta 0:01:11 lr 0.001578 wd 0.0500 time 0.2557 (0.2797) data time 0.0009 (0.0037) model time 0.2548 (0.2553) loss 5.3041 (6.0318) grad_norm 2.0458 (1.7737) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][380/625] eta 0:01:08 lr 0.001577 wd 0.0500 time 0.2608 (0.2792) data time 0.0006 (0.0036) model time 0.2602 (0.2554) loss 4.9004 (6.0298) grad_norm 2.0976 (1.7674) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 14:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][390/625] eta 0:01:05 lr 0.001577 wd 0.0500 time 0.2535 (0.2786) data time 0.0006 (0.0035) model time 0.2529 (0.2554) loss 6.6207 (6.0182) grad_norm 1.8497 (1.7707) loss_scale 4096.0000 (2084.6650) mem 9656MB [2024-07-30 14:58:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][400/625] eta 0:01:02 lr 0.001577 wd 0.0500 time 0.2574 (0.2781) data time 0.0009 (0.0035) model time 0.2565 (0.2554) loss 6.4939 (6.0289) grad_norm 1.4429 (1.7676) loss_scale 4096.0000 (2134.8229) mem 9656MB [2024-07-30 14:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][410/625] eta 0:00:59 lr 0.001577 wd 0.0500 time 0.2596 (0.2775) data time 0.0009 (0.0034) model time 0.2587 (0.2554) loss 6.5262 (6.0364) grad_norm 2.2580 (1.7792) loss_scale 4096.0000 (2182.5401) mem 9656MB [2024-07-30 14:58:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][420/625] eta 0:00:56 lr 0.001577 wd 0.0500 time 0.2425 (0.2777) data time 0.0006 (0.0034) model time 0.2419 (0.2561) loss 6.8074 (6.0343) grad_norm 1.2606 (1.7750) loss_scale 4096.0000 (2227.9905) mem 9656MB [2024-07-30 14:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][430/625] eta 0:00:54 lr 0.001577 wd 0.0500 time 0.2596 (0.2772) data time 0.0007 (0.0033) model time 0.2590 (0.2562) loss 6.2543 (6.0382) grad_norm 3.1448 (1.7794) loss_scale 4096.0000 (2271.3318) mem 9656MB [2024-07-30 14:58:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 14:58:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 14:58:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:00:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:00:19 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 105) [2024-07-30 15:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][440/625] eta 0:03:29 lr 0.001576 wd 0.0500 time 0.2612 (1.1346) data time 0.0008 (0.1142) model time 0.2604 (1.0204) loss 6.7852 (6.8768) grad_norm 1.3353 (1.9995) loss_scale 4096.0000 (4096.0000) mem 9661MB [2024-07-30 15:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:00:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:00:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:02:52 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 105) [2024-07-30 15:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][450/625] eta 0:14:18 lr 0.001576 wd 0.0500 time 1.0386 (4.9046) data time 0.0010 (0.6364) model time 1.0376 (4.2682) loss 7.4484 (7.3158) grad_norm 1.2932 (1.2605) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][460/625] eta 0:02:51 lr 0.001576 wd 0.0500 time 0.2735 (1.0368) data time 0.0007 (0.1069) model time 0.2728 (0.9299) loss 5.1901 (6.5120) grad_norm 1.8841 (1.6228) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][470/625] eta 0:01:46 lr 0.001576 wd 0.0500 time 0.2626 (0.6863) data time 0.0009 (0.0588) model time 0.2617 (0.6275) loss 6.9147 (6.4059) grad_norm 1.4909 (1.6923) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][480/625] eta 0:01:20 lr 0.001576 wd 0.0500 time 0.2592 (0.5547) data time 0.0008 (0.0407) model time 0.2584 (0.5140) loss 6.0727 (6.3842) grad_norm 1.8345 (1.7249) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][490/625] eta 0:01:05 lr 0.001576 wd 0.0500 time 0.2594 (0.4861) data time 0.0011 (0.0313) model time 0.2584 (0.4548) loss 6.6465 (6.2577) grad_norm 1.6934 (1.8363) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][500/625] eta 0:00:55 lr 0.001576 wd 0.0500 time 0.2646 (0.4434) data time 0.0009 (0.0255) model time 0.2637 (0.4179) loss 5.8133 (6.1857) grad_norm 2.1704 (1.8496) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][510/625] eta 0:00:47 lr 0.001575 wd 0.0500 time 0.2613 (0.4144) data time 0.0007 (0.0215) model time 0.2605 (0.3929) loss 6.7777 (6.1666) grad_norm 2.1753 (1.9270) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][520/625] eta 0:00:41 lr 0.001575 wd 0.0500 time 0.2687 (0.3939) data time 0.0012 (0.0187) model time 0.2675 (0.3752) loss 5.4936 (6.0985) grad_norm 1.9441 (1.8908) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][530/625] eta 0:00:35 lr 0.001575 wd 0.0500 time 0.2638 (0.3781) data time 0.0010 (0.0166) model time 0.2628 (0.3616) loss 6.1609 (6.0935) grad_norm 1.4747 (1.8408) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][540/625] eta 0:00:31 lr 0.001575 wd 0.0500 time 0.2685 (0.3658) data time 0.0008 (0.0149) model time 0.2677 (0.3509) loss 4.8258 (6.0644) grad_norm 1.9964 (1.8168) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][550/625] eta 0:00:26 lr 0.001575 wd 0.0500 time 0.2645 (0.3558) data time 0.0008 (0.0135) model time 0.2637 (0.3423) loss 6.8673 (6.1002) grad_norm 1.7508 (1.8122) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][560/625] eta 0:00:22 lr 0.001575 wd 0.0500 time 0.2721 (0.3476) data time 0.0011 (0.0124) model time 0.2711 (0.3352) loss 7.4727 (6.1050) grad_norm 2.7254 (1.8281) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:03:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][570/625] eta 0:00:18 lr 0.001575 wd 0.0500 time 0.2765 (0.3410) data time 0.0007 (0.0115) model time 0.2758 (0.3295) loss 6.3605 (6.1034) grad_norm 1.7174 (1.8074) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][580/625] eta 0:00:15 lr 0.001574 wd 0.0500 time 0.2596 (0.3353) data time 0.0010 (0.0107) model time 0.2586 (0.3246) loss 5.8996 (6.0812) grad_norm 1.6994 (1.7915) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][590/625] eta 0:00:11 lr 0.001574 wd 0.0500 time 0.2634 (0.3305) data time 0.0012 (0.0100) model time 0.2622 (0.3205) loss 5.9057 (6.0661) grad_norm 2.1198 (1.7813) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][600/625] eta 0:00:08 lr 0.001574 wd 0.0500 time 0.2764 (0.3264) data time 0.0010 (0.0094) model time 0.2754 (0.3170) loss 6.7708 (6.0575) grad_norm 1.4744 (1.7810) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][610/625] eta 0:00:04 lr 0.001574 wd 0.0500 time 0.2643 (0.3226) data time 0.0007 (0.0089) model time 0.2636 (0.3137) loss 5.8685 (6.0742) grad_norm 1.7317 (1.7960) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][620/625] eta 0:00:01 lr 0.001574 wd 0.0500 time 0.2620 (0.3191) data time 0.0007 (0.0084) model time 0.2613 (0.3107) loss 5.7027 (6.0674) grad_norm 2.1352 (1.8201) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 105 training takes 0:00:55 [2024-07-30 15:04:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:04:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.6504 (0.6504) Acc@1 87.549 (87.549) Acc@5 98.096 (98.096) Mem 9656MB [2024-07-30 15:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.098) Loss 1.0713 (0.8093) Acc@1 75.732 (82.932) Acc@5 93.701 (96.529) Mem 9656MB [2024-07-30 15:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.2217 (0.9717) Acc@1 72.363 (79.057) Acc@5 91.504 (94.606) Mem 9656MB [2024-07-30 15:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.769 Acc@5 94.620 [2024-07-30 15:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-30 15:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.77% [2024-07-30 15:04:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 15:04:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 15:04:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5674 (0.5674) Acc@1 88.232 (88.232) Acc@5 98.389 (98.389) Mem 9656MB [2024-07-30 15:04:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9453 (0.7169) Acc@1 78.076 (84.313) Acc@5 94.824 (97.097) Mem 9656MB [2024-07-30 15:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0947 (0.8615) Acc@1 72.900 (80.485) Acc@5 92.969 (95.403) Mem 9656MB [2024-07-30 15:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.142 Acc@5 95.389 [2024-07-30 15:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-30 15:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.14% [2024-07-30 15:04:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:04:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][0/625] eta 0:07:24 lr 0.001574 wd 0.0500 time 0.7104 (0.7104) data time 0.3777 (0.3777) model time 0.0000 (0.0000) loss 5.8595 (5.8595) grad_norm 2.6229 (2.6229) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 15:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][10/625] eta 0:03:07 lr 0.001574 wd 0.0500 time 0.2657 (0.3050) data time 0.0008 (0.0353) model time 0.0000 (0.0000) loss 5.6946 (5.8483) grad_norm 0.9824 (1.5953) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][20/625] eta 0:02:53 lr 0.001573 wd 0.0500 time 0.2621 (0.2869) data time 0.0009 (0.0190) model time 0.0000 (0.0000) loss 6.1055 (5.8640) grad_norm 1.0212 (1.7553) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][30/625] eta 0:02:46 lr 0.001573 wd 0.0500 time 0.2743 (0.2803) data time 0.0008 (0.0132) model time 0.0000 (0.0000) loss 6.7150 (5.7948) grad_norm 2.2492 (1.7609) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][40/625] eta 0:02:42 lr 0.001573 wd 0.0500 time 0.2692 (0.2771) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 5.9427 (5.8212) grad_norm 1.1722 (1.7429) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][50/625] eta 0:02:38 lr 0.001573 wd 0.0500 time 0.2642 (0.2750) data time 0.0011 (0.0084) model time 0.0000 (0.0000) loss 6.2779 (5.8702) grad_norm 1.4136 (1.6765) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:04:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:04:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:06:40 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 106) [2024-07-30 15:06:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][60/625] eta 0:13:01 lr 0.001573 wd 0.0500 time 0.2560 (1.3830) data time 0.0011 (0.1191) model time 0.2549 (1.2639) loss 6.6010 (6.6493) grad_norm 2.1056 (1.8456) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][70/625] eta 0:07:36 lr 0.001573 wd 0.0500 time 0.2601 (0.8217) data time 0.0008 (0.0602) model time 0.2593 (0.7615) loss 6.4315 (6.3247) grad_norm 2.1516 (1.7780) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][80/625] eta 0:05:45 lr 0.001573 wd 0.0500 time 0.2583 (0.6341) data time 0.0013 (0.0405) model time 0.2570 (0.5936) loss 7.1662 (6.3624) grad_norm 2.1841 (1.9350) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][90/625] eta 0:04:49 lr 0.001572 wd 0.0500 time 0.2557 (0.5402) data time 0.0009 (0.0306) model time 0.2547 (0.5096) loss 5.0550 (6.2020) grad_norm 2.6248 (2.0401) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][100/625] eta 0:04:14 lr 0.001572 wd 0.0500 time 0.2612 (0.4841) data time 0.0010 (0.0247) model time 0.2602 (0.4593) loss 5.3166 (6.1996) grad_norm 1.3321 (2.1651) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][110/625] eta 0:03:49 lr 0.001572 wd 0.0500 time 0.2601 (0.4464) data time 0.0008 (0.0208) model time 0.2593 (0.4256) loss 6.3213 (6.1616) grad_norm 1.3936 (2.0701) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][120/625] eta 0:03:31 lr 0.001572 wd 0.0500 time 0.2578 (0.4198) data time 0.0008 (0.0180) model time 0.2570 (0.4018) loss 4.9630 (6.1314) grad_norm 1.6141 (1.9847) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:07:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:07:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:09:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:09:35 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:09:46 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:09:47 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:09:47 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:09:47 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 106) [2024-07-30 15:09:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][130/625] eta 0:16:26 lr 0.001572 wd 0.0500 time 0.2656 (1.9937) data time 0.0007 (0.2043) model time 0.2649 (1.7893) loss 6.8708 (6.6940) grad_norm 1.4034 (1.8766) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][140/625] eta 0:06:49 lr 0.001572 wd 0.0500 time 0.2705 (0.8437) data time 0.0008 (0.0689) model time 0.2697 (0.7749) loss 6.5598 (6.4105) grad_norm 1.7950 (1.8758) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][150/625] eta 0:04:51 lr 0.001572 wd 0.0500 time 0.2693 (0.6130) data time 0.0009 (0.0417) model time 0.2684 (0.5713) loss 6.8155 (6.3687) grad_norm 2.5881 (1.9724) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][160/625] eta 0:03:58 lr 0.001571 wd 0.0500 time 0.2642 (0.5139) data time 0.0009 (0.0301) model time 0.2633 (0.4838) loss 5.9521 (6.3388) grad_norm 1.4648 (1.9300) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][170/625] eta 0:03:29 lr 0.001571 wd 0.0500 time 0.2698 (0.4596) data time 0.0010 (0.0237) model time 0.2689 (0.4360) loss 7.2028 (6.2728) grad_norm 1.2402 (1.9411) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][180/625] eta 0:03:09 lr 0.001571 wd 0.0500 time 0.2720 (0.4251) data time 0.0007 (0.0196) model time 0.2713 (0.4055) loss 5.2062 (6.2494) grad_norm 1.8008 (1.9337) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][190/625] eta 0:02:54 lr 0.001571 wd 0.0500 time 0.2659 (0.4009) data time 0.0010 (0.0167) model time 0.2650 (0.3842) loss 6.6600 (6.2230) grad_norm 1.2747 (1.8729) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][200/625] eta 0:02:42 lr 0.001571 wd 0.0500 time 0.2647 (0.3830) data time 0.0010 (0.0146) model time 0.2637 (0.3683) loss 4.8617 (6.1621) grad_norm 1.1074 (1.8503) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][210/625] eta 0:02:33 lr 0.001571 wd 0.0500 time 0.2649 (0.3692) data time 0.0007 (0.0130) model time 0.2642 (0.3561) loss 6.2736 (6.1516) grad_norm 1.7552 (1.7964) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][220/625] eta 0:02:25 lr 0.001571 wd 0.0500 time 0.2684 (0.3585) data time 0.0012 (0.0118) model time 0.2673 (0.3468) loss 6.9723 (6.1521) grad_norm 1.2291 (1.7586) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][230/625] eta 0:02:18 lr 0.001570 wd 0.0500 time 0.2685 (0.3503) data time 0.0010 (0.0107) model time 0.2675 (0.3396) loss 4.9623 (6.1563) grad_norm 1.5240 (1.7492) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][240/625] eta 0:02:12 lr 0.001570 wd 0.0500 time 0.2655 (0.3430) data time 0.0007 (0.0099) model time 0.2648 (0.3331) loss 5.4129 (6.1374) grad_norm 1.9097 (1.7561) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][250/625] eta 0:02:06 lr 0.001570 wd 0.0500 time 0.2669 (0.3370) data time 0.0008 (0.0092) model time 0.2661 (0.3278) loss 5.4615 (6.1223) grad_norm 1.4874 (1.7343) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][260/625] eta 0:02:01 lr 0.001570 wd 0.0500 time 0.2703 (0.3320) data time 0.0007 (0.0086) model time 0.2696 (0.3235) loss 5.1390 (6.1027) grad_norm 2.0685 (1.7477) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][270/625] eta 0:01:56 lr 0.001570 wd 0.0500 time 0.2661 (0.3276) data time 0.0009 (0.0080) model time 0.2651 (0.3195) loss 5.8936 (6.0816) grad_norm 1.6855 (1.7609) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][280/625] eta 0:01:51 lr 0.001570 wd 0.0500 time 0.2654 (0.3237) data time 0.0009 (0.0076) model time 0.2644 (0.3161) loss 7.0461 (6.0682) grad_norm 1.4686 (1.7317) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][290/625] eta 0:01:47 lr 0.001570 wd 0.0500 time 0.2660 (0.3203) data time 0.0009 (0.0072) model time 0.2650 (0.3131) loss 7.3019 (6.0733) grad_norm 1.9095 (1.7585) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][300/625] eta 0:01:43 lr 0.001569 wd 0.0500 time 0.2674 (0.3173) data time 0.0007 (0.0069) model time 0.2667 (0.3104) loss 6.6774 (6.0575) grad_norm 2.1932 (1.7525) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][310/625] eta 0:01:39 lr 0.001569 wd 0.0500 time 0.2638 (0.3147) data time 0.0011 (0.0065) model time 0.2627 (0.3081) loss 6.4427 (6.0504) grad_norm 1.5310 (1.7496) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][320/625] eta 0:01:35 lr 0.001569 wd 0.0500 time 0.2729 (0.3123) data time 0.0009 (0.0063) model time 0.2719 (0.3061) loss 4.4106 (6.0262) grad_norm 1.6130 (1.7384) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][330/625] eta 0:01:31 lr 0.001569 wd 0.0500 time 0.2690 (0.3103) data time 0.0009 (0.0060) model time 0.2681 (0.3043) loss 5.8321 (6.0101) grad_norm 2.6014 (1.7324) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][340/625] eta 0:01:27 lr 0.001569 wd 0.0500 time 0.2806 (0.3084) data time 0.0010 (0.0058) model time 0.2796 (0.3026) loss 5.5909 (6.0077) grad_norm 1.0470 (1.7318) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][350/625] eta 0:01:24 lr 0.001569 wd 0.0500 time 0.2711 (0.3066) data time 0.0008 (0.0056) model time 0.2703 (0.3011) loss 5.9054 (6.0053) grad_norm 1.2057 (1.7222) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][360/625] eta 0:01:20 lr 0.001569 wd 0.0500 time 0.2714 (0.3050) data time 0.0009 (0.0054) model time 0.2705 (0.2996) loss 6.8269 (5.9999) grad_norm 1.6872 (1.7122) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][370/625] eta 0:01:17 lr 0.001568 wd 0.0500 time 0.2736 (0.3036) data time 0.0009 (0.0052) model time 0.2728 (0.2984) loss 6.1143 (5.9973) grad_norm 1.4097 (1.7087) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][380/625] eta 0:01:14 lr 0.001568 wd 0.0500 time 0.2682 (0.3023) data time 0.0010 (0.0050) model time 0.2672 (0.2973) loss 4.9912 (5.9882) grad_norm 2.0395 (1.7115) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][390/625] eta 0:01:10 lr 0.001568 wd 0.0500 time 0.2693 (0.3010) data time 0.0007 (0.0049) model time 0.2686 (0.2962) loss 4.3674 (5.9636) grad_norm 2.4224 (1.7327) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][400/625] eta 0:01:07 lr 0.001568 wd 0.0500 time 0.2790 (0.3000) data time 0.0009 (0.0047) model time 0.2781 (0.2952) loss 6.0438 (5.9613) grad_norm 1.7994 (1.7318) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][410/625] eta 0:01:04 lr 0.001568 wd 0.0500 time 0.2675 (0.2989) data time 0.0009 (0.0046) model time 0.2665 (0.2943) loss 6.8688 (5.9635) grad_norm 1.4921 (1.7377) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][420/625] eta 0:01:01 lr 0.001568 wd 0.0500 time 0.2744 (0.2980) data time 0.0009 (0.0045) model time 0.2735 (0.2935) loss 6.3701 (5.9631) grad_norm 1.4336 (1.7423) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][430/625] eta 0:00:57 lr 0.001567 wd 0.0500 time 0.2688 (0.2970) data time 0.0011 (0.0044) model time 0.2677 (0.2927) loss 5.3830 (5.9539) grad_norm 2.9133 (1.7419) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][440/625] eta 0:00:54 lr 0.001567 wd 0.0500 time 0.2687 (0.2962) data time 0.0009 (0.0043) model time 0.2678 (0.2920) loss 7.1857 (5.9607) grad_norm 1.9982 (1.7513) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][450/625] eta 0:00:51 lr 0.001567 wd 0.0500 time 0.2681 (0.2954) data time 0.0007 (0.0042) model time 0.2674 (0.2912) loss 7.1491 (5.9785) grad_norm 1.1771 (1.7499) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][460/625] eta 0:00:48 lr 0.001567 wd 0.0500 time 0.2688 (0.2946) data time 0.0008 (0.0041) model time 0.2680 (0.2906) loss 6.6502 (5.9787) grad_norm 2.0528 (1.7462) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][470/625] eta 0:00:45 lr 0.001567 wd 0.0500 time 0.2712 (0.2939) data time 0.0007 (0.0040) model time 0.2705 (0.2899) loss 5.0453 (5.9766) grad_norm 1.4298 (1.7442) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][480/625] eta 0:00:42 lr 0.001567 wd 0.0500 time 0.2675 (0.2932) data time 0.0010 (0.0039) model time 0.2665 (0.2893) loss 5.7666 (5.9824) grad_norm 2.9749 (1.7380) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][490/625] eta 0:00:39 lr 0.001567 wd 0.0500 time 0.2712 (0.2926) data time 0.0010 (0.0038) model time 0.2702 (0.2888) loss 5.9367 (5.9798) grad_norm 1.2572 (1.7430) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][500/625] eta 0:00:36 lr 0.001566 wd 0.0500 time 0.2754 (0.2920) data time 0.0007 (0.0037) model time 0.2747 (0.2883) loss 5.0207 (5.9817) grad_norm 1.8444 (1.7454) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][510/625] eta 0:00:33 lr 0.001566 wd 0.0500 time 0.2665 (0.2914) data time 0.0008 (0.0037) model time 0.2657 (0.2878) loss 4.7998 (5.9760) grad_norm 2.8157 (1.7538) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][520/625] eta 0:00:30 lr 0.001566 wd 0.0500 time 0.2767 (0.2909) data time 0.0010 (0.0036) model time 0.2757 (0.2873) loss 6.9921 (5.9772) grad_norm 3.2525 (1.7612) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][530/625] eta 0:00:27 lr 0.001566 wd 0.0500 time 0.2735 (0.2905) data time 0.0008 (0.0035) model time 0.2727 (0.2869) loss 5.7393 (5.9853) grad_norm 1.5348 (1.7599) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][540/625] eta 0:00:24 lr 0.001566 wd 0.0500 time 0.2672 (0.2899) data time 0.0009 (0.0035) model time 0.2663 (0.2865) loss 7.2022 (5.9881) grad_norm 1.3808 (1.7597) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][550/625] eta 0:00:21 lr 0.001566 wd 0.0500 time 0.2718 (0.2900) data time 0.0007 (0.0034) model time 0.2711 (0.2866) loss 6.0291 (5.9850) grad_norm 1.8074 (1.7661) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][560/625] eta 0:00:18 lr 0.001566 wd 0.0500 time 0.2697 (0.2896) data time 0.0007 (0.0034) model time 0.2690 (0.2862) loss 6.3085 (5.9947) grad_norm 3.0756 (1.7742) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][570/625] eta 0:00:15 lr 0.001565 wd 0.0500 time 0.2701 (0.2892) data time 0.0009 (0.0033) model time 0.2692 (0.2859) loss 5.6213 (5.9945) grad_norm 1.9598 (1.7777) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][580/625] eta 0:00:12 lr 0.001565 wd 0.0500 time 0.2755 (0.2888) data time 0.0007 (0.0033) model time 0.2748 (0.2856) loss 5.6069 (5.9859) grad_norm 1.2093 (1.7703) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][590/625] eta 0:00:10 lr 0.001565 wd 0.0500 time 0.2720 (0.2884) data time 0.0008 (0.0032) model time 0.2712 (0.2852) loss 4.9625 (5.9798) grad_norm 1.2807 (1.7703) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][600/625] eta 0:00:07 lr 0.001565 wd 0.0500 time 0.2701 (0.2880) data time 0.0007 (0.0032) model time 0.2694 (0.2848) loss 5.7718 (5.9696) grad_norm 1.9544 (1.7748) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][610/625] eta 0:00:04 lr 0.001565 wd 0.0500 time 0.2710 (0.2876) data time 0.0005 (0.0031) model time 0.2705 (0.2845) loss 6.5433 (5.9727) grad_norm 3.1174 (1.7880) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][620/625] eta 0:00:01 lr 0.001565 wd 0.0500 time 0.2693 (0.2873) data time 0.0005 (0.0031) model time 0.2688 (0.2842) loss 6.1066 (5.9740) grad_norm 1.2595 (1.7822) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:12:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 106 training takes 0:02:23 [2024-07-30 15:12:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:12:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:12:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.444 (0.444) Loss 0.6870 (0.6870) Acc@1 87.451 (87.451) Acc@5 98.096 (98.096) Mem 9656MB [2024-07-30 15:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.093) Loss 1.1123 (0.8328) Acc@1 75.049 (82.884) Acc@5 93.555 (96.613) Mem 9656MB [2024-07-30 15:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.2129 (0.9770) Acc@1 72.217 (79.076) Acc@5 92.725 (94.824) Mem 9656MB [2024-07-30 15:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.827 Acc@5 94.800 [2024-07-30 15:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-30 15:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.83% [2024-07-30 15:12:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 15:12:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 15:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5684 (0.5684) Acc@1 88.232 (88.232) Acc@5 98.340 (98.340) Mem 9656MB [2024-07-30 15:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.094) Loss 0.9463 (0.7177) Acc@1 78.027 (84.357) Acc@5 94.873 (97.124) Mem 9656MB [2024-07-30 15:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0938 (0.8616) Acc@1 72.998 (80.534) Acc@5 92.969 (95.426) Mem 9656MB [2024-07-30 15:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.186 Acc@5 95.405 [2024-07-30 15:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-30 15:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.19% [2024-07-30 15:12:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:12:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:12:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][0/625] eta 0:08:07 lr 0.001565 wd 0.0500 time 0.7805 (0.7805) data time 0.4538 (0.4538) model time 0.0000 (0.0000) loss 5.1845 (5.1845) grad_norm 1.4750 (1.4750) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 15:12:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][10/625] eta 0:03:14 lr 0.001564 wd 0.0500 time 0.2676 (0.3155) data time 0.0009 (0.0421) model time 0.0000 (0.0000) loss 6.0404 (6.1301) grad_norm 1.2419 (1.5605) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][20/625] eta 0:02:58 lr 0.001564 wd 0.0500 time 0.2747 (0.2944) data time 0.0007 (0.0226) model time 0.0000 (0.0000) loss 5.6721 (6.0289) grad_norm 1.5315 (1.6394) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][30/625] eta 0:02:50 lr 0.001564 wd 0.0500 time 0.2617 (0.2865) data time 0.0010 (0.0156) model time 0.0000 (0.0000) loss 5.8302 (5.8811) grad_norm 1.1036 (1.6778) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][40/625] eta 0:02:45 lr 0.001564 wd 0.0500 time 0.2689 (0.2830) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 5.9063 (5.9276) grad_norm 1.2212 (1.7058) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][50/625] eta 0:02:41 lr 0.001564 wd 0.0500 time 0.2734 (0.2808) data time 0.0016 (0.0099) model time 0.0000 (0.0000) loss 4.8475 (5.9529) grad_norm 2.7918 (1.7851) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][60/625] eta 0:02:38 lr 0.001564 wd 0.0500 time 0.2647 (0.2797) data time 0.0010 (0.0084) model time 0.2637 (0.2735) loss 6.3087 (6.0204) grad_norm 1.3706 (1.7508) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][70/625] eta 0:02:34 lr 0.001564 wd 0.0500 time 0.2678 (0.2783) data time 0.0011 (0.0075) model time 0.2667 (0.2704) loss 6.4271 (6.0401) grad_norm 1.8819 (1.7733) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][80/625] eta 0:02:31 lr 0.001563 wd 0.0500 time 0.2641 (0.2771) data time 0.0010 (0.0067) model time 0.2631 (0.2696) loss 6.0891 (6.0405) grad_norm 1.0417 (1.7450) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][90/625] eta 0:02:27 lr 0.001563 wd 0.0500 time 0.2678 (0.2765) data time 0.0007 (0.0061) model time 0.2670 (0.2699) loss 6.6499 (6.0637) grad_norm 1.6786 (1.7138) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][100/625] eta 0:02:24 lr 0.001563 wd 0.0500 time 0.2680 (0.2759) data time 0.0010 (0.0056) model time 0.2670 (0.2696) loss 7.1056 (6.0622) grad_norm 1.5953 (1.7188) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][110/625] eta 0:02:23 lr 0.001563 wd 0.0500 time 0.2673 (0.2778) data time 0.0007 (0.0052) model time 0.2665 (0.2740) loss 5.6783 (6.0337) grad_norm 1.2405 (1.6983) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][120/625] eta 0:02:19 lr 0.001563 wd 0.0500 time 0.2694 (0.2770) data time 0.0009 (0.0048) model time 0.2686 (0.2732) loss 5.8078 (6.0349) grad_norm 1.1646 (1.7025) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][130/625] eta 0:02:16 lr 0.001563 wd 0.0500 time 0.2713 (0.2765) data time 0.0007 (0.0045) model time 0.2706 (0.2726) loss 6.9694 (6.0414) grad_norm 2.0520 (1.6930) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][140/625] eta 0:02:13 lr 0.001563 wd 0.0500 time 0.2699 (0.2760) data time 0.0008 (0.0043) model time 0.2692 (0.2722) loss 4.4186 (6.0322) grad_norm 1.5584 (1.6968) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][150/625] eta 0:02:10 lr 0.001562 wd 0.0500 time 0.2618 (0.2755) data time 0.0010 (0.0041) model time 0.2608 (0.2718) loss 6.2701 (6.0087) grad_norm 1.0664 (1.6895) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][160/625] eta 0:02:07 lr 0.001562 wd 0.0500 time 0.2712 (0.2751) data time 0.0007 (0.0039) model time 0.2704 (0.2715) loss 5.1064 (5.9808) grad_norm 1.4631 (1.6904) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][170/625] eta 0:02:05 lr 0.001562 wd 0.0500 time 0.2688 (0.2748) data time 0.0007 (0.0037) model time 0.2680 (0.2712) loss 6.9542 (5.9947) grad_norm 2.3716 (1.6993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][180/625] eta 0:02:02 lr 0.001562 wd 0.0500 time 0.2727 (0.2745) data time 0.0009 (0.0035) model time 0.2719 (0.2709) loss 5.6931 (6.0020) grad_norm 2.6815 (1.7417) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][190/625] eta 0:01:59 lr 0.001562 wd 0.0500 time 0.2668 (0.2743) data time 0.0007 (0.0034) model time 0.2661 (0.2708) loss 4.7925 (5.9909) grad_norm 2.8888 (1.7608) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][200/625] eta 0:01:56 lr 0.001562 wd 0.0500 time 0.2716 (0.2740) data time 0.0007 (0.0033) model time 0.2709 (0.2707) loss 5.3515 (5.9690) grad_norm 1.8999 (1.7649) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][210/625] eta 0:01:53 lr 0.001562 wd 0.0500 time 0.2647 (0.2738) data time 0.0010 (0.0032) model time 0.2637 (0.2705) loss 6.4062 (5.9778) grad_norm 1.0428 (1.7611) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][220/625] eta 0:01:50 lr 0.001561 wd 0.0500 time 0.2682 (0.2736) data time 0.0010 (0.0031) model time 0.2672 (0.2704) loss 5.3743 (5.9620) grad_norm 1.9994 (1.7482) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][230/625] eta 0:01:47 lr 0.001561 wd 0.0500 time 0.2709 (0.2734) data time 0.0009 (0.0030) model time 0.2700 (0.2703) loss 6.6211 (5.9696) grad_norm 1.2975 (1.7387) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][240/625] eta 0:01:45 lr 0.001561 wd 0.0500 time 0.2665 (0.2732) data time 0.0010 (0.0029) model time 0.2655 (0.2702) loss 7.1908 (5.9967) grad_norm 1.1943 (1.7307) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:13:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:13:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:13:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:15:35 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:15:46 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:15:46 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:15:46 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:15:46 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 107) [2024-07-30 15:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][250/625] eta 0:10:09 lr 0.001561 wd 0.0500 time 0.2655 (1.6248) data time 0.0008 (0.2313) model time 0.2648 (1.3935) loss 6.4411 (6.6257) grad_norm 1.7635 (2.4737) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][260/625] eta 0:04:42 lr 0.001561 wd 0.0500 time 0.2625 (0.7742) data time 0.0009 (0.0874) model time 0.2615 (0.6868) loss 6.7382 (6.2582) grad_norm 1.9205 (2.2087) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][270/625] eta 0:03:24 lr 0.001561 wd 0.0500 time 0.2575 (0.5773) data time 0.0008 (0.0542) model time 0.2567 (0.5231) loss 6.0006 (6.3230) grad_norm 1.7212 (2.0410) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][280/625] eta 0:02:48 lr 0.001560 wd 0.0500 time 0.2585 (0.4896) data time 0.0011 (0.0395) model time 0.2573 (0.4501) loss 6.2066 (6.3502) grad_norm 1.5745 (1.9227) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][290/625] eta 0:02:27 lr 0.001560 wd 0.0500 time 0.2584 (0.4394) data time 0.0009 (0.0311) model time 0.2575 (0.4083) loss 5.3597 (6.2603) grad_norm 1.2424 (1.8428) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][300/625] eta 0:02:12 lr 0.001560 wd 0.0500 time 0.2574 (0.4075) data time 0.0008 (0.0258) model time 0.2566 (0.3818) loss 6.7321 (6.2009) grad_norm 1.9718 (1.8476) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][310/625] eta 0:02:01 lr 0.001560 wd 0.0500 time 0.2559 (0.3850) data time 0.0008 (0.0220) model time 0.2550 (0.3630) loss 4.7342 (6.1529) grad_norm 1.6014 (1.8226) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][320/625] eta 0:01:52 lr 0.001560 wd 0.0500 time 0.2567 (0.3688) data time 0.0011 (0.0193) model time 0.2556 (0.3496) loss 6.4801 (6.1003) grad_norm 1.7074 (1.8158) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][330/625] eta 0:01:45 lr 0.001560 wd 0.0500 time 0.2703 (0.3567) data time 0.0008 (0.0173) model time 0.2695 (0.3394) loss 5.0585 (6.0761) grad_norm 1.8644 (1.8733) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][340/625] eta 0:01:38 lr 0.001560 wd 0.0500 time 0.2544 (0.3465) data time 0.0010 (0.0156) model time 0.2533 (0.3309) loss 6.1857 (6.0948) grad_norm 1.2330 (1.8765) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][350/625] eta 0:01:33 lr 0.001559 wd 0.0500 time 0.2612 (0.3384) data time 0.0010 (0.0142) model time 0.2602 (0.3242) loss 6.7725 (6.1314) grad_norm 1.1223 (1.8544) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][360/625] eta 0:01:27 lr 0.001559 wd 0.0500 time 0.2584 (0.3318) data time 0.0009 (0.0131) model time 0.2576 (0.3187) loss 6.6867 (6.1193) grad_norm 3.0641 (1.8530) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][370/625] eta 0:01:23 lr 0.001559 wd 0.0500 time 0.2603 (0.3261) data time 0.0008 (0.0121) model time 0.2594 (0.3140) loss 4.4495 (6.1021) grad_norm 1.6404 (1.8714) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][380/625] eta 0:01:18 lr 0.001559 wd 0.0500 time 0.2619 (0.3214) data time 0.0009 (0.0113) model time 0.2609 (0.3101) loss 5.5147 (6.1111) grad_norm 1.8070 (1.8562) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][390/625] eta 0:01:14 lr 0.001559 wd 0.0500 time 0.2605 (0.3172) data time 0.0008 (0.0106) model time 0.2598 (0.3066) loss 4.1689 (6.0963) grad_norm 1.1785 (1.8498) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][400/625] eta 0:01:10 lr 0.001559 wd 0.0500 time 0.2593 (0.3136) data time 0.0010 (0.0100) model time 0.2583 (0.3036) loss 6.4666 (6.0964) grad_norm 2.1075 (1.8495) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][410/625] eta 0:01:06 lr 0.001559 wd 0.0500 time 0.2562 (0.3105) data time 0.0011 (0.0095) model time 0.2551 (0.3010) loss 7.0250 (6.0988) grad_norm 1.7754 (1.8620) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][420/625] eta 0:01:03 lr 0.001558 wd 0.0500 time 0.2564 (0.3077) data time 0.0009 (0.0090) model time 0.2555 (0.2986) loss 5.4343 (6.0899) grad_norm 1.4067 (1.8412) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][430/625] eta 0:00:59 lr 0.001558 wd 0.0500 time 0.2600 (0.3052) data time 0.0011 (0.0086) model time 0.2590 (0.2966) loss 5.2523 (6.0868) grad_norm 1.3545 (1.8401) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][440/625] eta 0:00:56 lr 0.001558 wd 0.0500 time 0.2623 (0.3030) data time 0.0008 (0.0082) model time 0.2615 (0.2948) loss 5.2557 (6.0790) grad_norm 2.2921 (1.8318) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][450/625] eta 0:00:52 lr 0.001558 wd 0.0500 time 0.2593 (0.3010) data time 0.0009 (0.0078) model time 0.2584 (0.2932) loss 4.3353 (6.0656) grad_norm 1.4556 (1.8224) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][460/625] eta 0:00:49 lr 0.001558 wd 0.0500 time 0.2589 (0.2991) data time 0.0007 (0.0075) model time 0.2582 (0.2916) loss 4.2933 (6.0532) grad_norm 1.8878 (1.8136) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:16:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][470/625] eta 0:00:46 lr 0.001558 wd 0.0500 time 0.2555 (0.2974) data time 0.0009 (0.0073) model time 0.2547 (0.2901) loss 5.6833 (6.0574) grad_norm 2.0510 (1.8255) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][480/625] eta 0:00:42 lr 0.001558 wd 0.0500 time 0.2620 (0.2958) data time 0.0008 (0.0070) model time 0.2612 (0.2888) loss 6.7716 (6.0552) grad_norm 1.8147 (1.8379) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][490/625] eta 0:00:39 lr 0.001557 wd 0.0500 time 0.2575 (0.2945) data time 0.0010 (0.0068) model time 0.2565 (0.2877) loss 4.7819 (6.0612) grad_norm 3.2160 (1.8387) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][500/625] eta 0:00:36 lr 0.001557 wd 0.0500 time 0.2741 (0.2933) data time 0.0011 (0.0065) model time 0.2730 (0.2867) loss 5.3830 (6.0480) grad_norm 1.2175 (1.8269) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][510/625] eta 0:00:33 lr 0.001557 wd 0.0500 time 0.2665 (0.2921) data time 0.0009 (0.0063) model time 0.2656 (0.2857) loss 5.6802 (6.0418) grad_norm 0.9208 (1.8145) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][520/625] eta 0:00:30 lr 0.001557 wd 0.0500 time 0.2597 (0.2909) data time 0.0011 (0.0062) model time 0.2587 (0.2848) loss 6.7378 (6.0520) grad_norm 1.4358 (1.8096) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][530/625] eta 0:00:27 lr 0.001557 wd 0.0500 time 0.2602 (0.2900) data time 0.0010 (0.0060) model time 0.2592 (0.2840) loss 5.9930 (6.0513) grad_norm 1.8648 (1.8230) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][540/625] eta 0:00:24 lr 0.001557 wd 0.0500 time 0.2578 (0.2890) data time 0.0008 (0.0058) model time 0.2570 (0.2832) loss 4.9559 (6.0472) grad_norm 1.8423 (1.8225) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][550/625] eta 0:00:21 lr 0.001556 wd 0.0500 time 0.2613 (0.2881) data time 0.0009 (0.0057) model time 0.2604 (0.2824) loss 4.8637 (6.0323) grad_norm 1.1387 (1.8209) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][560/625] eta 0:00:18 lr 0.001556 wd 0.0500 time 0.2580 (0.2873) data time 0.0010 (0.0055) model time 0.2570 (0.2817) loss 6.1578 (6.0500) grad_norm 2.8423 (1.8205) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][570/625] eta 0:00:15 lr 0.001556 wd 0.0500 time 0.2600 (0.2865) data time 0.0011 (0.0054) model time 0.2588 (0.2811) loss 6.9227 (6.0668) grad_norm 1.5382 (1.8146) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][580/625] eta 0:00:12 lr 0.001556 wd 0.0500 time 0.2622 (0.2858) data time 0.0010 (0.0053) model time 0.2612 (0.2805) loss 6.2487 (6.0605) grad_norm 3.9687 (1.8156) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][590/625] eta 0:00:09 lr 0.001556 wd 0.0500 time 0.2602 (0.2851) data time 0.0012 (0.0051) model time 0.2590 (0.2800) loss 6.5625 (6.0639) grad_norm 2.1460 (1.8129) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][600/625] eta 0:00:07 lr 0.001556 wd 0.0500 time 0.2588 (0.2845) data time 0.0009 (0.0050) model time 0.2579 (0.2795) loss 4.6757 (6.0587) grad_norm 1.2295 (1.8036) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][610/625] eta 0:00:04 lr 0.001556 wd 0.0500 time 0.2615 (0.2838) data time 0.0007 (0.0049) model time 0.2608 (0.2789) loss 6.1342 (6.0513) grad_norm 1.9676 (1.7939) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][620/625] eta 0:00:01 lr 0.001555 wd 0.0500 time 0.2601 (0.2832) data time 0.0005 (0.0048) model time 0.2597 (0.2784) loss 4.4071 (6.0494) grad_norm 3.7310 (1.7961) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:17:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 107 training takes 0:01:47 [2024-07-30 15:17:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:17:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.6421 (0.6421) Acc@1 87.109 (87.109) Acc@5 98.486 (98.486) Mem 9656MB [2024-07-30 15:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0967 (0.8091) Acc@1 75.000 (82.804) Acc@5 93.262 (96.760) Mem 9656MB [2024-07-30 15:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2002 (0.9622) Acc@1 72.510 (79.116) Acc@5 91.797 (94.899) Mem 9656MB [2024-07-30 15:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.807 Acc@5 94.836 [2024-07-30 15:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-30 15:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.937 (0.937) Loss 0.5684 (0.5684) Acc@1 88.281 (88.281) Acc@5 98.389 (98.389) Mem 9656MB [2024-07-30 15:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.146) Loss 0.9434 (0.7178) Acc@1 77.881 (84.362) Acc@5 94.873 (97.137) Mem 9656MB [2024-07-30 15:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.103) Loss 1.0947 (0.8616) Acc@1 72.949 (80.566) Acc@5 93.213 (95.459) Mem 9656MB [2024-07-30 15:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.222 Acc@5 95.437 [2024-07-30 15:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-30 15:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.22% [2024-07-30 15:17:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:17:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:17:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][0/625] eta 0:08:13 lr 0.001555 wd 0.0500 time 0.7888 (0.7888) data time 0.4387 (0.4387) model time 0.0000 (0.0000) loss 3.9635 (3.9635) grad_norm 1.9765 (1.9765) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 15:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][10/625] eta 0:03:10 lr 0.001555 wd 0.0500 time 0.2556 (0.3096) data time 0.0009 (0.0408) model time 0.0000 (0.0000) loss 7.4352 (5.6464) grad_norm 1.1817 (2.2833) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][20/625] eta 0:02:53 lr 0.001555 wd 0.0500 time 0.2626 (0.2865) data time 0.0010 (0.0219) model time 0.0000 (0.0000) loss 6.7116 (5.9524) grad_norm 1.3489 (2.1637) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][30/625] eta 0:02:45 lr 0.001555 wd 0.0500 time 0.2565 (0.2788) data time 0.0013 (0.0151) model time 0.0000 (0.0000) loss 6.0984 (6.0254) grad_norm 1.0685 (2.0520) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][40/625] eta 0:02:40 lr 0.001555 wd 0.0500 time 0.2573 (0.2752) data time 0.0008 (0.0117) model time 0.0000 (0.0000) loss 6.2276 (6.0237) grad_norm 1.5520 (1.9246) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][50/625] eta 0:02:36 lr 0.001555 wd 0.0500 time 0.2592 (0.2729) data time 0.0007 (0.0096) model time 0.0000 (0.0000) loss 6.0427 (6.1032) grad_norm 1.2347 (1.8803) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][60/625] eta 0:02:33 lr 0.001554 wd 0.0500 time 0.2533 (0.2711) data time 0.0009 (0.0082) model time 0.2524 (0.2609) loss 7.2863 (6.1274) grad_norm 1.1217 (1.8606) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][70/625] eta 0:02:29 lr 0.001554 wd 0.0500 time 0.2588 (0.2697) data time 0.0009 (0.0072) model time 0.2579 (0.2605) loss 5.5776 (6.0625) grad_norm 1.3027 (1.8755) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][80/625] eta 0:02:26 lr 0.001554 wd 0.0500 time 0.2634 (0.2692) data time 0.0009 (0.0065) model time 0.2625 (0.2618) loss 7.0426 (6.0171) grad_norm 1.3414 (1.9048) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][90/625] eta 0:02:23 lr 0.001554 wd 0.0500 time 0.2605 (0.2685) data time 0.0010 (0.0059) model time 0.2596 (0.2618) loss 5.5764 (5.9513) grad_norm 1.5186 (1.9008) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][100/625] eta 0:02:20 lr 0.001554 wd 0.0500 time 0.2644 (0.2679) data time 0.0008 (0.0054) model time 0.2637 (0.2617) loss 6.1105 (5.9290) grad_norm 2.6504 (1.8845) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][110/625] eta 0:02:17 lr 0.001554 wd 0.0500 time 0.2607 (0.2674) data time 0.0009 (0.0050) model time 0.2598 (0.2616) loss 6.5069 (5.9640) grad_norm 1.5617 (1.8695) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][120/625] eta 0:02:14 lr 0.001554 wd 0.0500 time 0.2618 (0.2670) data time 0.0013 (0.0047) model time 0.2605 (0.2616) loss 5.9447 (5.9678) grad_norm 1.1220 (1.8432) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][130/625] eta 0:02:11 lr 0.001553 wd 0.0500 time 0.2580 (0.2665) data time 0.0008 (0.0044) model time 0.2572 (0.2614) loss 6.8663 (5.9851) grad_norm 1.0299 (1.8121) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][140/625] eta 0:02:09 lr 0.001553 wd 0.0500 time 0.2584 (0.2662) data time 0.0010 (0.0042) model time 0.2574 (0.2613) loss 5.0865 (5.9739) grad_norm 1.1720 (1.7993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][150/625] eta 0:02:06 lr 0.001553 wd 0.0500 time 0.2595 (0.2660) data time 0.0011 (0.0040) model time 0.2584 (0.2614) loss 6.1668 (5.9542) grad_norm 1.0685 (1.7817) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][160/625] eta 0:02:03 lr 0.001553 wd 0.0500 time 0.2616 (0.2657) data time 0.0009 (0.0038) model time 0.2607 (0.2613) loss 6.3784 (5.9541) grad_norm 1.3229 (1.7666) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][170/625] eta 0:02:00 lr 0.001553 wd 0.0500 time 0.2617 (0.2655) data time 0.0007 (0.0036) model time 0.2610 (0.2613) loss 6.6299 (5.9488) grad_norm 2.8030 (1.7709) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][180/625] eta 0:01:58 lr 0.001553 wd 0.0500 time 0.2658 (0.2670) data time 0.0012 (0.0035) model time 0.2646 (0.2637) loss 6.7496 (5.9755) grad_norm 3.2597 (1.7930) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][190/625] eta 0:01:56 lr 0.001553 wd 0.0500 time 0.2635 (0.2668) data time 0.0009 (0.0034) model time 0.2626 (0.2635) loss 6.0802 (5.9900) grad_norm 1.0628 (1.7931) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][200/625] eta 0:01:53 lr 0.001552 wd 0.0500 time 0.2611 (0.2667) data time 0.0009 (0.0033) model time 0.2602 (0.2635) loss 6.3804 (5.9942) grad_norm 1.6918 (1.8208) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][210/625] eta 0:01:50 lr 0.001552 wd 0.0500 time 0.2605 (0.2665) data time 0.0010 (0.0032) model time 0.2595 (0.2633) loss 6.3363 (6.0109) grad_norm 1.6120 (1.8177) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][220/625] eta 0:01:47 lr 0.001552 wd 0.0500 time 0.2630 (0.2664) data time 0.0010 (0.0031) model time 0.2620 (0.2633) loss 4.6638 (6.0021) grad_norm 1.4057 (1.8132) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:18:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:18:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:21:03 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 108) [2024-07-30 15:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][230/625] eta 0:10:43 lr 0.001552 wd 0.0500 time 0.3115 (1.6289) data time 0.0011 (0.1814) model time 0.3104 (1.4475) loss 6.8079 (6.6452) grad_norm 1.6173 (1.7447) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][240/625] eta 0:05:16 lr 0.001552 wd 0.0500 time 0.2438 (0.8212) data time 0.0009 (0.0753) model time 0.2429 (0.7459) loss 5.6937 (6.3262) grad_norm 1.2526 (1.7521) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][250/625] eta 0:03:48 lr 0.001552 wd 0.0500 time 0.2758 (0.6088) data time 0.0010 (0.0478) model time 0.2748 (0.5610) loss 6.9526 (6.3538) grad_norm 1.2094 (1.7300) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][260/625] eta 0:03:06 lr 0.001552 wd 0.0500 time 0.2424 (0.5104) data time 0.0012 (0.0352) model time 0.2412 (0.4752) loss 5.4145 (6.2992) grad_norm 2.2221 (1.8578) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][270/625] eta 0:02:42 lr 0.001551 wd 0.0500 time 0.2795 (0.4583) data time 0.0008 (0.0279) model time 0.2787 (0.4304) loss 6.5979 (6.2745) grad_norm 1.6599 (1.8666) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][280/625] eta 0:02:25 lr 0.001551 wd 0.0500 time 0.2562 (0.4211) data time 0.0010 (0.0232) model time 0.2552 (0.3979) loss 6.2473 (6.2421) grad_norm 2.2788 (1.9219) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][290/625] eta 0:02:12 lr 0.001551 wd 0.0500 time 0.2463 (0.3963) data time 0.0009 (0.0199) model time 0.2454 (0.3764) loss 7.1754 (6.2170) grad_norm 1.1805 (1.9219) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][300/625] eta 0:02:02 lr 0.001551 wd 0.0500 time 0.2438 (0.3770) data time 0.0012 (0.0175) model time 0.2426 (0.3595) loss 5.8568 (6.1512) grad_norm 1.3938 (1.8860) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:21:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][310/625] eta 0:01:54 lr 0.001551 wd 0.0500 time 0.2571 (0.3630) data time 0.0011 (0.0156) model time 0.2560 (0.3474) loss 5.7116 (6.0945) grad_norm 2.2838 (1.8636) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][320/625] eta 0:01:47 lr 0.001551 wd 0.0500 time 0.2470 (0.3519) data time 0.0010 (0.0141) model time 0.2460 (0.3378) loss 6.3130 (6.1193) grad_norm 1.7443 (1.8629) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][330/625] eta 0:01:40 lr 0.001550 wd 0.0500 time 0.2461 (0.3420) data time 0.0006 (0.0129) model time 0.2455 (0.3291) loss 5.9438 (6.1371) grad_norm 2.7619 (1.8846) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][340/625] eta 0:01:35 lr 0.001550 wd 0.0500 time 0.2418 (0.3343) data time 0.0015 (0.0119) model time 0.2403 (0.3225) loss 6.3033 (6.1164) grad_norm 1.1562 (1.8754) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][350/625] eta 0:01:30 lr 0.001550 wd 0.0500 time 0.3061 (0.3283) data time 0.0008 (0.0110) model time 0.3052 (0.3173) loss 6.5541 (6.1023) grad_norm 1.3016 (1.8398) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][360/625] eta 0:01:25 lr 0.001550 wd 0.0500 time 0.2511 (0.3223) data time 0.0009 (0.0103) model time 0.2502 (0.3120) loss 5.0980 (6.0879) grad_norm 0.9944 (1.8156) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][370/625] eta 0:01:21 lr 0.001550 wd 0.0500 time 0.2454 (0.3180) data time 0.0009 (0.0096) model time 0.2445 (0.3084) loss 5.9525 (6.0615) grad_norm 1.2351 (1.7985) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][380/625] eta 0:01:16 lr 0.001550 wd 0.0500 time 0.2424 (0.3134) data time 0.0009 (0.0091) model time 0.2416 (0.3043) loss 5.6400 (6.0602) grad_norm 1.9687 (1.8103) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][390/625] eta 0:01:12 lr 0.001550 wd 0.0500 time 0.2598 (0.3099) data time 0.0009 (0.0086) model time 0.2590 (0.3013) loss 5.6897 (6.0752) grad_norm 1.7381 (1.8103) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][400/625] eta 0:01:09 lr 0.001549 wd 0.0500 time 0.2464 (0.3068) data time 0.0011 (0.0082) model time 0.2453 (0.2986) loss 5.9363 (6.0617) grad_norm 1.4303 (1.8307) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][410/625] eta 0:01:05 lr 0.001549 wd 0.0500 time 0.2568 (0.3035) data time 0.0008 (0.0078) model time 0.2560 (0.2957) loss 6.0177 (6.0489) grad_norm 2.8832 (1.8684) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][420/625] eta 0:01:01 lr 0.001549 wd 0.0500 time 0.2414 (0.3013) data time 0.0008 (0.0075) model time 0.2406 (0.2939) loss 6.8030 (6.0482) grad_norm 1.0954 (1.8672) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][430/625] eta 0:00:58 lr 0.001549 wd 0.0500 time 0.2451 (0.2991) data time 0.0007 (0.0072) model time 0.2444 (0.2919) loss 5.4411 (6.0266) grad_norm 1.9988 (1.8618) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][440/625] eta 0:00:54 lr 0.001549 wd 0.0500 time 0.3203 (0.2972) data time 0.0009 (0.0069) model time 0.3195 (0.2903) loss 6.9555 (6.0147) grad_norm 1.0730 (1.8407) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][450/625] eta 0:00:51 lr 0.001549 wd 0.0500 time 0.2444 (0.2949) data time 0.0008 (0.0066) model time 0.2435 (0.2882) loss 6.0603 (6.0212) grad_norm 0.9371 (1.8236) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][460/625] eta 0:00:48 lr 0.001549 wd 0.0500 time 0.2487 (0.2928) data time 0.0009 (0.0064) model time 0.2478 (0.2864) loss 5.8101 (6.0168) grad_norm 2.4634 (1.8138) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][470/625] eta 0:00:45 lr 0.001548 wd 0.0500 time 0.2556 (0.2916) data time 0.0009 (0.0062) model time 0.2548 (0.2854) loss 6.7536 (6.0200) grad_norm 1.0929 (1.7986) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][480/625] eta 0:00:42 lr 0.001548 wd 0.0500 time 0.2426 (0.2901) data time 0.0012 (0.0060) model time 0.2413 (0.2841) loss 4.6604 (6.0004) grad_norm 1.2399 (1.7990) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][490/625] eta 0:00:39 lr 0.001548 wd 0.0500 time 0.3079 (0.2889) data time 0.0010 (0.0058) model time 0.3069 (0.2831) loss 5.0809 (5.9889) grad_norm 2.2113 (1.8197) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][500/625] eta 0:00:35 lr 0.001548 wd 0.0500 time 0.2750 (0.2874) data time 0.0011 (0.0056) model time 0.2740 (0.2818) loss 5.5438 (5.9952) grad_norm 1.8530 (1.8119) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:22:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:22:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:25:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 108) [2024-07-30 15:25:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][510/625] eta 0:03:10 lr 0.001548 wd 0.0500 time 0.2462 (1.6549) data time 0.0007 (0.1107) model time 0.2455 (1.5442) loss 6.9703 (6.6670) grad_norm 2.1428 (2.2692) loss_scale 8192.0000 (5461.3333) mem 9656MB [2024-07-30 15:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][520/625] eta 0:01:22 lr 0.001548 wd 0.0500 time 0.3236 (0.7879) data time 0.0016 (0.0423) model time 0.3220 (0.7456) loss 6.6748 (6.3899) grad_norm 1.2147 (2.0032) loss_scale 8192.0000 (7168.0000) mem 9656MB [2024-07-30 15:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][530/625] eta 0:00:55 lr 0.001548 wd 0.0500 time 0.2411 (0.5797) data time 0.0008 (0.0266) model time 0.2403 (0.5531) loss 5.5380 (6.2654) grad_norm 1.3529 (1.9410) loss_scale 8192.0000 (7561.8462) mem 9656MB [2024-07-30 15:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][540/625] eta 0:00:41 lr 0.001547 wd 0.0500 time 0.2441 (0.4905) data time 0.0009 (0.0195) model time 0.2432 (0.4710) loss 5.9839 (6.2493) grad_norm 1.2771 (1.8083) loss_scale 8192.0000 (7736.8889) mem 9656MB [2024-07-30 15:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][550/625] eta 0:00:32 lr 0.001547 wd 0.0500 time 0.2438 (0.4370) data time 0.0009 (0.0155) model time 0.2429 (0.4215) loss 5.1826 (6.1524) grad_norm 1.9328 (1.7083) loss_scale 8192.0000 (7835.8261) mem 9656MB [2024-07-30 15:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][560/625] eta 0:00:26 lr 0.001547 wd 0.0500 time 0.2478 (0.4042) data time 0.0008 (0.0129) model time 0.2470 (0.3913) loss 6.6764 (6.1163) grad_norm 1.6359 (1.7199) loss_scale 8192.0000 (7899.4286) mem 9656MB [2024-07-30 15:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][570/625] eta 0:00:20 lr 0.001547 wd 0.0500 time 0.2514 (0.3817) data time 0.0011 (0.0112) model time 0.2503 (0.3705) loss 5.8406 (6.1042) grad_norm 1.7560 (1.7696) loss_scale 8192.0000 (7943.7576) mem 9656MB [2024-07-30 15:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][580/625] eta 0:00:16 lr 0.001547 wd 0.0500 time 0.2413 (0.3641) data time 0.0011 (0.0098) model time 0.2402 (0.3543) loss 6.0986 (6.0678) grad_norm 2.0768 (1.7353) loss_scale 8192.0000 (7976.4211) mem 9656MB [2024-07-30 15:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][590/625] eta 0:00:12 lr 0.001547 wd 0.0500 time 0.2475 (0.3506) data time 0.0008 (0.0088) model time 0.2468 (0.3418) loss 4.3986 (6.0158) grad_norm 1.0930 (1.7762) loss_scale 8192.0000 (8001.4884) mem 9656MB [2024-07-30 15:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][600/625] eta 0:00:08 lr 0.001546 wd 0.0500 time 0.2507 (0.3418) data time 0.0008 (0.0080) model time 0.2500 (0.3338) loss 5.5606 (6.0259) grad_norm 2.6666 (1.7860) loss_scale 8192.0000 (8021.3333) mem 9656MB [2024-07-30 15:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][610/625] eta 0:00:04 lr 0.001546 wd 0.0500 time 0.2562 (0.3329) data time 0.0007 (0.0074) model time 0.2556 (0.3255) loss 6.7885 (6.0737) grad_norm 2.5917 (1.8138) loss_scale 8192.0000 (8037.4340) mem 9656MB [2024-07-30 15:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][620/625] eta 0:00:01 lr 0.001546 wd 0.0500 time 0.2437 (0.3265) data time 0.0005 (0.0068) model time 0.2432 (0.3197) loss 6.5267 (6.0512) grad_norm 2.0271 (1.8245) loss_scale 8192.0000 (8050.7586) mem 9656MB [2024-07-30 15:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 108 training takes 0:00:38 [2024-07-30 15:26:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:26:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.784 (0.784) Loss 0.6953 (0.6953) Acc@1 86.963 (86.963) Acc@5 97.705 (97.705) Mem 9656MB [2024-07-30 15:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.129) Loss 1.1035 (0.8335) Acc@1 75.195 (82.662) Acc@5 93.652 (96.507) Mem 9656MB [2024-07-30 15:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.095) Loss 1.2119 (0.9821) Acc@1 72.412 (78.923) Acc@5 92.480 (94.664) Mem 9656MB [2024-07-30 15:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.659 Acc@5 94.636 [2024-07-30 15:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.7% [2024-07-30 15:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.217 (1.217) Loss 0.5684 (0.5684) Acc@1 88.330 (88.330) Acc@5 98.389 (98.389) Mem 9656MB [2024-07-30 15:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.168) Loss 0.9443 (0.7178) Acc@1 77.930 (84.411) Acc@5 94.873 (97.128) Mem 9656MB [2024-07-30 15:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.113) Loss 1.0928 (0.8611) Acc@1 72.998 (80.601) Acc@5 93.213 (95.471) Mem 9656MB [2024-07-30 15:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.258 Acc@5 95.449 [2024-07-30 15:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-30 15:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.26% [2024-07-30 15:26:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:26:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][0/625] eta 0:10:29 lr 0.001546 wd 0.0500 time 1.0065 (1.0065) data time 0.5959 (0.5959) model time 0.0000 (0.0000) loss 4.7145 (4.7145) grad_norm 1.7365 (1.7365) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-30 15:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][10/625] eta 0:03:20 lr 0.001546 wd 0.0500 time 0.2475 (0.3260) data time 0.0009 (0.0551) model time 0.0000 (0.0000) loss 6.6275 (5.8043) grad_norm 1.7720 (1.8435) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][20/625] eta 0:02:53 lr 0.001546 wd 0.0500 time 0.2443 (0.2872) data time 0.0007 (0.0294) model time 0.0000 (0.0000) loss 6.8577 (5.9391) grad_norm 2.2518 (1.8545) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][30/625] eta 0:02:45 lr 0.001546 wd 0.0500 time 0.2436 (0.2784) data time 0.0009 (0.0203) model time 0.0000 (0.0000) loss 5.3206 (5.9693) grad_norm 1.4113 (1.7905) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][40/625] eta 0:02:38 lr 0.001545 wd 0.0500 time 0.2426 (0.2704) data time 0.0009 (0.0156) model time 0.0000 (0.0000) loss 6.8613 (6.0262) grad_norm 3.1981 (1.8511) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][50/625] eta 0:02:34 lr 0.001545 wd 0.0500 time 0.2518 (0.2682) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 6.0667 (6.0186) grad_norm 1.1901 (1.8237) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][60/625] eta 0:02:29 lr 0.001545 wd 0.0500 time 0.2427 (0.2643) data time 0.0011 (0.0109) model time 0.2415 (0.2434) loss 7.6912 (5.9981) grad_norm 2.3150 (1.8270) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][70/625] eta 0:02:25 lr 0.001545 wd 0.0500 time 0.2438 (0.2617) data time 0.0009 (0.0095) model time 0.2429 (0.2440) loss 5.5183 (5.9816) grad_norm 3.2256 (1.8996) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][80/625] eta 0:02:22 lr 0.001545 wd 0.0500 time 0.2503 (0.2611) data time 0.0009 (0.0085) model time 0.2494 (0.2479) loss 5.2357 (5.9177) grad_norm 1.4896 (1.8762) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][90/625] eta 0:02:19 lr 0.001545 wd 0.0500 time 0.2475 (0.2601) data time 0.0008 (0.0077) model time 0.2467 (0.2486) loss 6.6785 (5.9234) grad_norm 1.5846 (1.8338) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][100/625] eta 0:02:34 lr 0.001545 wd 0.0500 time 3.8791 (0.2946) data time 0.0007 (0.0071) model time 3.8783 (0.3204) loss 6.1464 (5.9024) grad_norm 1.7679 (1.8108) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][110/625] eta 0:02:30 lr 0.001544 wd 0.0500 time 0.2421 (0.2916) data time 0.0008 (0.0065) model time 0.2413 (0.3104) loss 4.8455 (5.9151) grad_norm 1.1141 (1.7910) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][120/625] eta 0:02:25 lr 0.001544 wd 0.0500 time 0.2464 (0.2887) data time 0.0008 (0.0061) model time 0.2456 (0.3025) loss 6.5124 (5.9314) grad_norm 1.8204 (1.7691) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][130/625] eta 0:02:21 lr 0.001544 wd 0.0500 time 0.2430 (0.2854) data time 0.0008 (0.0057) model time 0.2422 (0.2952) loss 6.2512 (5.9236) grad_norm 2.0914 (1.7953) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][140/625] eta 0:02:17 lr 0.001544 wd 0.0500 time 0.2674 (0.2834) data time 0.0011 (0.0054) model time 0.2662 (0.2908) loss 6.0001 (5.9188) grad_norm 1.4065 (1.7804) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][150/625] eta 0:02:13 lr 0.001544 wd 0.0500 time 0.2524 (0.2812) data time 0.0006 (0.0051) model time 0.2517 (0.2867) loss 5.7359 (5.9062) grad_norm 2.2760 (1.7730) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:27:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][160/625] eta 0:02:10 lr 0.001544 wd 0.0500 time 0.3099 (0.2797) data time 0.0020 (0.0049) model time 0.3079 (0.2839) loss 5.6390 (5.9345) grad_norm 1.2239 (1.7660) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][170/625] eta 0:02:13 lr 0.001544 wd 0.0500 time 0.2441 (0.2936) data time 0.0008 (0.0046) model time 0.2432 (0.3032) loss 4.8893 (5.9395) grad_norm 1.7423 (1.7598) loss_scale 8192.0000 (8192.0000) mem 9655MB [2024-07-30 15:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][180/625] eta 0:02:09 lr 0.001543 wd 0.0500 time 0.2947 (0.2920) data time 0.0015 (0.0045) model time 0.2932 (0.3002) loss 5.6406 (5.9248) grad_norm inf (inf) loss_scale 4096.0000 (8169.3702) mem 9655MB [2024-07-30 15:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][190/625] eta 0:02:06 lr 0.001543 wd 0.0500 time 0.2465 (0.2897) data time 0.0009 (0.0043) model time 0.2456 (0.2964) loss 6.6257 (5.9314) grad_norm 2.3288 (inf) loss_scale 4096.0000 (7956.1047) mem 9655MB [2024-07-30 15:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][200/625] eta 0:02:02 lr 0.001543 wd 0.0500 time 0.2726 (0.2879) data time 0.0012 (0.0041) model time 0.2714 (0.2935) loss 6.9737 (5.9602) grad_norm 2.7052 (inf) loss_scale 4096.0000 (7764.0597) mem 9655MB [2024-07-30 15:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][210/625] eta 0:01:58 lr 0.001543 wd 0.0500 time 0.2452 (0.2859) data time 0.0007 (0.0040) model time 0.2445 (0.2904) loss 4.3245 (5.9488) grad_norm 1.6908 (inf) loss_scale 4096.0000 (7590.2180) mem 9655MB [2024-07-30 15:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][220/625] eta 0:01:55 lr 0.001543 wd 0.0500 time 0.2502 (0.2846) data time 0.0010 (0.0039) model time 0.2492 (0.2884) loss 6.6473 (5.9582) grad_norm 1.5388 (inf) loss_scale 4096.0000 (7432.1086) mem 9655MB [2024-07-30 15:27:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][230/625] eta 0:01:51 lr 0.001543 wd 0.0500 time 0.2440 (0.2834) data time 0.0010 (0.0037) model time 0.2431 (0.2865) loss 6.6492 (5.9695) grad_norm 1.3465 (inf) loss_scale 4096.0000 (7287.6883) mem 9655MB [2024-07-30 15:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][240/625] eta 0:01:48 lr 0.001542 wd 0.0500 time 0.2512 (0.2818) data time 0.0007 (0.0036) model time 0.2506 (0.2844) loss 5.3942 (5.9756) grad_norm 1.6357 (inf) loss_scale 4096.0000 (7155.2531) mem 9655MB [2024-07-30 15:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][250/625] eta 0:01:45 lr 0.001542 wd 0.0500 time 0.2426 (0.2804) data time 0.0012 (0.0035) model time 0.2414 (0.2823) loss 4.5678 (5.9787) grad_norm 1.1066 (inf) loss_scale 4096.0000 (7033.3705) mem 9655MB [2024-07-30 15:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][260/625] eta 0:01:42 lr 0.001542 wd 0.0500 time 0.2473 (0.2801) data time 0.0013 (0.0034) model time 0.2460 (0.2819) loss 4.6860 (5.9861) grad_norm 1.3338 (inf) loss_scale 4096.0000 (6920.8276) mem 9655MB [2024-07-30 15:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][270/625] eta 0:01:38 lr 0.001542 wd 0.0500 time 0.2449 (0.2788) data time 0.0007 (0.0033) model time 0.2442 (0.2802) loss 6.9508 (5.9790) grad_norm 1.6720 (inf) loss_scale 4096.0000 (6816.5904) mem 9655MB [2024-07-30 15:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][280/625] eta 0:01:35 lr 0.001542 wd 0.0500 time 0.2424 (0.2779) data time 0.0010 (0.0033) model time 0.2413 (0.2789) loss 6.2847 (5.9920) grad_norm 4.9704 (inf) loss_scale 4096.0000 (6719.7722) mem 9655MB [2024-07-30 15:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:27:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:27:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:31:16 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 109) [2024-07-30 15:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][290/625] eta 0:06:19 lr 0.001542 wd 0.0500 time 0.2502 (1.1330) data time 0.0012 (0.1483) model time 0.2490 (0.9847) loss 7.1304 (6.8779) grad_norm 1.3931 (1.8456) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:31:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][300/625] eta 0:03:45 lr 0.001542 wd 0.0500 time 0.2524 (0.6925) data time 0.0006 (0.0746) model time 0.2518 (0.6179) loss 6.2016 (6.4354) grad_norm 1.2985 (1.6802) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][310/625] eta 0:03:28 lr 0.001541 wd 0.0500 time 0.2546 (0.6629) data time 0.0012 (0.0501) model time 0.2534 (0.6128) loss 6.4716 (6.4851) grad_norm 1.6220 (1.6549) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][320/625] eta 0:02:51 lr 0.001541 wd 0.0500 time 0.2559 (0.5608) data time 0.0008 (0.0378) model time 0.2551 (0.5230) loss 5.7135 (6.3470) grad_norm 2.1096 (1.6876) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][330/625] eta 0:02:27 lr 0.001541 wd 0.0500 time 0.2512 (0.4991) data time 0.0009 (0.0305) model time 0.2503 (0.4687) loss 5.6564 (6.3148) grad_norm 1.7471 (1.7120) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][340/625] eta 0:02:10 lr 0.001541 wd 0.0500 time 0.2512 (0.4582) data time 0.0010 (0.0255) model time 0.2502 (0.4327) loss 5.9775 (6.2445) grad_norm 1.2783 (1.7220) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][350/625] eta 0:01:57 lr 0.001541 wd 0.0500 time 0.2530 (0.4290) data time 0.0010 (0.0220) model time 0.2519 (0.4070) loss 4.7807 (6.2024) grad_norm 1.1512 (1.7572) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][360/625] eta 0:01:47 lr 0.001541 wd 0.0500 time 0.2500 (0.4070) data time 0.0010 (0.0194) model time 0.2489 (0.3876) loss 6.4852 (6.1760) grad_norm 2.1327 (1.7636) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][370/625] eta 0:01:39 lr 0.001541 wd 0.0500 time 0.2623 (0.3900) data time 0.0008 (0.0173) model time 0.2615 (0.3727) loss 6.4989 (6.1258) grad_norm 1.7761 (1.7660) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][380/625] eta 0:01:32 lr 0.001540 wd 0.0500 time 0.2552 (0.3763) data time 0.0008 (0.0157) model time 0.2544 (0.3606) loss 6.8222 (6.1462) grad_norm 1.4548 (1.7688) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][390/625] eta 0:01:31 lr 0.001540 wd 0.0500 time 0.2513 (0.3903) data time 0.0010 (0.0144) model time 0.2503 (0.3759) loss 5.7662 (6.1506) grad_norm 1.0877 (1.7668) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][400/625] eta 0:01:25 lr 0.001540 wd 0.0500 time 0.2551 (0.3790) data time 0.0008 (0.0132) model time 0.2543 (0.3658) loss 7.4637 (6.1677) grad_norm 1.2149 (1.7637) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][410/625] eta 0:01:19 lr 0.001540 wd 0.0500 time 0.2569 (0.3695) data time 0.0016 (0.0123) model time 0.2552 (0.3572) loss 5.4453 (6.1225) grad_norm 2.1739 (1.7807) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][420/625] eta 0:01:14 lr 0.001540 wd 0.0500 time 0.2403 (0.3613) data time 0.0009 (0.0115) model time 0.2394 (0.3498) loss 4.7005 (6.1113) grad_norm 2.6602 (1.8134) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][430/625] eta 0:01:09 lr 0.001540 wd 0.0500 time 0.2589 (0.3542) data time 0.0010 (0.0108) model time 0.2578 (0.3434) loss 6.6859 (6.1130) grad_norm 1.0713 (1.8158) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][440/625] eta 0:01:04 lr 0.001539 wd 0.0500 time 0.2550 (0.3479) data time 0.0009 (0.0102) model time 0.2541 (0.3377) loss 7.1035 (6.1278) grad_norm 1.7161 (1.7978) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][450/625] eta 0:00:59 lr 0.001539 wd 0.0500 time 0.2557 (0.3425) data time 0.0007 (0.0096) model time 0.2550 (0.3328) loss 4.8245 (6.1268) grad_norm 2.4223 (1.7931) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][460/625] eta 0:00:55 lr 0.001539 wd 0.0500 time 0.2507 (0.3377) data time 0.0007 (0.0092) model time 0.2501 (0.3285) loss 5.2550 (6.0994) grad_norm 1.4558 (1.7903) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][470/625] eta 0:00:51 lr 0.001539 wd 0.0500 time 0.2561 (0.3334) data time 0.0008 (0.0087) model time 0.2553 (0.3247) loss 4.8808 (6.0953) grad_norm 2.1611 (1.7831) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][480/625] eta 0:00:47 lr 0.001539 wd 0.0500 time 0.2536 (0.3295) data time 0.0008 (0.0083) model time 0.2528 (0.3211) loss 6.1450 (6.0714) grad_norm 2.3750 (1.7731) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][490/625] eta 0:00:44 lr 0.001539 wd 0.0500 time 0.2545 (0.3259) data time 0.0007 (0.0080) model time 0.2539 (0.3179) loss 5.8754 (6.0534) grad_norm 3.0271 (1.7726) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][500/625] eta 0:00:40 lr 0.001539 wd 0.0500 time 0.2440 (0.3228) data time 0.0012 (0.0077) model time 0.2428 (0.3151) loss 5.1952 (6.0432) grad_norm 2.3884 (1.7644) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][510/625] eta 0:00:36 lr 0.001538 wd 0.0500 time 0.2557 (0.3199) data time 0.0009 (0.0074) model time 0.2548 (0.3125) loss 7.3071 (6.0491) grad_norm 2.5665 (1.7653) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][520/625] eta 0:00:33 lr 0.001538 wd 0.0500 time 0.2568 (0.3172) data time 0.0008 (0.0071) model time 0.2559 (0.3101) loss 6.7543 (6.0313) grad_norm 1.3164 (1.7591) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][530/625] eta 0:00:29 lr 0.001538 wd 0.0500 time 0.2547 (0.3148) data time 0.0007 (0.0069) model time 0.2540 (0.3079) loss 4.9564 (6.0146) grad_norm 2.3047 (1.7547) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][540/625] eta 0:00:26 lr 0.001538 wd 0.0500 time 0.2513 (0.3125) data time 0.0010 (0.0067) model time 0.2504 (0.3058) loss 5.4647 (6.0087) grad_norm 1.8960 (1.7653) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][550/625] eta 0:00:23 lr 0.001538 wd 0.0500 time 0.2556 (0.3104) data time 0.0007 (0.0064) model time 0.2549 (0.3039) loss 7.1312 (5.9996) grad_norm 1.7633 (1.7584) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][560/625] eta 0:00:20 lr 0.001538 wd 0.0500 time 0.2589 (0.3084) data time 0.0009 (0.0063) model time 0.2581 (0.3021) loss 6.9782 (6.0098) grad_norm 1.3868 (1.7496) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][570/625] eta 0:00:16 lr 0.001538 wd 0.0500 time 0.2544 (0.3065) data time 0.0011 (0.0061) model time 0.2533 (0.3005) loss 5.9644 (6.0081) grad_norm 2.1112 (1.7536) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][580/625] eta 0:00:13 lr 0.001537 wd 0.0500 time 0.2526 (0.3049) data time 0.0008 (0.0059) model time 0.2518 (0.2990) loss 5.3768 (5.9991) grad_norm 1.9767 (1.7664) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][590/625] eta 0:00:10 lr 0.001537 wd 0.0500 time 0.2572 (0.3033) data time 0.0011 (0.0058) model time 0.2560 (0.2976) loss 5.9499 (5.9921) grad_norm 1.2324 (1.7580) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][600/625] eta 0:00:07 lr 0.001537 wd 0.0500 time 0.2562 (0.3018) data time 0.0011 (0.0056) model time 0.2551 (0.2962) loss 6.0412 (6.0105) grad_norm 1.0675 (1.7489) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][610/625] eta 0:00:04 lr 0.001537 wd 0.0500 time 0.2522 (0.3007) data time 0.0004 (0.0055) model time 0.2518 (0.2952) loss 6.6600 (6.0133) grad_norm 2.2985 (1.7486) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][620/625] eta 0:00:01 lr 0.001537 wd 0.0500 time 0.2535 (0.2993) data time 0.0005 (0.0053) model time 0.2530 (0.2939) loss 6.4484 (6.0180) grad_norm 3.3852 (1.7686) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 15:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 109 training takes 0:01:42 [2024-07-30 15:33:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:33:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:33:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.402 (0.402) Loss 0.6372 (0.6372) Acc@1 86.621 (86.621) Acc@5 98.242 (98.242) Mem 9656MB [2024-07-30 15:33:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.090) Loss 1.0977 (0.8028) Acc@1 75.342 (82.657) Acc@5 93.848 (96.773) Mem 9656MB [2024-07-30 15:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.2207 (0.9704) Acc@1 72.607 (78.930) Acc@5 92.334 (94.782) Mem 9656MB [2024-07-30 15:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.711 Acc@5 94.766 [2024-07-30 15:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.7% [2024-07-30 15:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.896 (0.896) Loss 0.5708 (0.5708) Acc@1 88.330 (88.330) Acc@5 98.389 (98.389) Mem 9656MB [2024-07-30 15:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.136) Loss 0.9448 (0.7188) Acc@1 77.783 (84.415) Acc@5 94.922 (97.132) Mem 9656MB [2024-07-30 15:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.097) Loss 1.0908 (0.8616) Acc@1 73.096 (80.597) Acc@5 93.262 (95.480) Mem 9656MB [2024-07-30 15:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.252 Acc@5 95.461 [2024-07-30 15:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-30 15:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.25% [2024-07-30 15:33:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:33:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][0/625] eta 0:28:42 lr 0.001537 wd 0.0500 time 2.7555 (2.7555) data time 2.4447 (2.4447) model time 0.0000 (0.0000) loss 5.3735 (5.3735) grad_norm 2.2123 (2.2123) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-30 15:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][10/625] eta 0:04:58 lr 0.001537 wd 0.0500 time 0.2536 (0.4847) data time 0.0011 (0.2231) model time 0.0000 (0.0000) loss 5.1835 (6.0793) grad_norm 2.2855 (1.8422) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][20/625] eta 0:03:47 lr 0.001536 wd 0.0500 time 0.2576 (0.3755) data time 0.0007 (0.1173) model time 0.0000 (0.0000) loss 6.5981 (6.0730) grad_norm 1.4942 (1.8118) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][30/625] eta 0:03:20 lr 0.001536 wd 0.0500 time 0.2503 (0.3367) data time 0.0008 (0.0798) model time 0.0000 (0.0000) loss 4.7700 (6.0143) grad_norm 1.2025 (1.8048) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][40/625] eta 0:03:05 lr 0.001536 wd 0.0500 time 0.2520 (0.3171) data time 0.0007 (0.0605) model time 0.0000 (0.0000) loss 4.4191 (5.8536) grad_norm 1.5156 (1.8099) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][50/625] eta 0:02:55 lr 0.001536 wd 0.0500 time 0.2578 (0.3049) data time 0.0010 (0.0488) model time 0.0000 (0.0000) loss 6.2618 (5.9499) grad_norm 1.6514 (1.7623) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][60/625] eta 0:02:47 lr 0.001536 wd 0.0500 time 0.2588 (0.2968) data time 0.0008 (0.0410) model time 0.2580 (0.2547) loss 5.5216 (5.9824) grad_norm 2.3378 (1.7837) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][70/625] eta 0:02:41 lr 0.001536 wd 0.0500 time 0.2517 (0.2911) data time 0.0009 (0.0353) model time 0.2509 (0.2550) loss 6.6381 (6.0060) grad_norm 2.3782 (1.8340) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][80/625] eta 0:02:36 lr 0.001536 wd 0.0500 time 0.2562 (0.2870) data time 0.0008 (0.0311) model time 0.2554 (0.2555) loss 6.4628 (5.9662) grad_norm 1.1922 (1.7843) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][90/625] eta 0:02:31 lr 0.001535 wd 0.0500 time 0.2563 (0.2837) data time 0.0008 (0.0278) model time 0.2555 (0.2557) loss 6.4555 (6.0416) grad_norm 1.8341 (1.7466) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][100/625] eta 0:02:27 lr 0.001535 wd 0.0500 time 0.2740 (0.2811) data time 0.0010 (0.0251) model time 0.2730 (0.2559) loss 6.1231 (6.0456) grad_norm 4.3943 (1.8487) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][110/625] eta 0:02:23 lr 0.001535 wd 0.0500 time 0.2542 (0.2789) data time 0.0008 (0.0229) model time 0.2534 (0.2559) loss 5.8137 (6.0086) grad_norm 1.6423 (1.8537) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][120/625] eta 0:02:19 lr 0.001535 wd 0.0500 time 0.2555 (0.2770) data time 0.0007 (0.0211) model time 0.2549 (0.2556) loss 5.1393 (5.9882) grad_norm 1.0941 (1.8443) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][130/625] eta 0:02:16 lr 0.001535 wd 0.0500 time 0.2559 (0.2753) data time 0.0009 (0.0196) model time 0.2549 (0.2555) loss 5.6699 (5.9598) grad_norm 1.8176 (1.8435) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][140/625] eta 0:02:12 lr 0.001535 wd 0.0500 time 0.2582 (0.2739) data time 0.0008 (0.0183) model time 0.2574 (0.2554) loss 6.8388 (5.9712) grad_norm 1.4428 (1.8372) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][150/625] eta 0:02:09 lr 0.001534 wd 0.0500 time 0.2525 (0.2727) data time 0.0009 (0.0171) model time 0.2516 (0.2553) loss 6.0198 (5.9801) grad_norm 1.7958 (1.8358) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][160/625] eta 0:02:06 lr 0.001534 wd 0.0500 time 0.2516 (0.2716) data time 0.0014 (0.0161) model time 0.2502 (0.2552) loss 7.0035 (5.9733) grad_norm 2.5045 (1.8286) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][170/625] eta 0:02:03 lr 0.001534 wd 0.0500 time 0.2526 (0.2706) data time 0.0010 (0.0152) model time 0.2516 (0.2551) loss 6.3016 (6.0061) grad_norm 2.4295 (1.8449) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][180/625] eta 0:02:00 lr 0.001534 wd 0.0500 time 0.2645 (0.2700) data time 0.0006 (0.0144) model time 0.2639 (0.2554) loss 6.8912 (5.9949) grad_norm 2.2426 (1.8307) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][190/625] eta 0:01:57 lr 0.001534 wd 0.0500 time 0.2534 (0.2695) data time 0.0009 (0.0137) model time 0.2525 (0.2556) loss 4.5043 (5.9785) grad_norm 1.3054 (1.8177) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][200/625] eta 0:01:54 lr 0.001534 wd 0.0500 time 0.5241 (0.2701) data time 0.0008 (0.0131) model time 0.5233 (0.2573) loss 5.7585 (5.9780) grad_norm 3.3351 (1.8310) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][210/625] eta 0:01:51 lr 0.001534 wd 0.0500 time 0.2507 (0.2694) data time 0.0009 (0.0125) model time 0.2498 (0.2572) loss 5.4284 (5.9896) grad_norm 2.7218 (1.8306) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][220/625] eta 0:01:48 lr 0.001533 wd 0.0500 time 0.2603 (0.2688) data time 0.0010 (0.0120) model time 0.2594 (0.2571) loss 6.3224 (6.0085) grad_norm 2.6365 (1.8410) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][230/625] eta 0:01:45 lr 0.001533 wd 0.0500 time 0.2540 (0.2683) data time 0.0009 (0.0115) model time 0.2531 (0.2569) loss 6.3825 (6.0074) grad_norm 1.3345 (1.8438) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][240/625] eta 0:01:43 lr 0.001533 wd 0.0500 time 0.2590 (0.2678) data time 0.0013 (0.0111) model time 0.2577 (0.2569) loss 5.9327 (6.0129) grad_norm 2.2918 (1.8455) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][250/625] eta 0:01:40 lr 0.001533 wd 0.0500 time 0.2583 (0.2674) data time 0.0007 (0.0107) model time 0.2576 (0.2569) loss 6.4391 (6.0208) grad_norm 2.0092 (1.8534) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][260/625] eta 0:01:37 lr 0.001533 wd 0.0500 time 0.2560 (0.2669) data time 0.0007 (0.0103) model time 0.2553 (0.2568) loss 7.3470 (6.0092) grad_norm 1.4532 (1.8594) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][270/625] eta 0:01:34 lr 0.001533 wd 0.0500 time 0.2560 (0.2667) data time 0.0010 (0.0100) model time 0.2550 (0.2569) loss 7.0221 (6.0153) grad_norm 1.5391 (1.8458) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][280/625] eta 0:01:31 lr 0.001532 wd 0.0500 time 0.2516 (0.2663) data time 0.0009 (0.0096) model time 0.2507 (0.2568) loss 6.7749 (6.0332) grad_norm 1.9412 (1.8310) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][290/625] eta 0:01:29 lr 0.001532 wd 0.0500 time 0.2556 (0.2659) data time 0.0009 (0.0093) model time 0.2547 (0.2567) loss 5.6457 (6.0298) grad_norm 1.1219 (1.8425) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][300/625] eta 0:01:26 lr 0.001532 wd 0.0500 time 0.2587 (0.2656) data time 0.0008 (0.0091) model time 0.2580 (0.2566) loss 5.3850 (6.0198) grad_norm 3.1411 (1.8482) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][310/625] eta 0:01:23 lr 0.001532 wd 0.0500 time 0.2556 (0.2653) data time 0.0008 (0.0088) model time 0.2547 (0.2566) loss 5.4749 (6.0204) grad_norm 1.6851 (1.8452) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][320/625] eta 0:01:20 lr 0.001532 wd 0.0500 time 0.2649 (0.2652) data time 0.0011 (0.0086) model time 0.2639 (0.2567) loss 6.0018 (6.0107) grad_norm 1.2604 (1.8343) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][330/625] eta 0:01:18 lr 0.001532 wd 0.0500 time 0.2498 (0.2649) data time 0.0008 (0.0083) model time 0.2490 (0.2567) loss 6.2077 (6.0180) grad_norm 1.2671 (1.8290) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][340/625] eta 0:01:15 lr 0.001532 wd 0.0500 time 0.2535 (0.2646) data time 0.0011 (0.0081) model time 0.2524 (0.2566) loss 6.9555 (6.0181) grad_norm 1.6646 (1.8243) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][350/625] eta 0:01:12 lr 0.001531 wd 0.0500 time 0.2549 (0.2644) data time 0.0010 (0.0079) model time 0.2539 (0.2565) loss 5.7236 (6.0126) grad_norm 1.8724 (1.8243) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 15:35:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:35:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:35:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:37:10 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 110) [2024-07-30 15:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:41:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:41:16 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 110) [2024-07-30 15:41:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][360/625] eta 0:11:04 lr 0.001531 wd 0.0500 time 0.2616 (2.5078) data time 0.0006 (0.2243) model time 0.2610 (2.2834) loss 5.6148 (6.2594) grad_norm 2.1670 (1.8832) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][370/625] eta 0:03:17 lr 0.001531 wd 0.0500 time 0.2477 (0.7748) data time 0.0010 (0.0525) model time 0.2467 (0.7223) loss 6.0704 (6.3231) grad_norm 1.2414 (1.6457) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][380/625] eta 0:02:14 lr 0.001531 wd 0.0500 time 0.2558 (0.5475) data time 0.0007 (0.0301) model time 0.2550 (0.5174) loss 6.3767 (6.3508) grad_norm 1.2060 (1.5768) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][390/625] eta 0:01:47 lr 0.001531 wd 0.0500 time 0.2492 (0.4580) data time 0.0009 (0.0213) model time 0.2483 (0.4368) loss 6.5227 (6.3954) grad_norm 1.4361 (1.5391) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][400/625] eta 0:01:32 lr 0.001531 wd 0.0500 time 0.2536 (0.4102) data time 0.0009 (0.0165) model time 0.2527 (0.3936) loss 6.5634 (6.2894) grad_norm 2.0824 (1.5919) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][410/625] eta 0:01:21 lr 0.001531 wd 0.0500 time 0.2506 (0.3809) data time 0.0008 (0.0136) model time 0.2498 (0.3673) loss 5.9841 (6.2582) grad_norm 1.7448 (1.6594) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][420/625] eta 0:01:13 lr 0.001530 wd 0.0500 time 0.2530 (0.3609) data time 0.0009 (0.0116) model time 0.2520 (0.3493) loss 4.8974 (6.1934) grad_norm 4.8625 (1.7600) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][430/625] eta 0:01:07 lr 0.001530 wd 0.0500 time 0.2533 (0.3462) data time 0.0009 (0.0101) model time 0.2524 (0.3360) loss 6.4056 (6.1548) grad_norm 3.0370 (1.8651) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][440/625] eta 0:01:01 lr 0.001530 wd 0.0500 time 0.2536 (0.3349) data time 0.0007 (0.0091) model time 0.2529 (0.3259) loss 5.1148 (6.1049) grad_norm 2.8865 (1.9048) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][450/625] eta 0:00:57 lr 0.001530 wd 0.0500 time 0.2502 (0.3263) data time 0.0008 (0.0082) model time 0.2494 (0.3181) loss 6.8595 (6.0981) grad_norm 1.7441 (1.9200) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][460/625] eta 0:00:52 lr 0.001530 wd 0.0500 time 0.2667 (0.3194) data time 0.0008 (0.0075) model time 0.2659 (0.3119) loss 6.9420 (6.1183) grad_norm 1.3709 (1.8844) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][470/625] eta 0:00:48 lr 0.001530 wd 0.0500 time 0.2510 (0.3136) data time 0.0007 (0.0069) model time 0.2503 (0.3066) loss 5.6074 (6.1050) grad_norm 1.4287 (1.8415) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][480/625] eta 0:00:44 lr 0.001529 wd 0.0500 time 0.2528 (0.3088) data time 0.0009 (0.0065) model time 0.2519 (0.3023) loss 6.5883 (6.1136) grad_norm 3.0101 (1.8942) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][490/625] eta 0:00:41 lr 0.001529 wd 0.0500 time 0.2532 (0.3045) data time 0.0008 (0.0060) model time 0.2524 (0.2985) loss 6.7476 (6.1119) grad_norm 1.3913 (1.8877) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][500/625] eta 0:00:37 lr 0.001529 wd 0.0500 time 0.2571 (0.3010) data time 0.0010 (0.0057) model time 0.2561 (0.2953) loss 7.1266 (6.1057) grad_norm 2.0601 (1.9369) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][510/625] eta 0:00:34 lr 0.001529 wd 0.0500 time 0.2515 (0.2978) data time 0.0007 (0.0054) model time 0.2507 (0.2924) loss 6.2449 (6.0983) grad_norm 1.2381 (1.9201) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][520/625] eta 0:00:30 lr 0.001529 wd 0.0500 time 0.2619 (0.2952) data time 0.0007 (0.0051) model time 0.2612 (0.2901) loss 5.4840 (6.1005) grad_norm 2.1728 (1.9070) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][530/625] eta 0:00:27 lr 0.001529 wd 0.0500 time 0.2507 (0.2928) data time 0.0009 (0.0049) model time 0.2498 (0.2879) loss 6.6218 (6.1091) grad_norm 1.1138 (1.8884) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][540/625] eta 0:00:24 lr 0.001529 wd 0.0500 time 0.2567 (0.2907) data time 0.0008 (0.0047) model time 0.2559 (0.2860) loss 6.4158 (6.1028) grad_norm 4.0724 (1.8779) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][550/625] eta 0:00:21 lr 0.001528 wd 0.0500 time 0.2518 (0.2888) data time 0.0007 (0.0045) model time 0.2512 (0.2844) loss 6.0380 (6.0948) grad_norm 1.4987 (1.8737) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][560/625] eta 0:00:18 lr 0.001528 wd 0.0500 time 0.2546 (0.2871) data time 0.0006 (0.0043) model time 0.2540 (0.2828) loss 4.8375 (6.0693) grad_norm 1.5705 (1.8776) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][570/625] eta 0:00:15 lr 0.001528 wd 0.0500 time 0.2560 (0.2856) data time 0.0008 (0.0042) model time 0.2551 (0.2814) loss 4.5272 (6.0568) grad_norm 1.6350 (1.8908) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][580/625] eta 0:00:12 lr 0.001528 wd 0.0500 time 0.2524 (0.2842) data time 0.0008 (0.0040) model time 0.2516 (0.2802) loss 4.8789 (6.0478) grad_norm 2.3403 (1.9001) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][590/625] eta 0:00:09 lr 0.001528 wd 0.0500 time 0.2518 (0.2830) data time 0.0009 (0.0039) model time 0.2509 (0.2792) loss 5.4729 (6.0474) grad_norm 2.4770 (1.8873) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][600/625] eta 0:00:07 lr 0.001528 wd 0.0500 time 0.2522 (0.2820) data time 0.0009 (0.0038) model time 0.2512 (0.2782) loss 7.4205 (6.0550) grad_norm 2.2707 (1.8774) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][610/625] eta 0:00:04 lr 0.001528 wd 0.0500 time 0.2544 (0.2810) data time 0.0005 (0.0037) model time 0.2538 (0.2773) loss 6.2803 (6.0511) grad_norm 1.1621 (1.8577) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][620/625] eta 0:00:01 lr 0.001527 wd 0.0500 time 0.2507 (0.2799) data time 0.0004 (0.0035) model time 0.2503 (0.2763) loss 5.4765 (6.0378) grad_norm 1.7857 (1.8404) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 110 training takes 0:01:14 [2024-07-30 15:42:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:42:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.575 (0.575) Loss 0.6621 (0.6621) Acc@1 87.061 (87.061) Acc@5 97.900 (97.900) Mem 9656MB [2024-07-30 15:42:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.0674 (0.8061) Acc@1 74.658 (82.844) Acc@5 94.189 (96.786) Mem 9656MB [2024-07-30 15:42:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.2227 (0.9593) Acc@1 71.094 (79.004) Acc@5 92.236 (94.947) Mem 9656MB [2024-07-30 15:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.849 Acc@5 94.918 [2024-07-30 15:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-30 15:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.85% [2024-07-30 15:43:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 15:43:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 15:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.417 (0.417) Loss 0.5718 (0.5718) Acc@1 88.135 (88.135) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 15:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 0.9453 (0.7191) Acc@1 77.686 (84.415) Acc@5 94.971 (97.164) Mem 9656MB [2024-07-30 15:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.0908 (0.8616) Acc@1 73.145 (80.611) Acc@5 93.311 (95.515) Mem 9656MB [2024-07-30 15:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.270 Acc@5 95.497 [2024-07-30 15:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-30 15:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.27% [2024-07-30 15:43:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 15:43:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 15:43:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][0/625] eta 0:59:05 lr 0.001527 wd 0.0500 time 5.6723 (5.6723) data time 3.3918 (3.3918) model time 0.0000 (0.0000) loss 6.9841 (6.9841) grad_norm 1.2075 (1.2075) loss_scale 2048.0000 (2048.0000) mem 9651MB [2024-07-30 15:43:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:43:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:43:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:48:14 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 111) [2024-07-30 15:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:48:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][10/625] eta 0:09:22 lr 0.001527 wd 0.0500 time 0.2495 (0.9140) data time 0.0010 (0.1166) model time 0.0000 (0.0000) loss 7.1149 (6.5514) grad_norm 1.0575 (1.3900) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][20/625] eta 0:05:53 lr 0.001527 wd 0.0500 time 0.2553 (0.5836) data time 0.0006 (0.0587) model time 0.0000 (0.0000) loss 6.9301 (6.2781) grad_norm 1.0893 (1.4081) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][30/625] eta 0:04:41 lr 0.001527 wd 0.0500 time 0.2514 (0.4738) data time 0.0010 (0.0395) model time 0.0000 (0.0000) loss 6.9883 (6.3359) grad_norm 1.8044 (1.5458) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:48:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][40/625] eta 0:04:04 lr 0.001527 wd 0.0500 time 0.2524 (0.4184) data time 0.0008 (0.0298) model time 0.0000 (0.0000) loss 5.4002 (6.2311) grad_norm 1.2857 (1.5688) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][50/625] eta 0:03:41 lr 0.001527 wd 0.0500 time 0.2531 (0.3855) data time 0.0010 (0.0241) model time 0.0000 (0.0000) loss 5.5096 (6.2313) grad_norm 2.5595 (1.5913) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][60/625] eta 0:03:25 lr 0.001526 wd 0.0500 time 0.2573 (0.3634) data time 0.0006 (0.0202) model time 0.2567 (0.2523) loss 6.1370 (6.1747) grad_norm 2.0165 (1.6340) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][70/625] eta 0:03:13 lr 0.001526 wd 0.0500 time 0.2507 (0.3481) data time 0.0007 (0.0175) model time 0.2500 (0.2539) loss 4.8511 (6.1369) grad_norm 1.7249 (1.6341) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][80/625] eta 0:03:03 lr 0.001526 wd 0.0500 time 0.2542 (0.3365) data time 0.0009 (0.0154) model time 0.2534 (0.2540) loss 7.5145 (6.1401) grad_norm 3.1179 (1.6861) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][90/625] eta 0:02:55 lr 0.001526 wd 0.0500 time 0.2533 (0.3274) data time 0.0008 (0.0138) model time 0.2525 (0.2538) loss 6.8005 (6.1082) grad_norm 1.1195 (1.7471) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][100/625] eta 0:02:48 lr 0.001526 wd 0.0500 time 0.2549 (0.3202) data time 0.0008 (0.0125) model time 0.2541 (0.2539) loss 6.9230 (6.1304) grad_norm 1.6563 (1.7417) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][110/625] eta 0:02:41 lr 0.001526 wd 0.0500 time 0.2539 (0.3141) data time 0.0009 (0.0115) model time 0.2529 (0.2537) loss 6.1597 (6.1418) grad_norm 1.3159 (1.7274) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][120/625] eta 0:02:36 lr 0.001525 wd 0.0500 time 0.2535 (0.3093) data time 0.0007 (0.0106) model time 0.2528 (0.2539) loss 7.7992 (6.1457) grad_norm 1.5699 (1.7534) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][130/625] eta 0:02:31 lr 0.001525 wd 0.0500 time 0.2548 (0.3051) data time 0.0007 (0.0099) model time 0.2541 (0.2540) loss 6.4007 (6.1068) grad_norm 2.2811 (1.7970) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][140/625] eta 0:02:26 lr 0.001525 wd 0.0500 time 0.2592 (0.3016) data time 0.0005 (0.0092) model time 0.2586 (0.2540) loss 4.7570 (6.1108) grad_norm 1.8325 (1.8062) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][150/625] eta 0:02:21 lr 0.001525 wd 0.0500 time 0.2621 (0.2985) data time 0.0007 (0.0086) model time 0.2613 (0.2540) loss 7.0312 (6.1184) grad_norm 2.2001 (1.8120) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][160/625] eta 0:02:17 lr 0.001525 wd 0.0500 time 0.2545 (0.2957) data time 0.0009 (0.0082) model time 0.2537 (0.2540) loss 6.7576 (6.1243) grad_norm 1.3998 (1.8136) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][170/625] eta 0:02:13 lr 0.001525 wd 0.0500 time 0.2555 (0.2933) data time 0.0005 (0.0077) model time 0.2549 (0.2540) loss 5.1064 (6.1225) grad_norm 1.1940 (1.8024) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][180/625] eta 0:02:09 lr 0.001525 wd 0.0500 time 0.2580 (0.2913) data time 0.0006 (0.0074) model time 0.2575 (0.2542) loss 4.8357 (6.0889) grad_norm 1.2796 (1.8090) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][190/625] eta 0:02:05 lr 0.001524 wd 0.0500 time 0.2520 (0.2895) data time 0.0008 (0.0070) model time 0.2512 (0.2542) loss 5.1245 (6.0846) grad_norm 3.1389 (1.8385) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][200/625] eta 0:02:02 lr 0.001524 wd 0.0500 time 0.2547 (0.2880) data time 0.0009 (0.0067) model time 0.2538 (0.2546) loss 6.6739 (6.0612) grad_norm 1.2709 (1.8393) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][210/625] eta 0:01:58 lr 0.001524 wd 0.0500 time 0.2533 (0.2864) data time 0.0007 (0.0064) model time 0.2526 (0.2546) loss 5.3867 (6.0457) grad_norm 1.5186 (1.8161) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][220/625] eta 0:01:55 lr 0.001524 wd 0.0500 time 0.2671 (0.2852) data time 0.0011 (0.0062) model time 0.2660 (0.2547) loss 5.8773 (6.0356) grad_norm 2.0797 (1.7984) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][230/625] eta 0:01:52 lr 0.001524 wd 0.0500 time 0.2536 (0.2839) data time 0.0009 (0.0060) model time 0.2527 (0.2547) loss 7.0191 (6.0437) grad_norm 1.6128 (1.8012) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][240/625] eta 0:01:48 lr 0.001524 wd 0.0500 time 0.2471 (0.2827) data time 0.0009 (0.0057) model time 0.2462 (0.2547) loss 7.2375 (6.0336) grad_norm 2.5640 (1.8202) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][250/625] eta 0:01:45 lr 0.001523 wd 0.0500 time 0.2542 (0.2816) data time 0.0006 (0.0056) model time 0.2536 (0.2547) loss 4.9767 (6.0223) grad_norm 2.2498 (1.8280) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][260/625] eta 0:01:42 lr 0.001523 wd 0.0500 time 0.2556 (0.2806) data time 0.0008 (0.0054) model time 0.2548 (0.2547) loss 5.2448 (6.0187) grad_norm 1.7936 (1.8216) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][270/625] eta 0:01:39 lr 0.001523 wd 0.0500 time 0.2590 (0.2796) data time 0.0007 (0.0052) model time 0.2583 (0.2547) loss 6.8373 (6.0117) grad_norm 1.6215 (1.8221) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][280/625] eta 0:01:36 lr 0.001523 wd 0.0500 time 0.2568 (0.2788) data time 0.0008 (0.0051) model time 0.2560 (0.2547) loss 6.3718 (6.0217) grad_norm 1.5760 (1.8246) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][290/625] eta 0:01:33 lr 0.001523 wd 0.0500 time 0.2568 (0.2780) data time 0.0008 (0.0049) model time 0.2561 (0.2547) loss 5.3876 (6.0192) grad_norm 1.1287 (1.8105) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][300/625] eta 0:01:30 lr 0.001523 wd 0.0500 time 0.2596 (0.2774) data time 0.0006 (0.0048) model time 0.2590 (0.2549) loss 5.3903 (6.0048) grad_norm 3.3249 (1.8287) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][310/625] eta 0:01:27 lr 0.001523 wd 0.0500 time 0.2593 (0.2768) data time 0.0009 (0.0047) model time 0.2585 (0.2550) loss 5.9923 (6.0052) grad_norm 1.4117 (1.8609) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][320/625] eta 0:01:24 lr 0.001522 wd 0.0500 time 0.2521 (0.2762) data time 0.0009 (0.0045) model time 0.2511 (0.2550) loss 5.7526 (6.0244) grad_norm 2.9274 (1.8774) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][330/625] eta 0:01:21 lr 0.001522 wd 0.0500 time 0.2591 (0.2756) data time 0.0006 (0.0044) model time 0.2585 (0.2551) loss 6.3710 (6.0351) grad_norm 2.5855 (1.8764) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][340/625] eta 0:01:18 lr 0.001522 wd 0.0500 time 0.2604 (0.2751) data time 0.0008 (0.0043) model time 0.2596 (0.2551) loss 6.0199 (6.0354) grad_norm 1.6935 (1.8751) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][350/625] eta 0:01:15 lr 0.001522 wd 0.0500 time 0.2607 (0.2745) data time 0.0009 (0.0042) model time 0.2598 (0.2551) loss 5.1857 (6.0395) grad_norm 1.5825 (1.8680) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][360/625] eta 0:01:12 lr 0.001522 wd 0.0500 time 0.2614 (0.2740) data time 0.0006 (0.0041) model time 0.2608 (0.2551) loss 7.2250 (6.0408) grad_norm 1.5497 (1.8688) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][370/625] eta 0:01:09 lr 0.001522 wd 0.0500 time 0.2568 (0.2735) data time 0.0008 (0.0040) model time 0.2560 (0.2551) loss 7.4575 (6.0408) grad_norm 2.9064 (1.8704) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][380/625] eta 0:01:06 lr 0.001522 wd 0.0500 time 0.2431 (0.2731) data time 0.0009 (0.0040) model time 0.2422 (0.2552) loss 6.2006 (6.0428) grad_norm 1.9486 (1.8653) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][390/625] eta 0:01:04 lr 0.001521 wd 0.0500 time 0.2519 (0.2727) data time 0.0011 (0.0039) model time 0.2508 (0.2552) loss 4.3234 (6.0243) grad_norm 3.0154 (1.8630) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][400/625] eta 0:01:01 lr 0.001521 wd 0.0500 time 0.2537 (0.2723) data time 0.0011 (0.0038) model time 0.2526 (0.2552) loss 5.9403 (6.0314) grad_norm 1.9023 (1.8592) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][410/625] eta 0:00:58 lr 0.001521 wd 0.0500 time 0.2555 (0.2719) data time 0.0007 (0.0037) model time 0.2549 (0.2552) loss 6.3601 (6.0364) grad_norm 1.3323 (1.8637) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][420/625] eta 0:00:55 lr 0.001521 wd 0.0500 time 0.4987 (0.2721) data time 0.0007 (0.0037) model time 0.4981 (0.2559) loss 6.2561 (6.0375) grad_norm 1.7141 (1.8568) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][430/625] eta 0:00:52 lr 0.001521 wd 0.0500 time 0.2542 (0.2717) data time 0.0008 (0.0036) model time 0.2534 (0.2558) loss 6.6361 (6.0490) grad_norm 2.0287 (1.8502) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][440/625] eta 0:00:50 lr 0.001521 wd 0.0500 time 0.2556 (0.2714) data time 0.0009 (0.0035) model time 0.2547 (0.2558) loss 5.6373 (6.0575) grad_norm 1.6730 (1.8480) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][450/625] eta 0:00:47 lr 0.001520 wd 0.0500 time 0.2559 (0.2710) data time 0.0009 (0.0035) model time 0.2550 (0.2558) loss 6.6351 (6.0576) grad_norm 2.6570 (1.8520) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][460/625] eta 0:00:44 lr 0.001520 wd 0.0500 time 0.2606 (0.2707) data time 0.0006 (0.0034) model time 0.2599 (0.2558) loss 5.8719 (6.0452) grad_norm 1.9199 (1.8560) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][470/625] eta 0:00:41 lr 0.001520 wd 0.0500 time 0.2560 (0.2704) data time 0.0008 (0.0034) model time 0.2552 (0.2558) loss 5.1215 (6.0317) grad_norm 1.3749 (1.8621) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][480/625] eta 0:00:39 lr 0.001520 wd 0.0500 time 0.2544 (0.2702) data time 0.0009 (0.0033) model time 0.2534 (0.2558) loss 5.0864 (6.0275) grad_norm 1.6604 (1.8634) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 15:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:50:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:50:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 15:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 15:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 15:56:22 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 111) [2024-07-30 15:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 15:56:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 15:56:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 15:56:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:02:01 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 111) [2024-07-30 16:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:02:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][490/625] eta 0:06:14 lr 0.001520 wd 0.0500 time 0.2505 (2.7728) data time 0.0007 (0.3303) model time 0.2497 (2.4425) loss 6.9213 (6.4602) grad_norm 1.4997 (1.5474) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][500/625] eta 0:02:16 lr 0.001520 wd 0.0500 time 0.2453 (1.0895) data time 0.0010 (0.1109) model time 0.2443 (0.9786) loss 6.0906 (6.2302) grad_norm 2.1527 (1.5519) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][510/625] eta 0:01:26 lr 0.001520 wd 0.0500 time 0.2479 (0.7526) data time 0.0009 (0.0670) model time 0.2470 (0.6855) loss 6.8081 (6.3243) grad_norm 1.6006 (1.5426) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][520/625] eta 0:01:04 lr 0.001519 wd 0.0500 time 0.2478 (0.6149) data time 0.0009 (0.0482) model time 0.2469 (0.5667) loss 6.3179 (6.2535) grad_norm 1.5495 (1.6077) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][530/625] eta 0:00:50 lr 0.001519 wd 0.0500 time 0.2479 (0.5329) data time 0.0009 (0.0377) model time 0.2470 (0.4951) loss 6.3858 (6.2035) grad_norm 1.6877 (1.7206) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][540/625] eta 0:00:41 lr 0.001519 wd 0.0500 time 0.2532 (0.4840) data time 0.0007 (0.0311) model time 0.2525 (0.4529) loss 5.2970 (6.1976) grad_norm 1.9151 (1.6975) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][550/625] eta 0:00:33 lr 0.001519 wd 0.0500 time 0.2446 (0.4480) data time 0.0013 (0.0265) model time 0.2432 (0.4216) loss 6.7062 (6.1883) grad_norm 2.7016 (1.7015) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][560/625] eta 0:00:27 lr 0.001519 wd 0.0500 time 0.3161 (0.4250) data time 0.0013 (0.0231) model time 0.3147 (0.4019) loss 5.1206 (6.1317) grad_norm 1.4482 (1.7514) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][570/625] eta 0:00:22 lr 0.001519 wd 0.0500 time 0.2556 (0.4041) data time 0.0007 (0.0205) model time 0.2549 (0.3837) loss 5.4822 (6.1161) grad_norm 1.9982 (1.8370) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][580/625] eta 0:00:17 lr 0.001518 wd 0.0500 time 0.2476 (0.3880) data time 0.0010 (0.0185) model time 0.2465 (0.3696) loss 6.6555 (6.1120) grad_norm 2.2521 (1.8347) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][590/625] eta 0:00:13 lr 0.001518 wd 0.0500 time 0.3175 (0.3755) data time 0.0011 (0.0168) model time 0.3164 (0.3587) loss 5.4208 (6.1490) grad_norm 1.6215 (1.8177) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:03:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][600/625] eta 0:00:09 lr 0.001518 wd 0.0500 time 0.2459 (0.3655) data time 0.0010 (0.0154) model time 0.2450 (0.3500) loss 4.9600 (6.1530) grad_norm 2.2539 (1.8202) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][610/625] eta 0:00:05 lr 0.001518 wd 0.0500 time 0.2444 (0.3562) data time 0.0007 (0.0143) model time 0.2438 (0.3419) loss 5.2658 (6.1500) grad_norm 1.6946 (1.8409) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][620/625] eta 0:00:01 lr 0.001518 wd 0.0500 time 0.2430 (0.3492) data time 0.0004 (0.0133) model time 0.2425 (0.3359) loss 5.4735 (6.1420) grad_norm 2.0276 (1.8748) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:03:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 111 training takes 0:00:48 [2024-07-30 16:03:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:03:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:03:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.835 (0.835) Loss 0.6709 (0.6709) Acc@1 85.742 (85.742) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-30 16:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.139) Loss 1.0664 (0.8158) Acc@1 76.953 (82.884) Acc@5 93.359 (96.604) Mem 9656MB [2024-07-30 16:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.058 (0.098) Loss 1.2207 (0.9669) Acc@1 72.217 (79.088) Acc@5 92.139 (94.827) Mem 9656MB [2024-07-30 16:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.757 Acc@5 94.794 [2024-07-30 16:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-30 16:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 3.070 (3.070) Loss 0.5723 (0.5723) Acc@1 88.135 (88.135) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 16:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.340) Loss 0.9448 (0.7195) Acc@1 77.734 (84.450) Acc@5 94.971 (97.168) Mem 9656MB [2024-07-30 16:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.203) Loss 1.0918 (0.8617) Acc@1 73.193 (80.659) Acc@5 93.164 (95.508) Mem 9656MB [2024-07-30 16:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.328 Acc@5 95.503 [2024-07-30 16:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-30 16:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.33% [2024-07-30 16:03:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 16:03:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 16:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][0/625] eta 0:23:07 lr 0.001518 wd 0.0500 time 2.2205 (2.2205) data time 1.8809 (1.8809) model time 0.0000 (0.0000) loss 3.9802 (3.9802) grad_norm 1.1525 (1.1525) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-30 16:03:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][10/625] eta 0:04:27 lr 0.001518 wd 0.0500 time 0.2453 (0.4347) data time 0.0010 (0.1719) model time 0.0000 (0.0000) loss 7.5754 (5.9478) grad_norm 2.5272 (1.6138) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][20/625] eta 0:03:28 lr 0.001517 wd 0.0500 time 0.2448 (0.3449) data time 0.0014 (0.0906) model time 0.0000 (0.0000) loss 6.8029 (6.0551) grad_norm 1.7549 (1.8035) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][30/625] eta 0:03:11 lr 0.001517 wd 0.0500 time 0.3061 (0.3216) data time 0.0009 (0.0617) model time 0.0000 (0.0000) loss 5.4528 (6.1062) grad_norm 2.1595 (2.0486) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][40/625] eta 0:03:00 lr 0.001517 wd 0.0500 time 0.2532 (0.3080) data time 0.0007 (0.0469) model time 0.0000 (0.0000) loss 5.3834 (5.9882) grad_norm 2.1907 (2.0080) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][50/625] eta 0:02:50 lr 0.001517 wd 0.0500 time 0.2459 (0.2965) data time 0.0010 (0.0379) model time 0.0000 (0.0000) loss 5.5711 (6.0234) grad_norm 1.0559 (1.9212) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][60/625] eta 0:02:44 lr 0.001517 wd 0.0500 time 0.2446 (0.2906) data time 0.0011 (0.0319) model time 0.2435 (0.2589) loss 6.0159 (5.9635) grad_norm 1.3249 (1.9017) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][70/625] eta 0:02:38 lr 0.001517 wd 0.0500 time 0.2428 (0.2852) data time 0.0007 (0.0276) model time 0.2421 (0.2553) loss 6.3541 (5.9309) grad_norm 1.7195 (1.8731) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][80/625] eta 0:02:32 lr 0.001517 wd 0.0500 time 0.2430 (0.2807) data time 0.0010 (0.0243) model time 0.2420 (0.2525) loss 5.5488 (5.9336) grad_norm 1.2855 (1.8260) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 16:03:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:03:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:15:02 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 112) [2024-07-30 16:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][90/625] eta 0:25:27 lr 0.001516 wd 0.0500 time 0.2579 (2.8554) data time 0.0008 (0.2891) model time 0.2571 (2.5664) loss 6.3729 (6.4841) grad_norm 1.7096 (2.0785) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][100/625] eta 0:08:46 lr 0.001516 wd 0.0500 time 0.2696 (1.0028) data time 0.0007 (0.0834) model time 0.2689 (0.9194) loss 6.1746 (6.3181) grad_norm 1.4532 (2.1372) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][110/625] eta 0:05:57 lr 0.001516 wd 0.0500 time 0.2573 (0.6942) data time 0.0011 (0.0491) model time 0.2562 (0.6451) loss 6.5839 (6.3356) grad_norm 1.4470 (2.0733) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][120/625] eta 0:04:46 lr 0.001516 wd 0.0500 time 0.2732 (0.5670) data time 0.0010 (0.0350) model time 0.2722 (0.5320) loss 4.8392 (6.2904) grad_norm 1.2195 (2.0117) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][130/625] eta 0:04:06 lr 0.001516 wd 0.0500 time 0.2669 (0.4988) data time 0.0008 (0.0273) model time 0.2661 (0.4715) loss 6.2802 (6.1934) grad_norm 3.4958 (1.9668) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][140/625] eta 0:03:40 lr 0.001516 wd 0.0500 time 0.2592 (0.4547) data time 0.0008 (0.0225) model time 0.2584 (0.4322) loss 7.1070 (6.2012) grad_norm 1.3267 (1.9295) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][150/625] eta 0:03:21 lr 0.001515 wd 0.0500 time 0.2610 (0.4247) data time 0.0010 (0.0191) model time 0.2600 (0.4055) loss 6.9541 (6.1247) grad_norm 2.4042 (1.9712) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][160/625] eta 0:03:07 lr 0.001515 wd 0.0500 time 0.2573 (0.4025) data time 0.0008 (0.0167) model time 0.2565 (0.3858) loss 6.5656 (6.1061) grad_norm 1.5312 (1.9397) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][170/625] eta 0:02:55 lr 0.001515 wd 0.0500 time 0.2572 (0.3857) data time 0.0012 (0.0149) model time 0.2560 (0.3709) loss 6.3856 (6.0854) grad_norm 1.0637 (1.9258) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][180/625] eta 0:02:45 lr 0.001515 wd 0.0500 time 0.2604 (0.3724) data time 0.0011 (0.0134) model time 0.2593 (0.3589) loss 5.9876 (6.0777) grad_norm 1.6323 (1.8770) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][190/625] eta 0:02:37 lr 0.001515 wd 0.0500 time 0.2595 (0.3615) data time 0.0010 (0.0124) model time 0.2585 (0.3491) loss 6.2477 (6.1085) grad_norm 1.3357 (1.8722) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][200/625] eta 0:02:29 lr 0.001515 wd 0.0500 time 0.2600 (0.3526) data time 0.0010 (0.0114) model time 0.2590 (0.3412) loss 6.5811 (6.1084) grad_norm 1.7472 (1.8589) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][210/625] eta 0:02:23 lr 0.001515 wd 0.0500 time 0.2576 (0.3451) data time 0.0011 (0.0106) model time 0.2565 (0.3345) loss 6.1344 (6.1165) grad_norm 1.3180 (1.8698) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][220/625] eta 0:02:17 lr 0.001514 wd 0.0500 time 0.2600 (0.3387) data time 0.0011 (0.0099) model time 0.2588 (0.3288) loss 7.2040 (6.1269) grad_norm 2.7449 (1.8744) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][230/625] eta 0:02:11 lr 0.001514 wd 0.0500 time 0.2725 (0.3334) data time 0.0010 (0.0093) model time 0.2715 (0.3242) loss 5.5423 (6.1048) grad_norm 1.0439 (1.8921) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][240/625] eta 0:02:06 lr 0.001514 wd 0.0500 time 0.2574 (0.3286) data time 0.0009 (0.0087) model time 0.2566 (0.3199) loss 6.1203 (6.1008) grad_norm 1.3675 (1.8849) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][250/625] eta 0:02:01 lr 0.001514 wd 0.0500 time 0.2584 (0.3244) data time 0.0009 (0.0083) model time 0.2575 (0.3162) loss 6.0691 (6.1054) grad_norm 2.5182 (1.8664) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][260/625] eta 0:01:57 lr 0.001514 wd 0.0500 time 0.2583 (0.3209) data time 0.0013 (0.0079) model time 0.2570 (0.3130) loss 5.3123 (6.1003) grad_norm 4.0495 (1.8952) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][270/625] eta 0:01:52 lr 0.001514 wd 0.0500 time 0.2637 (0.3176) data time 0.0008 (0.0075) model time 0.2630 (0.3102) loss 5.6728 (6.0921) grad_norm 1.8475 (1.9138) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][280/625] eta 0:01:48 lr 0.001513 wd 0.0500 time 0.2475 (0.3148) data time 0.0011 (0.0072) model time 0.2463 (0.3077) loss 5.5474 (6.0775) grad_norm 1.4978 (1.9024) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][290/625] eta 0:01:44 lr 0.001513 wd 0.0500 time 0.2579 (0.3123) data time 0.0013 (0.0069) model time 0.2567 (0.3054) loss 6.4758 (6.0518) grad_norm 2.4000 (1.8841) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][300/625] eta 0:01:40 lr 0.001513 wd 0.0500 time 0.2573 (0.3098) data time 0.0007 (0.0066) model time 0.2565 (0.3032) loss 7.0231 (6.0570) grad_norm 1.6681 (1.8899) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][310/625] eta 0:01:38 lr 0.001513 wd 0.0500 time 0.2785 (0.3137) data time 0.0013 (0.0064) model time 0.2772 (0.3073) loss 6.8980 (6.0624) grad_norm 1.5194 (1.8775) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][320/625] eta 0:01:35 lr 0.001513 wd 0.0500 time 0.2560 (0.3115) data time 0.0009 (0.0061) model time 0.2550 (0.3054) loss 4.8066 (6.0501) grad_norm 1.3891 (1.8819) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][330/625] eta 0:01:31 lr 0.001513 wd 0.0500 time 0.2609 (0.3094) data time 0.0008 (0.0059) model time 0.2600 (0.3035) loss 5.0542 (6.0624) grad_norm 1.4626 (1.8632) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][340/625] eta 0:01:27 lr 0.001513 wd 0.0500 time 0.2613 (0.3075) data time 0.0007 (0.0057) model time 0.2605 (0.3018) loss 5.5094 (6.0432) grad_norm 1.7498 (1.8583) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][350/625] eta 0:01:24 lr 0.001512 wd 0.0500 time 0.2746 (0.3058) data time 0.0009 (0.0056) model time 0.2737 (0.3002) loss 5.4474 (6.0313) grad_norm 1.5241 (1.8570) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][360/625] eta 0:01:20 lr 0.001512 wd 0.0500 time 0.2581 (0.3042) data time 0.0011 (0.0054) model time 0.2570 (0.2988) loss 6.1394 (6.0270) grad_norm 1.3249 (1.8554) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][370/625] eta 0:01:17 lr 0.001512 wd 0.0500 time 0.2741 (0.3027) data time 0.0008 (0.0053) model time 0.2734 (0.2975) loss 5.6963 (6.0230) grad_norm 1.5508 (1.8460) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][380/625] eta 0:01:13 lr 0.001512 wd 0.0500 time 0.2613 (0.3014) data time 0.0008 (0.0051) model time 0.2605 (0.2962) loss 5.6433 (6.0208) grad_norm 1.4515 (1.8404) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][390/625] eta 0:01:10 lr 0.001512 wd 0.0500 time 0.2601 (0.3001) data time 0.0012 (0.0050) model time 0.2589 (0.2951) loss 6.7947 (6.0177) grad_norm 1.2678 (1.8312) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][400/625] eta 0:01:07 lr 0.001512 wd 0.0500 time 0.2562 (0.2988) data time 0.0011 (0.0049) model time 0.2551 (0.2939) loss 6.7237 (6.0168) grad_norm 1.6480 (1.8321) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][410/625] eta 0:01:03 lr 0.001512 wd 0.0500 time 0.2618 (0.2977) data time 0.0008 (0.0048) model time 0.2610 (0.2929) loss 6.4734 (6.0285) grad_norm 1.3431 (1.8241) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][420/625] eta 0:01:00 lr 0.001511 wd 0.0500 time 0.2603 (0.2966) data time 0.0008 (0.0047) model time 0.2594 (0.2919) loss 5.6312 (6.0293) grad_norm 1.2614 (1.8153) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][430/625] eta 0:00:57 lr 0.001511 wd 0.0500 time 0.2559 (0.2955) data time 0.0013 (0.0046) model time 0.2546 (0.2910) loss 6.6659 (6.0322) grad_norm 2.7167 (1.8159) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][440/625] eta 0:00:54 lr 0.001511 wd 0.0500 time 0.2620 (0.2946) data time 0.0011 (0.0045) model time 0.2609 (0.2901) loss 7.0527 (6.0399) grad_norm 2.7305 (1.8194) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][450/625] eta 0:00:51 lr 0.001511 wd 0.0500 time 0.2611 (0.2937) data time 0.0008 (0.0044) model time 0.2603 (0.2894) loss 5.4160 (6.0387) grad_norm 1.4093 (1.8174) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][460/625] eta 0:00:48 lr 0.001511 wd 0.0500 time 0.2566 (0.2929) data time 0.0011 (0.0043) model time 0.2555 (0.2886) loss 7.8000 (6.0432) grad_norm 1.7970 (1.8279) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][470/625] eta 0:00:45 lr 0.001511 wd 0.0500 time 0.2594 (0.2920) data time 0.0007 (0.0042) model time 0.2587 (0.2878) loss 5.2131 (6.0273) grad_norm 1.4009 (1.8260) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][480/625] eta 0:00:42 lr 0.001510 wd 0.0500 time 0.2592 (0.2912) data time 0.0010 (0.0041) model time 0.2582 (0.2871) loss 6.3868 (6.0270) grad_norm 1.6378 (1.8199) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][490/625] eta 0:00:39 lr 0.001510 wd 0.0500 time 0.2607 (0.2905) data time 0.0008 (0.0041) model time 0.2600 (0.2865) loss 6.0858 (6.0349) grad_norm 1.5058 (1.8196) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][500/625] eta 0:00:36 lr 0.001510 wd 0.0500 time 0.2594 (0.2899) data time 0.0011 (0.0040) model time 0.2583 (0.2859) loss 5.0422 (6.0352) grad_norm 3.1407 (1.8199) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][510/625] eta 0:00:33 lr 0.001510 wd 0.0500 time 0.2593 (0.2899) data time 0.0009 (0.0039) model time 0.2584 (0.2860) loss 6.9553 (6.0313) grad_norm 3.4558 (1.8228) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][520/625] eta 0:00:30 lr 0.001510 wd 0.0500 time 0.2594 (0.2894) data time 0.0010 (0.0038) model time 0.2584 (0.2856) loss 6.8628 (6.0428) grad_norm 1.9184 (1.8217) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][530/625] eta 0:00:27 lr 0.001510 wd 0.0500 time 0.2648 (0.2888) data time 0.0010 (0.0038) model time 0.2638 (0.2851) loss 6.5776 (6.0431) grad_norm 1.1787 (1.8166) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][540/625] eta 0:00:24 lr 0.001510 wd 0.0500 time 0.2683 (0.2883) data time 0.0008 (0.0037) model time 0.2675 (0.2845) loss 6.7870 (6.0376) grad_norm 1.5408 (1.8113) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][550/625] eta 0:00:21 lr 0.001509 wd 0.0500 time 0.2573 (0.2877) data time 0.0010 (0.0037) model time 0.2563 (0.2840) loss 7.4418 (6.0285) grad_norm 1.6083 (1.8115) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][560/625] eta 0:00:18 lr 0.001509 wd 0.0500 time 0.2587 (0.2871) data time 0.0011 (0.0036) model time 0.2577 (0.2835) loss 6.0224 (6.0190) grad_norm 2.0880 (1.8108) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][570/625] eta 0:00:15 lr 0.001509 wd 0.0500 time 0.2657 (0.2867) data time 0.0010 (0.0036) model time 0.2646 (0.2831) loss 6.6201 (6.0188) grad_norm 1.2260 (1.8272) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][580/625] eta 0:00:12 lr 0.001509 wd 0.0500 time 0.2654 (0.2862) data time 0.0010 (0.0035) model time 0.2644 (0.2827) loss 5.7244 (6.0208) grad_norm 1.5844 (1.8262) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][590/625] eta 0:00:10 lr 0.001509 wd 0.0500 time 0.2614 (0.2857) data time 0.0009 (0.0035) model time 0.2605 (0.2822) loss 5.7414 (6.0167) grad_norm 1.3855 (1.8270) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][600/625] eta 0:00:07 lr 0.001509 wd 0.0500 time 0.2565 (0.2852) data time 0.0012 (0.0034) model time 0.2554 (0.2818) loss 6.0241 (6.0257) grad_norm 3.7957 (1.8298) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][610/625] eta 0:00:04 lr 0.001508 wd 0.0500 time 0.2582 (0.2848) data time 0.0005 (0.0034) model time 0.2577 (0.2815) loss 5.2838 (6.0152) grad_norm 2.4392 (1.8353) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][620/625] eta 0:00:01 lr 0.001508 wd 0.0500 time 0.2610 (0.2844) data time 0.0005 (0.0033) model time 0.2606 (0.2810) loss 5.6822 (6.0165) grad_norm 3.0410 (1.8488) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 112 training takes 0:02:32 [2024-07-30 16:17:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:17:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.785 (0.785) Loss 0.6704 (0.6704) Acc@1 87.500 (87.500) Acc@5 97.900 (97.900) Mem 9656MB [2024-07-30 16:18:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 1.0791 (0.8141) Acc@1 76.611 (83.043) Acc@5 93.555 (96.644) Mem 9656MB [2024-07-30 16:18:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.2188 (0.9687) Acc@1 71.973 (79.157) Acc@5 92.188 (94.840) Mem 9656MB [2024-07-30 16:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.871 Acc@5 94.824 [2024-07-30 16:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-30 16:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.87% [2024-07-30 16:18:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 16:18:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 16:18:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.746 (0.746) Loss 0.5728 (0.5728) Acc@1 88.232 (88.232) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 16:18:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9438 (0.7202) Acc@1 77.930 (84.504) Acc@5 94.971 (97.195) Mem 9656MB [2024-07-30 16:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0918 (0.8617) Acc@1 73.193 (80.718) Acc@5 93.164 (95.522) Mem 9656MB [2024-07-30 16:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.390 Acc@5 95.511 [2024-07-30 16:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-30 16:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.39% [2024-07-30 16:18:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 16:18:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 16:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][0/625] eta 0:09:35 lr 0.001508 wd 0.0500 time 0.9210 (0.9210) data time 0.5769 (0.5769) model time 0.0000 (0.0000) loss 6.6613 (6.6613) grad_norm 1.6525 (1.6525) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-30 16:18:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][10/625] eta 0:03:17 lr 0.001508 wd 0.0500 time 0.2562 (0.3205) data time 0.0010 (0.0535) model time 0.0000 (0.0000) loss 7.4445 (6.2541) grad_norm 2.6681 (2.2705) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][20/625] eta 0:02:56 lr 0.001508 wd 0.0500 time 0.2583 (0.2917) data time 0.0007 (0.0286) model time 0.0000 (0.0000) loss 7.4622 (6.1767) grad_norm 1.5474 (1.8574) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][30/625] eta 0:03:53 lr 0.001508 wd 0.0500 time 0.2510 (0.3930) data time 0.0010 (0.0197) model time 0.0000 (0.0000) loss 5.0179 (6.2101) grad_norm 1.7677 (1.8045) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][40/625] eta 0:03:31 lr 0.001508 wd 0.0500 time 0.2592 (0.3613) data time 0.0008 (0.0152) model time 0.0000 (0.0000) loss 7.1073 (6.1914) grad_norm 1.7564 (1.6883) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][50/625] eta 0:03:16 lr 0.001507 wd 0.0500 time 0.2638 (0.3419) data time 0.0011 (0.0124) model time 0.0000 (0.0000) loss 5.1009 (6.1826) grad_norm 2.2182 (1.7580) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][60/625] eta 0:03:05 lr 0.001507 wd 0.0500 time 0.2625 (0.3288) data time 0.0010 (0.0106) model time 0.2614 (0.2612) loss 5.7577 (6.1619) grad_norm 1.6654 (1.8430) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][70/625] eta 0:02:57 lr 0.001507 wd 0.0500 time 0.2617 (0.3199) data time 0.0008 (0.0093) model time 0.2609 (0.2628) loss 7.3433 (6.1329) grad_norm 2.3675 (1.8412) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][80/625] eta 0:02:52 lr 0.001507 wd 0.0500 time 0.2620 (0.3164) data time 0.0009 (0.0082) model time 0.2611 (0.2719) loss 6.7272 (6.1365) grad_norm 2.0186 (1.8633) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][90/625] eta 0:02:46 lr 0.001507 wd 0.0500 time 0.2633 (0.3109) data time 0.0010 (0.0075) model time 0.2623 (0.2702) loss 5.2053 (6.1325) grad_norm 1.8258 (1.8538) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][100/625] eta 0:02:40 lr 0.001507 wd 0.0500 time 0.2637 (0.3062) data time 0.0009 (0.0069) model time 0.2628 (0.2687) loss 6.5258 (6.1598) grad_norm 2.3505 (1.8754) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][110/625] eta 0:02:35 lr 0.001507 wd 0.0500 time 0.2687 (0.3022) data time 0.0008 (0.0063) model time 0.2679 (0.2674) loss 5.6399 (6.1074) grad_norm 1.4657 (1.8839) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][120/625] eta 0:02:33 lr 0.001506 wd 0.0500 time 0.2625 (0.3045) data time 0.0011 (0.0059) model time 0.2614 (0.2761) loss 5.5209 (6.0830) grad_norm 1.1725 (1.8570) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][130/625] eta 0:02:29 lr 0.001506 wd 0.0500 time 0.2634 (0.3021) data time 0.0008 (0.0056) model time 0.2626 (0.2756) loss 6.0628 (6.0799) grad_norm 2.0227 (1.8376) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][140/625] eta 0:02:25 lr 0.001506 wd 0.0500 time 0.2593 (0.2995) data time 0.0008 (0.0052) model time 0.2585 (0.2743) loss 5.3255 (6.0870) grad_norm 1.9571 (1.8273) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][150/625] eta 0:02:21 lr 0.001506 wd 0.0500 time 0.3033 (0.2973) data time 0.0010 (0.0050) model time 0.3023 (0.2734) loss 4.8011 (6.0623) grad_norm 2.2095 (1.8426) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][160/625] eta 0:02:19 lr 0.001506 wd 0.0500 time 0.2978 (0.2999) data time 0.0013 (0.0048) model time 0.2965 (0.2792) loss 4.7308 (6.0330) grad_norm 2.8714 (1.8373) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][170/625] eta 0:02:15 lr 0.001506 wd 0.0500 time 0.2716 (0.2981) data time 0.0012 (0.0046) model time 0.2705 (0.2782) loss 4.7811 (6.0257) grad_norm 2.0454 (1.8369) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-30 16:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 16:19:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:19:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:24:11 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 113) [2024-07-30 16:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:34:04 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 113) [2024-07-30 16:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][180/625] eta 0:43:38 lr 0.001505 wd 0.0500 time 0.8508 (5.8848) data time 0.0009 (0.8556) model time 0.8499 (5.0291) loss 6.9453 (7.1988) grad_norm 1.1153 (1.2718) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:34:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][190/625] eta 0:08:42 lr 0.001505 wd 0.0500 time 0.2639 (1.2006) data time 0.0007 (0.1436) model time 0.2632 (1.0570) loss 5.2168 (6.4056) grad_norm 1.4897 (1.4630) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][200/625] eta 0:05:30 lr 0.001505 wd 0.0500 time 0.2620 (0.7786) data time 0.0011 (0.0792) model time 0.2609 (0.6995) loss 6.6700 (6.3367) grad_norm 1.8804 (1.5447) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][210/625] eta 0:04:16 lr 0.001505 wd 0.0500 time 0.2636 (0.6177) data time 0.0008 (0.0548) model time 0.2628 (0.5628) loss 6.2237 (6.3764) grad_norm 3.7943 (1.7098) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][220/625] eta 0:04:17 lr 0.001505 wd 0.0500 time 0.2585 (0.6348) data time 0.0012 (0.0421) model time 0.2573 (0.5927) loss 6.6391 (6.2569) grad_norm 2.1730 (1.7523) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][230/625] eta 0:03:42 lr 0.001505 wd 0.0500 time 0.2567 (0.5641) data time 0.0009 (0.0344) model time 0.2558 (0.5297) loss 6.0294 (6.2100) grad_norm 1.7986 (1.7494) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][240/625] eta 0:03:18 lr 0.001505 wd 0.0500 time 0.2689 (0.5159) data time 0.0008 (0.0291) model time 0.2681 (0.4869) loss 6.7557 (6.1596) grad_norm 1.2290 (1.7093) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][250/625] eta 0:03:00 lr 0.001504 wd 0.0500 time 0.2539 (0.4814) data time 0.0012 (0.0253) model time 0.2527 (0.4561) loss 5.9528 (6.1148) grad_norm 2.1173 (1.7824) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][260/625] eta 0:02:46 lr 0.001504 wd 0.0500 time 0.2609 (0.4554) data time 0.0011 (0.0228) model time 0.2598 (0.4326) loss 5.3481 (6.1179) grad_norm 2.2260 (1.8172) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][270/625] eta 0:02:34 lr 0.001504 wd 0.0500 time 0.2622 (0.4348) data time 0.0009 (0.0206) model time 0.2613 (0.4142) loss 5.1711 (6.0985) grad_norm 1.9283 (1.8268) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][280/625] eta 0:02:24 lr 0.001504 wd 0.0500 time 0.2580 (0.4182) data time 0.0008 (0.0188) model time 0.2572 (0.3994) loss 6.0181 (6.1179) grad_norm 1.4105 (1.8138) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][290/625] eta 0:02:15 lr 0.001504 wd 0.0500 time 0.2579 (0.4043) data time 0.0010 (0.0172) model time 0.2568 (0.3871) loss 6.7718 (6.1149) grad_norm 1.5620 (1.7953) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][300/625] eta 0:02:07 lr 0.001504 wd 0.0500 time 0.2603 (0.3930) data time 0.0008 (0.0160) model time 0.2595 (0.3770) loss 6.3897 (6.1084) grad_norm 1.6080 (1.8069) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][310/625] eta 0:02:00 lr 0.001503 wd 0.0500 time 0.2668 (0.3839) data time 0.0012 (0.0149) model time 0.2656 (0.3690) loss 6.5526 (6.1066) grad_norm 1.9987 (1.8166) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][320/625] eta 0:01:54 lr 0.001503 wd 0.0500 time 0.2657 (0.3761) data time 0.0010 (0.0139) model time 0.2647 (0.3622) loss 5.8645 (6.1035) grad_norm 2.1901 (1.8042) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][330/625] eta 0:01:48 lr 0.001503 wd 0.0500 time 0.2601 (0.3688) data time 0.0010 (0.0131) model time 0.2591 (0.3557) loss 7.2991 (6.0870) grad_norm 1.6071 (1.8099) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][340/625] eta 0:01:43 lr 0.001503 wd 0.0500 time 0.2671 (0.3626) data time 0.0009 (0.0124) model time 0.2662 (0.3502) loss 6.3148 (6.0916) grad_norm 2.9223 (1.8321) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][350/625] eta 0:01:38 lr 0.001503 wd 0.0500 time 0.2590 (0.3569) data time 0.0010 (0.0118) model time 0.2580 (0.3452) loss 5.4867 (6.0801) grad_norm 1.8043 (1.8309) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][360/625] eta 0:01:33 lr 0.001503 wd 0.0500 time 0.2632 (0.3520) data time 0.0012 (0.0112) model time 0.2620 (0.3408) loss 6.9729 (6.0786) grad_norm 4.2287 (1.8507) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][370/625] eta 0:01:28 lr 0.001503 wd 0.0500 time 0.2764 (0.3475) data time 0.0010 (0.0107) model time 0.2754 (0.3368) loss 6.6006 (6.0775) grad_norm 1.6809 (1.8385) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][380/625] eta 0:01:24 lr 0.001502 wd 0.0500 time 0.2628 (0.3436) data time 0.0010 (0.0103) model time 0.2617 (0.3333) loss 6.5871 (6.0571) grad_norm 1.8197 (1.8416) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][390/625] eta 0:01:19 lr 0.001502 wd 0.0500 time 0.2586 (0.3401) data time 0.0011 (0.0099) model time 0.2574 (0.3302) loss 6.3437 (6.0408) grad_norm 1.6700 (1.8395) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][400/625] eta 0:01:15 lr 0.001502 wd 0.0500 time 0.2669 (0.3367) data time 0.0008 (0.0095) model time 0.2661 (0.3272) loss 6.8285 (6.0391) grad_norm 1.8910 (1.8314) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][410/625] eta 0:01:11 lr 0.001502 wd 0.0500 time 0.2598 (0.3338) data time 0.0008 (0.0092) model time 0.2589 (0.3246) loss 5.9765 (6.0270) grad_norm 1.4998 (1.8281) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][420/625] eta 0:01:07 lr 0.001502 wd 0.0500 time 0.2628 (0.3309) data time 0.0015 (0.0089) model time 0.2613 (0.3221) loss 6.9284 (6.0263) grad_norm 2.3799 (1.8287) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][430/625] eta 0:01:04 lr 0.001502 wd 0.0500 time 0.2628 (0.3287) data time 0.0008 (0.0086) model time 0.2620 (0.3201) loss 7.0518 (6.0187) grad_norm 1.8854 (1.8306) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][440/625] eta 0:01:02 lr 0.001501 wd 0.0500 time 0.2596 (0.3365) data time 0.0012 (0.0083) model time 0.2584 (0.3282) loss 6.9757 (6.0055) grad_norm 1.8201 (1.8483) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][450/625] eta 0:00:58 lr 0.001501 wd 0.0500 time 0.2595 (0.3339) data time 0.0010 (0.0081) model time 0.2586 (0.3258) loss 5.4868 (5.9919) grad_norm 1.2750 (1.8383) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][460/625] eta 0:00:56 lr 0.001501 wd 0.0500 time 0.2617 (0.3426) data time 0.0012 (0.0078) model time 0.2605 (0.3348) loss 5.3402 (5.9930) grad_norm 1.6697 (1.8300) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][470/625] eta 0:00:52 lr 0.001501 wd 0.0500 time 0.2599 (0.3400) data time 0.0011 (0.0076) model time 0.2588 (0.3323) loss 5.7923 (5.9943) grad_norm 2.0630 (1.8206) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-30 16:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][480/625] eta 0:00:48 lr 0.001501 wd 0.0500 time 0.2632 (0.3375) data time 0.0010 (0.0074) model time 0.2623 (0.3301) loss 6.3652 (5.9788) grad_norm 2.6865 (1.8168) loss_scale 4096.0000 (2054.7815) mem 9656MB [2024-07-30 16:36:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][490/625] eta 0:00:45 lr 0.001501 wd 0.0500 time 0.2600 (0.3352) data time 0.0011 (0.0073) model time 0.2589 (0.3279) loss 6.1405 (5.9795) grad_norm 1.5878 (1.8114) loss_scale 4096.0000 (2120.2051) mem 9656MB [2024-07-30 16:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][500/625] eta 0:00:41 lr 0.001501 wd 0.0500 time 0.2598 (0.3330) data time 0.0010 (0.0071) model time 0.2588 (0.3260) loss 6.4100 (5.9901) grad_norm 1.7990 (1.8113) loss_scale 4096.0000 (2181.5652) mem 9656MB [2024-07-30 16:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][510/625] eta 0:00:38 lr 0.001500 wd 0.0500 time 0.2635 (0.3310) data time 0.0010 (0.0069) model time 0.2624 (0.3241) loss 6.7909 (5.9943) grad_norm 1.6006 (1.8011) loss_scale 4096.0000 (2239.2289) mem 9656MB [2024-07-30 16:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][520/625] eta 0:00:34 lr 0.001500 wd 0.0500 time 0.2692 (0.3291) data time 0.0009 (0.0067) model time 0.2683 (0.3224) loss 6.2559 (5.9977) grad_norm 2.7678 (1.8015) loss_scale 4096.0000 (2293.5205) mem 9656MB [2024-07-30 16:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 16:36:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:36:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:45:48 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:48:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:49:10 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 113) [2024-07-30 16:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][530/625] eta 0:04:45 lr 0.001500 wd 0.0500 time 0.2533 (3.0089) data time 0.0007 (0.2461) model time 0.2526 (2.7628) loss 5.4833 (6.4776) grad_norm 1.5064 (1.5404) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][540/625] eta 0:01:15 lr 0.001500 wd 0.0500 time 0.2510 (0.8892) data time 0.0008 (0.0579) model time 0.2502 (0.8313) loss 6.1543 (6.5086) grad_norm 4.9783 (2.3281) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][550/625] eta 0:00:45 lr 0.001500 wd 0.0500 time 0.2573 (0.6132) data time 0.0006 (0.0334) model time 0.2567 (0.5798) loss 6.3056 (6.4184) grad_norm 1.2370 (2.2337) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][560/625] eta 0:00:32 lr 0.001500 wd 0.0500 time 0.2497 (0.5042) data time 0.0007 (0.0236) model time 0.2490 (0.4806) loss 7.1219 (6.4143) grad_norm 1.9452 (2.1400) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][570/625] eta 0:00:24 lr 0.001499 wd 0.0500 time 0.2618 (0.4461) data time 0.0008 (0.0184) model time 0.2610 (0.4277) loss 6.2393 (6.2730) grad_norm 1.2061 (2.0386) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][580/625] eta 0:00:18 lr 0.001499 wd 0.0500 time 0.2525 (0.4099) data time 0.0008 (0.0152) model time 0.2516 (0.3947) loss 6.1864 (6.2523) grad_norm 1.5394 (1.9124) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][590/625] eta 0:00:13 lr 0.001499 wd 0.0500 time 0.2581 (0.3854) data time 0.0008 (0.0129) model time 0.2573 (0.3725) loss 5.7706 (6.1776) grad_norm 2.1448 (1.8839) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][600/625] eta 0:00:09 lr 0.001499 wd 0.0500 time 0.2511 (0.3675) data time 0.0011 (0.0113) model time 0.2500 (0.3561) loss 6.6768 (6.1379) grad_norm 1.6209 (1.9435) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][610/625] eta 0:00:05 lr 0.001499 wd 0.0500 time 0.2569 (0.3543) data time 0.0004 (0.0102) model time 0.2565 (0.3442) loss 4.7582 (6.0966) grad_norm 1.8714 (1.9074) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][620/625] eta 0:00:01 lr 0.001499 wd 0.0500 time 0.2505 (0.3434) data time 0.0004 (0.0092) model time 0.2502 (0.3342) loss 6.8168 (6.0733) grad_norm 1.4920 (1.8910) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 113 training takes 0:00:32 [2024-07-30 16:49:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:50:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.401 (0.401) Loss 0.6587 (0.6587) Acc@1 87.061 (87.061) Acc@5 97.803 (97.803) Mem 9656MB [2024-07-30 16:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 1.0723 (0.8117) Acc@1 76.465 (82.937) Acc@5 93.994 (96.675) Mem 9656MB [2024-07-30 16:50:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.2129 (0.9699) Acc@1 71.826 (79.111) Acc@5 92.236 (94.806) Mem 9656MB [2024-07-30 16:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.969 Acc@5 94.788 [2024-07-30 16:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-30 16:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.97% [2024-07-30 16:50:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-30 16:50:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-30 16:50:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.722 (0.722) Loss 0.5728 (0.5728) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-30 16:50:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.124) Loss 0.9438 (0.7203) Acc@1 77.783 (84.495) Acc@5 95.020 (97.186) Mem 9656MB [2024-07-30 16:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.091) Loss 1.0918 (0.8617) Acc@1 73.145 (80.711) Acc@5 93.213 (95.533) Mem 9656MB [2024-07-30 16:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.378 Acc@5 95.525 [2024-07-30 16:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-30 16:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.38% [2024-07-30 16:50:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-30 16:50:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-30 16:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][0/625] eta 0:07:33 lr 0.001499 wd 0.0500 time 0.7258 (0.7258) data time 0.4215 (0.4215) model time 0.0000 (0.0000) loss 6.6265 (6.6265) grad_norm 1.1801 (1.1801) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-30 16:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][10/625] eta 0:03:03 lr 0.001498 wd 0.0500 time 0.2586 (0.2986) data time 0.0007 (0.0397) model time 0.0000 (0.0000) loss 5.0035 (6.3521) grad_norm 1.2074 (1.7662) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][20/625] eta 0:02:48 lr 0.001498 wd 0.0500 time 0.2563 (0.2784) data time 0.0010 (0.0216) model time 0.0000 (0.0000) loss 6.0792 (6.3234) grad_norm 1.2421 (1.6427) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][30/625] eta 0:02:41 lr 0.001498 wd 0.0500 time 0.2617 (0.2715) data time 0.0006 (0.0151) model time 0.0000 (0.0000) loss 6.1752 (6.1945) grad_norm 2.3504 (1.6286) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][40/625] eta 0:02:36 lr 0.001498 wd 0.0500 time 0.2481 (0.2677) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 5.3036 (6.1647) grad_norm 1.9431 (1.6706) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][50/625] eta 0:02:32 lr 0.001498 wd 0.0500 time 0.2594 (0.2659) data time 0.0006 (0.0098) model time 0.0000 (0.0000) loss 6.3333 (6.1521) grad_norm 1.7286 (1.7112) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][60/625] eta 0:02:29 lr 0.001498 wd 0.0500 time 0.2558 (0.2646) data time 0.0006 (0.0084) model time 0.2552 (0.2564) loss 5.6240 (6.1479) grad_norm 1.6560 (1.6991) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][70/625] eta 0:02:26 lr 0.001497 wd 0.0500 time 0.2599 (0.2635) data time 0.0010 (0.0074) model time 0.2589 (0.2562) loss 7.1021 (6.2030) grad_norm 1.3479 (1.6968) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][80/625] eta 0:02:23 lr 0.001497 wd 0.0500 time 0.2515 (0.2627) data time 0.0009 (0.0066) model time 0.2506 (0.2560) loss 4.8983 (6.1506) grad_norm 1.5941 (1.7389) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][90/625] eta 0:02:20 lr 0.001497 wd 0.0500 time 0.2607 (0.2620) data time 0.0007 (0.0060) model time 0.2600 (0.2559) loss 6.7591 (6.1232) grad_norm 2.3291 (1.7802) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-30 16:50:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 16:50:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:50:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-30 16:58:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-30 16:58:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-30 16:58:14 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 114) [2024-07-30 16:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-30 16:58:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][100/625] eta 1:12:27 lr 0.001497 wd 0.0500 time 8.2817 (8.2817) data time 0.8726 (0.8726) model time 7.4091 (7.4091) loss 6.6097 (6.6097) grad_norm 1.3908 (1.3908) loss_scale 4096.0000 (4096.0000) mem 10976MB [2024-07-30 16:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][110/625] eta 0:09:08 lr 0.001497 wd 0.0500 time 0.2638 (1.0659) data time 0.0009 (0.0805) model time 0.2629 (0.9854) loss 5.1688 (6.3762) grad_norm 2.8397 (2.2648) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][120/625] eta 0:05:45 lr 0.001497 wd 0.0500 time 0.2608 (0.6840) data time 0.0010 (0.0428) model time 0.2598 (0.6412) loss 5.9958 (6.2783) grad_norm 1.1426 (2.0087) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][130/625] eta 0:04:31 lr 0.001497 wd 0.0500 time 0.2677 (0.5492) data time 0.0008 (0.0295) model time 0.2669 (0.5197) loss 4.9259 (6.3241) grad_norm 1.4565 (1.8983) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][140/625] eta 0:03:52 lr 0.001496 wd 0.0500 time 0.2636 (0.4796) data time 0.0010 (0.0227) model time 0.2625 (0.4569) loss 5.1825 (6.2093) grad_norm 3.0292 (1.9236) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][150/625] eta 0:03:27 lr 0.001496 wd 0.0500 time 0.2672 (0.4376) data time 0.0007 (0.0185) model time 0.2665 (0.4191) loss 6.7407 (6.1818) grad_norm 2.1214 (1.9617) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][160/625] eta 0:03:10 lr 0.001496 wd 0.0500 time 0.2641 (0.4092) data time 0.0010 (0.0157) model time 0.2631 (0.3935) loss 5.3803 (6.1262) grad_norm 2.5390 (1.9886) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][170/625] eta 0:02:56 lr 0.001496 wd 0.0500 time 0.2621 (0.3890) data time 0.0011 (0.0137) model time 0.2611 (0.3753) loss 5.8105 (6.0960) grad_norm 1.3743 (2.0018) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][180/625] eta 0:02:46 lr 0.001496 wd 0.0500 time 0.2645 (0.3736) data time 0.0011 (0.0122) model time 0.2634 (0.3615) loss 5.1058 (6.0690) grad_norm 1.8887 (2.0138) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][190/625] eta 0:02:37 lr 0.001496 wd 0.0500 time 0.2743 (0.3619) data time 0.0010 (0.0110) model time 0.2733 (0.3509) loss 6.8070 (6.0595) grad_norm 1.3256 (1.9650) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][200/625] eta 0:02:29 lr 0.001495 wd 0.0500 time 0.2620 (0.3526) data time 0.0008 (0.0100) model time 0.2613 (0.3425) loss 6.5830 (6.0862) grad_norm 1.7098 (1.9627) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][210/625] eta 0:02:23 lr 0.001495 wd 0.0500 time 0.2606 (0.3447) data time 0.0011 (0.0093) model time 0.2595 (0.3354) loss 5.2620 (6.0808) grad_norm 1.7576 (1.9511) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][220/625] eta 0:02:16 lr 0.001495 wd 0.0500 time 0.2612 (0.3381) data time 0.0009 (0.0087) model time 0.2603 (0.3294) loss 5.2000 (6.0965) grad_norm 2.4244 (1.9387) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][230/625] eta 0:02:11 lr 0.001495 wd 0.0500 time 0.2675 (0.3326) data time 0.0010 (0.0081) model time 0.2665 (0.3245) loss 6.3576 (6.0887) grad_norm 1.8732 (1.9308) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][240/625] eta 0:02:06 lr 0.001495 wd 0.0500 time 0.2635 (0.3279) data time 0.0009 (0.0076) model time 0.2626 (0.3203) loss 6.5372 (6.0748) grad_norm 1.2288 (1.9062) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][250/625] eta 0:02:01 lr 0.001495 wd 0.0500 time 0.2592 (0.3239) data time 0.0012 (0.0072) model time 0.2580 (0.3167) loss 5.1014 (6.0626) grad_norm 1.5959 (1.8942) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][260/625] eta 0:01:56 lr 0.001495 wd 0.0500 time 0.2607 (0.3203) data time 0.0019 (0.0069) model time 0.2588 (0.3135) loss 6.9348 (6.0776) grad_norm 1.2568 (1.8970) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][270/625] eta 0:01:52 lr 0.001494 wd 0.0500 time 0.2641 (0.3172) data time 0.0012 (0.0066) model time 0.2629 (0.3106) loss 5.8653 (6.0665) grad_norm 1.8317 (1.9018) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-30 16:59:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-30 16:59:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-30 16:59:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:15:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:15:58 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:16:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:16:09 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 114) [2024-07-31 08:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][280/625] eta 0:09:48 lr 0.001494 wd 0.0500 time 0.2643 (1.7070) data time 0.0008 (0.1057) model time 0.2635 (1.6013) loss 6.9695 (6.7347) grad_norm 2.6340 (1.9591) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][290/625] eta 0:04:29 lr 0.001494 wd 0.0500 time 0.2573 (0.8045) data time 0.0011 (0.0403) model time 0.2562 (0.7642) loss 6.3378 (6.4689) grad_norm 1.1205 (2.0290) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][300/625] eta 0:03:13 lr 0.001494 wd 0.0500 time 0.2563 (0.5948) data time 0.0008 (0.0252) model time 0.2554 (0.5696) loss 5.7303 (6.3473) grad_norm 2.1060 (2.0648) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:16:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:16:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:16:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:24:19 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 114) [2024-07-31 08:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][310/625] eta 0:10:03 lr 0.001494 wd 0.0500 time 0.2579 (1.9166) data time 0.0008 (0.1775) model time 0.2571 (1.7392) loss 6.9633 (6.4967) grad_norm 1.7028 (1.8386) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][320/625] eta 0:04:08 lr 0.001494 wd 0.0500 time 0.2761 (0.8142) data time 0.0009 (0.0598) model time 0.2752 (0.7544) loss 6.7908 (6.3333) grad_norm 1.7217 (2.1093) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][330/625] eta 0:02:54 lr 0.001493 wd 0.0500 time 0.2601 (0.5927) data time 0.0009 (0.0363) model time 0.2591 (0.5564) loss 6.8035 (6.3579) grad_norm 2.4820 (2.0445) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:24:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][340/625] eta 0:02:21 lr 0.001493 wd 0.0500 time 0.2587 (0.4977) data time 0.0011 (0.0262) model time 0.2577 (0.4715) loss 6.5653 (6.3382) grad_norm 1.4333 (1.9990) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][350/625] eta 0:02:02 lr 0.001493 wd 0.0500 time 0.2621 (0.4454) data time 0.0009 (0.0206) model time 0.2612 (0.4248) loss 6.1270 (6.2619) grad_norm 1.3061 (2.0598) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][360/625] eta 0:01:49 lr 0.001493 wd 0.0500 time 0.2690 (0.4130) data time 0.0009 (0.0170) model time 0.2681 (0.3960) loss 4.0927 (6.1927) grad_norm 1.3678 (1.9920) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][370/625] eta 0:01:39 lr 0.001493 wd 0.0500 time 0.2595 (0.3902) data time 0.0009 (0.0146) model time 0.2585 (0.3756) loss 7.0244 (6.1835) grad_norm 1.8636 (1.9646) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][380/625] eta 0:01:31 lr 0.001493 wd 0.0500 time 0.2660 (0.3731) data time 0.0009 (0.0127) model time 0.2651 (0.3604) loss 5.1167 (6.0895) grad_norm 1.7356 (1.9572) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][390/625] eta 0:01:24 lr 0.001493 wd 0.0500 time 0.2733 (0.3604) data time 0.0007 (0.0114) model time 0.2726 (0.3490) loss 5.0225 (6.0643) grad_norm 2.0603 (1.9489) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][400/625] eta 0:01:18 lr 0.001492 wd 0.0500 time 0.2649 (0.3499) data time 0.0009 (0.0103) model time 0.2641 (0.3396) loss 6.5489 (6.0714) grad_norm 3.9745 (1.9567) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][410/625] eta 0:01:13 lr 0.001492 wd 0.0500 time 0.2574 (0.3416) data time 0.0010 (0.0094) model time 0.2564 (0.3322) loss 5.3122 (6.1040) grad_norm 1.8513 (1.9897) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][420/625] eta 0:01:08 lr 0.001492 wd 0.0500 time 0.2628 (0.3351) data time 0.0007 (0.0088) model time 0.2621 (0.3263) loss 4.4325 (6.0805) grad_norm 1.4533 (1.9553) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][430/625] eta 0:01:04 lr 0.001492 wd 0.0500 time 0.2645 (0.3294) data time 0.0008 (0.0082) model time 0.2637 (0.3212) loss 5.7411 (6.0928) grad_norm 1.1881 (1.9245) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][440/625] eta 0:01:00 lr 0.001492 wd 0.0500 time 0.2692 (0.3247) data time 0.0008 (0.0076) model time 0.2684 (0.3170) loss 5.5841 (6.0982) grad_norm 2.3747 (1.9098) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][450/625] eta 0:00:56 lr 0.001492 wd 0.0500 time 0.2604 (0.3205) data time 0.0011 (0.0072) model time 0.2593 (0.3133) loss 6.5348 (6.0896) grad_norm 1.1541 (1.9033) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][460/625] eta 0:00:52 lr 0.001491 wd 0.0500 time 0.2685 (0.3168) data time 0.0011 (0.0068) model time 0.2674 (0.3100) loss 7.0222 (6.0764) grad_norm 1.4355 (1.9146) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][470/625] eta 0:00:48 lr 0.001491 wd 0.0500 time 0.2618 (0.3136) data time 0.0010 (0.0064) model time 0.2608 (0.3072) loss 7.0047 (6.0785) grad_norm 1.7379 (1.9009) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][480/625] eta 0:00:45 lr 0.001491 wd 0.0500 time 0.2614 (0.3107) data time 0.0009 (0.0061) model time 0.2606 (0.3046) loss 6.2871 (6.0777) grad_norm 1.8476 (1.8919) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][490/625] eta 0:00:41 lr 0.001491 wd 0.0500 time 0.2614 (0.3081) data time 0.0011 (0.0058) model time 0.2604 (0.3023) loss 6.3926 (6.0808) grad_norm 1.8320 (1.8991) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][500/625] eta 0:00:38 lr 0.001491 wd 0.0500 time 0.2592 (0.3057) data time 0.0011 (0.0056) model time 0.2580 (0.3001) loss 4.3946 (6.0732) grad_norm 1.1932 (1.8939) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][510/625] eta 0:00:34 lr 0.001491 wd 0.0500 time 0.2559 (0.3036) data time 0.0011 (0.0054) model time 0.2548 (0.2982) loss 5.3898 (6.0568) grad_norm 1.7054 (1.8794) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][520/625] eta 0:00:31 lr 0.001491 wd 0.0500 time 0.2645 (0.3018) data time 0.0009 (0.0052) model time 0.2636 (0.2966) loss 5.3031 (6.0475) grad_norm 1.2054 (1.8799) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][530/625] eta 0:00:28 lr 0.001490 wd 0.0500 time 0.2576 (0.3001) data time 0.0009 (0.0050) model time 0.2566 (0.2951) loss 5.4263 (6.0500) grad_norm 1.3454 (1.8855) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][540/625] eta 0:00:25 lr 0.001490 wd 0.0500 time 0.2622 (0.2985) data time 0.0011 (0.0048) model time 0.2612 (0.2937) loss 7.2035 (6.0372) grad_norm 1.5793 (1.8801) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][550/625] eta 0:00:22 lr 0.001490 wd 0.0500 time 0.2643 (0.2971) data time 0.0011 (0.0047) model time 0.2633 (0.2924) loss 6.7976 (6.0466) grad_norm 2.1246 (1.8735) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][560/625] eta 0:00:19 lr 0.001490 wd 0.0500 time 0.2652 (0.2958) data time 0.0008 (0.0045) model time 0.2644 (0.2913) loss 5.0719 (6.0257) grad_norm 2.5702 (1.8768) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][570/625] eta 0:00:16 lr 0.001490 wd 0.0500 time 0.2640 (0.2946) data time 0.0009 (0.0044) model time 0.2631 (0.2902) loss 5.2091 (6.0139) grad_norm 1.3389 (1.8641) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][580/625] eta 0:00:13 lr 0.001490 wd 0.0500 time 0.2686 (0.2934) data time 0.0009 (0.0043) model time 0.2676 (0.2892) loss 6.3617 (6.0120) grad_norm 2.5698 (1.8716) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][590/625] eta 0:00:10 lr 0.001489 wd 0.0500 time 0.2660 (0.2924) data time 0.0009 (0.0041) model time 0.2651 (0.2883) loss 6.6575 (6.0130) grad_norm 1.9042 (1.8632) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][600/625] eta 0:00:07 lr 0.001489 wd 0.0500 time 0.2742 (0.2914) data time 0.0011 (0.0040) model time 0.2731 (0.2874) loss 5.8325 (6.0097) grad_norm 1.2928 (1.8477) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][610/625] eta 0:00:04 lr 0.001489 wd 0.0500 time 0.2647 (0.2906) data time 0.0007 (0.0040) model time 0.2641 (0.2866) loss 5.9131 (6.0021) grad_norm 2.5846 (1.8488) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][620/625] eta 0:00:01 lr 0.001489 wd 0.0500 time 0.2639 (0.2897) data time 0.0006 (0.0038) model time 0.2632 (0.2858) loss 7.3085 (6.0122) grad_norm 3.0690 (1.8538) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 114 training takes 0:01:32 [2024-07-31 08:26:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:26:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.6646 (0.6646) Acc@1 87.988 (87.988) Acc@5 98.242 (98.242) Mem 9656MB [2024-07-31 08:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.062 (0.097) Loss 1.0635 (0.8217) Acc@1 77.344 (83.518) Acc@5 94.287 (96.804) Mem 9656MB [2024-07-31 08:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.2080 (0.9801) Acc@1 73.975 (79.490) Acc@5 91.943 (94.840) Mem 9656MB [2024-07-31 08:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.091 Acc@5 94.802 [2024-07-31 08:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-07-31 08:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.09% [2024-07-31 08:26:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 08:26:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 08:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5742 (0.5742) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9434 (0.7209) Acc@1 77.783 (84.521) Acc@5 94.922 (97.168) Mem 9656MB [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0928 (0.8620) Acc@1 73.291 (80.759) Acc@5 93.359 (95.557) Mem 9656MB [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.422 Acc@5 95.543 [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.42% [2024-07-31 08:26:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:26:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][0/625] eta 0:12:09 lr 0.001489 wd 0.0500 time 1.1665 (1.1665) data time 0.5111 (0.5111) model time 0.0000 (0.0000) loss 6.0950 (6.0950) grad_norm 1.9086 (1.9086) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-31 08:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][10/625] eta 0:03:31 lr 0.001489 wd 0.0500 time 0.2606 (0.3433) data time 0.0007 (0.0474) model time 0.0000 (0.0000) loss 6.4499 (6.1354) grad_norm 2.8291 (1.7909) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][20/625] eta 0:03:04 lr 0.001489 wd 0.0500 time 0.2624 (0.3052) data time 0.0009 (0.0253) model time 0.0000 (0.0000) loss 6.8633 (6.0823) grad_norm 1.2287 (1.8216) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][30/625] eta 0:02:53 lr 0.001488 wd 0.0500 time 0.2660 (0.2921) data time 0.0009 (0.0174) model time 0.0000 (0.0000) loss 5.3018 (6.1149) grad_norm 1.7336 (1.7820) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][40/625] eta 0:02:46 lr 0.001488 wd 0.0500 time 0.2623 (0.2851) data time 0.0009 (0.0134) model time 0.0000 (0.0000) loss 6.8331 (6.0352) grad_norm 2.2099 (1.7884) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][50/625] eta 0:02:41 lr 0.001488 wd 0.0500 time 0.2583 (0.2808) data time 0.0010 (0.0110) model time 0.0000 (0.0000) loss 6.7600 (6.0003) grad_norm 1.9093 (1.9866) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][60/625] eta 0:02:37 lr 0.001488 wd 0.0500 time 0.2578 (0.2781) data time 0.0007 (0.0094) model time 0.2570 (0.2632) loss 5.4535 (6.0035) grad_norm 1.5893 (2.0482) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][70/625] eta 0:02:33 lr 0.001488 wd 0.0500 time 0.2642 (0.2761) data time 0.0010 (0.0082) model time 0.2631 (0.2633) loss 5.5429 (5.9589) grad_norm 1.9734 (2.0002) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][80/625] eta 0:02:29 lr 0.001488 wd 0.0500 time 0.2635 (0.2748) data time 0.0010 (0.0073) model time 0.2625 (0.2638) loss 6.8137 (6.0052) grad_norm 1.6098 (1.9877) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][90/625] eta 0:02:26 lr 0.001487 wd 0.0500 time 0.2654 (0.2736) data time 0.0007 (0.0066) model time 0.2647 (0.2635) loss 6.0639 (6.0325) grad_norm 1.7257 (1.9160) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][100/625] eta 0:02:23 lr 0.001487 wd 0.0500 time 0.2606 (0.2726) data time 0.0007 (0.0060) model time 0.2598 (0.2633) loss 6.3388 (6.0183) grad_norm 1.7607 (1.9035) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][110/625] eta 0:02:19 lr 0.001487 wd 0.0500 time 0.2644 (0.2718) data time 0.0011 (0.0056) model time 0.2634 (0.2631) loss 6.0017 (6.0607) grad_norm 2.4358 (1.9223) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][120/625] eta 0:02:16 lr 0.001487 wd 0.0500 time 0.2659 (0.2712) data time 0.0011 (0.0052) model time 0.2649 (0.2632) loss 5.5655 (6.0693) grad_norm 2.7356 (1.9873) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][130/625] eta 0:02:13 lr 0.001487 wd 0.0500 time 0.2624 (0.2705) data time 0.0007 (0.0049) model time 0.2617 (0.2629) loss 7.0086 (6.0693) grad_norm 2.2300 (2.0213) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][140/625] eta 0:02:10 lr 0.001487 wd 0.0500 time 0.2648 (0.2699) data time 0.0008 (0.0046) model time 0.2641 (0.2628) loss 5.6916 (6.0363) grad_norm 1.7221 (1.9980) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][150/625] eta 0:02:08 lr 0.001487 wd 0.0500 time 0.2642 (0.2696) data time 0.0009 (0.0044) model time 0.2633 (0.2629) loss 5.7596 (5.9985) grad_norm 1.1822 (1.9622) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][160/625] eta 0:02:05 lr 0.001486 wd 0.0500 time 0.2632 (0.2693) data time 0.0009 (0.0041) model time 0.2623 (0.2630) loss 5.1994 (5.9932) grad_norm 2.5492 (1.9830) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][170/625] eta 0:02:02 lr 0.001486 wd 0.0500 time 0.2640 (0.2690) data time 0.0007 (0.0040) model time 0.2633 (0.2630) loss 4.9992 (6.0171) grad_norm 1.7174 (1.9845) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][180/625] eta 0:01:59 lr 0.001486 wd 0.0500 time 0.2707 (0.2687) data time 0.0008 (0.0038) model time 0.2699 (0.2631) loss 5.5476 (6.0128) grad_norm 1.6166 (1.9884) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][190/625] eta 0:01:56 lr 0.001486 wd 0.0500 time 0.2610 (0.2684) data time 0.0009 (0.0036) model time 0.2601 (0.2630) loss 6.2893 (6.0262) grad_norm 1.4715 (1.9669) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][200/625] eta 0:01:54 lr 0.001486 wd 0.0500 time 0.2645 (0.2683) data time 0.0007 (0.0035) model time 0.2637 (0.2630) loss 6.1068 (6.0297) grad_norm 1.6107 (1.9454) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][210/625] eta 0:01:51 lr 0.001486 wd 0.0500 time 0.2634 (0.2694) data time 0.0010 (0.0034) model time 0.2624 (0.2648) loss 5.9211 (6.0159) grad_norm 1.2487 (1.9251) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][220/625] eta 0:01:49 lr 0.001485 wd 0.0500 time 0.2737 (0.2704) data time 0.0007 (0.0033) model time 0.2730 (0.2663) loss 6.1389 (6.0132) grad_norm 2.0651 (1.9145) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][230/625] eta 0:01:46 lr 0.001485 wd 0.0500 time 0.2630 (0.2701) data time 0.0009 (0.0032) model time 0.2621 (0.2661) loss 5.2073 (6.0146) grad_norm 1.8414 (1.8988) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][240/625] eta 0:01:43 lr 0.001485 wd 0.0500 time 0.2637 (0.2698) data time 0.0010 (0.0031) model time 0.2627 (0.2660) loss 6.9390 (6.0320) grad_norm 1.5174 (1.8861) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][250/625] eta 0:01:41 lr 0.001485 wd 0.0500 time 0.2634 (0.2696) data time 0.0009 (0.0030) model time 0.2624 (0.2658) loss 6.1292 (6.0487) grad_norm 1.9598 (1.8836) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][260/625] eta 0:01:38 lr 0.001485 wd 0.0500 time 0.2651 (0.2693) data time 0.0007 (0.0029) model time 0.2644 (0.2656) loss 5.1511 (6.0505) grad_norm 1.4050 (1.8798) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][270/625] eta 0:01:35 lr 0.001485 wd 0.0500 time 0.2794 (0.2692) data time 0.0009 (0.0028) model time 0.2785 (0.2656) loss 7.2864 (6.0555) grad_norm 1.4687 (1.8770) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][280/625] eta 0:01:32 lr 0.001485 wd 0.0500 time 0.2618 (0.2690) data time 0.0011 (0.0028) model time 0.2606 (0.2654) loss 6.8077 (6.0623) grad_norm 1.7213 (1.8733) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][290/625] eta 0:01:30 lr 0.001484 wd 0.0500 time 0.2641 (0.2689) data time 0.0008 (0.0027) model time 0.2634 (0.2654) loss 5.7123 (6.0514) grad_norm 1.0566 (1.8654) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][300/625] eta 0:01:27 lr 0.001484 wd 0.0500 time 0.2693 (0.2687) data time 0.0011 (0.0027) model time 0.2682 (0.2653) loss 6.5137 (6.0504) grad_norm 1.1335 (1.8492) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][310/625] eta 0:01:24 lr 0.001484 wd 0.0500 time 0.2618 (0.2686) data time 0.0008 (0.0026) model time 0.2610 (0.2653) loss 6.8949 (6.0574) grad_norm 1.5293 (1.8410) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][320/625] eta 0:01:21 lr 0.001484 wd 0.0500 time 0.2712 (0.2686) data time 0.0008 (0.0026) model time 0.2704 (0.2653) loss 4.7553 (6.0544) grad_norm 1.6133 (1.8421) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][330/625] eta 0:01:19 lr 0.001484 wd 0.0500 time 0.2625 (0.2684) data time 0.0009 (0.0025) model time 0.2615 (0.2652) loss 6.2713 (6.0415) grad_norm 1.9332 (1.8399) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][340/625] eta 0:01:16 lr 0.001484 wd 0.0500 time 0.2629 (0.2683) data time 0.0007 (0.0025) model time 0.2621 (0.2651) loss 5.9392 (6.0260) grad_norm 1.4705 (1.8514) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][350/625] eta 0:01:13 lr 0.001483 wd 0.0500 time 0.2675 (0.2682) data time 0.0007 (0.0024) model time 0.2668 (0.2651) loss 6.7983 (6.0314) grad_norm 2.2615 (1.8522) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][360/625] eta 0:01:11 lr 0.001483 wd 0.0500 time 0.2634 (0.2681) data time 0.0010 (0.0024) model time 0.2623 (0.2650) loss 4.7847 (6.0290) grad_norm 1.4143 (1.8641) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][370/625] eta 0:01:08 lr 0.001483 wd 0.0500 time 0.2619 (0.2680) data time 0.0007 (0.0023) model time 0.2611 (0.2650) loss 5.4161 (6.0256) grad_norm 1.9378 (1.8634) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][380/625] eta 0:01:05 lr 0.001483 wd 0.0500 time 0.2644 (0.2680) data time 0.0010 (0.0023) model time 0.2635 (0.2650) loss 5.3894 (6.0168) grad_norm 1.4178 (1.8612) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][390/625] eta 0:01:02 lr 0.001483 wd 0.0500 time 0.2735 (0.2679) data time 0.0010 (0.0023) model time 0.2725 (0.2650) loss 6.4354 (6.0246) grad_norm 1.6648 (1.8557) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][400/625] eta 0:01:00 lr 0.001483 wd 0.0500 time 0.2608 (0.2678) data time 0.0010 (0.0022) model time 0.2597 (0.2650) loss 4.4989 (6.0100) grad_norm 3.0520 (1.8554) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][410/625] eta 0:00:57 lr 0.001482 wd 0.0500 time 0.2715 (0.2677) data time 0.0010 (0.0022) model time 0.2705 (0.2649) loss 6.7122 (6.0108) grad_norm 1.9433 (1.8539) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][420/625] eta 0:00:54 lr 0.001482 wd 0.0500 time 0.2701 (0.2676) data time 0.0010 (0.0022) model time 0.2691 (0.2649) loss 6.4414 (6.0185) grad_norm 1.2399 (1.8632) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][430/625] eta 0:00:52 lr 0.001482 wd 0.0500 time 0.2664 (0.2676) data time 0.0010 (0.0022) model time 0.2654 (0.2648) loss 5.2447 (6.0149) grad_norm 1.7345 (1.8688) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][440/625] eta 0:00:49 lr 0.001482 wd 0.0500 time 0.2681 (0.2675) data time 0.0009 (0.0021) model time 0.2671 (0.2648) loss 6.4113 (6.0185) grad_norm 1.9565 (1.8687) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][450/625] eta 0:00:46 lr 0.001482 wd 0.0500 time 0.2598 (0.2674) data time 0.0009 (0.0021) model time 0.2589 (0.2647) loss 6.8912 (6.0245) grad_norm 1.2682 (1.8763) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][460/625] eta 0:00:44 lr 0.001482 wd 0.0500 time 0.2745 (0.2674) data time 0.0011 (0.0021) model time 0.2733 (0.2647) loss 6.9540 (6.0294) grad_norm 1.4294 (1.8788) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][470/625] eta 0:00:41 lr 0.001482 wd 0.0500 time 0.2650 (0.2673) data time 0.0009 (0.0021) model time 0.2641 (0.2647) loss 5.6675 (6.0279) grad_norm 1.5439 (1.8930) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][480/625] eta 0:00:38 lr 0.001481 wd 0.0500 time 0.2629 (0.2672) data time 0.0009 (0.0020) model time 0.2619 (0.2646) loss 6.0561 (6.0280) grad_norm 1.7963 (1.8962) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][490/625] eta 0:00:36 lr 0.001481 wd 0.0500 time 0.2668 (0.2671) data time 0.0007 (0.0020) model time 0.2661 (0.2646) loss 4.9013 (6.0189) grad_norm 1.5353 (1.8992) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][500/625] eta 0:00:33 lr 0.001481 wd 0.0500 time 0.2655 (0.2671) data time 0.0007 (0.0020) model time 0.2648 (0.2645) loss 4.1881 (6.0204) grad_norm 1.3902 (1.8922) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][510/625] eta 0:00:30 lr 0.001481 wd 0.0500 time 0.2631 (0.2670) data time 0.0009 (0.0020) model time 0.2622 (0.2645) loss 6.7003 (6.0191) grad_norm 1.8652 (1.8892) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][520/625] eta 0:00:28 lr 0.001481 wd 0.0500 time 0.2595 (0.2669) data time 0.0008 (0.0020) model time 0.2588 (0.2645) loss 5.1576 (6.0126) grad_norm 1.2906 (1.8952) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][530/625] eta 0:00:25 lr 0.001481 wd 0.0500 time 0.2573 (0.2669) data time 0.0008 (0.0019) model time 0.2565 (0.2645) loss 7.1244 (6.0126) grad_norm 1.2324 (1.8916) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][540/625] eta 0:00:22 lr 0.001480 wd 0.0500 time 0.2663 (0.2668) data time 0.0008 (0.0019) model time 0.2656 (0.2644) loss 5.1047 (6.0126) grad_norm 1.2170 (1.8880) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][550/625] eta 0:00:20 lr 0.001480 wd 0.0500 time 0.2626 (0.2667) data time 0.0008 (0.0019) model time 0.2618 (0.2644) loss 6.8356 (6.0190) grad_norm 1.2586 (1.8833) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][560/625] eta 0:00:17 lr 0.001480 wd 0.0500 time 0.2595 (0.2667) data time 0.0011 (0.0019) model time 0.2584 (0.2643) loss 4.7262 (6.0114) grad_norm 2.2382 (1.8797) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][570/625] eta 0:00:14 lr 0.001480 wd 0.0500 time 0.2641 (0.2667) data time 0.0010 (0.0019) model time 0.2631 (0.2644) loss 5.0076 (6.0110) grad_norm 1.1004 (1.8767) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][580/625] eta 0:00:11 lr 0.001480 wd 0.0500 time 0.2631 (0.2667) data time 0.0008 (0.0018) model time 0.2623 (0.2643) loss 6.8224 (6.0091) grad_norm 1.1863 (1.8675) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][590/625] eta 0:00:09 lr 0.001480 wd 0.0500 time 0.2665 (0.2666) data time 0.0007 (0.0018) model time 0.2658 (0.2643) loss 6.2312 (6.0146) grad_norm 1.3892 (1.8622) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][600/625] eta 0:00:06 lr 0.001480 wd 0.0500 time 0.2620 (0.2666) data time 0.0009 (0.0018) model time 0.2611 (0.2643) loss 6.8940 (6.0129) grad_norm 1.1420 (1.8727) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][610/625] eta 0:00:03 lr 0.001479 wd 0.0500 time 0.2614 (0.2666) data time 0.0008 (0.0018) model time 0.2606 (0.2643) loss 6.0708 (6.0206) grad_norm 2.2561 (1.8720) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][620/625] eta 0:00:01 lr 0.001479 wd 0.0500 time 0.2618 (0.2665) data time 0.0006 (0.0018) model time 0.2611 (0.2642) loss 5.7274 (6.0190) grad_norm 1.1548 (1.8679) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 115 training takes 0:02:46 [2024-07-31 08:29:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:29:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.6655 (0.6655) Acc@1 86.914 (86.914) Acc@5 97.900 (97.900) Mem 9655MB [2024-07-31 08:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.104) Loss 1.0586 (0.8071) Acc@1 77.002 (83.150) Acc@5 94.043 (96.653) Mem 9655MB [2024-07-31 08:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.2041 (0.9600) Acc@1 72.559 (79.278) Acc@5 92.139 (94.889) Mem 9655MB [2024-07-31 08:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.039 Acc@5 94.888 [2024-07-31 08:29:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-31 08:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.810 (0.810) Loss 0.5747 (0.5747) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 9655MB [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.136) Loss 0.9434 (0.7212) Acc@1 77.637 (84.530) Acc@5 94.971 (97.172) Mem 9655MB [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.097) Loss 1.0908 (0.8618) Acc@1 73.193 (80.762) Acc@5 93.311 (95.568) Mem 9655MB [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.430 Acc@5 95.547 [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.43% [2024-07-31 08:29:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:29:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][0/625] eta 0:08:09 lr 0.001479 wd 0.0500 time 0.7826 (0.7826) data time 0.5048 (0.5048) model time 0.0000 (0.0000) loss 6.1486 (6.1486) grad_norm 3.1333 (3.1333) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][10/625] eta 0:03:10 lr 0.001479 wd 0.0500 time 0.2606 (0.3098) data time 0.0012 (0.0469) model time 0.0000 (0.0000) loss 6.1749 (5.6789) grad_norm 2.0032 (1.9164) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][20/625] eta 0:02:54 lr 0.001479 wd 0.0500 time 0.2740 (0.2885) data time 0.0009 (0.0250) model time 0.0000 (0.0000) loss 5.9153 (5.7798) grad_norm 1.5752 (1.8127) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][30/625] eta 0:02:46 lr 0.001479 wd 0.0500 time 0.2656 (0.2806) data time 0.0007 (0.0173) model time 0.0000 (0.0000) loss 6.4950 (5.9615) grad_norm 2.2629 (1.7517) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][40/625] eta 0:02:41 lr 0.001479 wd 0.0500 time 0.2638 (0.2761) data time 0.0010 (0.0133) model time 0.0000 (0.0000) loss 6.4637 (5.9216) grad_norm 1.2480 (1.7931) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 08:29:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:29:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:29:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:33:45 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 116) [2024-07-31 08:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][50/625] eta 0:18:08 lr 0.001478 wd 0.0500 time 0.2520 (1.8926) data time 0.0007 (0.1318) model time 0.0000 (0.0000) loss 6.7348 (6.4995) grad_norm 2.1070 (2.0694) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][60/625] eta 0:07:31 lr 0.001478 wd 0.0500 time 0.2540 (0.7990) data time 0.0007 (0.0446) model time 0.2533 (0.2512) loss 6.6063 (6.3783) grad_norm 1.1920 (1.7166) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][70/625] eta 0:05:22 lr 0.001478 wd 0.0500 time 0.2506 (0.5805) data time 0.0014 (0.0272) model time 0.2492 (0.2514) loss 6.2820 (6.3742) grad_norm 1.8274 (1.6739) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][80/625] eta 0:04:25 lr 0.001478 wd 0.0500 time 0.2643 (0.4875) data time 0.0009 (0.0197) model time 0.2634 (0.2523) loss 6.0961 (6.2818) grad_norm 1.2636 (1.6917) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][90/625] eta 0:03:53 lr 0.001478 wd 0.0500 time 0.2554 (0.4357) data time 0.0009 (0.0156) model time 0.2544 (0.2525) loss 6.2859 (6.2477) grad_norm 2.2763 (1.8067) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][100/625] eta 0:03:31 lr 0.001478 wd 0.0500 time 0.2601 (0.4027) data time 0.0006 (0.0129) model time 0.2595 (0.2527) loss 4.9661 (6.1558) grad_norm 2.2777 (1.8240) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][110/625] eta 0:03:15 lr 0.001477 wd 0.0500 time 0.2544 (0.3801) data time 0.0011 (0.0111) model time 0.2534 (0.2530) loss 7.0334 (6.1177) grad_norm 2.0357 (1.8475) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][120/625] eta 0:03:03 lr 0.001477 wd 0.0500 time 0.2560 (0.3634) data time 0.0009 (0.0097) model time 0.2552 (0.2531) loss 5.5703 (6.0637) grad_norm 1.9778 (1.8106) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][130/625] eta 0:02:53 lr 0.001477 wd 0.0500 time 0.2648 (0.3505) data time 0.0007 (0.0087) model time 0.2641 (0.2532) loss 5.4002 (6.0153) grad_norm 1.1171 (1.7710) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][140/625] eta 0:02:45 lr 0.001477 wd 0.0500 time 0.2528 (0.3406) data time 0.0009 (0.0079) model time 0.2519 (0.2534) loss 7.0326 (6.0071) grad_norm 2.0482 (1.7620) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][150/625] eta 0:02:37 lr 0.001477 wd 0.0500 time 0.2581 (0.3325) data time 0.0008 (0.0072) model time 0.2572 (0.2535) loss 5.1748 (6.0320) grad_norm 3.5611 (1.8148) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][160/625] eta 0:02:31 lr 0.001477 wd 0.0500 time 0.2562 (0.3257) data time 0.0007 (0.0069) model time 0.2555 (0.2533) loss 4.9956 (6.0133) grad_norm 1.3991 (1.8333) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][170/625] eta 0:02:25 lr 0.001476 wd 0.0500 time 0.2553 (0.3201) data time 0.0007 (0.0064) model time 0.2546 (0.2534) loss 5.6681 (6.0294) grad_norm 2.0980 (1.8317) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][180/625] eta 0:02:20 lr 0.001476 wd 0.0500 time 0.2525 (0.3153) data time 0.0007 (0.0060) model time 0.2518 (0.2535) loss 6.3171 (6.0415) grad_norm 1.5785 (1.8442) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][190/625] eta 0:02:15 lr 0.001476 wd 0.0500 time 0.2567 (0.3113) data time 0.0009 (0.0057) model time 0.2559 (0.2537) loss 6.1508 (6.0409) grad_norm 2.3814 (1.8436) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][200/625] eta 0:02:10 lr 0.001476 wd 0.0500 time 0.2611 (0.3078) data time 0.0009 (0.0054) model time 0.2602 (0.2538) loss 7.4380 (6.0338) grad_norm 1.6308 (1.8553) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][210/625] eta 0:02:06 lr 0.001476 wd 0.0500 time 0.2564 (0.3049) data time 0.0009 (0.0052) model time 0.2555 (0.2541) loss 7.3951 (6.0512) grad_norm 1.4749 (1.8451) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][220/625] eta 0:02:02 lr 0.001476 wd 0.0500 time 0.2541 (0.3020) data time 0.0008 (0.0049) model time 0.2533 (0.2540) loss 6.4368 (6.0479) grad_norm 1.2211 (1.8358) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][230/625] eta 0:01:58 lr 0.001476 wd 0.0500 time 0.2537 (0.2996) data time 0.0009 (0.0047) model time 0.2528 (0.2542) loss 6.2809 (6.0424) grad_norm 1.9942 (1.8262) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][240/625] eta 0:01:54 lr 0.001475 wd 0.0500 time 0.2570 (0.2974) data time 0.0011 (0.0045) model time 0.2560 (0.2542) loss 4.8162 (6.0283) grad_norm 1.4542 (1.8422) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][250/625] eta 0:01:50 lr 0.001475 wd 0.0500 time 0.2566 (0.2954) data time 0.0009 (0.0043) model time 0.2556 (0.2543) loss 5.2186 (6.0076) grad_norm 2.0042 (1.8437) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][260/625] eta 0:01:47 lr 0.001475 wd 0.0500 time 0.2566 (0.2935) data time 0.0010 (0.0042) model time 0.2556 (0.2543) loss 5.8479 (5.9993) grad_norm 1.2637 (1.8268) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][270/625] eta 0:01:43 lr 0.001475 wd 0.0500 time 0.2567 (0.2918) data time 0.0006 (0.0040) model time 0.2560 (0.2543) loss 5.5301 (6.0096) grad_norm 1.4794 (1.8189) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][280/625] eta 0:01:40 lr 0.001475 wd 0.0500 time 0.2553 (0.2903) data time 0.0009 (0.0039) model time 0.2544 (0.2543) loss 6.6179 (5.9944) grad_norm 2.0091 (1.8354) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][290/625] eta 0:01:36 lr 0.001475 wd 0.0500 time 0.2574 (0.2889) data time 0.0009 (0.0038) model time 0.2565 (0.2544) loss 6.4415 (6.0067) grad_norm 1.8613 (1.8299) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][300/625] eta 0:01:33 lr 0.001474 wd 0.0500 time 0.2575 (0.2877) data time 0.0007 (0.0037) model time 0.2568 (0.2545) loss 5.4291 (6.0054) grad_norm 3.7930 (1.8533) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][310/625] eta 0:01:30 lr 0.001474 wd 0.0500 time 0.2514 (0.2865) data time 0.0007 (0.0036) model time 0.2507 (0.2545) loss 5.4832 (5.9948) grad_norm 2.4244 (1.8814) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][320/625] eta 0:01:27 lr 0.001474 wd 0.0500 time 0.2631 (0.2855) data time 0.0009 (0.0035) model time 0.2622 (0.2546) loss 6.2406 (5.9887) grad_norm 1.5933 (1.8694) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][330/625] eta 0:01:24 lr 0.001474 wd 0.0500 time 0.2596 (0.2849) data time 0.0008 (0.0034) model time 0.2588 (0.2551) loss 6.4602 (5.9921) grad_norm 2.0658 (1.8695) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][340/625] eta 0:01:20 lr 0.001474 wd 0.0500 time 0.2664 (0.2840) data time 0.0009 (0.0033) model time 0.2655 (0.2551) loss 6.8198 (5.9920) grad_norm 1.9812 (1.8693) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][350/625] eta 0:01:17 lr 0.001474 wd 0.0500 time 0.2530 (0.2830) data time 0.0009 (0.0032) model time 0.2521 (0.2551) loss 5.0397 (5.9810) grad_norm 1.2616 (1.8741) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][360/625] eta 0:01:14 lr 0.001473 wd 0.0500 time 0.3065 (0.2825) data time 0.0009 (0.0032) model time 0.3057 (0.2553) loss 7.0714 (5.9938) grad_norm 1.1912 (1.8726) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][370/625] eta 0:01:11 lr 0.001473 wd 0.0500 time 0.2547 (0.2817) data time 0.0007 (0.0032) model time 0.2539 (0.2554) loss 7.0387 (6.0127) grad_norm 3.1197 (1.8788) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][380/625] eta 0:01:08 lr 0.001473 wd 0.0500 time 0.2555 (0.2811) data time 0.0006 (0.0031) model time 0.2549 (0.2555) loss 6.0702 (6.0120) grad_norm 2.4613 (1.8707) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][390/625] eta 0:01:05 lr 0.001473 wd 0.0500 time 0.2556 (0.2804) data time 0.0006 (0.0030) model time 0.2550 (0.2555) loss 5.2389 (6.0129) grad_norm 2.1237 (1.8770) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][400/625] eta 0:01:02 lr 0.001473 wd 0.0500 time 0.2553 (0.2799) data time 0.0009 (0.0030) model time 0.2544 (0.2557) loss 5.3412 (6.0201) grad_norm 1.9693 (1.8829) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][410/625] eta 0:01:00 lr 0.001473 wd 0.0500 time 0.2534 (0.2793) data time 0.0008 (0.0030) model time 0.2526 (0.2558) loss 5.9875 (6.0158) grad_norm 1.6225 (1.8780) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][420/625] eta 0:00:57 lr 0.001473 wd 0.0500 time 0.2920 (0.2789) data time 0.0007 (0.0029) model time 0.2913 (0.2560) loss 5.0825 (6.0149) grad_norm 1.2057 (1.8814) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 08:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:35:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:35:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:38:51 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 116) [2024-07-31 08:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][430/625] eta 0:04:08 lr 0.001472 wd 0.0500 time 0.2533 (1.2767) data time 0.0011 (0.1754) model time 0.2522 (1.1013) loss 6.5600 (6.6731) grad_norm 2.1548 (2.2425) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][440/625] eta 0:02:21 lr 0.001472 wd 0.0500 time 0.2557 (0.7643) data time 0.0006 (0.0882) model time 0.2551 (0.6762) loss 6.7282 (6.4423) grad_norm 2.4448 (2.2052) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][450/625] eta 0:01:43 lr 0.001472 wd 0.0500 time 0.2542 (0.5938) data time 0.0008 (0.0591) model time 0.2533 (0.5347) loss 6.3306 (6.4543) grad_norm 1.2194 (2.0414) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][460/625] eta 0:01:23 lr 0.001472 wd 0.0500 time 0.2495 (0.5084) data time 0.0008 (0.0446) model time 0.2487 (0.4638) loss 5.4257 (6.3227) grad_norm 1.8701 (2.1130) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][470/625] eta 0:01:10 lr 0.001472 wd 0.0500 time 0.2553 (0.4571) data time 0.0010 (0.0359) model time 0.2543 (0.4212) loss 4.7217 (6.2642) grad_norm 2.5818 (2.1248) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][480/625] eta 0:01:01 lr 0.001472 wd 0.0500 time 0.2498 (0.4232) data time 0.0007 (0.0301) model time 0.2491 (0.3932) loss 6.2839 (6.2151) grad_norm 1.6629 (2.0784) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][490/625] eta 0:00:53 lr 0.001471 wd 0.0500 time 0.2481 (0.3988) data time 0.0006 (0.0259) model time 0.2475 (0.3729) loss 4.5163 (6.1624) grad_norm 1.3553 (1.9996) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][500/625] eta 0:00:47 lr 0.001471 wd 0.0500 time 0.2511 (0.3805) data time 0.0020 (0.0228) model time 0.2490 (0.3577) loss 6.7728 (6.1420) grad_norm 1.9669 (1.9603) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][510/625] eta 0:00:42 lr 0.001471 wd 0.0500 time 0.2525 (0.3664) data time 0.0008 (0.0204) model time 0.2517 (0.3460) loss 7.0504 (6.1298) grad_norm 1.3048 (1.8984) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][520/625] eta 0:00:37 lr 0.001471 wd 0.0500 time 0.2558 (0.3551) data time 0.0012 (0.0185) model time 0.2546 (0.3366) loss 6.7757 (6.1255) grad_norm 1.2402 (1.8714) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][530/625] eta 0:00:32 lr 0.001471 wd 0.0500 time 0.2560 (0.3457) data time 0.0009 (0.0169) model time 0.2551 (0.3288) loss 5.2867 (6.1236) grad_norm 2.5042 (1.8506) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][540/625] eta 0:00:28 lr 0.001471 wd 0.0500 time 0.2492 (0.3380) data time 0.0007 (0.0156) model time 0.2485 (0.3224) loss 6.4679 (6.1146) grad_norm 1.3056 (1.8860) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][550/625] eta 0:00:24 lr 0.001470 wd 0.0500 time 0.2491 (0.3315) data time 0.0009 (0.0144) model time 0.2482 (0.3171) loss 5.8052 (6.0866) grad_norm 1.4810 (1.8717) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][560/625] eta 0:00:21 lr 0.001470 wd 0.0500 time 0.2520 (0.3260) data time 0.0008 (0.0135) model time 0.2512 (0.3125) loss 4.3789 (6.0858) grad_norm 1.7218 (1.8420) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][570/625] eta 0:00:17 lr 0.001470 wd 0.0500 time 0.2564 (0.3212) data time 0.0008 (0.0127) model time 0.2555 (0.3086) loss 7.2194 (6.0832) grad_norm 1.2508 (1.8218) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][580/625] eta 0:00:14 lr 0.001470 wd 0.0500 time 0.2523 (0.3170) data time 0.0009 (0.0119) model time 0.2514 (0.3051) loss 6.5722 (6.0813) grad_norm 1.5120 (1.8067) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][590/625] eta 0:00:10 lr 0.001470 wd 0.0500 time 0.2548 (0.3132) data time 0.0007 (0.0113) model time 0.2541 (0.3020) loss 5.4251 (6.0922) grad_norm 2.7432 (1.8044) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][600/625] eta 0:00:07 lr 0.001470 wd 0.0500 time 0.2529 (0.3100) data time 0.0008 (0.0107) model time 0.2521 (0.2993) loss 4.7417 (6.0729) grad_norm 2.2479 (1.8170) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 08:40:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][610/625] eta 0:00:04 lr 0.001470 wd 0.0500 time 0.2522 (0.3071) data time 0.0004 (0.0102) model time 0.2518 (0.2969) loss 5.0818 (6.0791) grad_norm 1.2141 (1.8506) loss_scale 8192.0000 (4225.3474) mem 9652MB [2024-07-31 08:40:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][620/625] eta 0:00:01 lr 0.001469 wd 0.0500 time 0.2507 (0.3043) data time 0.0005 (0.0097) model time 0.2502 (0.2946) loss 5.9134 (6.0578) grad_norm 1.5415 (1.8519) loss_scale 8192.0000 (4423.6800) mem 9652MB [2024-07-31 08:40:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 116 training takes 0:01:01 [2024-07-31 08:40:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:40:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.410 (0.410) Loss 0.6685 (0.6685) Acc@1 86.816 (86.816) Acc@5 98.145 (98.145) Mem 9652MB [2024-07-31 08:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.0518 (0.8050) Acc@1 75.830 (82.622) Acc@5 94.189 (96.693) Mem 9652MB [2024-07-31 08:40:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.1797 (0.9519) Acc@1 73.291 (79.076) Acc@5 92.432 (94.985) Mem 9652MB [2024-07-31 08:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.829 Acc@5 94.948 [2024-07-31 08:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-31 08:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.718 (0.718) Loss 0.5752 (0.5752) Acc@1 88.379 (88.379) Acc@5 98.438 (98.438) Mem 9652MB [2024-07-31 08:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.124) Loss 0.9429 (0.7211) Acc@1 77.490 (84.535) Acc@5 94.971 (97.159) Mem 9652MB [2024-07-31 08:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.091) Loss 1.0898 (0.8613) Acc@1 73.291 (80.808) Acc@5 93.359 (95.571) Mem 9652MB [2024-07-31 08:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.474 Acc@5 95.551 [2024-07-31 08:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-31 08:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.47% [2024-07-31 08:40:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:40:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][0/625] eta 0:07:31 lr 0.001469 wd 0.0500 time 0.7226 (0.7226) data time 0.3422 (0.3422) model time 0.0000 (0.0000) loss 6.0973 (6.0973) grad_norm 2.0934 (2.0934) loss_scale 8192.0000 (8192.0000) mem 9651MB [2024-07-31 08:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][10/625] eta 0:03:02 lr 0.001469 wd 0.0500 time 0.2557 (0.2968) data time 0.0009 (0.0320) model time 0.0000 (0.0000) loss 5.4163 (5.8090) grad_norm 1.8177 (1.6308) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][20/625] eta 0:02:47 lr 0.001469 wd 0.0500 time 0.2536 (0.2768) data time 0.0007 (0.0172) model time 0.0000 (0.0000) loss 6.1799 (5.9456) grad_norm 1.3632 (1.6036) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][30/625] eta 0:02:40 lr 0.001469 wd 0.0500 time 0.2781 (0.2703) data time 0.0009 (0.0120) model time 0.0000 (0.0000) loss 6.5793 (5.9014) grad_norm 2.6377 (1.7004) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][40/625] eta 0:02:35 lr 0.001469 wd 0.0500 time 0.2553 (0.2665) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 6.0593 (5.9635) grad_norm 1.9840 (1.7308) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][50/625] eta 0:02:31 lr 0.001469 wd 0.0500 time 0.2513 (0.2640) data time 0.0007 (0.0077) model time 0.0000 (0.0000) loss 5.6452 (5.9186) grad_norm 1.2620 (1.6877) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][60/625] eta 0:02:28 lr 0.001468 wd 0.0500 time 0.2504 (0.2624) data time 0.0008 (0.0066) model time 0.2496 (0.2532) loss 4.8580 (5.8822) grad_norm 1.7362 (1.6826) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][70/625] eta 0:02:25 lr 0.001468 wd 0.0500 time 0.2531 (0.2613) data time 0.0009 (0.0058) model time 0.2522 (0.2536) loss 6.4585 (5.8895) grad_norm 2.5286 (1.7365) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][80/625] eta 0:02:21 lr 0.001468 wd 0.0500 time 0.2524 (0.2605) data time 0.0009 (0.0052) model time 0.2515 (0.2535) loss 6.7807 (5.8964) grad_norm 2.0971 (1.7121) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][90/625] eta 0:02:19 lr 0.001468 wd 0.0500 time 0.2505 (0.2598) data time 0.0011 (0.0047) model time 0.2495 (0.2535) loss 6.2994 (5.8859) grad_norm 2.3283 (1.7623) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][100/625] eta 0:02:16 lr 0.001468 wd 0.0500 time 0.2645 (0.2595) data time 0.0011 (0.0044) model time 0.2633 (0.2538) loss 5.4730 (5.8498) grad_norm 2.4340 (1.8248) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][110/625] eta 0:02:13 lr 0.001468 wd 0.0500 time 0.2547 (0.2591) data time 0.0009 (0.0041) model time 0.2537 (0.2539) loss 6.9793 (5.8875) grad_norm 1.8970 (1.8292) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][120/625] eta 0:02:10 lr 0.001467 wd 0.0500 time 0.2573 (0.2589) data time 0.0007 (0.0038) model time 0.2566 (0.2541) loss 6.3247 (5.9239) grad_norm 1.2846 (1.8354) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][130/625] eta 0:02:07 lr 0.001467 wd 0.0500 time 0.2538 (0.2586) data time 0.0008 (0.0036) model time 0.2530 (0.2541) loss 6.3700 (5.9229) grad_norm 2.2094 (1.8561) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][140/625] eta 0:02:05 lr 0.001467 wd 0.0500 time 0.2596 (0.2583) data time 0.0006 (0.0034) model time 0.2589 (0.2540) loss 5.0431 (5.9403) grad_norm 1.3931 (1.8806) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][150/625] eta 0:02:02 lr 0.001467 wd 0.0500 time 0.2571 (0.2581) data time 0.0010 (0.0033) model time 0.2561 (0.2541) loss 5.4667 (5.9587) grad_norm 1.9168 (1.8716) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][160/625] eta 0:01:59 lr 0.001467 wd 0.0500 time 0.2619 (0.2579) data time 0.0008 (0.0031) model time 0.2611 (0.2540) loss 6.3914 (5.9498) grad_norm 1.1812 (1.8644) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:40:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][170/625] eta 0:01:57 lr 0.001467 wd 0.0500 time 0.2595 (0.2577) data time 0.0006 (0.0030) model time 0.2589 (0.2540) loss 5.1292 (5.9551) grad_norm 1.3963 (1.8869) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][180/625] eta 0:01:54 lr 0.001466 wd 0.0500 time 0.2588 (0.2576) data time 0.0007 (0.0029) model time 0.2581 (0.2541) loss 4.8809 (5.9369) grad_norm 1.3013 (1.8637) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][190/625] eta 0:01:52 lr 0.001466 wd 0.0500 time 0.2540 (0.2575) data time 0.0010 (0.0028) model time 0.2530 (0.2541) loss 6.5927 (5.9610) grad_norm 1.7434 (1.8616) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][200/625] eta 0:01:49 lr 0.001466 wd 0.0500 time 0.2529 (0.2575) data time 0.0008 (0.0027) model time 0.2521 (0.2542) loss 6.0474 (5.9790) grad_norm 3.3897 (1.9209) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][210/625] eta 0:01:47 lr 0.001466 wd 0.0500 time 0.5254 (0.2587) data time 0.0009 (0.0026) model time 0.5246 (0.2559) loss 6.3269 (5.9915) grad_norm 1.9731 (1.9025) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][220/625] eta 0:01:44 lr 0.001466 wd 0.0500 time 0.2577 (0.2584) data time 0.0006 (0.0026) model time 0.2571 (0.2558) loss 5.8116 (5.9851) grad_norm 1.7103 (1.8934) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][230/625] eta 0:01:42 lr 0.001466 wd 0.0500 time 0.2560 (0.2584) data time 0.0008 (0.0025) model time 0.2553 (0.2558) loss 5.9309 (6.0035) grad_norm 2.3794 (1.8815) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][240/625] eta 0:01:39 lr 0.001466 wd 0.0500 time 0.2543 (0.2583) data time 0.0010 (0.0024) model time 0.2534 (0.2558) loss 5.2561 (6.0101) grad_norm 1.2989 (1.8803) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][250/625] eta 0:01:36 lr 0.001465 wd 0.0500 time 0.2540 (0.2582) data time 0.0008 (0.0024) model time 0.2533 (0.2558) loss 5.4440 (6.0011) grad_norm 1.4886 (1.8639) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][260/625] eta 0:01:34 lr 0.001465 wd 0.0500 time 0.2591 (0.2582) data time 0.0005 (0.0023) model time 0.2586 (0.2557) loss 5.6936 (5.9972) grad_norm 1.3128 (1.8598) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][270/625] eta 0:01:31 lr 0.001465 wd 0.0500 time 0.2549 (0.2581) data time 0.0008 (0.0023) model time 0.2540 (0.2557) loss 5.8990 (5.9828) grad_norm 1.5438 (1.8486) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][280/625] eta 0:01:29 lr 0.001465 wd 0.0500 time 0.2558 (0.2580) data time 0.0008 (0.0022) model time 0.2550 (0.2556) loss 7.0715 (5.9892) grad_norm 1.9184 (1.8430) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][290/625] eta 0:01:26 lr 0.001465 wd 0.0500 time 0.2516 (0.2588) data time 0.0007 (0.0022) model time 0.2509 (0.2567) loss 5.8732 (5.9840) grad_norm 1.6133 (1.8800) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][300/625] eta 0:01:24 lr 0.001465 wd 0.0500 time 0.2553 (0.2587) data time 0.0009 (0.0021) model time 0.2544 (0.2567) loss 6.3636 (5.9807) grad_norm 1.6257 (1.8766) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][310/625] eta 0:01:21 lr 0.001464 wd 0.0500 time 0.2603 (0.2587) data time 0.0009 (0.0021) model time 0.2595 (0.2567) loss 5.8508 (5.9985) grad_norm 2.4448 (1.8764) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][320/625] eta 0:01:18 lr 0.001464 wd 0.0500 time 0.2544 (0.2587) data time 0.0007 (0.0021) model time 0.2537 (0.2566) loss 6.9276 (5.9869) grad_norm 2.6944 (1.8800) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][330/625] eta 0:01:16 lr 0.001464 wd 0.0500 time 0.2591 (0.2586) data time 0.0008 (0.0020) model time 0.2583 (0.2566) loss 4.9008 (5.9797) grad_norm 1.4197 (1.8857) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][340/625] eta 0:01:13 lr 0.001464 wd 0.0500 time 0.2589 (0.2585) data time 0.0009 (0.0020) model time 0.2580 (0.2565) loss 5.2250 (5.9835) grad_norm 2.4033 (1.8881) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][350/625] eta 0:01:11 lr 0.001464 wd 0.0500 time 0.2560 (0.2584) data time 0.0010 (0.0020) model time 0.2550 (0.2565) loss 6.8179 (5.9966) grad_norm 2.3118 (1.8808) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][360/625] eta 0:01:08 lr 0.001464 wd 0.0500 time 0.2542 (0.2583) data time 0.0011 (0.0020) model time 0.2531 (0.2564) loss 6.6537 (6.0050) grad_norm 1.5558 (1.8730) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][370/625] eta 0:01:05 lr 0.001463 wd 0.0500 time 0.2573 (0.2582) data time 0.0009 (0.0019) model time 0.2564 (0.2563) loss 6.4420 (6.0054) grad_norm 1.5719 (1.8759) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][380/625] eta 0:01:03 lr 0.001463 wd 0.0500 time 0.2530 (0.2581) data time 0.0009 (0.0019) model time 0.2522 (0.2562) loss 6.2469 (6.0062) grad_norm 2.0254 (1.8714) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][390/625] eta 0:01:00 lr 0.001463 wd 0.0500 time 0.2487 (0.2581) data time 0.0008 (0.0019) model time 0.2480 (0.2561) loss 6.2006 (6.0102) grad_norm 1.5611 (1.8750) loss_scale 8192.0000 (8192.0000) mem 9658MB [2024-07-31 08:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][400/625] eta 0:00:58 lr 0.001463 wd 0.0500 time 0.2585 (0.2580) data time 0.0008 (0.0019) model time 0.2577 (0.2562) loss 6.8536 (6.0028) grad_norm 1.4918 (inf) loss_scale 4096.0000 (8140.9277) mem 9658MB [2024-07-31 08:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][410/625] eta 0:00:55 lr 0.001463 wd 0.0500 time 0.2556 (0.2580) data time 0.0007 (0.0018) model time 0.2548 (0.2561) loss 6.9601 (6.0047) grad_norm 1.3210 (inf) loss_scale 4096.0000 (8042.5109) mem 9658MB [2024-07-31 08:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][420/625] eta 0:00:52 lr 0.001463 wd 0.0500 time 0.2573 (0.2579) data time 0.0010 (0.0018) model time 0.2563 (0.2561) loss 6.7212 (6.0111) grad_norm 1.6168 (inf) loss_scale 4096.0000 (7948.7696) mem 9658MB [2024-07-31 08:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][430/625] eta 0:00:50 lr 0.001463 wd 0.0500 time 0.2534 (0.2579) data time 0.0009 (0.0018) model time 0.2525 (0.2560) loss 5.3856 (6.0105) grad_norm 1.2987 (inf) loss_scale 4096.0000 (7859.3782) mem 9658MB [2024-07-31 08:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][440/625] eta 0:00:47 lr 0.001462 wd 0.0500 time 0.2573 (0.2578) data time 0.0014 (0.0018) model time 0.2559 (0.2560) loss 4.8480 (6.0009) grad_norm 1.2304 (inf) loss_scale 4096.0000 (7774.0408) mem 9658MB [2024-07-31 08:42:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][450/625] eta 0:00:45 lr 0.001462 wd 0.0500 time 0.2530 (0.2577) data time 0.0011 (0.0018) model time 0.2519 (0.2559) loss 5.5733 (6.0018) grad_norm 3.9434 (inf) loss_scale 4096.0000 (7692.4878) mem 9658MB [2024-07-31 08:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][460/625] eta 0:00:42 lr 0.001462 wd 0.0500 time 0.2569 (0.2577) data time 0.0007 (0.0018) model time 0.2563 (0.2558) loss 5.7129 (5.9929) grad_norm 2.8050 (inf) loss_scale 2048.0000 (7605.5879) mem 9658MB [2024-07-31 08:42:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][470/625] eta 0:00:39 lr 0.001462 wd 0.0500 time 0.2497 (0.2576) data time 0.0009 (0.0018) model time 0.2488 (0.2558) loss 6.1769 (5.9965) grad_norm 1.3501 (inf) loss_scale 2048.0000 (7487.5924) mem 9658MB [2024-07-31 08:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][480/625] eta 0:00:37 lr 0.001462 wd 0.0500 time 0.2564 (0.2576) data time 0.0008 (0.0017) model time 0.2555 (0.2558) loss 6.2306 (5.9965) grad_norm 1.9901 (inf) loss_scale 2048.0000 (7374.5031) mem 9658MB [2024-07-31 08:42:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][490/625] eta 0:00:34 lr 0.001462 wd 0.0500 time 0.2545 (0.2575) data time 0.0010 (0.0017) model time 0.2535 (0.2557) loss 5.4090 (5.9938) grad_norm 2.7279 (inf) loss_scale 2048.0000 (7266.0204) mem 9658MB [2024-07-31 08:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][500/625] eta 0:00:32 lr 0.001461 wd 0.0500 time 0.2546 (0.2575) data time 0.0007 (0.0017) model time 0.2540 (0.2557) loss 6.7834 (5.9902) grad_norm 2.1903 (inf) loss_scale 2048.0000 (7161.8683) mem 9658MB [2024-07-31 08:42:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][510/625] eta 0:00:29 lr 0.001461 wd 0.0500 time 0.2539 (0.2574) data time 0.0010 (0.0017) model time 0.2529 (0.2557) loss 6.4754 (5.9836) grad_norm 2.7638 (inf) loss_scale 2048.0000 (7061.7926) mem 9658MB [2024-07-31 08:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][520/625] eta 0:00:27 lr 0.001461 wd 0.0500 time 0.2571 (0.2574) data time 0.0008 (0.0017) model time 0.2563 (0.2556) loss 5.5390 (5.9832) grad_norm 1.2944 (inf) loss_scale 2048.0000 (6965.5585) mem 9658MB [2024-07-31 08:42:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][530/625] eta 0:00:24 lr 0.001461 wd 0.0500 time 0.2711 (0.2574) data time 0.0007 (0.0017) model time 0.2704 (0.2557) loss 5.9822 (5.9879) grad_norm 2.1963 (inf) loss_scale 2048.0000 (6872.9492) mem 9658MB [2024-07-31 08:42:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][540/625] eta 0:00:21 lr 0.001461 wd 0.0500 time 0.2522 (0.2574) data time 0.0009 (0.0016) model time 0.2513 (0.2556) loss 6.0306 (5.9927) grad_norm 1.3567 (inf) loss_scale 2048.0000 (6783.7634) mem 9658MB [2024-07-31 08:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][550/625] eta 0:00:19 lr 0.001461 wd 0.0500 time 0.2545 (0.2573) data time 0.0010 (0.0016) model time 0.2535 (0.2556) loss 4.7599 (5.9891) grad_norm 1.7023 (inf) loss_scale 2048.0000 (6697.8149) mem 9658MB [2024-07-31 08:42:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][560/625] eta 0:00:16 lr 0.001460 wd 0.0500 time 0.2571 (0.2573) data time 0.0006 (0.0016) model time 0.2565 (0.2556) loss 6.0482 (5.9947) grad_norm 2.2532 (inf) loss_scale 2048.0000 (6614.9305) mem 9658MB [2024-07-31 08:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][570/625] eta 0:00:14 lr 0.001460 wd 0.0500 time 0.2561 (0.2573) data time 0.0010 (0.0016) model time 0.2552 (0.2556) loss 4.6596 (5.9980) grad_norm 1.5796 (inf) loss_scale 2048.0000 (6534.9492) mem 9658MB [2024-07-31 08:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][580/625] eta 0:00:11 lr 0.001460 wd 0.0500 time 0.2517 (0.2572) data time 0.0010 (0.0016) model time 0.2507 (0.2556) loss 5.2771 (5.9991) grad_norm 4.8290 (inf) loss_scale 2048.0000 (6457.7212) mem 9658MB [2024-07-31 08:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][590/625] eta 0:00:09 lr 0.001460 wd 0.0500 time 0.2586 (0.2572) data time 0.0006 (0.0016) model time 0.2579 (0.2556) loss 5.4820 (6.0013) grad_norm 1.2984 (inf) loss_scale 2048.0000 (6383.1066) mem 9658MB [2024-07-31 08:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][600/625] eta 0:00:06 lr 0.001460 wd 0.0500 time 0.2534 (0.2572) data time 0.0009 (0.0016) model time 0.2525 (0.2556) loss 6.8823 (5.9995) grad_norm 1.1506 (inf) loss_scale 2048.0000 (6310.9750) mem 9658MB [2024-07-31 08:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][610/625] eta 0:00:03 lr 0.001460 wd 0.0500 time 0.2525 (0.2572) data time 0.0004 (0.0016) model time 0.2521 (0.2556) loss 6.1331 (5.9919) grad_norm 1.6125 (inf) loss_scale 2048.0000 (6241.2046) mem 9658MB [2024-07-31 08:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][620/625] eta 0:00:01 lr 0.001460 wd 0.0500 time 0.2525 (0.2571) data time 0.0003 (0.0016) model time 0.2522 (0.2555) loss 4.5932 (5.9892) grad_norm 1.1003 (inf) loss_scale 2048.0000 (6173.6812) mem 9658MB [2024-07-31 08:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 117 training takes 0:02:40 [2024-07-31 08:42:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:42:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.6489 (0.6489) Acc@1 86.572 (86.572) Acc@5 97.949 (97.949) Mem 9658MB [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 1.0635 (0.7918) Acc@1 75.537 (83.261) Acc@5 93.555 (96.600) Mem 9658MB [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.075) Loss 1.1621 (0.9417) Acc@1 73.193 (79.522) Acc@5 93.066 (94.859) Mem 9658MB [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.233 Acc@5 94.832 [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.23% [2024-07-31 08:42:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 08:42:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 08:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.406 (0.406) Loss 0.5762 (0.5762) Acc@1 88.379 (88.379) Acc@5 98.438 (98.438) Mem 9658MB [2024-07-31 08:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 0.9419 (0.7214) Acc@1 77.637 (84.544) Acc@5 94.971 (97.181) Mem 9658MB [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.072) Loss 1.0889 (0.8612) Acc@1 73.291 (80.829) Acc@5 93.408 (95.580) Mem 9658MB [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.500 Acc@5 95.563 [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.50% [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:43:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][0/625] eta 0:07:54 lr 0.001459 wd 0.0500 time 0.7598 (0.7598) data time 0.5177 (0.5177) model time 0.0000 (0.0000) loss 6.7967 (6.7967) grad_norm 1.3635 (1.3635) loss_scale 2048.0000 (2048.0000) mem 9657MB [2024-07-31 08:43:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][10/625] eta 0:03:05 lr 0.001459 wd 0.0500 time 0.2508 (0.3009) data time 0.0007 (0.0479) model time 0.0000 (0.0000) loss 6.0784 (5.8922) grad_norm 1.3670 (1.4028) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][20/625] eta 0:02:48 lr 0.001459 wd 0.0500 time 0.2567 (0.2792) data time 0.0008 (0.0257) model time 0.0000 (0.0000) loss 6.2426 (5.9412) grad_norm 1.7381 (1.5393) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][30/625] eta 0:02:41 lr 0.001459 wd 0.0500 time 0.2527 (0.2712) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 5.2505 (5.9355) grad_norm 1.6632 (1.7120) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][40/625] eta 0:02:36 lr 0.001459 wd 0.0500 time 0.2681 (0.2677) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 6.1101 (5.9620) grad_norm 2.0911 (1.7661) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][50/625] eta 0:02:32 lr 0.001459 wd 0.0500 time 0.2608 (0.2655) data time 0.0010 (0.0111) model time 0.0000 (0.0000) loss 5.1446 (5.8845) grad_norm 1.2089 (1.7838) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][60/625] eta 0:02:29 lr 0.001459 wd 0.0500 time 0.2599 (0.2640) data time 0.0010 (0.0095) model time 0.2590 (0.2551) loss 4.3118 (5.8947) grad_norm 1.2673 (1.7502) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][70/625] eta 0:02:25 lr 0.001458 wd 0.0500 time 0.2575 (0.2627) data time 0.0008 (0.0083) model time 0.2567 (0.2543) loss 6.6397 (5.8796) grad_norm 3.2672 (1.8375) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][80/625] eta 0:02:22 lr 0.001458 wd 0.0500 time 0.2547 (0.2616) data time 0.0007 (0.0074) model time 0.2540 (0.2540) loss 6.7602 (5.9309) grad_norm 1.4433 (1.8351) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][90/625] eta 0:02:19 lr 0.001458 wd 0.0500 time 0.2550 (0.2609) data time 0.0009 (0.0067) model time 0.2541 (0.2539) loss 7.0315 (5.9354) grad_norm 1.4000 (1.8406) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][100/625] eta 0:02:16 lr 0.001458 wd 0.0500 time 0.2468 (0.2602) data time 0.0009 (0.0061) model time 0.2459 (0.2538) loss 6.0466 (5.9883) grad_norm 1.7032 (1.8386) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][110/625] eta 0:02:13 lr 0.001458 wd 0.0500 time 0.2569 (0.2597) data time 0.0009 (0.0057) model time 0.2560 (0.2538) loss 6.6789 (5.9898) grad_norm 1.1876 (1.8200) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][120/625] eta 0:02:10 lr 0.001458 wd 0.0500 time 0.2515 (0.2593) data time 0.0014 (0.0053) model time 0.2500 (0.2538) loss 5.4642 (5.9417) grad_norm 2.0875 (1.8280) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][130/625] eta 0:02:08 lr 0.001457 wd 0.0500 time 0.2529 (0.2589) data time 0.0008 (0.0050) model time 0.2521 (0.2537) loss 6.9649 (5.9364) grad_norm 1.7893 (1.8331) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][140/625] eta 0:02:05 lr 0.001457 wd 0.0500 time 0.2510 (0.2588) data time 0.0009 (0.0047) model time 0.2501 (0.2539) loss 6.6249 (5.9605) grad_norm 1.5141 (1.8195) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][150/625] eta 0:02:02 lr 0.001457 wd 0.0500 time 0.2536 (0.2585) data time 0.0008 (0.0044) model time 0.2528 (0.2539) loss 7.0002 (5.9661) grad_norm 1.3942 (1.8110) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][160/625] eta 0:02:00 lr 0.001457 wd 0.0500 time 0.2522 (0.2583) data time 0.0009 (0.0042) model time 0.2513 (0.2540) loss 6.3822 (5.9732) grad_norm 2.1832 (1.8029) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][170/625] eta 0:01:57 lr 0.001457 wd 0.0500 time 0.2521 (0.2581) data time 0.0006 (0.0040) model time 0.2515 (0.2540) loss 6.4770 (5.9731) grad_norm 2.0597 (1.8151) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][180/625] eta 0:01:54 lr 0.001457 wd 0.0500 time 0.2515 (0.2579) data time 0.0009 (0.0038) model time 0.2506 (0.2539) loss 6.7755 (5.9776) grad_norm 1.6686 (1.8068) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][190/625] eta 0:01:52 lr 0.001456 wd 0.0500 time 0.2576 (0.2578) data time 0.0008 (0.0037) model time 0.2568 (0.2540) loss 7.2651 (5.9697) grad_norm 1.9053 (1.8122) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][200/625] eta 0:01:49 lr 0.001456 wd 0.0500 time 0.2531 (0.2576) data time 0.0009 (0.0036) model time 0.2521 (0.2540) loss 5.5571 (5.9549) grad_norm 1.2291 (1.7942) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][210/625] eta 0:01:46 lr 0.001456 wd 0.0500 time 0.2487 (0.2575) data time 0.0007 (0.0034) model time 0.2480 (0.2540) loss 5.6452 (5.9679) grad_norm 1.1016 (1.7721) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:43:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][220/625] eta 0:01:44 lr 0.001456 wd 0.0500 time 0.2547 (0.2583) data time 0.0011 (0.0033) model time 0.2536 (0.2551) loss 7.3422 (5.9773) grad_norm 1.3920 (1.7771) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][230/625] eta 0:01:41 lr 0.001456 wd 0.0500 time 0.2540 (0.2581) data time 0.0009 (0.0032) model time 0.2531 (0.2550) loss 6.2574 (5.9913) grad_norm 1.1271 (1.7737) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][240/625] eta 0:01:39 lr 0.001456 wd 0.0500 time 0.2524 (0.2580) data time 0.0007 (0.0031) model time 0.2517 (0.2550) loss 5.4104 (5.9820) grad_norm 1.7130 (1.7776) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][250/625] eta 0:01:36 lr 0.001455 wd 0.0500 time 0.2492 (0.2578) data time 0.0007 (0.0031) model time 0.2485 (0.2549) loss 6.1214 (5.9898) grad_norm 1.7386 (1.7670) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][260/625] eta 0:01:34 lr 0.001455 wd 0.0500 time 0.2519 (0.2577) data time 0.0009 (0.0030) model time 0.2510 (0.2548) loss 6.6402 (5.9834) grad_norm 1.5050 (1.7740) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][270/625] eta 0:01:31 lr 0.001455 wd 0.0500 time 0.2507 (0.2576) data time 0.0010 (0.0029) model time 0.2497 (0.2548) loss 6.4801 (5.9918) grad_norm 1.7069 (1.7823) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][280/625] eta 0:01:28 lr 0.001455 wd 0.0500 time 0.2543 (0.2575) data time 0.0018 (0.0028) model time 0.2525 (0.2547) loss 7.1631 (5.9984) grad_norm 1.5099 (1.7955) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][290/625] eta 0:01:26 lr 0.001455 wd 0.0500 time 0.2509 (0.2574) data time 0.0009 (0.0028) model time 0.2500 (0.2546) loss 5.6916 (6.0203) grad_norm 1.5663 (1.8010) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][300/625] eta 0:01:23 lr 0.001455 wd 0.0500 time 0.2595 (0.2573) data time 0.0010 (0.0027) model time 0.2585 (0.2547) loss 6.9244 (6.0244) grad_norm 1.6416 (1.7928) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][310/625] eta 0:01:21 lr 0.001455 wd 0.0500 time 0.2539 (0.2573) data time 0.0009 (0.0027) model time 0.2530 (0.2547) loss 6.5884 (6.0213) grad_norm 1.8532 (1.8248) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][320/625] eta 0:01:18 lr 0.001454 wd 0.0500 time 0.2539 (0.2573) data time 0.0010 (0.0026) model time 0.2529 (0.2547) loss 4.9723 (6.0225) grad_norm 2.5720 (1.8377) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][330/625] eta 0:01:15 lr 0.001454 wd 0.0500 time 0.2547 (0.2572) data time 0.0008 (0.0026) model time 0.2539 (0.2547) loss 4.8851 (6.0153) grad_norm 1.4254 (1.8303) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][340/625] eta 0:01:13 lr 0.001454 wd 0.0500 time 0.2541 (0.2571) data time 0.0009 (0.0025) model time 0.2532 (0.2547) loss 6.9544 (6.0155) grad_norm 1.3014 (1.8288) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][350/625] eta 0:01:10 lr 0.001454 wd 0.0500 time 0.2642 (0.2571) data time 0.0010 (0.0025) model time 0.2632 (0.2547) loss 6.3384 (6.0300) grad_norm 2.3930 (1.8397) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][360/625] eta 0:01:08 lr 0.001454 wd 0.0500 time 0.2575 (0.2571) data time 0.0006 (0.0024) model time 0.2569 (0.2547) loss 6.6885 (6.0423) grad_norm 1.9109 (1.8469) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][370/625] eta 0:01:05 lr 0.001454 wd 0.0500 time 0.2437 (0.2570) data time 0.0007 (0.0024) model time 0.2430 (0.2547) loss 6.5633 (6.0429) grad_norm 1.6571 (1.8403) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][380/625] eta 0:01:02 lr 0.001453 wd 0.0500 time 0.2543 (0.2570) data time 0.0010 (0.0024) model time 0.2534 (0.2547) loss 5.4971 (6.0411) grad_norm 1.6299 (1.8427) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][390/625] eta 0:01:00 lr 0.001453 wd 0.0500 time 0.2518 (0.2570) data time 0.0009 (0.0023) model time 0.2509 (0.2547) loss 5.7351 (6.0391) grad_norm 2.4038 (1.8415) loss_scale 2048.0000 (2048.0000) mem 9658MB [2024-07-31 08:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:44:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:44:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:46:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:47:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 118) [2024-07-31 08:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][400/625] eta 0:04:35 lr 0.001453 wd 0.0500 time 0.2621 (1.2228) data time 0.0008 (0.0942) model time 0.2613 (1.1286) loss 6.0586 (6.6149) grad_norm 2.5396 (2.2246) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][410/625] eta 0:02:34 lr 0.001453 wd 0.0500 time 0.2646 (0.7166) data time 0.0011 (0.0452) model time 0.2634 (0.6713) loss 6.1708 (6.3589) grad_norm 1.6073 (2.1530) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][420/625] eta 0:01:54 lr 0.001453 wd 0.0500 time 0.2566 (0.5593) data time 0.0009 (0.0300) model time 0.2557 (0.5293) loss 6.6326 (6.3888) grad_norm 1.7152 (2.0609) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][430/625] eta 0:01:34 lr 0.001453 wd 0.0500 time 0.2625 (0.4826) data time 0.0010 (0.0226) model time 0.2615 (0.4601) loss 6.3856 (6.3003) grad_norm 2.7489 (2.1802) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][440/625] eta 0:01:20 lr 0.001452 wd 0.0500 time 0.2604 (0.4373) data time 0.0010 (0.0182) model time 0.2594 (0.4191) loss 5.4666 (6.2478) grad_norm 1.7814 (2.1924) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][450/625] eta 0:01:11 lr 0.001452 wd 0.0500 time 0.2597 (0.4074) data time 0.0008 (0.0153) model time 0.2589 (0.3922) loss 4.2439 (6.1942) grad_norm 2.2983 (2.1417) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][460/625] eta 0:01:03 lr 0.001452 wd 0.0500 time 0.2598 (0.3863) data time 0.0010 (0.0132) model time 0.2588 (0.3731) loss 5.7067 (6.1662) grad_norm 1.6473 (2.0948) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][470/625] eta 0:00:57 lr 0.001452 wd 0.0500 time 0.3038 (0.3710) data time 0.0010 (0.0117) model time 0.3028 (0.3594) loss 5.5107 (6.1020) grad_norm 1.5361 (2.0631) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][480/625] eta 0:00:52 lr 0.001452 wd 0.0500 time 0.2589 (0.3588) data time 0.0008 (0.0105) model time 0.2581 (0.3484) loss 6.0786 (6.0763) grad_norm 1.8135 (2.0883) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][490/625] eta 0:00:47 lr 0.001452 wd 0.0500 time 0.2616 (0.3491) data time 0.0010 (0.0097) model time 0.2606 (0.3394) loss 5.5976 (6.1093) grad_norm 1.1424 (2.0795) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:47:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][500/625] eta 0:00:42 lr 0.001452 wd 0.0500 time 0.2615 (0.3412) data time 0.0007 (0.0089) model time 0.2608 (0.3323) loss 6.5962 (6.1277) grad_norm 1.2025 (2.0478) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][510/625] eta 0:00:38 lr 0.001451 wd 0.0500 time 0.2796 (0.3347) data time 0.0010 (0.0082) model time 0.2786 (0.3265) loss 7.0155 (6.1155) grad_norm 1.5288 (2.0417) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][520/625] eta 0:00:34 lr 0.001451 wd 0.0500 time 0.2606 (0.3289) data time 0.0008 (0.0077) model time 0.2598 (0.3213) loss 5.8954 (6.1074) grad_norm 1.8649 (2.0301) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][530/625] eta 0:00:30 lr 0.001451 wd 0.0500 time 0.2611 (0.3243) data time 0.0008 (0.0072) model time 0.2604 (0.3171) loss 6.5133 (6.1166) grad_norm 1.3328 (2.0466) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][540/625] eta 0:00:27 lr 0.001451 wd 0.0500 time 0.2556 (0.3201) data time 0.0010 (0.0068) model time 0.2546 (0.3134) loss 5.6429 (6.1015) grad_norm 1.3252 (2.0232) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][550/625] eta 0:00:23 lr 0.001451 wd 0.0500 time 0.2669 (0.3166) data time 0.0008 (0.0064) model time 0.2661 (0.3102) loss 6.9640 (6.1147) grad_norm 1.1150 (2.0053) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][560/625] eta 0:00:20 lr 0.001451 wd 0.0500 time 0.2630 (0.3134) data time 0.0011 (0.0061) model time 0.2620 (0.3073) loss 6.3914 (6.1265) grad_norm 1.9290 (1.9771) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][570/625] eta 0:00:17 lr 0.001450 wd 0.0500 time 0.2622 (0.3107) data time 0.0009 (0.0058) model time 0.2613 (0.3048) loss 5.6226 (6.0960) grad_norm 2.8269 (1.9766) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][580/625] eta 0:00:13 lr 0.001450 wd 0.0500 time 0.2678 (0.3081) data time 0.0008 (0.0056) model time 0.2670 (0.3025) loss 6.7774 (6.0938) grad_norm 2.9059 (1.9605) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][590/625] eta 0:00:10 lr 0.001450 wd 0.0500 time 0.2630 (0.3059) data time 0.0010 (0.0053) model time 0.2621 (0.3005) loss 5.0887 (6.0679) grad_norm 2.6672 (1.9784) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][600/625] eta 0:00:07 lr 0.001450 wd 0.0500 time 0.2619 (0.3039) data time 0.0009 (0.0051) model time 0.2610 (0.2988) loss 5.5675 (6.0622) grad_norm 1.0316 (1.9653) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][610/625] eta 0:00:04 lr 0.001450 wd 0.0500 time 0.2636 (0.3022) data time 0.0005 (0.0051) model time 0.2630 (0.2971) loss 6.6102 (6.0570) grad_norm 1.9848 (1.9708) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][620/625] eta 0:00:01 lr 0.001450 wd 0.0500 time 0.2598 (0.3005) data time 0.0005 (0.0049) model time 0.2594 (0.2956) loss 5.3166 (6.0595) grad_norm 1.4026 (1.9676) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 118 training takes 0:01:09 [2024-07-31 08:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:48:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.6406 (0.6406) Acc@1 87.793 (87.793) Acc@5 98.096 (98.096) Mem 9656MB [2024-07-31 08:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0771 (0.8248) Acc@1 76.074 (82.950) Acc@5 94.092 (96.649) Mem 9656MB [2024-07-31 08:48:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.2773 (0.9750) Acc@1 70.654 (79.016) Acc@5 91.992 (94.741) Mem 9656MB [2024-07-31 08:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.771 Acc@5 94.714 [2024-07-31 08:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-31 08:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.5767 (0.5767) Acc@1 88.428 (88.428) Acc@5 98.438 (98.438) Mem 9656MB [2024-07-31 08:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.136) Loss 0.9429 (0.7214) Acc@1 77.783 (84.601) Acc@5 94.971 (97.212) Mem 9656MB [2024-07-31 08:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0859 (0.8609) Acc@1 73.389 (80.876) Acc@5 93.359 (95.615) Mem 9656MB [2024-07-31 08:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.550 Acc@5 95.595 [2024-07-31 08:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-31 08:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.55% [2024-07-31 08:48:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:48:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][0/625] eta 0:08:21 lr 0.001450 wd 0.0500 time 0.8018 (0.8018) data time 0.4733 (0.4733) model time 0.0000 (0.0000) loss 5.2002 (5.2002) grad_norm 1.9721 (1.9721) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 08:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][10/625] eta 0:03:11 lr 0.001449 wd 0.0500 time 0.2601 (0.3111) data time 0.0008 (0.0439) model time 0.0000 (0.0000) loss 4.9795 (6.0274) grad_norm 1.9844 (1.7199) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:48:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][20/625] eta 0:02:55 lr 0.001449 wd 0.0500 time 0.2604 (0.2897) data time 0.0008 (0.0235) model time 0.0000 (0.0000) loss 4.6147 (5.8659) grad_norm 1.4009 (1.6933) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][30/625] eta 0:02:48 lr 0.001449 wd 0.0500 time 0.2608 (0.2831) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 5.8899 (5.7553) grad_norm 1.5929 (1.6217) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][40/625] eta 0:02:42 lr 0.001449 wd 0.0500 time 0.2599 (0.2786) data time 0.0011 (0.0125) model time 0.0000 (0.0000) loss 6.0576 (5.7602) grad_norm 1.8084 (1.6575) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][50/625] eta 0:02:38 lr 0.001449 wd 0.0500 time 0.2609 (0.2758) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 4.7543 (5.7867) grad_norm 1.3093 (1.7071) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][60/625] eta 0:02:35 lr 0.001449 wd 0.0500 time 0.2613 (0.2757) data time 0.0008 (0.0088) model time 0.2605 (0.2742) loss 5.0853 (5.7834) grad_norm 1.8098 (1.7118) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][70/625] eta 0:02:32 lr 0.001448 wd 0.0500 time 0.2611 (0.2741) data time 0.0012 (0.0077) model time 0.2599 (0.2687) loss 6.5564 (5.7640) grad_norm 1.1583 (1.7562) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][80/625] eta 0:02:28 lr 0.001448 wd 0.0500 time 0.2599 (0.2727) data time 0.0011 (0.0069) model time 0.2588 (0.2662) loss 7.0350 (5.8222) grad_norm 1.2586 (1.7628) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][90/625] eta 0:02:25 lr 0.001448 wd 0.0500 time 0.2630 (0.2720) data time 0.0009 (0.0063) model time 0.2621 (0.2661) loss 6.8277 (5.8881) grad_norm 1.2533 (1.7481) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][100/625] eta 0:02:22 lr 0.001448 wd 0.0500 time 0.2658 (0.2715) data time 0.0007 (0.0058) model time 0.2651 (0.2659) loss 6.0018 (5.8832) grad_norm 2.8225 (1.7694) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][110/625] eta 0:02:19 lr 0.001448 wd 0.0500 time 0.2651 (0.2709) data time 0.0011 (0.0054) model time 0.2640 (0.2656) loss 6.6559 (5.9058) grad_norm 2.0598 (1.8095) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][120/625] eta 0:02:16 lr 0.001448 wd 0.0500 time 0.2631 (0.2705) data time 0.0011 (0.0050) model time 0.2620 (0.2655) loss 6.7382 (5.9206) grad_norm 2.8588 (1.8258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][130/625] eta 0:02:14 lr 0.001447 wd 0.0500 time 0.2488 (0.2724) data time 0.0010 (0.0048) model time 0.2478 (0.2690) loss 4.2910 (5.9064) grad_norm 2.0650 (1.8193) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][140/625] eta 0:02:11 lr 0.001447 wd 0.0500 time 0.2652 (0.2719) data time 0.0011 (0.0045) model time 0.2641 (0.2684) loss 6.7611 (5.9157) grad_norm 1.2338 (1.8544) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][150/625] eta 0:02:08 lr 0.001447 wd 0.0500 time 0.2623 (0.2713) data time 0.0008 (0.0043) model time 0.2615 (0.2677) loss 4.8244 (5.8873) grad_norm 1.1327 (1.8221) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][160/625] eta 0:02:05 lr 0.001447 wd 0.0500 time 0.2795 (0.2709) data time 0.0010 (0.0041) model time 0.2785 (0.2674) loss 6.2695 (5.8812) grad_norm 2.0973 (1.8309) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][170/625] eta 0:02:03 lr 0.001447 wd 0.0500 time 0.2649 (0.2704) data time 0.0009 (0.0039) model time 0.2641 (0.2669) loss 6.3215 (5.9024) grad_norm 1.5332 (1.8208) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:49:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:49:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:51:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 08:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 08:51:52 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 119) [2024-07-31 08:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 08:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][180/625] eta 0:07:11 lr 0.001447 wd 0.0500 time 0.2514 (0.9699) data time 0.0008 (0.0953) model time 0.2506 (0.8746) loss 6.4048 (6.6718) grad_norm 1.2190 (1.7516) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][190/625] eta 0:04:25 lr 0.001446 wd 0.0500 time 0.2523 (0.6111) data time 0.0008 (0.0482) model time 0.2515 (0.5629) loss 6.6611 (6.3940) grad_norm 1.1263 (1.7649) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][200/625] eta 0:03:28 lr 0.001446 wd 0.0500 time 0.2507 (0.4913) data time 0.0011 (0.0325) model time 0.2497 (0.4588) loss 6.3669 (6.4144) grad_norm 1.5398 (1.7281) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][210/625] eta 0:02:59 lr 0.001446 wd 0.0500 time 0.2621 (0.4326) data time 0.0006 (0.0246) model time 0.2614 (0.4080) loss 5.5603 (6.2591) grad_norm 2.7555 (1.7523) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][220/625] eta 0:02:40 lr 0.001446 wd 0.0500 time 0.2521 (0.3965) data time 0.0008 (0.0199) model time 0.2513 (0.3766) loss 4.8762 (6.1910) grad_norm 1.5375 (1.8830) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][230/625] eta 0:02:27 lr 0.001446 wd 0.0500 time 0.2529 (0.3725) data time 0.0011 (0.0167) model time 0.2518 (0.3558) loss 6.2213 (6.1725) grad_norm 2.5125 (1.8720) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][240/625] eta 0:02:16 lr 0.001446 wd 0.0500 time 0.2522 (0.3555) data time 0.0006 (0.0144) model time 0.2516 (0.3410) loss 4.7033 (6.1204) grad_norm 3.2622 (1.9581) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][250/625] eta 0:02:08 lr 0.001446 wd 0.0500 time 0.2572 (0.3430) data time 0.0009 (0.0128) model time 0.2564 (0.3302) loss 6.9703 (6.1242) grad_norm 2.9811 (1.9799) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][260/625] eta 0:02:01 lr 0.001445 wd 0.0500 time 0.2537 (0.3330) data time 0.0008 (0.0114) model time 0.2530 (0.3215) loss 6.0915 (6.0881) grad_norm 1.5747 (1.9622) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][270/625] eta 0:01:55 lr 0.001445 wd 0.0500 time 0.2559 (0.3251) data time 0.0010 (0.0104) model time 0.2549 (0.3147) loss 7.1031 (6.1068) grad_norm 1.7293 (1.9420) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][280/625] eta 0:01:49 lr 0.001445 wd 0.0500 time 0.2494 (0.3187) data time 0.0009 (0.0095) model time 0.2485 (0.3092) loss 6.0651 (6.1169) grad_norm 1.4054 (1.9284) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][290/625] eta 0:01:44 lr 0.001445 wd 0.0500 time 0.2562 (0.3133) data time 0.0007 (0.0088) model time 0.2555 (0.3045) loss 6.4931 (6.1182) grad_norm 1.5287 (1.9025) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][300/625] eta 0:01:40 lr 0.001445 wd 0.0500 time 0.2559 (0.3088) data time 0.0007 (0.0082) model time 0.2552 (0.3006) loss 6.1970 (6.0956) grad_norm 1.7929 (1.8874) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][310/625] eta 0:01:36 lr 0.001445 wd 0.0500 time 0.2634 (0.3049) data time 0.0007 (0.0077) model time 0.2627 (0.2972) loss 4.3721 (6.0949) grad_norm 1.5884 (1.8890) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][320/625] eta 0:01:31 lr 0.001444 wd 0.0500 time 0.2539 (0.3016) data time 0.0009 (0.0072) model time 0.2530 (0.2943) loss 6.9084 (6.0898) grad_norm 1.2447 (1.8899) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][330/625] eta 0:01:28 lr 0.001444 wd 0.0500 time 0.2561 (0.2986) data time 0.0008 (0.0068) model time 0.2553 (0.2918) loss 7.1112 (6.0959) grad_norm 1.9539 (1.8814) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][340/625] eta 0:01:24 lr 0.001444 wd 0.0500 time 0.2564 (0.2961) data time 0.0008 (0.0065) model time 0.2556 (0.2896) loss 4.9207 (6.1021) grad_norm 1.3368 (1.8867) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][350/625] eta 0:01:20 lr 0.001444 wd 0.0500 time 0.2540 (0.2937) data time 0.0007 (0.0062) model time 0.2533 (0.2875) loss 5.2253 (6.0666) grad_norm 1.3728 (1.8641) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][360/625] eta 0:01:17 lr 0.001444 wd 0.0500 time 0.2505 (0.2917) data time 0.0007 (0.0059) model time 0.2498 (0.2858) loss 5.1765 (6.0727) grad_norm 2.8493 (1.8536) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][370/625] eta 0:01:13 lr 0.001444 wd 0.0500 time 0.2541 (0.2899) data time 0.0010 (0.0057) model time 0.2531 (0.2842) loss 6.4001 (6.0567) grad_norm 1.2683 (1.8337) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][380/625] eta 0:01:10 lr 0.001443 wd 0.0500 time 0.2557 (0.2882) data time 0.0007 (0.0054) model time 0.2550 (0.2828) loss 6.3760 (6.0389) grad_norm 2.1840 (1.8324) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][390/625] eta 0:01:07 lr 0.001443 wd 0.0500 time 0.2542 (0.2867) data time 0.0009 (0.0052) model time 0.2533 (0.2814) loss 6.0480 (6.0285) grad_norm 1.1553 (1.8449) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][400/625] eta 0:01:04 lr 0.001443 wd 0.0500 time 0.2507 (0.2853) data time 0.0009 (0.0051) model time 0.2498 (0.2803) loss 6.0101 (6.0405) grad_norm 1.4691 (1.8460) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][410/625] eta 0:01:01 lr 0.001443 wd 0.0500 time 0.2524 (0.2841) data time 0.0010 (0.0049) model time 0.2514 (0.2792) loss 7.0782 (6.0380) grad_norm 2.5026 (1.8527) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][420/625] eta 0:00:58 lr 0.001443 wd 0.0500 time 0.2561 (0.2830) data time 0.0006 (0.0047) model time 0.2555 (0.2782) loss 5.0582 (6.0316) grad_norm 1.7619 (1.8605) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][430/625] eta 0:00:54 lr 0.001443 wd 0.0500 time 0.2512 (0.2819) data time 0.0008 (0.0046) model time 0.2503 (0.2773) loss 5.2625 (6.0204) grad_norm 1.6361 (1.8940) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][440/625] eta 0:00:51 lr 0.001443 wd 0.0500 time 0.2525 (0.2809) data time 0.0008 (0.0044) model time 0.2518 (0.2764) loss 7.1221 (6.0049) grad_norm 1.9802 (1.9088) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][450/625] eta 0:00:49 lr 0.001442 wd 0.0500 time 0.2533 (0.2800) data time 0.0009 (0.0043) model time 0.2524 (0.2757) loss 7.1625 (6.0199) grad_norm 2.1176 (1.9167) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][460/625] eta 0:00:46 lr 0.001442 wd 0.0500 time 0.2530 (0.2792) data time 0.0008 (0.0042) model time 0.2522 (0.2750) loss 5.7639 (6.0189) grad_norm 1.9327 (1.9159) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][470/625] eta 0:00:43 lr 0.001442 wd 0.0500 time 0.2562 (0.2783) data time 0.0009 (0.0041) model time 0.2553 (0.2742) loss 5.2487 (6.0048) grad_norm 1.4500 (1.9112) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][480/625] eta 0:00:40 lr 0.001442 wd 0.0500 time 0.2562 (0.2775) data time 0.0008 (0.0040) model time 0.2554 (0.2735) loss 5.5484 (6.0003) grad_norm 1.8561 (1.9088) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][490/625] eta 0:00:37 lr 0.001442 wd 0.0500 time 0.2620 (0.2769) data time 0.0016 (0.0039) model time 0.2604 (0.2730) loss 5.5837 (6.0197) grad_norm 1.5813 (1.9157) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][500/625] eta 0:00:34 lr 0.001442 wd 0.0500 time 0.2542 (0.2763) data time 0.0008 (0.0038) model time 0.2534 (0.2724) loss 5.8327 (6.0280) grad_norm 3.3875 (1.9273) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][510/625] eta 0:00:31 lr 0.001441 wd 0.0500 time 0.2556 (0.2756) data time 0.0008 (0.0037) model time 0.2548 (0.2719) loss 5.8133 (6.0288) grad_norm 1.4286 (1.9195) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][520/625] eta 0:00:28 lr 0.001441 wd 0.0500 time 0.2527 (0.2750) data time 0.0009 (0.0036) model time 0.2518 (0.2714) loss 5.8277 (6.0349) grad_norm 1.5442 (1.9125) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][530/625] eta 0:00:26 lr 0.001441 wd 0.0500 time 0.2526 (0.2745) data time 0.0006 (0.0036) model time 0.2520 (0.2709) loss 7.7473 (6.0365) grad_norm 1.4667 (1.9035) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][540/625] eta 0:00:23 lr 0.001441 wd 0.0500 time 0.2578 (0.2740) data time 0.0008 (0.0035) model time 0.2570 (0.2705) loss 6.7836 (6.0405) grad_norm 1.0622 (1.9046) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][550/625] eta 0:00:20 lr 0.001441 wd 0.0500 time 0.2588 (0.2735) data time 0.0009 (0.0034) model time 0.2579 (0.2701) loss 5.6954 (6.0368) grad_norm 2.2248 (1.9058) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][560/625] eta 0:00:17 lr 0.001441 wd 0.0500 time 0.2575 (0.2730) data time 0.0008 (0.0034) model time 0.2567 (0.2697) loss 4.8196 (6.0249) grad_norm 1.9461 (1.8987) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][570/625] eta 0:00:14 lr 0.001440 wd 0.0500 time 0.2563 (0.2726) data time 0.0009 (0.0033) model time 0.2554 (0.2693) loss 6.2886 (6.0329) grad_norm 1.6504 (1.8926) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][580/625] eta 0:00:12 lr 0.001440 wd 0.0500 time 0.2589 (0.2722) data time 0.0007 (0.0032) model time 0.2583 (0.2689) loss 5.9998 (6.0355) grad_norm 1.1403 (1.8802) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][590/625] eta 0:00:09 lr 0.001440 wd 0.0500 time 0.4972 (0.2724) data time 0.0007 (0.0032) model time 0.4965 (0.2692) loss 5.8392 (6.0300) grad_norm 3.0532 (1.8827) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][600/625] eta 0:00:06 lr 0.001440 wd 0.0500 time 0.2544 (0.2719) data time 0.0011 (0.0031) model time 0.2533 (0.2688) loss 6.0538 (6.0401) grad_norm 2.3005 (1.8782) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][610/625] eta 0:00:04 lr 0.001440 wd 0.0500 time 0.2525 (0.2715) data time 0.0005 (0.0031) model time 0.2520 (0.2684) loss 4.9324 (6.0439) grad_norm 2.2058 (1.8752) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][620/625] eta 0:00:01 lr 0.001440 wd 0.0500 time 0.2511 (0.2711) data time 0.0003 (0.0030) model time 0.2508 (0.2681) loss 6.3876 (6.0457) grad_norm 1.4574 (1.8870) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 08:54:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 119 training takes 0:02:03 [2024-07-31 08:54:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:54:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 08:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.384 (0.384) Loss 0.6650 (0.6650) Acc@1 87.744 (87.744) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-31 08:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.086) Loss 1.0742 (0.8272) Acc@1 77.051 (83.168) Acc@5 94.043 (96.751) Mem 9656MB [2024-07-31 08:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.071) Loss 1.1982 (0.9756) Acc@1 73.535 (79.529) Acc@5 92.480 (95.015) Mem 9656MB [2024-07-31 08:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.191 Acc@5 94.962 [2024-07-31 08:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-31 08:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.723 (0.723) Loss 0.5771 (0.5771) Acc@1 88.477 (88.477) Acc@5 98.535 (98.535) Mem 9656MB [2024-07-31 08:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.124) Loss 0.9399 (0.7216) Acc@1 77.881 (84.628) Acc@5 95.068 (97.257) Mem 9656MB [2024-07-31 08:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.091) Loss 1.0850 (0.8609) Acc@1 73.535 (80.887) Acc@5 93.359 (95.640) Mem 9656MB [2024-07-31 08:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.570 Acc@5 95.611 [2024-07-31 08:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-31 08:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.57% [2024-07-31 08:54:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 08:54:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 08:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][0/625] eta 0:07:25 lr 0.001440 wd 0.0500 time 0.7133 (0.7133) data time 0.3896 (0.3896) model time 0.0000 (0.0000) loss 5.1800 (5.1800) grad_norm 1.9091 (1.9091) loss_scale 2048.0000 (2048.0000) mem 9651MB [2024-07-31 08:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][10/625] eta 0:03:02 lr 0.001439 wd 0.0500 time 0.2536 (0.2960) data time 0.0006 (0.0362) model time 0.0000 (0.0000) loss 5.7241 (5.4766) grad_norm 1.2878 (1.9861) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][20/625] eta 0:02:47 lr 0.001439 wd 0.0500 time 0.2585 (0.2768) data time 0.0007 (0.0195) model time 0.0000 (0.0000) loss 6.0303 (5.4956) grad_norm 2.9641 (1.9378) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][30/625] eta 0:02:40 lr 0.001439 wd 0.0500 time 0.2564 (0.2699) data time 0.0007 (0.0135) model time 0.0000 (0.0000) loss 6.9963 (5.7131) grad_norm 1.5365 (2.1544) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][40/625] eta 0:02:35 lr 0.001439 wd 0.0500 time 0.2539 (0.2663) data time 0.0009 (0.0104) model time 0.0000 (0.0000) loss 6.2043 (5.7955) grad_norm 1.2658 (2.0361) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][50/625] eta 0:02:31 lr 0.001439 wd 0.0500 time 0.2551 (0.2639) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 6.5336 (5.8035) grad_norm 1.5908 (1.9209) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][60/625] eta 0:02:28 lr 0.001439 wd 0.0500 time 0.2520 (0.2625) data time 0.0010 (0.0073) model time 0.2511 (0.2543) loss 6.0249 (5.8877) grad_norm 2.2101 (1.9169) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][70/625] eta 0:02:25 lr 0.001438 wd 0.0500 time 0.2538 (0.2616) data time 0.0007 (0.0064) model time 0.2531 (0.2545) loss 6.4473 (5.8464) grad_norm 2.0049 (1.8964) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][80/625] eta 0:02:22 lr 0.001438 wd 0.0500 time 0.2560 (0.2608) data time 0.0007 (0.0058) model time 0.2553 (0.2546) loss 5.6071 (5.8431) grad_norm 3.0201 (1.9076) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][90/625] eta 0:02:19 lr 0.001438 wd 0.0500 time 0.2531 (0.2602) data time 0.0007 (0.0052) model time 0.2524 (0.2546) loss 5.0007 (5.8414) grad_norm 1.3972 (1.8721) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][100/625] eta 0:02:16 lr 0.001438 wd 0.0500 time 0.2521 (0.2598) data time 0.0008 (0.0048) model time 0.2513 (0.2547) loss 5.9220 (5.8936) grad_norm 1.6223 (1.8439) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][110/625] eta 0:02:13 lr 0.001438 wd 0.0500 time 0.2561 (0.2596) data time 0.0010 (0.0044) model time 0.2552 (0.2549) loss 6.1357 (5.9298) grad_norm 1.7489 (1.8169) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][120/625] eta 0:02:10 lr 0.001438 wd 0.0500 time 0.2559 (0.2593) data time 0.0010 (0.0042) model time 0.2549 (0.2550) loss 6.5002 (5.9296) grad_norm 2.0288 (1.8080) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][130/625] eta 0:02:09 lr 0.001437 wd 0.0500 time 0.5043 (0.2608) data time 0.0008 (0.0039) model time 0.5035 (0.2580) loss 6.1202 (5.9342) grad_norm 1.1414 (1.7994) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][140/625] eta 0:02:06 lr 0.001437 wd 0.0500 time 0.2536 (0.2604) data time 0.0006 (0.0037) model time 0.2530 (0.2574) loss 7.0800 (5.9503) grad_norm 2.2962 (1.7888) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][150/625] eta 0:02:03 lr 0.001437 wd 0.0500 time 0.2570 (0.2601) data time 0.0007 (0.0035) model time 0.2563 (0.2572) loss 6.4991 (5.9393) grad_norm 2.0771 (1.7733) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][160/625] eta 0:02:00 lr 0.001437 wd 0.0500 time 0.2483 (0.2598) data time 0.0008 (0.0033) model time 0.2475 (0.2569) loss 6.7093 (5.9444) grad_norm 1.4090 (1.7535) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][170/625] eta 0:01:58 lr 0.001437 wd 0.0500 time 0.2512 (0.2595) data time 0.0011 (0.0032) model time 0.2501 (0.2567) loss 6.1327 (5.9650) grad_norm 1.5318 (1.7601) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][180/625] eta 0:01:55 lr 0.001437 wd 0.0500 time 0.2574 (0.2593) data time 0.0008 (0.0031) model time 0.2565 (0.2566) loss 5.4068 (5.9634) grad_norm 1.1862 (1.7665) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][190/625] eta 0:01:52 lr 0.001437 wd 0.0500 time 0.2551 (0.2593) data time 0.0008 (0.0030) model time 0.2543 (0.2567) loss 5.8023 (5.9413) grad_norm 2.3966 (1.7661) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][200/625] eta 0:01:50 lr 0.001436 wd 0.0500 time 0.2563 (0.2591) data time 0.0007 (0.0029) model time 0.2556 (0.2566) loss 5.3262 (5.9450) grad_norm 1.8411 (1.7563) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][210/625] eta 0:01:47 lr 0.001436 wd 0.0500 time 0.2559 (0.2589) data time 0.0007 (0.0028) model time 0.2552 (0.2563) loss 4.7764 (5.9264) grad_norm 1.3766 (1.7512) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][220/625] eta 0:01:44 lr 0.001436 wd 0.0500 time 0.2565 (0.2587) data time 0.0008 (0.0027) model time 0.2557 (0.2563) loss 5.6595 (5.9421) grad_norm 5.7982 (1.7847) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][230/625] eta 0:01:42 lr 0.001436 wd 0.0500 time 0.2529 (0.2587) data time 0.0009 (0.0026) model time 0.2520 (0.2563) loss 6.1958 (5.9406) grad_norm 3.4252 (1.8679) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][240/625] eta 0:01:39 lr 0.001436 wd 0.0500 time 0.2550 (0.2586) data time 0.0009 (0.0025) model time 0.2541 (0.2562) loss 5.5675 (5.9428) grad_norm 2.3019 (1.8972) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][250/625] eta 0:01:36 lr 0.001436 wd 0.0500 time 0.2542 (0.2584) data time 0.0007 (0.0025) model time 0.2535 (0.2561) loss 6.2323 (5.9399) grad_norm 1.7065 (1.9143) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][260/625] eta 0:01:34 lr 0.001435 wd 0.0500 time 0.2590 (0.2584) data time 0.0008 (0.0024) model time 0.2583 (0.2561) loss 6.7700 (5.9324) grad_norm 2.0799 (1.9271) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][270/625] eta 0:01:31 lr 0.001435 wd 0.0500 time 0.2541 (0.2583) data time 0.0006 (0.0023) model time 0.2535 (0.2561) loss 6.1174 (5.9307) grad_norm 3.3283 (1.9244) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][280/625] eta 0:01:29 lr 0.001435 wd 0.0500 time 0.2490 (0.2581) data time 0.0007 (0.0023) model time 0.2483 (0.2560) loss 6.8870 (5.9441) grad_norm 2.7948 (1.9295) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][290/625] eta 0:01:26 lr 0.001435 wd 0.0500 time 0.2539 (0.2581) data time 0.0008 (0.0022) model time 0.2531 (0.2560) loss 6.1942 (5.9512) grad_norm 1.4136 (1.9475) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][300/625] eta 0:01:23 lr 0.001435 wd 0.0500 time 0.2623 (0.2580) data time 0.0007 (0.0022) model time 0.2616 (0.2559) loss 5.5670 (5.9430) grad_norm 1.2231 (1.9266) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][310/625] eta 0:01:21 lr 0.001435 wd 0.0500 time 0.2522 (0.2579) data time 0.0007 (0.0022) model time 0.2515 (0.2558) loss 5.9788 (5.9501) grad_norm 1.4171 (1.9111) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][320/625] eta 0:01:18 lr 0.001434 wd 0.0500 time 0.2585 (0.2578) data time 0.0010 (0.0021) model time 0.2575 (0.2557) loss 5.2271 (5.9604) grad_norm 7.7493 (1.9155) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][330/625] eta 0:01:16 lr 0.001434 wd 0.0500 time 0.2635 (0.2577) data time 0.0008 (0.0021) model time 0.2627 (0.2557) loss 5.3520 (5.9615) grad_norm 1.6658 (1.9343) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][340/625] eta 0:01:13 lr 0.001434 wd 0.0500 time 0.2539 (0.2577) data time 0.0007 (0.0021) model time 0.2531 (0.2557) loss 5.4004 (5.9726) grad_norm 2.2009 (1.9486) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][350/625] eta 0:01:10 lr 0.001434 wd 0.0500 time 0.2558 (0.2576) data time 0.0008 (0.0020) model time 0.2550 (0.2556) loss 6.4815 (5.9670) grad_norm 2.4799 (1.9559) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:55:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][360/625] eta 0:01:08 lr 0.001434 wd 0.0500 time 0.2591 (0.2575) data time 0.0011 (0.0020) model time 0.2581 (0.2555) loss 6.3444 (5.9556) grad_norm 1.7391 (1.9515) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][370/625] eta 0:01:05 lr 0.001434 wd 0.0500 time 0.2571 (0.2575) data time 0.0006 (0.0020) model time 0.2565 (0.2556) loss 4.8269 (5.9574) grad_norm 1.0567 (1.9474) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][380/625] eta 0:01:03 lr 0.001433 wd 0.0500 time 0.2574 (0.2575) data time 0.0008 (0.0020) model time 0.2566 (0.2556) loss 4.9855 (5.9499) grad_norm 2.4494 (1.9468) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][390/625] eta 0:01:00 lr 0.001433 wd 0.0500 time 0.2617 (0.2575) data time 0.0008 (0.0019) model time 0.2610 (0.2556) loss 5.0835 (5.9474) grad_norm 1.4829 (1.9391) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][400/625] eta 0:00:57 lr 0.001433 wd 0.0500 time 0.2551 (0.2574) data time 0.0008 (0.0019) model time 0.2543 (0.2555) loss 6.7301 (5.9584) grad_norm 1.3688 (1.9399) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][410/625] eta 0:00:55 lr 0.001433 wd 0.0500 time 0.2677 (0.2574) data time 0.0009 (0.0019) model time 0.2668 (0.2555) loss 5.8637 (5.9573) grad_norm 1.4091 (1.9300) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][420/625] eta 0:00:52 lr 0.001433 wd 0.0500 time 0.2583 (0.2573) data time 0.0006 (0.0019) model time 0.2576 (0.2555) loss 6.0670 (5.9613) grad_norm 1.7319 (1.9258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][430/625] eta 0:00:50 lr 0.001433 wd 0.0500 time 0.2534 (0.2573) data time 0.0007 (0.0018) model time 0.2527 (0.2555) loss 4.9121 (5.9553) grad_norm 1.6222 (1.9273) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][440/625] eta 0:00:47 lr 0.001432 wd 0.0500 time 0.2585 (0.2572) data time 0.0010 (0.0018) model time 0.2575 (0.2554) loss 6.0666 (5.9582) grad_norm 2.7947 (1.9240) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][450/625] eta 0:00:45 lr 0.001432 wd 0.0500 time 0.2565 (0.2572) data time 0.0007 (0.0018) model time 0.2558 (0.2554) loss 6.1294 (5.9582) grad_norm 2.3882 (1.9272) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][460/625] eta 0:00:42 lr 0.001432 wd 0.0500 time 0.2559 (0.2571) data time 0.0006 (0.0018) model time 0.2553 (0.2554) loss 7.4288 (5.9623) grad_norm 1.5825 (1.9246) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][470/625] eta 0:00:39 lr 0.001432 wd 0.0500 time 0.2556 (0.2571) data time 0.0007 (0.0018) model time 0.2549 (0.2553) loss 6.4291 (5.9641) grad_norm 2.0982 (1.9212) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][480/625] eta 0:00:37 lr 0.001432 wd 0.0500 time 0.2544 (0.2571) data time 0.0008 (0.0018) model time 0.2537 (0.2553) loss 5.0221 (5.9650) grad_norm 1.2692 (1.9250) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][490/625] eta 0:00:34 lr 0.001432 wd 0.0500 time 0.2548 (0.2570) data time 0.0007 (0.0017) model time 0.2542 (0.2553) loss 6.0743 (5.9592) grad_norm 3.3943 (1.9295) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][500/625] eta 0:00:32 lr 0.001432 wd 0.0500 time 0.2575 (0.2570) data time 0.0010 (0.0017) model time 0.2565 (0.2553) loss 6.6276 (5.9527) grad_norm 1.7122 (1.9256) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][510/625] eta 0:00:29 lr 0.001431 wd 0.0500 time 0.2565 (0.2570) data time 0.0009 (0.0017) model time 0.2557 (0.2553) loss 5.6819 (5.9559) grad_norm 3.1187 (1.9288) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][520/625] eta 0:00:26 lr 0.001431 wd 0.0500 time 0.2609 (0.2570) data time 0.0008 (0.0017) model time 0.2601 (0.2553) loss 6.7128 (5.9650) grad_norm 2.0063 (1.9287) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][530/625] eta 0:00:24 lr 0.001431 wd 0.0500 time 0.2565 (0.2569) data time 0.0009 (0.0017) model time 0.2556 (0.2553) loss 7.2085 (5.9677) grad_norm 1.8959 (1.9294) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][540/625] eta 0:00:21 lr 0.001431 wd 0.0500 time 0.2558 (0.2569) data time 0.0008 (0.0017) model time 0.2550 (0.2552) loss 6.7959 (5.9678) grad_norm 2.1031 (1.9321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][550/625] eta 0:00:19 lr 0.001431 wd 0.0500 time 0.2617 (0.2569) data time 0.0008 (0.0016) model time 0.2609 (0.2552) loss 4.8415 (5.9652) grad_norm 1.0490 (1.9210) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][560/625] eta 0:00:16 lr 0.001431 wd 0.0500 time 0.2596 (0.2569) data time 0.0007 (0.0016) model time 0.2589 (0.2552) loss 7.2392 (5.9764) grad_norm 1.3576 (1.9127) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][570/625] eta 0:00:14 lr 0.001430 wd 0.0500 time 0.2557 (0.2569) data time 0.0006 (0.0016) model time 0.2550 (0.2552) loss 4.6282 (5.9676) grad_norm 1.5620 (1.9106) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][580/625] eta 0:00:11 lr 0.001430 wd 0.0500 time 0.2552 (0.2568) data time 0.0007 (0.0016) model time 0.2544 (0.2552) loss 6.9805 (5.9679) grad_norm 1.7231 (1.9090) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 08:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][590/625] eta 0:00:08 lr 0.001430 wd 0.0500 time 0.2581 (0.2568) data time 0.0010 (0.0016) model time 0.2572 (0.2552) loss 5.6530 (5.9716) grad_norm 1.6749 (1.9065) loss_scale 4096.0000 (2072.2572) mem 9655MB [2024-07-31 08:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][600/625] eta 0:00:06 lr 0.001430 wd 0.0500 time 0.2535 (0.2571) data time 0.0008 (0.0016) model time 0.2527 (0.2556) loss 6.6860 (5.9774) grad_norm 1.4917 (1.9037) loss_scale 4096.0000 (2105.9301) mem 9655MB [2024-07-31 08:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 08:56:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 08:57:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 09:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 09:01:23 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 09:01:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 09:01:34 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 09:01:35 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 09:01:35 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 09:01:35 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 120) [2024-07-31 09:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 09:01:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][610/625] eta 0:00:18 lr 0.001430 wd 0.0500 time 0.2702 (1.2149) data time 0.0008 (0.0779) model time 0.2694 (1.1370) loss 6.2443 (6.8038) grad_norm 2.4305 (2.1801) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][620/625] eta 0:00:03 lr 0.001430 wd 0.0500 time 0.2506 (0.6485) data time 0.0005 (0.0324) model time 0.2500 (0.6161) loss 5.8125 (6.3898) grad_norm 3.4824 (2.0593) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 120 training takes 0:00:12 [2024-07-31 09:01:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:01:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.387 (0.387) Loss 0.6846 (0.6846) Acc@1 87.305 (87.305) Acc@5 98.047 (98.047) Mem 9656MB [2024-07-31 09:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.086) Loss 1.1074 (0.8340) Acc@1 76.123 (83.101) Acc@5 93.994 (96.813) Mem 9656MB [2024-07-31 09:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.071) Loss 1.2061 (0.9852) Acc@1 72.217 (79.304) Acc@5 93.262 (95.066) Mem 9656MB [2024-07-31 09:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.983 Acc@5 94.966 [2024-07-31 09:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-31 09:01:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.724 (0.724) Loss 0.5781 (0.5781) Acc@1 88.623 (88.623) Acc@5 98.535 (98.535) Mem 9656MB [2024-07-31 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.123) Loss 0.9399 (0.7218) Acc@1 77.979 (84.655) Acc@5 94.922 (97.239) Mem 9656MB [2024-07-31 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.053 (0.093) Loss 1.0840 (0.8607) Acc@1 73.584 (80.931) Acc@5 93.506 (95.657) Mem 9656MB [2024-07-31 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.600 Acc@5 95.625 [2024-07-31 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-31 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.60% [2024-07-31 09:01:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:02:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][0/625] eta 0:07:11 lr 0.001430 wd 0.0500 time 0.6909 (0.6909) data time 0.3760 (0.3760) model time 0.0000 (0.0000) loss 6.6995 (6.6995) grad_norm 2.0484 (2.0484) loss_scale 4096.0000 (4096.0000) mem 9651MB [2024-07-31 09:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][10/625] eta 0:03:00 lr 0.001429 wd 0.0500 time 0.2536 (0.2933) data time 0.0007 (0.0350) model time 0.0000 (0.0000) loss 6.0436 (6.3435) grad_norm 3.2024 (2.1504) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][20/625] eta 0:02:47 lr 0.001429 wd 0.0500 time 0.2548 (0.2761) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 6.9552 (6.1552) grad_norm 2.0412 (2.1131) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][30/625] eta 0:02:42 lr 0.001429 wd 0.0500 time 0.2589 (0.2728) data time 0.0006 (0.0130) model time 0.0000 (0.0000) loss 5.8158 (6.1781) grad_norm 1.4562 (1.9864) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][40/625] eta 0:02:36 lr 0.001429 wd 0.0500 time 0.2548 (0.2683) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 6.7448 (6.1400) grad_norm 2.6632 (1.8951) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][50/625] eta 0:02:33 lr 0.001429 wd 0.0500 time 0.2557 (0.2663) data time 0.0008 (0.0083) model time 0.0000 (0.0000) loss 5.8625 (6.0548) grad_norm 2.3791 (1.9190) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][60/625] eta 0:02:30 lr 0.001429 wd 0.0500 time 0.2539 (0.2660) data time 0.0008 (0.0075) model time 0.2531 (0.2608) loss 6.2836 (6.0266) grad_norm 1.3720 (1.8678) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][70/625] eta 0:02:27 lr 0.001428 wd 0.0500 time 0.2577 (0.2649) data time 0.0006 (0.0066) model time 0.2571 (0.2589) loss 4.5900 (5.9831) grad_norm 1.8055 (1.8724) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][80/625] eta 0:02:23 lr 0.001428 wd 0.0500 time 0.2529 (0.2638) data time 0.0008 (0.0059) model time 0.2521 (0.2577) loss 6.5820 (6.0398) grad_norm 2.0917 (1.9332) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][90/625] eta 0:02:20 lr 0.001428 wd 0.0500 time 0.2588 (0.2632) data time 0.0011 (0.0054) model time 0.2578 (0.2575) loss 6.1916 (6.0497) grad_norm 2.4100 (1.9461) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][100/625] eta 0:02:17 lr 0.001428 wd 0.0500 time 0.2565 (0.2627) data time 0.0008 (0.0050) model time 0.2557 (0.2574) loss 6.4054 (6.0702) grad_norm 1.6628 (1.9304) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][110/625] eta 0:02:15 lr 0.001428 wd 0.0500 time 0.2569 (0.2631) data time 0.0008 (0.0046) model time 0.2561 (0.2587) loss 6.5211 (6.0713) grad_norm 1.5666 (1.9056) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][120/625] eta 0:02:12 lr 0.001428 wd 0.0500 time 0.2584 (0.2627) data time 0.0009 (0.0043) model time 0.2574 (0.2586) loss 6.2948 (6.0577) grad_norm 1.3841 (1.9203) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][130/625] eta 0:02:09 lr 0.001427 wd 0.0500 time 0.2524 (0.2623) data time 0.0009 (0.0041) model time 0.2515 (0.2583) loss 6.6702 (6.0532) grad_norm 2.1775 (1.9438) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][140/625] eta 0:02:06 lr 0.001427 wd 0.0500 time 0.2586 (0.2618) data time 0.0008 (0.0039) model time 0.2578 (0.2579) loss 6.3613 (6.0651) grad_norm 1.6260 (1.9578) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][150/625] eta 0:02:04 lr 0.001427 wd 0.0500 time 0.2569 (0.2616) data time 0.0007 (0.0037) model time 0.2562 (0.2579) loss 5.5602 (6.0571) grad_norm 1.5577 (1.9390) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][160/625] eta 0:02:01 lr 0.001427 wd 0.0500 time 0.2572 (0.2614) data time 0.0009 (0.0035) model time 0.2563 (0.2578) loss 6.8245 (6.0341) grad_norm 2.0019 (1.9267) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][170/625] eta 0:01:58 lr 0.001427 wd 0.0500 time 0.2527 (0.2610) data time 0.0009 (0.0033) model time 0.2518 (0.2575) loss 7.0471 (6.0361) grad_norm 1.5550 (1.9085) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][180/625] eta 0:01:56 lr 0.001427 wd 0.0500 time 0.2533 (0.2607) data time 0.0007 (0.0032) model time 0.2526 (0.2573) loss 7.4403 (6.0201) grad_norm 2.1474 (1.9147) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][190/625] eta 0:01:53 lr 0.001426 wd 0.0500 time 0.2532 (0.2606) data time 0.0009 (0.0031) model time 0.2522 (0.2574) loss 6.1815 (6.0056) grad_norm 1.5514 (1.9185) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][200/625] eta 0:01:50 lr 0.001426 wd 0.0500 time 0.2569 (0.2604) data time 0.0007 (0.0030) model time 0.2561 (0.2573) loss 7.6185 (5.9960) grad_norm 1.0568 (1.8962) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][210/625] eta 0:01:48 lr 0.001426 wd 0.0500 time 0.2554 (0.2604) data time 0.0007 (0.0029) model time 0.2547 (0.2573) loss 6.1309 (5.9858) grad_norm 1.6220 (1.8851) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][220/625] eta 0:01:45 lr 0.001426 wd 0.0500 time 0.2587 (0.2602) data time 0.0006 (0.0028) model time 0.2581 (0.2572) loss 7.2218 (5.9810) grad_norm 1.6159 (1.8824) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][230/625] eta 0:01:42 lr 0.001426 wd 0.0500 time 0.2589 (0.2600) data time 0.0007 (0.0027) model time 0.2582 (0.2571) loss 6.7632 (5.9696) grad_norm 1.5683 (1.8784) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][240/625] eta 0:01:40 lr 0.001426 wd 0.0500 time 0.2580 (0.2598) data time 0.0009 (0.0026) model time 0.2570 (0.2570) loss 6.8985 (5.9589) grad_norm 2.0626 (1.8993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][250/625] eta 0:01:37 lr 0.001425 wd 0.0500 time 0.2584 (0.2597) data time 0.0009 (0.0025) model time 0.2574 (0.2570) loss 5.4251 (5.9413) grad_norm 1.4471 (1.8875) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][260/625] eta 0:01:34 lr 0.001425 wd 0.0500 time 0.2605 (0.2596) data time 0.0007 (0.0025) model time 0.2598 (0.2568) loss 5.1233 (5.9519) grad_norm 2.7272 (1.8904) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][270/625] eta 0:01:32 lr 0.001425 wd 0.0500 time 0.2562 (0.2595) data time 0.0007 (0.0024) model time 0.2555 (0.2568) loss 6.0416 (5.9580) grad_norm 1.5047 (1.8822) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][280/625] eta 0:01:29 lr 0.001425 wd 0.0500 time 0.2615 (0.2594) data time 0.0009 (0.0024) model time 0.2606 (0.2568) loss 6.7060 (5.9456) grad_norm 1.8727 (1.8799) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][290/625] eta 0:01:26 lr 0.001425 wd 0.0500 time 0.2578 (0.2593) data time 0.0009 (0.0023) model time 0.2569 (0.2567) loss 6.7124 (5.9530) grad_norm 1.7723 (1.8694) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][300/625] eta 0:01:24 lr 0.001425 wd 0.0500 time 0.2589 (0.2592) data time 0.0010 (0.0023) model time 0.2579 (0.2567) loss 5.9464 (5.9697) grad_norm 1.8698 (1.8721) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][310/625] eta 0:01:21 lr 0.001425 wd 0.0500 time 0.2544 (0.2591) data time 0.0008 (0.0022) model time 0.2536 (0.2566) loss 6.0701 (5.9668) grad_norm 2.1032 (1.8663) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][320/625] eta 0:01:19 lr 0.001424 wd 0.0500 time 0.2595 (0.2590) data time 0.0008 (0.0022) model time 0.2587 (0.2566) loss 5.9139 (5.9687) grad_norm 1.5414 (1.8553) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][330/625] eta 0:01:16 lr 0.001424 wd 0.0500 time 0.2634 (0.2590) data time 0.0006 (0.0021) model time 0.2627 (0.2567) loss 6.5530 (5.9700) grad_norm 1.4284 (1.8433) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][340/625] eta 0:01:13 lr 0.001424 wd 0.0500 time 0.2559 (0.2589) data time 0.0006 (0.0021) model time 0.2553 (0.2566) loss 7.3177 (5.9691) grad_norm 3.4034 (1.8574) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][350/625] eta 0:01:11 lr 0.001424 wd 0.0500 time 0.2568 (0.2588) data time 0.0009 (0.0021) model time 0.2559 (0.2565) loss 6.6573 (5.9649) grad_norm 1.8455 (1.8641) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][360/625] eta 0:01:08 lr 0.001424 wd 0.0500 time 0.2521 (0.2587) data time 0.0009 (0.0020) model time 0.2512 (0.2565) loss 6.8640 (5.9701) grad_norm 1.5340 (1.8536) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][370/625] eta 0:01:05 lr 0.001424 wd 0.0500 time 0.2605 (0.2587) data time 0.0006 (0.0020) model time 0.2600 (0.2565) loss 5.3078 (5.9660) grad_norm 2.7026 (1.8529) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][380/625] eta 0:01:03 lr 0.001423 wd 0.0500 time 0.2542 (0.2586) data time 0.0007 (0.0020) model time 0.2534 (0.2564) loss 6.9101 (5.9714) grad_norm 1.5365 (1.8654) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][390/625] eta 0:01:00 lr 0.001423 wd 0.0500 time 0.2567 (0.2586) data time 0.0010 (0.0019) model time 0.2557 (0.2564) loss 6.0581 (5.9748) grad_norm 2.4766 (1.8692) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][400/625] eta 0:00:58 lr 0.001423 wd 0.0500 time 0.2599 (0.2592) data time 0.0006 (0.0019) model time 0.2593 (0.2571) loss 6.0451 (5.9808) grad_norm 1.4337 (1.8702) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][410/625] eta 0:00:55 lr 0.001423 wd 0.0500 time 0.2505 (0.2591) data time 0.0008 (0.0019) model time 0.2497 (0.2571) loss 5.4281 (5.9929) grad_norm 1.1522 (1.8611) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][420/625] eta 0:00:53 lr 0.001423 wd 0.0500 time 0.2549 (0.2591) data time 0.0006 (0.0019) model time 0.2543 (0.2571) loss 5.9460 (5.9960) grad_norm 1.5013 (1.8539) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][430/625] eta 0:00:50 lr 0.001423 wd 0.0500 time 0.2554 (0.2591) data time 0.0010 (0.0019) model time 0.2543 (0.2572) loss 4.7769 (5.9893) grad_norm 1.1653 (1.8443) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][440/625] eta 0:00:47 lr 0.001422 wd 0.0500 time 0.2585 (0.2591) data time 0.0006 (0.0018) model time 0.2579 (0.2571) loss 5.1093 (5.9820) grad_norm 1.1301 (1.8399) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][450/625] eta 0:00:45 lr 0.001422 wd 0.0500 time 0.2564 (0.2590) data time 0.0009 (0.0018) model time 0.2556 (0.2571) loss 5.6656 (5.9754) grad_norm 1.1798 (1.8374) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][460/625] eta 0:00:42 lr 0.001422 wd 0.0500 time 0.2538 (0.2590) data time 0.0008 (0.0018) model time 0.2530 (0.2571) loss 6.6029 (5.9781) grad_norm 1.3379 (1.8482) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][470/625] eta 0:00:40 lr 0.001422 wd 0.0500 time 0.2512 (0.2589) data time 0.0009 (0.0018) model time 0.2503 (0.2570) loss 6.7864 (5.9832) grad_norm 1.5007 (1.8564) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][480/625] eta 0:00:37 lr 0.001422 wd 0.0500 time 0.2557 (0.2593) data time 0.0008 (0.0017) model time 0.2548 (0.2575) loss 5.5569 (5.9778) grad_norm 1.3315 (1.8533) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][490/625] eta 0:00:34 lr 0.001422 wd 0.0500 time 0.2576 (0.2593) data time 0.0008 (0.0017) model time 0.2569 (0.2575) loss 6.0080 (5.9869) grad_norm 1.3739 (1.8449) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][500/625] eta 0:00:32 lr 0.001421 wd 0.0500 time 0.2544 (0.2592) data time 0.0007 (0.0017) model time 0.2537 (0.2574) loss 4.7861 (5.9869) grad_norm 1.3286 (1.8454) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][510/625] eta 0:00:29 lr 0.001421 wd 0.0500 time 0.2550 (0.2591) data time 0.0009 (0.0017) model time 0.2541 (0.2574) loss 6.4948 (5.9878) grad_norm 1.6478 (1.8511) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][520/625] eta 0:00:27 lr 0.001421 wd 0.0500 time 0.2571 (0.2591) data time 0.0012 (0.0017) model time 0.2559 (0.2573) loss 4.5837 (5.9836) grad_norm 1.2546 (1.8476) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][530/625] eta 0:00:24 lr 0.001421 wd 0.0500 time 0.2563 (0.2591) data time 0.0008 (0.0017) model time 0.2555 (0.2573) loss 6.6063 (5.9878) grad_norm 1.4650 (1.8440) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][540/625] eta 0:00:22 lr 0.001421 wd 0.0500 time 0.2532 (0.2590) data time 0.0007 (0.0016) model time 0.2526 (0.2573) loss 6.6817 (5.9957) grad_norm 1.9830 (1.8392) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][550/625] eta 0:00:19 lr 0.001421 wd 0.0500 time 0.2571 (0.2590) data time 0.0009 (0.0016) model time 0.2561 (0.2573) loss 5.4131 (5.9954) grad_norm 2.4142 (1.8375) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][560/625] eta 0:00:16 lr 0.001420 wd 0.0500 time 0.2580 (0.2590) data time 0.0007 (0.0016) model time 0.2573 (0.2573) loss 6.3226 (5.9969) grad_norm 1.6698 (1.8457) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][570/625] eta 0:00:14 lr 0.001420 wd 0.0500 time 0.2561 (0.2589) data time 0.0009 (0.0016) model time 0.2552 (0.2573) loss 6.7279 (6.0029) grad_norm 1.7742 (1.8481) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][580/625] eta 0:00:11 lr 0.001420 wd 0.0500 time 0.2587 (0.2589) data time 0.0007 (0.0016) model time 0.2580 (0.2572) loss 5.8936 (5.9997) grad_norm 1.1961 (1.8460) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][590/625] eta 0:00:09 lr 0.001420 wd 0.0500 time 0.2509 (0.2589) data time 0.0011 (0.0016) model time 0.2498 (0.2572) loss 4.7868 (5.9994) grad_norm 2.3550 (1.8441) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][600/625] eta 0:00:06 lr 0.001420 wd 0.0500 time 0.2561 (0.2589) data time 0.0008 (0.0016) model time 0.2553 (0.2572) loss 6.5258 (6.0030) grad_norm 1.2119 (1.8387) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][610/625] eta 0:00:03 lr 0.001420 wd 0.0500 time 0.2524 (0.2588) data time 0.0006 (0.0016) model time 0.2518 (0.2572) loss 6.9489 (6.0039) grad_norm 4.6316 (1.8422) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][620/625] eta 0:00:01 lr 0.001420 wd 0.0500 time 0.2542 (0.2587) data time 0.0005 (0.0015) model time 0.2537 (0.2571) loss 4.9668 (6.0005) grad_norm 1.9964 (1.8481) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 121 training takes 0:02:41 [2024-07-31 09:04:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:04:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.432 (0.432) Loss 0.6562 (0.6562) Acc@1 87.598 (87.598) Acc@5 98.242 (98.242) Mem 9655MB [2024-07-31 09:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 1.0244 (0.8011) Acc@1 77.148 (83.385) Acc@5 94.238 (96.897) Mem 9655MB [2024-07-31 09:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.1729 (0.9454) Acc@1 72.266 (79.729) Acc@5 92.676 (95.103) Mem 9655MB [2024-07-31 09:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.427 Acc@5 95.018 [2024-07-31 09:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-31 09:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.43% [2024-07-31 09:04:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 09:04:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 09:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.390 (0.390) Loss 0.5781 (0.5781) Acc@1 88.525 (88.525) Acc@5 98.633 (98.633) Mem 9655MB [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.086) Loss 0.9399 (0.7219) Acc@1 77.979 (84.637) Acc@5 94.971 (97.266) Mem 9655MB [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.071) Loss 1.0830 (0.8605) Acc@1 73.682 (80.943) Acc@5 93.506 (95.680) Mem 9655MB [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.604 Acc@5 95.643 [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.60% [2024-07-31 09:04:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:04:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][0/625] eta 0:07:41 lr 0.001419 wd 0.0500 time 0.7390 (0.7390) data time 0.4978 (0.4978) model time 0.0000 (0.0000) loss 6.0082 (6.0082) grad_norm 1.5869 (1.5869) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][10/625] eta 0:03:04 lr 0.001419 wd 0.0500 time 0.2525 (0.2997) data time 0.0009 (0.0460) model time 0.0000 (0.0000) loss 4.7145 (5.9389) grad_norm 1.7092 (1.9733) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][20/625] eta 0:02:48 lr 0.001419 wd 0.0500 time 0.2573 (0.2790) data time 0.0010 (0.0245) model time 0.0000 (0.0000) loss 5.7710 (5.9167) grad_norm 2.2998 (1.9474) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][30/625] eta 0:02:41 lr 0.001419 wd 0.0500 time 0.2524 (0.2719) data time 0.0006 (0.0168) model time 0.0000 (0.0000) loss 5.7641 (5.9695) grad_norm 1.4959 (1.9339) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][40/625] eta 0:02:36 lr 0.001419 wd 0.0500 time 0.2587 (0.2683) data time 0.0009 (0.0129) model time 0.0000 (0.0000) loss 6.1024 (5.9676) grad_norm 1.4423 (1.8666) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][50/625] eta 0:02:33 lr 0.001419 wd 0.0500 time 0.2572 (0.2663) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 4.4385 (5.8516) grad_norm 1.7701 (1.9746) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][60/625] eta 0:02:29 lr 0.001418 wd 0.0500 time 0.2507 (0.2650) data time 0.0009 (0.0090) model time 0.2498 (0.2575) loss 5.0318 (5.8711) grad_norm 1.4592 (2.0008) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][70/625] eta 0:02:26 lr 0.001418 wd 0.0500 time 0.2552 (0.2641) data time 0.0010 (0.0078) model time 0.2543 (0.2577) loss 4.7037 (5.8489) grad_norm 1.8331 (1.9878) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][80/625] eta 0:02:23 lr 0.001418 wd 0.0500 time 0.2654 (0.2632) data time 0.0008 (0.0070) model time 0.2647 (0.2571) loss 5.0822 (5.8417) grad_norm 1.0690 (1.9646) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][90/625] eta 0:02:20 lr 0.001418 wd 0.0500 time 0.2586 (0.2623) data time 0.0006 (0.0063) model time 0.2580 (0.2564) loss 7.2502 (5.9214) grad_norm 1.2682 (1.9163) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][100/625] eta 0:02:17 lr 0.001418 wd 0.0500 time 0.2612 (0.2618) data time 0.0008 (0.0058) model time 0.2604 (0.2564) loss 6.6899 (5.9353) grad_norm 1.8171 (1.9105) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][110/625] eta 0:02:14 lr 0.001418 wd 0.0500 time 0.2584 (0.2615) data time 0.0008 (0.0053) model time 0.2576 (0.2566) loss 5.6261 (5.9080) grad_norm 1.1275 (1.9041) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][120/625] eta 0:02:11 lr 0.001417 wd 0.0500 time 0.2599 (0.2612) data time 0.0007 (0.0050) model time 0.2592 (0.2567) loss 5.2730 (5.9253) grad_norm 1.1367 (1.8748) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][130/625] eta 0:02:09 lr 0.001417 wd 0.0500 time 0.2610 (0.2608) data time 0.0008 (0.0046) model time 0.2603 (0.2565) loss 4.8761 (5.9402) grad_norm 1.6034 (1.8700) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][140/625] eta 0:02:06 lr 0.001417 wd 0.0500 time 0.2538 (0.2606) data time 0.0008 (0.0044) model time 0.2530 (0.2565) loss 6.4065 (5.9249) grad_norm 1.2959 (1.8619) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][150/625] eta 0:02:03 lr 0.001417 wd 0.0500 time 0.2546 (0.2603) data time 0.0007 (0.0041) model time 0.2539 (0.2564) loss 6.6048 (5.9218) grad_norm 1.1515 (1.8984) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][160/625] eta 0:02:00 lr 0.001417 wd 0.0500 time 0.2567 (0.2601) data time 0.0007 (0.0039) model time 0.2560 (0.2564) loss 4.8298 (5.9046) grad_norm 1.2557 (1.9118) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][170/625] eta 0:01:58 lr 0.001417 wd 0.0500 time 0.2530 (0.2598) data time 0.0009 (0.0038) model time 0.2521 (0.2562) loss 6.7885 (5.9107) grad_norm 1.1841 (1.9301) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][180/625] eta 0:01:55 lr 0.001417 wd 0.0500 time 0.2547 (0.2597) data time 0.0007 (0.0036) model time 0.2540 (0.2562) loss 5.7100 (5.8998) grad_norm 1.5518 (1.8968) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][190/625] eta 0:01:52 lr 0.001416 wd 0.0500 time 0.2566 (0.2596) data time 0.0009 (0.0034) model time 0.2558 (0.2563) loss 4.6757 (5.8911) grad_norm 1.5809 (1.8750) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][200/625] eta 0:01:50 lr 0.001416 wd 0.0500 time 0.2602 (0.2594) data time 0.0008 (0.0033) model time 0.2594 (0.2562) loss 7.2164 (5.9149) grad_norm 1.6968 (1.8551) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][210/625] eta 0:01:47 lr 0.001416 wd 0.0500 time 0.2577 (0.2594) data time 0.0009 (0.0032) model time 0.2568 (0.2563) loss 5.2494 (5.9204) grad_norm 2.0044 (1.8429) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][220/625] eta 0:01:45 lr 0.001416 wd 0.0500 time 0.2534 (0.2593) data time 0.0007 (0.0031) model time 0.2526 (0.2563) loss 5.3791 (5.9233) grad_norm 2.0935 (1.8367) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][230/625] eta 0:01:42 lr 0.001416 wd 0.0500 time 0.2513 (0.2592) data time 0.0008 (0.0030) model time 0.2504 (0.2563) loss 4.5821 (5.9316) grad_norm 2.8610 (1.8387) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][240/625] eta 0:01:39 lr 0.001416 wd 0.0500 time 0.2594 (0.2592) data time 0.0006 (0.0029) model time 0.2588 (0.2564) loss 5.6805 (5.9277) grad_norm 3.7973 (1.8492) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][250/625] eta 0:01:37 lr 0.001415 wd 0.0500 time 0.2539 (0.2591) data time 0.0006 (0.0028) model time 0.2533 (0.2564) loss 5.2191 (5.9226) grad_norm 1.1942 (1.8411) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][260/625] eta 0:01:34 lr 0.001415 wd 0.0500 time 0.2569 (0.2590) data time 0.0007 (0.0028) model time 0.2563 (0.2564) loss 5.4350 (5.9333) grad_norm 1.5361 (1.8336) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][270/625] eta 0:01:31 lr 0.001415 wd 0.0500 time 0.2548 (0.2590) data time 0.0009 (0.0027) model time 0.2538 (0.2564) loss 5.5451 (5.9308) grad_norm 1.4138 (1.8289) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][280/625] eta 0:01:29 lr 0.001415 wd 0.0500 time 0.2598 (0.2590) data time 0.0008 (0.0026) model time 0.2590 (0.2565) loss 6.2448 (5.9473) grad_norm 2.9606 (1.8415) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][290/625] eta 0:01:26 lr 0.001415 wd 0.0500 time 0.2541 (0.2590) data time 0.0009 (0.0026) model time 0.2532 (0.2566) loss 6.3067 (5.9489) grad_norm 1.5924 (1.8445) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][300/625] eta 0:01:24 lr 0.001415 wd 0.0500 time 0.2541 (0.2589) data time 0.0007 (0.0025) model time 0.2534 (0.2566) loss 5.5096 (5.9450) grad_norm 1.6526 (1.8382) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][310/625] eta 0:01:21 lr 0.001414 wd 0.0500 time 0.2534 (0.2589) data time 0.0008 (0.0025) model time 0.2526 (0.2566) loss 5.7755 (5.9380) grad_norm 2.5447 (1.8632) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][320/625] eta 0:01:18 lr 0.001414 wd 0.0500 time 0.2640 (0.2589) data time 0.0009 (0.0024) model time 0.2631 (0.2566) loss 6.5915 (5.9437) grad_norm 2.3992 (1.8696) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][330/625] eta 0:01:16 lr 0.001414 wd 0.0500 time 0.2521 (0.2588) data time 0.0009 (0.0024) model time 0.2512 (0.2566) loss 3.8757 (5.9458) grad_norm 2.1481 (1.8646) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][340/625] eta 0:01:13 lr 0.001414 wd 0.0500 time 0.2586 (0.2588) data time 0.0007 (0.0023) model time 0.2579 (0.2566) loss 7.2280 (5.9437) grad_norm 1.4625 (1.8607) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][350/625] eta 0:01:11 lr 0.001414 wd 0.0500 time 0.2546 (0.2587) data time 0.0007 (0.0023) model time 0.2539 (0.2566) loss 4.1721 (5.9368) grad_norm 1.6073 (1.8531) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][360/625] eta 0:01:08 lr 0.001414 wd 0.0500 time 0.2573 (0.2588) data time 0.0009 (0.0022) model time 0.2564 (0.2567) loss 6.2271 (5.9405) grad_norm 1.3660 (1.8480) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][370/625] eta 0:01:05 lr 0.001413 wd 0.0500 time 0.2573 (0.2588) data time 0.0010 (0.0022) model time 0.2563 (0.2567) loss 5.0788 (5.9549) grad_norm 1.8536 (1.8511) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][380/625] eta 0:01:03 lr 0.001413 wd 0.0500 time 0.2575 (0.2588) data time 0.0009 (0.0022) model time 0.2566 (0.2567) loss 6.7336 (5.9479) grad_norm 1.6525 (1.8658) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][390/625] eta 0:01:00 lr 0.001413 wd 0.0500 time 0.2602 (0.2587) data time 0.0009 (0.0021) model time 0.2592 (0.2567) loss 6.7856 (5.9514) grad_norm 1.9172 (1.8618) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][400/625] eta 0:00:58 lr 0.001413 wd 0.0500 time 0.2589 (0.2587) data time 0.0007 (0.0021) model time 0.2582 (0.2567) loss 5.7120 (5.9562) grad_norm 1.8514 (1.8694) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][410/625] eta 0:00:55 lr 0.001413 wd 0.0500 time 0.2564 (0.2587) data time 0.0010 (0.0021) model time 0.2554 (0.2567) loss 6.2215 (5.9620) grad_norm 2.4923 (1.8771) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][420/625] eta 0:00:53 lr 0.001413 wd 0.0500 time 0.2536 (0.2587) data time 0.0009 (0.0021) model time 0.2527 (0.2567) loss 5.7531 (5.9648) grad_norm 1.7431 (1.8791) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][430/625] eta 0:00:50 lr 0.001412 wd 0.0500 time 0.2543 (0.2586) data time 0.0009 (0.0020) model time 0.2533 (0.2567) loss 6.4159 (5.9630) grad_norm 1.6621 (1.8799) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][440/625] eta 0:00:47 lr 0.001412 wd 0.0500 time 0.2543 (0.2586) data time 0.0007 (0.0020) model time 0.2536 (0.2567) loss 6.4127 (5.9671) grad_norm 1.7346 (1.8853) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][450/625] eta 0:00:45 lr 0.001412 wd 0.0500 time 0.2574 (0.2585) data time 0.0007 (0.0020) model time 0.2568 (0.2567) loss 6.5101 (5.9667) grad_norm 2.4113 (1.8872) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][460/625] eta 0:00:42 lr 0.001412 wd 0.0500 time 0.2521 (0.2585) data time 0.0007 (0.0019) model time 0.2514 (0.2566) loss 5.8689 (5.9708) grad_norm 1.6817 (1.8889) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][470/625] eta 0:00:40 lr 0.001412 wd 0.0500 time 0.2568 (0.2585) data time 0.0008 (0.0019) model time 0.2560 (0.2567) loss 5.7959 (5.9760) grad_norm 4.4907 (1.8912) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][480/625] eta 0:00:37 lr 0.001412 wd 0.0500 time 0.2629 (0.2585) data time 0.0008 (0.0019) model time 0.2621 (0.2567) loss 6.6660 (5.9845) grad_norm 1.7057 (1.9051) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][490/625] eta 0:00:34 lr 0.001411 wd 0.0500 time 0.2503 (0.2585) data time 0.0009 (0.0019) model time 0.2494 (0.2567) loss 6.0990 (5.9890) grad_norm 2.0117 (1.9136) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][500/625] eta 0:00:32 lr 0.001411 wd 0.0500 time 0.2536 (0.2584) data time 0.0007 (0.0019) model time 0.2528 (0.2567) loss 6.1827 (5.9944) grad_norm 3.2422 (1.9199) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][510/625] eta 0:00:29 lr 0.001411 wd 0.0500 time 0.2593 (0.2584) data time 0.0007 (0.0018) model time 0.2586 (0.2566) loss 6.4720 (5.9944) grad_norm 2.1605 (1.9211) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][520/625] eta 0:00:27 lr 0.001411 wd 0.0500 time 0.2594 (0.2584) data time 0.0010 (0.0018) model time 0.2583 (0.2566) loss 6.5591 (5.9883) grad_norm 1.5385 (1.9162) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][530/625] eta 0:00:24 lr 0.001411 wd 0.0500 time 0.2568 (0.2583) data time 0.0008 (0.0018) model time 0.2560 (0.2566) loss 6.3349 (5.9953) grad_norm 2.5273 (1.9265) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][540/625] eta 0:00:21 lr 0.001411 wd 0.0500 time 0.2592 (0.2583) data time 0.0008 (0.0018) model time 0.2583 (0.2566) loss 7.3236 (6.0091) grad_norm 2.6180 (1.9288) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][550/625] eta 0:00:19 lr 0.001411 wd 0.0500 time 0.2566 (0.2583) data time 0.0006 (0.0018) model time 0.2560 (0.2566) loss 6.0969 (6.0042) grad_norm 2.0439 (1.9289) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][560/625] eta 0:00:16 lr 0.001410 wd 0.0500 time 0.2594 (0.2583) data time 0.0006 (0.0018) model time 0.2588 (0.2566) loss 6.6509 (6.0103) grad_norm 1.7601 (1.9264) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][570/625] eta 0:00:14 lr 0.001410 wd 0.0500 time 0.2603 (0.2583) data time 0.0007 (0.0017) model time 0.2596 (0.2566) loss 6.9929 (6.0091) grad_norm 1.3323 (1.9232) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][580/625] eta 0:00:11 lr 0.001410 wd 0.0500 time 0.2592 (0.2583) data time 0.0008 (0.0017) model time 0.2584 (0.2566) loss 5.1411 (6.0067) grad_norm 1.6150 (1.9178) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][590/625] eta 0:00:09 lr 0.001410 wd 0.0500 time 0.2545 (0.2582) data time 0.0009 (0.0017) model time 0.2536 (0.2566) loss 6.2067 (6.0037) grad_norm 4.4747 (1.9181) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][600/625] eta 0:00:06 lr 0.001410 wd 0.0500 time 0.2592 (0.2582) data time 0.0008 (0.0017) model time 0.2584 (0.2566) loss 6.7731 (6.0094) grad_norm 2.5226 (1.9152) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][610/625] eta 0:00:03 lr 0.001410 wd 0.0500 time 0.2529 (0.2582) data time 0.0005 (0.0017) model time 0.2524 (0.2566) loss 5.8614 (6.0040) grad_norm 1.3265 (1.9068) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][620/625] eta 0:00:01 lr 0.001409 wd 0.0500 time 0.2554 (0.2581) data time 0.0004 (0.0017) model time 0.2550 (0.2565) loss 6.3819 (6.0110) grad_norm 1.4747 (1.9046) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 122 training takes 0:02:41 [2024-07-31 09:07:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:07:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.406 (0.406) Loss 0.6587 (0.6587) Acc@1 87.793 (87.793) Acc@5 98.145 (98.145) Mem 9655MB [2024-07-31 09:07:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.0781 (0.8242) Acc@1 76.416 (83.074) Acc@5 93.945 (96.835) Mem 9655MB [2024-07-31 09:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.073) Loss 1.2080 (0.9759) Acc@1 72.705 (79.376) Acc@5 92.578 (94.978) Mem 9655MB [2024-07-31 09:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.101 Acc@5 94.940 [2024-07-31 09:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-07-31 09:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.5781 (0.5781) Acc@1 88.477 (88.477) Acc@5 98.633 (98.633) Mem 9655MB [2024-07-31 09:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.9390 (0.7219) Acc@1 78.174 (84.690) Acc@5 94.971 (97.261) Mem 9655MB [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.094) Loss 1.0820 (0.8604) Acc@1 73.730 (80.985) Acc@5 93.555 (95.685) Mem 9655MB [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.630 Acc@5 95.651 [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.63% [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:07:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][0/625] eta 0:08:02 lr 0.001409 wd 0.0500 time 0.7723 (0.7723) data time 0.5233 (0.5233) model time 0.0000 (0.0000) loss 6.0070 (6.0070) grad_norm 1.4292 (1.4292) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][10/625] eta 0:03:05 lr 0.001409 wd 0.0500 time 0.2552 (0.3018) data time 0.0011 (0.0483) model time 0.0000 (0.0000) loss 7.0531 (6.5021) grad_norm 2.5538 (1.9347) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][20/625] eta 0:02:49 lr 0.001409 wd 0.0500 time 0.2674 (0.2808) data time 0.0010 (0.0258) model time 0.0000 (0.0000) loss 4.8908 (6.3045) grad_norm 2.6846 (1.9854) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][30/625] eta 0:02:42 lr 0.001409 wd 0.0500 time 0.2525 (0.2730) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 5.2991 (6.2027) grad_norm 1.2484 (2.0088) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][40/625] eta 0:02:37 lr 0.001409 wd 0.0500 time 0.2564 (0.2692) data time 0.0010 (0.0136) model time 0.0000 (0.0000) loss 6.5732 (6.1318) grad_norm 2.0969 (2.0012) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][50/625] eta 0:02:33 lr 0.001408 wd 0.0500 time 0.2521 (0.2668) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 6.1577 (6.0454) grad_norm 2.2835 (1.9201) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][60/625] eta 0:02:29 lr 0.001408 wd 0.0500 time 0.2630 (0.2652) data time 0.0006 (0.0094) model time 0.2623 (0.2567) loss 5.7144 (6.0427) grad_norm 2.8233 (2.0347) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][70/625] eta 0:02:26 lr 0.001408 wd 0.0500 time 0.2568 (0.2642) data time 0.0009 (0.0082) model time 0.2558 (0.2568) loss 5.0744 (6.0477) grad_norm 1.9590 (2.0222) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][80/625] eta 0:02:24 lr 0.001408 wd 0.0500 time 0.2612 (0.2658) data time 0.0006 (0.0073) model time 0.2606 (0.2633) loss 6.2365 (6.0513) grad_norm 2.0635 (2.0059) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][90/625] eta 0:02:22 lr 0.001408 wd 0.0500 time 0.2581 (0.2669) data time 0.0008 (0.0066) model time 0.2573 (0.2663) loss 7.2051 (6.0553) grad_norm 2.1640 (1.9789) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][100/625] eta 0:02:19 lr 0.001408 wd 0.0500 time 0.2589 (0.2659) data time 0.0011 (0.0060) model time 0.2578 (0.2641) loss 6.3990 (6.0766) grad_norm 1.8657 (1.9576) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][110/625] eta 0:02:16 lr 0.001407 wd 0.0500 time 0.2525 (0.2651) data time 0.0010 (0.0056) model time 0.2515 (0.2627) loss 6.1947 (6.0654) grad_norm 3.3468 (1.9474) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][120/625] eta 0:02:13 lr 0.001407 wd 0.0500 time 0.2533 (0.2644) data time 0.0008 (0.0052) model time 0.2525 (0.2617) loss 5.3361 (6.0709) grad_norm 2.2452 (1.9327) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][130/625] eta 0:02:10 lr 0.001407 wd 0.0500 time 0.2558 (0.2638) data time 0.0009 (0.0049) model time 0.2549 (0.2610) loss 7.0256 (6.0867) grad_norm 1.4124 (1.9113) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][140/625] eta 0:02:07 lr 0.001407 wd 0.0500 time 0.2584 (0.2633) data time 0.0008 (0.0046) model time 0.2576 (0.2604) loss 5.4391 (6.0901) grad_norm 1.4121 (1.8935) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][150/625] eta 0:02:04 lr 0.001407 wd 0.0500 time 0.2583 (0.2629) data time 0.0008 (0.0044) model time 0.2575 (0.2600) loss 6.0969 (6.0610) grad_norm 1.8988 (1.8854) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][160/625] eta 0:02:02 lr 0.001407 wd 0.0500 time 0.2554 (0.2625) data time 0.0008 (0.0041) model time 0.2546 (0.2595) loss 6.4654 (6.0463) grad_norm 1.2027 (1.8647) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][170/625] eta 0:01:59 lr 0.001407 wd 0.0500 time 0.2550 (0.2621) data time 0.0006 (0.0039) model time 0.2544 (0.2592) loss 6.5221 (6.0472) grad_norm 1.6531 (1.8505) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][180/625] eta 0:01:56 lr 0.001406 wd 0.0500 time 0.2564 (0.2619) data time 0.0009 (0.0038) model time 0.2555 (0.2591) loss 6.4936 (6.0503) grad_norm 1.8615 (1.8603) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][190/625] eta 0:01:53 lr 0.001406 wd 0.0500 time 0.2595 (0.2616) data time 0.0006 (0.0036) model time 0.2589 (0.2589) loss 6.8814 (6.0669) grad_norm 2.0141 (1.8510) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][200/625] eta 0:01:51 lr 0.001406 wd 0.0500 time 0.2618 (0.2614) data time 0.0009 (0.0035) model time 0.2609 (0.2587) loss 6.8486 (6.0773) grad_norm 2.4263 (1.8383) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][210/625] eta 0:01:48 lr 0.001406 wd 0.0500 time 0.2559 (0.2612) data time 0.0011 (0.0034) model time 0.2548 (0.2585) loss 4.4130 (6.0457) grad_norm 1.5635 (1.8443) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][220/625] eta 0:01:45 lr 0.001406 wd 0.0500 time 0.2568 (0.2610) data time 0.0007 (0.0033) model time 0.2561 (0.2584) loss 6.1305 (6.0275) grad_norm 2.0087 (1.8597) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][230/625] eta 0:01:43 lr 0.001406 wd 0.0500 time 0.2533 (0.2608) data time 0.0007 (0.0032) model time 0.2526 (0.2583) loss 6.1839 (6.0208) grad_norm 1.5728 (1.8563) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][240/625] eta 0:01:40 lr 0.001405 wd 0.0500 time 0.2564 (0.2607) data time 0.0008 (0.0031) model time 0.2557 (0.2582) loss 6.9990 (6.0324) grad_norm 2.0986 (1.8837) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][250/625] eta 0:01:37 lr 0.001405 wd 0.0500 time 0.2545 (0.2605) data time 0.0009 (0.0030) model time 0.2537 (0.2580) loss 6.0909 (6.0321) grad_norm 1.1490 (1.8900) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][260/625] eta 0:01:35 lr 0.001405 wd 0.0500 time 0.2595 (0.2604) data time 0.0010 (0.0029) model time 0.2585 (0.2579) loss 6.2622 (6.0264) grad_norm 1.3969 (1.8787) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][270/625] eta 0:01:32 lr 0.001405 wd 0.0500 time 0.2593 (0.2602) data time 0.0008 (0.0028) model time 0.2584 (0.2578) loss 6.5353 (6.0306) grad_norm 1.7362 (1.8806) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][280/625] eta 0:01:29 lr 0.001405 wd 0.0500 time 0.2520 (0.2601) data time 0.0008 (0.0027) model time 0.2512 (0.2577) loss 4.5984 (6.0115) grad_norm 1.6155 (1.8867) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][290/625] eta 0:01:27 lr 0.001405 wd 0.0500 time 0.2615 (0.2600) data time 0.0007 (0.0027) model time 0.2608 (0.2577) loss 4.7396 (6.0044) grad_norm 1.7373 (1.9100) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][300/625] eta 0:01:24 lr 0.001404 wd 0.0500 time 0.2576 (0.2600) data time 0.0008 (0.0026) model time 0.2568 (0.2577) loss 6.2304 (5.9888) grad_norm 1.4144 (1.9254) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][310/625] eta 0:01:21 lr 0.001404 wd 0.0500 time 0.2538 (0.2599) data time 0.0008 (0.0026) model time 0.2531 (0.2576) loss 6.5912 (5.9825) grad_norm 1.7093 (1.9252) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][320/625] eta 0:01:19 lr 0.001404 wd 0.0500 time 0.2589 (0.2598) data time 0.0009 (0.0025) model time 0.2579 (0.2576) loss 6.8404 (5.9927) grad_norm 2.3140 (1.9260) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][330/625] eta 0:01:16 lr 0.001404 wd 0.0500 time 0.2553 (0.2597) data time 0.0010 (0.0025) model time 0.2543 (0.2575) loss 5.9896 (6.0104) grad_norm 2.9086 (1.9239) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][340/625] eta 0:01:13 lr 0.001404 wd 0.0500 time 0.2751 (0.2596) data time 0.0006 (0.0024) model time 0.2745 (0.2575) loss 6.2807 (6.0158) grad_norm 1.4121 (1.9219) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][350/625] eta 0:01:11 lr 0.001404 wd 0.0500 time 0.2519 (0.2596) data time 0.0009 (0.0024) model time 0.2510 (0.2575) loss 5.8449 (6.0130) grad_norm 1.2616 (1.9147) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][360/625] eta 0:01:08 lr 0.001403 wd 0.0500 time 0.2597 (0.2595) data time 0.0008 (0.0023) model time 0.2590 (0.2574) loss 6.3989 (6.0212) grad_norm 1.7857 (1.9098) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][370/625] eta 0:01:06 lr 0.001403 wd 0.0500 time 0.2555 (0.2594) data time 0.0008 (0.0023) model time 0.2547 (0.2574) loss 5.4784 (6.0145) grad_norm 1.9872 (1.9010) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][380/625] eta 0:01:03 lr 0.001403 wd 0.0500 time 0.2574 (0.2593) data time 0.0008 (0.0022) model time 0.2566 (0.2573) loss 4.3644 (5.9932) grad_norm 2.1003 (1.9079) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][390/625] eta 0:01:00 lr 0.001403 wd 0.0500 time 0.2563 (0.2593) data time 0.0010 (0.0022) model time 0.2553 (0.2573) loss 5.1278 (5.9825) grad_norm 3.5940 (1.9095) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][400/625] eta 0:00:58 lr 0.001403 wd 0.0500 time 0.2559 (0.2593) data time 0.0006 (0.0022) model time 0.2552 (0.2573) loss 5.0498 (5.9800) grad_norm 2.6350 (1.9204) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][410/625] eta 0:00:55 lr 0.001403 wd 0.0500 time 0.2496 (0.2592) data time 0.0009 (0.0021) model time 0.2487 (0.2573) loss 5.2055 (5.9858) grad_norm 1.8766 (1.9185) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][420/625] eta 0:00:53 lr 0.001402 wd 0.0500 time 0.2589 (0.2592) data time 0.0007 (0.0021) model time 0.2582 (0.2573) loss 7.4129 (5.9890) grad_norm 2.8390 (1.9339) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][430/625] eta 0:00:50 lr 0.001402 wd 0.0500 time 0.2583 (0.2592) data time 0.0009 (0.0021) model time 0.2574 (0.2573) loss 6.1618 (5.9957) grad_norm 1.2576 (1.9350) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][440/625] eta 0:00:47 lr 0.001402 wd 0.0500 time 0.2589 (0.2591) data time 0.0008 (0.0020) model time 0.2581 (0.2573) loss 6.9969 (5.9938) grad_norm 1.1727 (1.9310) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][450/625] eta 0:00:45 lr 0.001402 wd 0.0500 time 0.2585 (0.2592) data time 0.0009 (0.0020) model time 0.2577 (0.2573) loss 7.1773 (5.9902) grad_norm 1.5777 (1.9230) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][460/625] eta 0:00:42 lr 0.001402 wd 0.0500 time 0.2579 (0.2591) data time 0.0007 (0.0020) model time 0.2572 (0.2573) loss 6.2080 (5.9955) grad_norm 1.4372 (1.9147) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][470/625] eta 0:00:40 lr 0.001402 wd 0.0500 time 0.2531 (0.2591) data time 0.0009 (0.0020) model time 0.2522 (0.2573) loss 6.9164 (5.9915) grad_norm 1.3329 (1.9166) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][480/625] eta 0:00:37 lr 0.001401 wd 0.0500 time 0.2576 (0.2590) data time 0.0009 (0.0019) model time 0.2567 (0.2573) loss 6.4222 (5.9867) grad_norm 1.4916 (1.9190) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][490/625] eta 0:00:34 lr 0.001401 wd 0.0500 time 0.2586 (0.2590) data time 0.0008 (0.0019) model time 0.2578 (0.2572) loss 7.5029 (5.9889) grad_norm 1.3292 (1.9117) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][500/625] eta 0:00:32 lr 0.001401 wd 0.0500 time 0.2603 (0.2590) data time 0.0009 (0.0019) model time 0.2595 (0.2572) loss 4.8284 (5.9906) grad_norm 3.1139 (1.9153) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][510/625] eta 0:00:29 lr 0.001401 wd 0.0500 time 0.2548 (0.2589) data time 0.0019 (0.0019) model time 0.2529 (0.2572) loss 6.4298 (5.9995) grad_norm 1.8924 (1.9096) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][520/625] eta 0:00:27 lr 0.001401 wd 0.0500 time 0.2565 (0.2589) data time 0.0009 (0.0019) model time 0.2556 (0.2572) loss 6.9487 (5.9955) grad_norm 3.5042 (1.9134) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][530/625] eta 0:00:24 lr 0.001401 wd 0.0500 time 0.2570 (0.2588) data time 0.0016 (0.0018) model time 0.2554 (0.2571) loss 5.8219 (6.0003) grad_norm 2.2187 (1.9228) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][540/625] eta 0:00:21 lr 0.001400 wd 0.0500 time 0.2555 (0.2588) data time 0.0008 (0.0018) model time 0.2548 (0.2571) loss 6.7142 (6.0009) grad_norm 3.2597 (1.9248) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][550/625] eta 0:00:19 lr 0.001400 wd 0.0500 time 0.2548 (0.2588) data time 0.0009 (0.0018) model time 0.2539 (0.2571) loss 5.5476 (6.0025) grad_norm 1.3435 (1.9217) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][560/625] eta 0:00:16 lr 0.001400 wd 0.0500 time 0.2559 (0.2588) data time 0.0009 (0.0018) model time 0.2550 (0.2571) loss 6.8811 (6.0100) grad_norm 2.3753 (1.9302) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][570/625] eta 0:00:14 lr 0.001400 wd 0.0500 time 0.2544 (0.2587) data time 0.0006 (0.0018) model time 0.2538 (0.2571) loss 5.3737 (6.0038) grad_norm 3.7836 (1.9345) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][580/625] eta 0:00:11 lr 0.001400 wd 0.0500 time 0.2596 (0.2587) data time 0.0007 (0.0018) model time 0.2589 (0.2571) loss 5.4474 (6.0004) grad_norm 2.3550 (1.9413) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][590/625] eta 0:00:09 lr 0.001400 wd 0.0500 time 0.2540 (0.2587) data time 0.0011 (0.0017) model time 0.2529 (0.2570) loss 6.2861 (5.9996) grad_norm 1.3115 (1.9375) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][600/625] eta 0:00:06 lr 0.001400 wd 0.0500 time 0.2502 (0.2586) data time 0.0007 (0.0017) model time 0.2495 (0.2570) loss 5.4811 (5.9948) grad_norm 1.5375 (1.9359) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][610/625] eta 0:00:03 lr 0.001399 wd 0.0500 time 0.2567 (0.2586) data time 0.0005 (0.0017) model time 0.2563 (0.2570) loss 6.3857 (6.0014) grad_norm 1.9457 (1.9336) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][620/625] eta 0:00:01 lr 0.001399 wd 0.0500 time 0.2591 (0.2585) data time 0.0003 (0.0017) model time 0.2588 (0.2569) loss 5.1145 (5.9969) grad_norm 2.0559 (1.9305) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 123 training takes 0:02:41 [2024-07-31 09:10:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:10:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.6582 (0.6582) Acc@1 87.598 (87.598) Acc@5 97.852 (97.852) Mem 9655MB [2024-07-31 09:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0635 (0.8247) Acc@1 78.369 (83.323) Acc@5 94.580 (96.791) Mem 9655MB [2024-07-31 09:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.077) Loss 1.2256 (0.9751) Acc@1 72.168 (79.567) Acc@5 92.627 (95.054) Mem 9655MB [2024-07-31 09:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.257 Acc@5 95.016 [2024-07-31 09:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-31 09:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.753 (0.753) Loss 0.5801 (0.5801) Acc@1 88.477 (88.477) Acc@5 98.584 (98.584) Mem 9655MB [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.125) Loss 0.9375 (0.7225) Acc@1 78.223 (84.739) Acc@5 94.922 (97.248) Mem 9655MB [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.092) Loss 1.0811 (0.8605) Acc@1 73.877 (81.057) Acc@5 93.506 (95.664) Mem 9655MB [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.702 Acc@5 95.629 [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.70% [2024-07-31 09:10:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:10:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][0/625] eta 0:07:00 lr 0.001399 wd 0.0500 time 0.6721 (0.6721) data time 0.4339 (0.4339) model time 0.0000 (0.0000) loss 5.1673 (5.1673) grad_norm 1.8477 (1.8477) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][10/625] eta 0:03:00 lr 0.001399 wd 0.0500 time 0.2558 (0.2941) data time 0.0008 (0.0404) model time 0.0000 (0.0000) loss 5.3425 (6.0333) grad_norm 1.5145 (1.5997) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][20/625] eta 0:02:47 lr 0.001399 wd 0.0500 time 0.2580 (0.2768) data time 0.0008 (0.0216) model time 0.0000 (0.0000) loss 5.4179 (5.8465) grad_norm 1.8267 (1.6843) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][30/625] eta 0:02:40 lr 0.001399 wd 0.0500 time 0.2590 (0.2702) data time 0.0007 (0.0149) model time 0.0000 (0.0000) loss 5.6492 (5.7931) grad_norm 1.3227 (1.6387) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:10:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 09:10:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:10:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 09:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 09:41:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 09:41:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 09:41:40 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 124) [2024-07-31 09:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 09:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][40/625] eta 0:15:30 lr 0.001398 wd 0.0500 time 0.2570 (1.5904) data time 0.0013 (0.0932) model time 0.0000 (0.0000) loss 6.5851 (6.5089) grad_norm 1.6280 (1.9360) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][50/625] eta 0:07:42 lr 0.001398 wd 0.0500 time 0.2519 (0.8039) data time 0.0010 (0.0390) model time 0.0000 (0.0000) loss 5.9614 (6.2075) grad_norm 1.7733 (1.8590) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][60/625] eta 0:05:39 lr 0.001398 wd 0.0500 time 0.2597 (0.6013) data time 0.0007 (0.0249) model time 0.2590 (0.2560) loss 7.4462 (6.3324) grad_norm 3.2799 (1.9003) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:42:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][70/625] eta 0:04:42 lr 0.001398 wd 0.0500 time 0.2538 (0.5090) data time 0.0008 (0.0185) model time 0.2530 (0.2571) loss 4.9360 (6.2944) grad_norm 1.7314 (1.8262) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:42:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][80/625] eta 0:04:07 lr 0.001398 wd 0.0500 time 0.2562 (0.4549) data time 0.0008 (0.0149) model time 0.2554 (0.2558) loss 6.1997 (6.2357) grad_norm 1.3587 (1.8280) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 09:42:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][90/625] eta 0:03:44 lr 0.001398 wd 0.0500 time 0.2529 (0.4196) data time 0.0010 (0.0124) model time 0.2519 (0.2550) loss 6.3663 (6.2014) grad_norm 1.7584 (1.8502) loss_scale 8192.0000 (4599.0175) mem 9656MB [2024-07-31 09:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][100/625] eta 0:03:27 lr 0.001397 wd 0.0500 time 0.2624 (0.3956) data time 0.0009 (0.0107) model time 0.2615 (0.2557) loss 6.8845 (6.1636) grad_norm 1.9898 (1.8894) loss_scale 8192.0000 (5135.2836) mem 9656MB [2024-07-31 09:42:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][110/625] eta 0:03:14 lr 0.001397 wd 0.0500 time 0.2572 (0.3776) data time 0.0009 (0.0096) model time 0.2563 (0.2555) loss 6.3552 (6.1152) grad_norm 1.7859 (1.8598) loss_scale 8192.0000 (5532.2597) mem 9656MB [2024-07-31 09:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][120/625] eta 0:03:03 lr 0.001397 wd 0.0500 time 0.2588 (0.3634) data time 0.0008 (0.0086) model time 0.2580 (0.2552) loss 6.0183 (6.0951) grad_norm 3.1188 (1.8596) loss_scale 8192.0000 (5837.9770) mem 9656MB [2024-07-31 09:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][130/625] eta 0:02:54 lr 0.001397 wd 0.0500 time 0.2539 (0.3521) data time 0.0009 (0.0078) model time 0.2530 (0.2550) loss 5.9690 (6.0945) grad_norm 1.1500 (1.8852) loss_scale 8192.0000 (6080.6598) mem 9656MB [2024-07-31 09:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][140/625] eta 0:02:46 lr 0.001397 wd 0.0500 time 0.2561 (0.3437) data time 0.0006 (0.0071) model time 0.2554 (0.2556) loss 5.7282 (6.1249) grad_norm 2.0979 (1.8772) loss_scale 8192.0000 (6277.9813) mem 9656MB [2024-07-31 09:42:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][150/625] eta 0:02:39 lr 0.001397 wd 0.0500 time 0.2549 (0.3361) data time 0.0014 (0.0066) model time 0.2535 (0.2554) loss 6.9518 (6.1163) grad_norm 2.0708 (1.9401) loss_scale 8192.0000 (6441.5726) mem 9656MB [2024-07-31 09:42:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][160/625] eta 0:02:33 lr 0.001396 wd 0.0500 time 0.2581 (0.3296) data time 0.0007 (0.0062) model time 0.2574 (0.2553) loss 6.3513 (6.0908) grad_norm 1.3860 (1.9540) loss_scale 8192.0000 (6579.4016) mem 9656MB [2024-07-31 09:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][170/625] eta 0:02:27 lr 0.001396 wd 0.0500 time 0.2509 (0.3242) data time 0.0009 (0.0058) model time 0.2500 (0.2552) loss 5.5647 (6.0984) grad_norm 1.4238 (1.9517) loss_scale 8192.0000 (6697.1095) mem 9656MB [2024-07-31 09:42:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][180/625] eta 0:02:22 lr 0.001396 wd 0.0500 time 0.2577 (0.3196) data time 0.0008 (0.0054) model time 0.2569 (0.2552) loss 5.8565 (6.0836) grad_norm 1.8197 (1.9334) loss_scale 8192.0000 (6798.8027) mem 9656MB [2024-07-31 09:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][190/625] eta 0:02:17 lr 0.001396 wd 0.0500 time 0.2577 (0.3155) data time 0.0009 (0.0052) model time 0.2568 (0.2551) loss 4.9282 (6.0779) grad_norm 2.3198 (1.9425) loss_scale 8192.0000 (6887.5414) mem 9656MB [2024-07-31 09:42:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][200/625] eta 0:02:12 lr 0.001396 wd 0.0500 time 0.2554 (0.3119) data time 0.0010 (0.0049) model time 0.2544 (0.2551) loss 5.7489 (6.0787) grad_norm 2.2300 (1.9347) loss_scale 8192.0000 (6965.6527) mem 9656MB [2024-07-31 09:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][210/625] eta 0:02:08 lr 0.001396 wd 0.0500 time 0.2514 (0.3087) data time 0.0010 (0.0047) model time 0.2503 (0.2550) loss 5.6368 (6.0502) grad_norm 1.8074 (1.9331) loss_scale 8192.0000 (7034.9379) mem 9656MB [2024-07-31 09:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][220/625] eta 0:02:03 lr 0.001395 wd 0.0500 time 0.2687 (0.3060) data time 0.0007 (0.0045) model time 0.2680 (0.2552) loss 5.7244 (6.0384) grad_norm 1.6824 (1.9295) loss_scale 8192.0000 (7096.8128) mem 9656MB [2024-07-31 09:43:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][230/625] eta 0:01:59 lr 0.001395 wd 0.0500 time 0.2560 (0.3035) data time 0.0008 (0.0043) model time 0.2552 (0.2552) loss 6.6032 (6.0372) grad_norm 1.2421 (1.9513) loss_scale 8192.0000 (7152.4061) mem 9656MB [2024-07-31 09:43:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][240/625] eta 0:01:55 lr 0.001395 wd 0.0500 time 0.2540 (0.3012) data time 0.0006 (0.0041) model time 0.2534 (0.2552) loss 6.1397 (6.0162) grad_norm 1.3972 (1.9303) loss_scale 8192.0000 (7202.6280) mem 9656MB [2024-07-31 09:43:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][250/625] eta 0:01:52 lr 0.001395 wd 0.0500 time 0.2501 (0.2991) data time 0.0014 (0.0040) model time 0.2487 (0.2551) loss 6.3269 (6.0227) grad_norm 1.2694 (inf) loss_scale 4096.0000 (7078.3410) mem 9656MB [2024-07-31 09:43:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][260/625] eta 0:01:48 lr 0.001395 wd 0.0500 time 0.2577 (0.2972) data time 0.0011 (0.0039) model time 0.2566 (0.2552) loss 6.3657 (6.0306) grad_norm 3.2679 (inf) loss_scale 4096.0000 (6946.9604) mem 9656MB [2024-07-31 09:43:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][270/625] eta 0:01:44 lr 0.001395 wd 0.0500 time 0.2564 (0.2955) data time 0.0007 (0.0037) model time 0.2557 (0.2552) loss 5.7742 (6.0148) grad_norm 1.9751 (inf) loss_scale 4096.0000 (6826.6667) mem 9656MB [2024-07-31 09:43:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][280/625] eta 0:01:41 lr 0.001395 wd 0.0500 time 0.2541 (0.2939) data time 0.0011 (0.0036) model time 0.2530 (0.2552) loss 7.2926 (6.0190) grad_norm 1.4374 (inf) loss_scale 4096.0000 (6716.1134) mem 9656MB [2024-07-31 09:43:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][290/625] eta 0:01:37 lr 0.001394 wd 0.0500 time 0.2569 (0.2925) data time 0.0007 (0.0035) model time 0.2561 (0.2552) loss 4.5356 (6.0057) grad_norm 1.5454 (inf) loss_scale 4096.0000 (6614.1634) mem 9656MB [2024-07-31 09:43:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][300/625] eta 0:01:34 lr 0.001394 wd 0.0500 time 0.2557 (0.2911) data time 0.0009 (0.0034) model time 0.2548 (0.2552) loss 4.9746 (5.9947) grad_norm 1.1270 (inf) loss_scale 4096.0000 (6519.8502) mem 9656MB [2024-07-31 09:43:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][310/625] eta 0:01:31 lr 0.001394 wd 0.0500 time 0.2519 (0.2899) data time 0.0010 (0.0033) model time 0.2509 (0.2553) loss 6.3965 (6.0053) grad_norm 1.7514 (inf) loss_scale 4096.0000 (6432.3466) mem 9656MB [2024-07-31 09:43:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][320/625] eta 0:01:28 lr 0.001394 wd 0.0500 time 0.2559 (0.2887) data time 0.0008 (0.0032) model time 0.2551 (0.2552) loss 7.7432 (6.0062) grad_norm 1.7408 (inf) loss_scale 4096.0000 (6350.9408) mem 9656MB [2024-07-31 09:43:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][330/625] eta 0:01:24 lr 0.001394 wd 0.0500 time 0.2548 (0.2876) data time 0.0010 (0.0032) model time 0.2539 (0.2553) loss 4.8551 (5.9906) grad_norm 1.2630 (inf) loss_scale 4096.0000 (6275.0168) mem 9656MB [2024-07-31 09:43:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][340/625] eta 0:01:21 lr 0.001394 wd 0.0500 time 0.2580 (0.2866) data time 0.0008 (0.0031) model time 0.2572 (0.2552) loss 7.0723 (5.9878) grad_norm 3.2401 (inf) loss_scale 4096.0000 (6204.0391) mem 9656MB [2024-07-31 09:43:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][350/625] eta 0:01:18 lr 0.001393 wd 0.0500 time 0.2571 (0.2856) data time 0.0008 (0.0030) model time 0.2563 (0.2552) loss 6.6930 (6.0041) grad_norm 2.5412 (inf) loss_scale 4096.0000 (6137.5394) mem 9656MB [2024-07-31 09:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][360/625] eta 0:01:15 lr 0.001393 wd 0.0500 time 0.2564 (0.2847) data time 0.0006 (0.0030) model time 0.2559 (0.2552) loss 4.9562 (6.0102) grad_norm 3.3770 (inf) loss_scale 4096.0000 (6075.1070) mem 9656MB [2024-07-31 09:43:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][370/625] eta 0:01:12 lr 0.001393 wd 0.0500 time 0.2559 (0.2838) data time 0.0010 (0.0029) model time 0.2549 (0.2552) loss 5.6888 (6.0079) grad_norm 1.6411 (inf) loss_scale 4096.0000 (6016.3798) mem 9656MB [2024-07-31 09:43:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][380/625] eta 0:01:09 lr 0.001393 wd 0.0500 time 0.2679 (0.2831) data time 0.0006 (0.0028) model time 0.2673 (0.2552) loss 5.0627 (6.0112) grad_norm 1.2612 (inf) loss_scale 4096.0000 (5961.0375) mem 9656MB [2024-07-31 09:43:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][390/625] eta 0:01:06 lr 0.001393 wd 0.0500 time 0.2548 (0.2823) data time 0.0010 (0.0028) model time 0.2537 (0.2552) loss 5.1411 (6.0080) grad_norm 1.4575 (inf) loss_scale 4096.0000 (5908.7955) mem 9656MB [2024-07-31 09:43:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][400/625] eta 0:01:03 lr 0.001393 wd 0.0500 time 0.2542 (0.2816) data time 0.0008 (0.0027) model time 0.2534 (0.2553) loss 5.1507 (6.0071) grad_norm 1.4163 (inf) loss_scale 4096.0000 (5859.4005) mem 9656MB [2024-07-31 09:43:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][410/625] eta 0:01:00 lr 0.001392 wd 0.0500 time 0.2588 (0.2809) data time 0.0009 (0.0027) model time 0.2579 (0.2553) loss 6.4165 (6.0124) grad_norm 1.7817 (inf) loss_scale 4096.0000 (5812.6260) mem 9656MB [2024-07-31 09:43:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][420/625] eta 0:00:57 lr 0.001392 wd 0.0500 time 0.2576 (0.2803) data time 0.0008 (0.0026) model time 0.2568 (0.2553) loss 7.0593 (6.0084) grad_norm 3.2854 (inf) loss_scale 4096.0000 (5768.2687) mem 9656MB [2024-07-31 09:43:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][430/625] eta 0:00:54 lr 0.001392 wd 0.0500 time 0.2543 (0.2797) data time 0.0008 (0.0026) model time 0.2535 (0.2553) loss 5.4352 (6.0084) grad_norm 1.3444 (inf) loss_scale 4096.0000 (5726.1461) mem 9656MB [2024-07-31 09:44:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][440/625] eta 0:00:51 lr 0.001392 wd 0.0500 time 0.2569 (0.2792) data time 0.0009 (0.0026) model time 0.2560 (0.2554) loss 5.7317 (6.0111) grad_norm 2.2132 (inf) loss_scale 4096.0000 (5686.0934) mem 9656MB [2024-07-31 09:44:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][450/625] eta 0:00:48 lr 0.001392 wd 0.0500 time 0.2574 (0.2787) data time 0.0014 (0.0025) model time 0.2560 (0.2554) loss 4.3952 (6.0056) grad_norm 1.5938 (inf) loss_scale 4096.0000 (5647.9616) mem 9656MB [2024-07-31 09:44:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][460/625] eta 0:00:45 lr 0.001392 wd 0.0500 time 0.2535 (0.2787) data time 0.0009 (0.0025) model time 0.2526 (0.2560) loss 6.5635 (6.0120) grad_norm 1.9233 (inf) loss_scale 4096.0000 (5611.6159) mem 9656MB [2024-07-31 09:44:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][470/625] eta 0:00:43 lr 0.001391 wd 0.0500 time 0.2588 (0.2783) data time 0.0011 (0.0024) model time 0.2577 (0.2560) loss 6.7869 (6.0212) grad_norm 1.8606 (inf) loss_scale 4096.0000 (5576.9336) mem 9656MB [2024-07-31 09:44:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][480/625] eta 0:00:40 lr 0.001391 wd 0.0500 time 0.2630 (0.2778) data time 0.0008 (0.0024) model time 0.2622 (0.2561) loss 5.6670 (6.0198) grad_norm 1.2605 (inf) loss_scale 4096.0000 (5543.8031) mem 9656MB [2024-07-31 09:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][490/625] eta 0:00:37 lr 0.001391 wd 0.0500 time 0.2505 (0.2773) data time 0.0010 (0.0024) model time 0.2495 (0.2560) loss 4.1810 (6.0108) grad_norm 2.4255 (inf) loss_scale 4096.0000 (5512.1225) mem 9656MB [2024-07-31 09:44:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][500/625] eta 0:00:34 lr 0.001391 wd 0.0500 time 0.2608 (0.2769) data time 0.0008 (0.0023) model time 0.2600 (0.2561) loss 5.1171 (6.0032) grad_norm 1.7761 (inf) loss_scale 4096.0000 (5481.7987) mem 9656MB [2024-07-31 09:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][510/625] eta 0:00:31 lr 0.001391 wd 0.0500 time 0.2550 (0.2765) data time 0.0009 (0.0023) model time 0.2541 (0.2560) loss 6.7793 (5.9972) grad_norm 1.4772 (inf) loss_scale 4096.0000 (5452.7463) mem 9656MB [2024-07-31 09:44:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][520/625] eta 0:00:28 lr 0.001391 wd 0.0500 time 0.2522 (0.2760) data time 0.0010 (0.0023) model time 0.2512 (0.2560) loss 5.5528 (6.0038) grad_norm 1.3771 (inf) loss_scale 4096.0000 (5424.8871) mem 9656MB [2024-07-31 09:44:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][530/625] eta 0:00:26 lr 0.001390 wd 0.0500 time 0.2568 (0.2756) data time 0.0006 (0.0022) model time 0.2562 (0.2560) loss 5.9156 (6.0010) grad_norm 1.1351 (inf) loss_scale 4096.0000 (5398.1489) mem 9656MB [2024-07-31 09:44:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][540/625] eta 0:00:23 lr 0.001390 wd 0.0500 time 0.2547 (0.2753) data time 0.0008 (0.0022) model time 0.2538 (0.2560) loss 5.8895 (6.0014) grad_norm 2.1097 (inf) loss_scale 4096.0000 (5372.4655) mem 9656MB [2024-07-31 09:44:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][550/625] eta 0:00:20 lr 0.001390 wd 0.0500 time 0.2549 (0.2749) data time 0.0018 (0.0022) model time 0.2531 (0.2560) loss 6.6871 (6.0092) grad_norm 2.2279 (inf) loss_scale 4096.0000 (5347.7756) mem 9656MB [2024-07-31 09:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][560/625] eta 0:00:17 lr 0.001390 wd 0.0500 time 0.2596 (0.2746) data time 0.0009 (0.0022) model time 0.2588 (0.2560) loss 6.3126 (6.0031) grad_norm 3.1404 (inf) loss_scale 4096.0000 (5324.0228) mem 9656MB [2024-07-31 09:44:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][570/625] eta 0:00:15 lr 0.001390 wd 0.0500 time 0.2575 (0.2742) data time 0.0006 (0.0021) model time 0.2569 (0.2560) loss 4.3097 (5.9971) grad_norm 1.5938 (inf) loss_scale 4096.0000 (5301.1546) mem 9656MB [2024-07-31 09:44:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][580/625] eta 0:00:12 lr 0.001390 wd 0.0500 time 0.2614 (0.2740) data time 0.0007 (0.0021) model time 0.2606 (0.2560) loss 6.7905 (6.0029) grad_norm 3.1534 (inf) loss_scale 4096.0000 (5279.1225) mem 9656MB [2024-07-31 09:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][590/625] eta 0:00:09 lr 0.001389 wd 0.0500 time 0.2564 (0.2737) data time 0.0009 (0.0021) model time 0.2555 (0.2560) loss 4.7951 (6.0066) grad_norm 1.8814 (inf) loss_scale 4096.0000 (5257.8815) mem 9656MB [2024-07-31 09:44:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][600/625] eta 0:00:06 lr 0.001389 wd 0.0500 time 0.2582 (0.2733) data time 0.0009 (0.0021) model time 0.2573 (0.2560) loss 6.8441 (6.0134) grad_norm 2.3066 (inf) loss_scale 4096.0000 (5237.3898) mem 9656MB [2024-07-31 09:44:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][610/625] eta 0:00:04 lr 0.001389 wd 0.0500 time 0.2530 (0.2730) data time 0.0004 (0.0021) model time 0.2526 (0.2560) loss 7.3595 (6.0155) grad_norm 2.1786 (inf) loss_scale 4096.0000 (5217.6083) mem 9656MB [2024-07-31 09:44:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][620/625] eta 0:00:01 lr 0.001389 wd 0.0500 time 0.2542 (0.2727) data time 0.0004 (0.0020) model time 0.2538 (0.2559) loss 6.0201 (6.0211) grad_norm 1.5419 (inf) loss_scale 4096.0000 (5198.5009) mem 9656MB [2024-07-31 09:44:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 124 training takes 0:02:41 [2024-07-31 09:44:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:44:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.392 (0.392) Loss 0.6733 (0.6733) Acc@1 87.256 (87.256) Acc@5 98.145 (98.145) Mem 9656MB [2024-07-31 09:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.087) Loss 1.0791 (0.8101) Acc@1 75.977 (83.265) Acc@5 94.287 (96.964) Mem 9656MB [2024-07-31 09:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.1846 (0.9539) Acc@1 73.242 (79.843) Acc@5 92.871 (95.196) Mem 9656MB [2024-07-31 09:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.549 Acc@5 95.150 [2024-07-31 09:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-07-31 09:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.55% [2024-07-31 09:44:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 09:44:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 09:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.389 (0.389) Loss 0.5811 (0.5811) Acc@1 88.428 (88.428) Acc@5 98.584 (98.584) Mem 9656MB [2024-07-31 09:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.086) Loss 0.9370 (0.7226) Acc@1 78.369 (84.770) Acc@5 94.922 (97.230) Mem 9656MB [2024-07-31 09:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.071) Loss 1.0820 (0.8604) Acc@1 73.926 (81.101) Acc@5 93.652 (95.661) Mem 9656MB [2024-07-31 09:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.738 Acc@5 95.635 [2024-07-31 09:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-31 09:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.74% [2024-07-31 09:44:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:44:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][0/625] eta 0:07:09 lr 0.001389 wd 0.0500 time 0.6872 (0.6872) data time 0.3911 (0.3911) model time 0.0000 (0.0000) loss 6.7413 (6.7413) grad_norm 1.8933 (1.8933) loss_scale 4096.0000 (4096.0000) mem 9652MB [2024-07-31 09:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][10/625] eta 0:03:02 lr 0.001389 wd 0.0500 time 0.2599 (0.2970) data time 0.0013 (0.0364) model time 0.0000 (0.0000) loss 6.1288 (6.3026) grad_norm 1.8069 (1.5075) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][20/625] eta 0:02:47 lr 0.001389 wd 0.0500 time 0.2539 (0.2769) data time 0.0011 (0.0195) model time 0.0000 (0.0000) loss 4.6773 (6.0655) grad_norm 2.3959 (1.6663) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][30/625] eta 0:02:41 lr 0.001388 wd 0.0500 time 0.2574 (0.2707) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 6.0488 (6.1324) grad_norm 2.0307 (1.7923) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][40/625] eta 0:02:36 lr 0.001388 wd 0.0500 time 0.2717 (0.2675) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 6.4371 (6.1597) grad_norm 1.5891 (1.7631) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][50/625] eta 0:02:35 lr 0.001388 wd 0.0500 time 0.2536 (0.2709) data time 0.0010 (0.0085) model time 0.0000 (0.0000) loss 4.3152 (6.0453) grad_norm 1.4465 (1.6905) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][60/625] eta 0:02:32 lr 0.001388 wd 0.0500 time 0.2655 (0.2692) data time 0.0008 (0.0073) model time 0.2647 (0.2598) loss 5.1639 (6.0542) grad_norm 1.6178 (1.6971) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][70/625] eta 0:02:28 lr 0.001388 wd 0.0500 time 0.2551 (0.2678) data time 0.0014 (0.0064) model time 0.2537 (0.2589) loss 6.7414 (6.0086) grad_norm 2.2770 (1.7265) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][80/625] eta 0:02:25 lr 0.001388 wd 0.0500 time 0.2645 (0.2668) data time 0.0010 (0.0057) model time 0.2635 (0.2589) loss 5.1057 (6.0312) grad_norm 1.4547 (1.7489) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][90/625] eta 0:02:22 lr 0.001387 wd 0.0500 time 0.2565 (0.2658) data time 0.0008 (0.0052) model time 0.2557 (0.2584) loss 5.8664 (6.0220) grad_norm 1.9734 (1.7231) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][100/625] eta 0:02:19 lr 0.001387 wd 0.0500 time 0.2618 (0.2650) data time 0.0021 (0.0048) model time 0.2598 (0.2581) loss 6.4782 (5.9739) grad_norm 1.2419 (1.6820) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][110/625] eta 0:02:16 lr 0.001387 wd 0.0500 time 0.2575 (0.2645) data time 0.0006 (0.0044) model time 0.2569 (0.2581) loss 5.1371 (5.9502) grad_norm 1.4024 (1.7040) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][120/625] eta 0:02:13 lr 0.001387 wd 0.0500 time 0.2560 (0.2638) data time 0.0007 (0.0041) model time 0.2553 (0.2577) loss 4.8215 (5.9463) grad_norm 1.6709 (1.6971) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][130/625] eta 0:02:10 lr 0.001387 wd 0.0500 time 0.2558 (0.2632) data time 0.0008 (0.0039) model time 0.2550 (0.2574) loss 5.8313 (5.9569) grad_norm 1.8946 (1.6898) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][140/625] eta 0:02:07 lr 0.001387 wd 0.0500 time 0.2563 (0.2627) data time 0.0010 (0.0037) model time 0.2553 (0.2571) loss 6.7432 (5.9672) grad_norm 1.9945 (1.7091) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][150/625] eta 0:02:04 lr 0.001386 wd 0.0500 time 0.2529 (0.2623) data time 0.0010 (0.0035) model time 0.2519 (0.2570) loss 5.2325 (5.9802) grad_norm 2.7125 (1.7865) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][160/625] eta 0:02:01 lr 0.001386 wd 0.0500 time 0.2520 (0.2619) data time 0.0009 (0.0033) model time 0.2510 (0.2568) loss 6.7451 (5.9875) grad_norm 2.4349 (1.8170) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][170/625] eta 0:01:59 lr 0.001386 wd 0.0500 time 0.2534 (0.2615) data time 0.0006 (0.0032) model time 0.2527 (0.2567) loss 6.2154 (5.9995) grad_norm 1.9399 (1.8345) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][180/625] eta 0:01:56 lr 0.001386 wd 0.0500 time 0.2557 (0.2612) data time 0.0009 (0.0030) model time 0.2549 (0.2566) loss 6.1139 (6.0112) grad_norm 1.6261 (1.8610) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][190/625] eta 0:01:53 lr 0.001386 wd 0.0500 time 0.2516 (0.2609) data time 0.0007 (0.0029) model time 0.2510 (0.2564) loss 5.4035 (6.0131) grad_norm 2.5099 (1.8854) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][200/625] eta 0:01:50 lr 0.001386 wd 0.0500 time 0.2540 (0.2607) data time 0.0010 (0.0028) model time 0.2530 (0.2563) loss 6.8499 (6.0015) grad_norm 1.7120 (1.9028) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][210/625] eta 0:01:48 lr 0.001385 wd 0.0500 time 0.2529 (0.2604) data time 0.0006 (0.0027) model time 0.2523 (0.2562) loss 5.1900 (5.9962) grad_norm 2.2180 (1.9081) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][220/625] eta 0:01:45 lr 0.001385 wd 0.0500 time 0.2542 (0.2602) data time 0.0009 (0.0027) model time 0.2533 (0.2562) loss 6.4460 (5.9769) grad_norm 1.7073 (1.9000) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][230/625] eta 0:01:42 lr 0.001385 wd 0.0500 time 0.2527 (0.2601) data time 0.0008 (0.0026) model time 0.2519 (0.2562) loss 5.1195 (5.9913) grad_norm 1.5410 (1.8812) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:45:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][240/625] eta 0:01:40 lr 0.001385 wd 0.0500 time 0.2530 (0.2600) data time 0.0008 (0.0025) model time 0.2522 (0.2562) loss 5.6698 (5.9810) grad_norm 1.3767 (1.8806) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][250/625] eta 0:01:37 lr 0.001385 wd 0.0500 time 0.2577 (0.2598) data time 0.0011 (0.0024) model time 0.2566 (0.2561) loss 4.5510 (5.9759) grad_norm 1.4881 (1.8638) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][260/625] eta 0:01:34 lr 0.001385 wd 0.0500 time 0.2581 (0.2598) data time 0.0006 (0.0024) model time 0.2574 (0.2562) loss 4.7289 (5.9787) grad_norm 1.3182 (1.8523) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][270/625] eta 0:01:32 lr 0.001384 wd 0.0500 time 0.2556 (0.2598) data time 0.0009 (0.0023) model time 0.2548 (0.2563) loss 6.2625 (5.9865) grad_norm 1.2228 (1.8597) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][280/625] eta 0:01:29 lr 0.001384 wd 0.0500 time 0.2567 (0.2597) data time 0.0007 (0.0023) model time 0.2559 (0.2563) loss 5.4277 (5.9954) grad_norm 1.4167 (1.8693) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][290/625] eta 0:01:26 lr 0.001384 wd 0.0500 time 0.2510 (0.2595) data time 0.0007 (0.0022) model time 0.2503 (0.2562) loss 4.8936 (5.9859) grad_norm 2.4570 (1.8926) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][300/625] eta 0:01:24 lr 0.001384 wd 0.0500 time 0.2554 (0.2594) data time 0.0006 (0.0022) model time 0.2548 (0.2562) loss 6.5816 (5.9810) grad_norm 1.9796 (1.9020) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][310/625] eta 0:01:21 lr 0.001384 wd 0.0500 time 0.2575 (0.2594) data time 0.0006 (0.0021) model time 0.2568 (0.2563) loss 6.9731 (5.9817) grad_norm 1.7042 (1.9095) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][320/625] eta 0:01:19 lr 0.001384 wd 0.0500 time 0.2515 (0.2593) data time 0.0008 (0.0021) model time 0.2507 (0.2562) loss 4.1302 (5.9814) grad_norm 1.4843 (1.8992) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][330/625] eta 0:01:16 lr 0.001383 wd 0.0500 time 0.2548 (0.2592) data time 0.0010 (0.0021) model time 0.2538 (0.2562) loss 6.7778 (5.9932) grad_norm 1.9582 (1.9019) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][340/625] eta 0:01:13 lr 0.001383 wd 0.0500 time 0.2559 (0.2591) data time 0.0008 (0.0020) model time 0.2551 (0.2562) loss 6.6185 (6.0022) grad_norm 2.1374 (1.9038) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][350/625] eta 0:01:11 lr 0.001383 wd 0.0500 time 0.2574 (0.2591) data time 0.0010 (0.0020) model time 0.2564 (0.2562) loss 5.0249 (5.9964) grad_norm 2.8520 (1.9057) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][360/625] eta 0:01:08 lr 0.001383 wd 0.0500 time 0.2547 (0.2590) data time 0.0008 (0.0020) model time 0.2539 (0.2562) loss 7.6043 (5.9873) grad_norm 1.6922 (1.9080) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][370/625] eta 0:01:06 lr 0.001383 wd 0.0500 time 0.2569 (0.2590) data time 0.0009 (0.0019) model time 0.2560 (0.2562) loss 6.5233 (5.9886) grad_norm 1.1652 (1.8932) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][380/625] eta 0:01:03 lr 0.001383 wd 0.0500 time 0.2574 (0.2590) data time 0.0007 (0.0019) model time 0.2567 (0.2562) loss 7.3670 (6.0018) grad_norm 1.2060 (1.8807) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][390/625] eta 0:01:00 lr 0.001382 wd 0.0500 time 0.2558 (0.2589) data time 0.0007 (0.0019) model time 0.2551 (0.2562) loss 7.2026 (6.0078) grad_norm 2.7925 (1.8842) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 09:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][400/625] eta 0:00:58 lr 0.001382 wd 0.0500 time 0.2536 (0.2589) data time 0.0009 (0.0019) model time 0.2527 (0.2562) loss 7.2166 (6.0100) grad_norm 1.6702 (inf) loss_scale 2048.0000 (4070.4638) mem 9655MB [2024-07-31 09:46:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][410/625] eta 0:00:55 lr 0.001382 wd 0.0500 time 0.2546 (0.2588) data time 0.0009 (0.0018) model time 0.2537 (0.2562) loss 6.1969 (6.0080) grad_norm 1.1922 (inf) loss_scale 2048.0000 (4021.2555) mem 9655MB [2024-07-31 09:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][420/625] eta 0:00:53 lr 0.001382 wd 0.0500 time 0.2598 (0.2588) data time 0.0009 (0.0018) model time 0.2589 (0.2562) loss 6.4542 (6.0130) grad_norm 1.1679 (inf) loss_scale 2048.0000 (3974.3848) mem 9655MB [2024-07-31 09:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][430/625] eta 0:00:50 lr 0.001382 wd 0.0500 time 0.2570 (0.2587) data time 0.0008 (0.0018) model time 0.2562 (0.2562) loss 6.5421 (6.0108) grad_norm 1.9321 (inf) loss_scale 2048.0000 (3929.6891) mem 9655MB [2024-07-31 09:46:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][440/625] eta 0:00:47 lr 0.001382 wd 0.0500 time 0.2618 (0.2587) data time 0.0006 (0.0018) model time 0.2612 (0.2562) loss 6.2991 (6.0016) grad_norm 1.7610 (inf) loss_scale 2048.0000 (3887.0204) mem 9655MB [2024-07-31 09:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][450/625] eta 0:00:45 lr 0.001381 wd 0.0500 time 0.2532 (0.2586) data time 0.0007 (0.0018) model time 0.2526 (0.2562) loss 6.2635 (6.0031) grad_norm 1.4575 (inf) loss_scale 2048.0000 (3846.2439) mem 9655MB [2024-07-31 09:46:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][460/625] eta 0:00:42 lr 0.001381 wd 0.0500 time 0.2731 (0.2587) data time 0.0008 (0.0017) model time 0.2722 (0.2563) loss 6.5746 (6.0047) grad_norm 1.7406 (inf) loss_scale 2048.0000 (3807.2364) mem 9655MB [2024-07-31 09:46:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][470/625] eta 0:00:40 lr 0.001381 wd 0.0500 time 0.2562 (0.2586) data time 0.0006 (0.0017) model time 0.2556 (0.2562) loss 6.0857 (6.0035) grad_norm 1.2942 (inf) loss_scale 2048.0000 (3769.8854) mem 9655MB [2024-07-31 09:47:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][480/625] eta 0:00:37 lr 0.001381 wd 0.0500 time 0.2603 (0.2586) data time 0.0008 (0.0017) model time 0.2596 (0.2562) loss 4.7031 (5.9989) grad_norm 2.0124 (inf) loss_scale 2048.0000 (3734.0873) mem 9655MB [2024-07-31 09:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][490/625] eta 0:00:34 lr 0.001381 wd 0.0500 time 0.2537 (0.2585) data time 0.0008 (0.0017) model time 0.2529 (0.2562) loss 6.9937 (5.9967) grad_norm 2.8687 (inf) loss_scale 2048.0000 (3699.7475) mem 9655MB [2024-07-31 09:47:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][500/625] eta 0:00:32 lr 0.001381 wd 0.0500 time 0.2527 (0.2585) data time 0.0009 (0.0017) model time 0.2517 (0.2562) loss 5.1723 (5.9927) grad_norm 1.2652 (inf) loss_scale 2048.0000 (3666.7784) mem 9655MB [2024-07-31 09:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][510/625] eta 0:00:29 lr 0.001380 wd 0.0500 time 0.2558 (0.2584) data time 0.0008 (0.0017) model time 0.2550 (0.2562) loss 5.1125 (5.9925) grad_norm 0.9846 (inf) loss_scale 2048.0000 (3635.0998) mem 9655MB [2024-07-31 09:47:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][520/625] eta 0:00:27 lr 0.001380 wd 0.0500 time 0.2564 (0.2584) data time 0.0010 (0.0016) model time 0.2555 (0.2562) loss 6.4610 (5.9958) grad_norm 2.8879 (inf) loss_scale 2048.0000 (3604.6372) mem 9655MB [2024-07-31 09:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][530/625] eta 0:00:24 lr 0.001380 wd 0.0500 time 0.2538 (0.2584) data time 0.0008 (0.0016) model time 0.2530 (0.2562) loss 5.2503 (5.9965) grad_norm 1.3277 (inf) loss_scale 2048.0000 (3575.3220) mem 9655MB [2024-07-31 09:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][540/625] eta 0:00:21 lr 0.001380 wd 0.0500 time 0.2563 (0.2584) data time 0.0007 (0.0016) model time 0.2556 (0.2562) loss 4.7991 (5.9988) grad_norm 1.0959 (inf) loss_scale 2048.0000 (3547.0906) mem 9655MB [2024-07-31 09:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][550/625] eta 0:00:19 lr 0.001380 wd 0.0500 time 0.2563 (0.2583) data time 0.0010 (0.0016) model time 0.2553 (0.2561) loss 6.0940 (5.9937) grad_norm 3.1041 (inf) loss_scale 2048.0000 (3519.8838) mem 9655MB [2024-07-31 09:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][560/625] eta 0:00:16 lr 0.001380 wd 0.0500 time 0.2564 (0.2583) data time 0.0008 (0.0016) model time 0.2556 (0.2562) loss 6.1263 (5.9899) grad_norm 1.2891 (inf) loss_scale 2048.0000 (3493.6471) mem 9655MB [2024-07-31 09:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][570/625] eta 0:00:14 lr 0.001379 wd 0.0500 time 0.2566 (0.2583) data time 0.0007 (0.0016) model time 0.2558 (0.2561) loss 5.5859 (5.9874) grad_norm 1.5249 (inf) loss_scale 2048.0000 (3468.3292) mem 9655MB [2024-07-31 09:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][580/625] eta 0:00:11 lr 0.001379 wd 0.0500 time 0.2556 (0.2582) data time 0.0008 (0.0016) model time 0.2547 (0.2561) loss 6.1769 (5.9907) grad_norm 1.8043 (inf) loss_scale 2048.0000 (3443.8830) mem 9655MB [2024-07-31 09:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][590/625] eta 0:00:09 lr 0.001379 wd 0.0500 time 0.2543 (0.2582) data time 0.0007 (0.0016) model time 0.2535 (0.2561) loss 6.0918 (6.0010) grad_norm 2.5099 (inf) loss_scale 2048.0000 (3420.2640) mem 9655MB [2024-07-31 09:47:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][600/625] eta 0:00:06 lr 0.001379 wd 0.0500 time 0.2533 (0.2582) data time 0.0010 (0.0015) model time 0.2523 (0.2561) loss 5.2308 (6.0033) grad_norm 1.2616 (inf) loss_scale 2048.0000 (3397.4309) mem 9655MB [2024-07-31 09:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][610/625] eta 0:00:03 lr 0.001379 wd 0.0500 time 0.2517 (0.2582) data time 0.0006 (0.0015) model time 0.2511 (0.2561) loss 4.2348 (6.0060) grad_norm 1.3569 (inf) loss_scale 2048.0000 (3375.3453) mem 9655MB [2024-07-31 09:47:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][620/625] eta 0:00:01 lr 0.001379 wd 0.0500 time 0.2532 (0.2584) data time 0.0004 (0.0015) model time 0.2529 (0.2563) loss 7.1353 (6.0115) grad_norm 1.3784 (inf) loss_scale 2048.0000 (3353.9710) mem 9655MB [2024-07-31 09:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 125 training takes 0:02:41 [2024-07-31 09:47:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:47:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.539 (0.539) Loss 0.6567 (0.6567) Acc@1 87.500 (87.500) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-31 09:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0576 (0.8165) Acc@1 77.295 (83.225) Acc@5 93.994 (96.635) Mem 9655MB [2024-07-31 09:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.2383 (0.9741) Acc@1 71.680 (79.236) Acc@5 92.334 (94.861) Mem 9655MB [2024-07-31 09:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.917 Acc@5 94.858 [2024-07-31 09:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-31 09:47:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.715 (0.715) Loss 0.5820 (0.5820) Acc@1 88.477 (88.477) Acc@5 98.584 (98.584) Mem 9655MB [2024-07-31 09:47:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.123) Loss 0.9355 (0.7231) Acc@1 78.467 (84.792) Acc@5 94.971 (97.248) Mem 9655MB [2024-07-31 09:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0811 (0.8605) Acc@1 73.730 (81.104) Acc@5 93.701 (95.687) Mem 9655MB [2024-07-31 09:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.742 Acc@5 95.663 [2024-07-31 09:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-31 09:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.74% [2024-07-31 09:47:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 09:47:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 09:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][0/625] eta 0:07:43 lr 0.001379 wd 0.0500 time 0.7420 (0.7420) data time 0.5063 (0.5063) model time 0.0000 (0.0000) loss 6.8164 (6.8164) grad_norm 2.4077 (2.4077) loss_scale 2048.0000 (2048.0000) mem 9654MB [2024-07-31 09:47:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][10/625] eta 0:03:04 lr 0.001378 wd 0.0500 time 0.2548 (0.3004) data time 0.0009 (0.0468) model time 0.0000 (0.0000) loss 5.8593 (5.8600) grad_norm 1.2990 (1.6673) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 09:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][20/625] eta 0:02:49 lr 0.001378 wd 0.0500 time 0.2598 (0.2803) data time 0.0009 (0.0254) model time 0.0000 (0.0000) loss 6.1931 (5.8411) grad_norm 2.0693 (1.9928) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 09:47:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 09:47:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 09:47:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 09:52:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 09:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:00:50 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 126) [2024-07-31 10:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][30/625] eta 0:30:24 lr 0.001378 wd 0.0500 time 0.2750 (3.0669) data time 0.0008 (0.2426) model time 0.0000 (0.0000) loss 5.5018 (6.3684) grad_norm 1.0560 (1.2454) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][40/625] eta 0:08:51 lr 0.001378 wd 0.0500 time 0.2561 (0.9082) data time 0.0009 (0.0571) model time 0.0000 (0.0000) loss 6.2645 (6.4134) grad_norm 4.1197 (1.9349) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][50/625] eta 0:06:00 lr 0.001378 wd 0.0500 time 0.2557 (0.6272) data time 0.0010 (0.0327) model time 0.0000 (0.0000) loss 7.1023 (6.4262) grad_norm 1.9188 (2.0530) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][60/625] eta 0:04:51 lr 0.001378 wd 0.0500 time 0.2582 (0.5156) data time 0.0008 (0.0231) model time 0.2574 (0.2580) loss 7.0994 (6.4020) grad_norm 1.7910 (2.0236) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][70/625] eta 0:04:12 lr 0.001377 wd 0.0500 time 0.2576 (0.4556) data time 0.0010 (0.0180) model time 0.2566 (0.2573) loss 6.0850 (6.3271) grad_norm 3.3949 (2.0873) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][80/625] eta 0:03:48 lr 0.001377 wd 0.0500 time 0.2571 (0.4185) data time 0.0010 (0.0148) model time 0.2561 (0.2574) loss 6.3176 (6.2942) grad_norm 2.1600 (2.0829) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][90/625] eta 0:03:30 lr 0.001377 wd 0.0500 time 0.2583 (0.3933) data time 0.0009 (0.0126) model time 0.2574 (0.2577) loss 6.1431 (6.2348) grad_norm 2.3976 (2.1460) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][100/625] eta 0:03:16 lr 0.001377 wd 0.0500 time 0.2576 (0.3750) data time 0.0010 (0.0110) model time 0.2566 (0.2581) loss 6.6444 (6.1806) grad_norm 1.9053 (2.1197) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][110/625] eta 0:03:06 lr 0.001377 wd 0.0500 time 0.2625 (0.3613) data time 0.0009 (0.0098) model time 0.2616 (0.2584) loss 4.5520 (6.1402) grad_norm 1.7973 (2.0413) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][120/625] eta 0:02:56 lr 0.001377 wd 0.0500 time 0.2557 (0.3504) data time 0.0015 (0.0088) model time 0.2542 (0.2585) loss 6.4871 (6.1208) grad_norm 5.6220 (2.0334) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][130/625] eta 0:02:49 lr 0.001376 wd 0.0500 time 0.2725 (0.3416) data time 0.0007 (0.0081) model time 0.2718 (0.2586) loss 6.4027 (6.1650) grad_norm 1.5445 (2.0362) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][140/625] eta 0:02:42 lr 0.001376 wd 0.0500 time 0.2575 (0.3345) data time 0.0010 (0.0075) model time 0.2565 (0.2586) loss 5.5648 (6.1419) grad_norm 1.4206 (1.9931) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][150/625] eta 0:02:36 lr 0.001376 wd 0.0500 time 0.2582 (0.3284) data time 0.0012 (0.0069) model time 0.2571 (0.2587) loss 5.9789 (6.1409) grad_norm 1.4645 (1.9978) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][160/625] eta 0:02:30 lr 0.001376 wd 0.0500 time 0.2616 (0.3233) data time 0.0010 (0.0065) model time 0.2606 (0.2588) loss 6.3589 (6.1321) grad_norm 1.4681 (1.9735) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][170/625] eta 0:02:25 lr 0.001376 wd 0.0500 time 0.2585 (0.3188) data time 0.0009 (0.0061) model time 0.2576 (0.2587) loss 7.1339 (6.1179) grad_norm 2.5110 (1.9658) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:01:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][180/625] eta 0:02:20 lr 0.001376 wd 0.0500 time 0.2599 (0.3150) data time 0.0007 (0.0058) model time 0.2591 (0.2588) loss 6.1848 (6.1155) grad_norm 2.4789 (1.9793) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][190/625] eta 0:02:15 lr 0.001375 wd 0.0500 time 0.2592 (0.3118) data time 0.0010 (0.0055) model time 0.2582 (0.2590) loss 5.1992 (6.1081) grad_norm 1.3009 (1.9957) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][200/625] eta 0:02:11 lr 0.001375 wd 0.0500 time 0.2642 (0.3091) data time 0.0011 (0.0052) model time 0.2631 (0.2593) loss 7.0999 (6.1091) grad_norm 1.3825 (1.9813) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][210/625] eta 0:02:07 lr 0.001375 wd 0.0500 time 0.2604 (0.3064) data time 0.0012 (0.0050) model time 0.2592 (0.2593) loss 6.3741 (6.0934) grad_norm 1.8648 (1.9796) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][220/625] eta 0:02:03 lr 0.001375 wd 0.0500 time 0.2545 (0.3041) data time 0.0009 (0.0048) model time 0.2536 (0.2594) loss 5.1707 (6.0807) grad_norm 1.4581 (1.9567) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][230/625] eta 0:01:59 lr 0.001375 wd 0.0500 time 0.2659 (0.3020) data time 0.0008 (0.0046) model time 0.2651 (0.2595) loss 5.1382 (6.0626) grad_norm 1.7015 (1.9387) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][240/625] eta 0:01:55 lr 0.001375 wd 0.0500 time 0.2663 (0.3002) data time 0.0010 (0.0044) model time 0.2653 (0.2596) loss 5.3824 (6.0597) grad_norm 1.6864 (1.9555) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][250/625] eta 0:01:51 lr 0.001374 wd 0.0500 time 0.2607 (0.2984) data time 0.0007 (0.0043) model time 0.2600 (0.2596) loss 5.0157 (6.0530) grad_norm 2.9222 (1.9533) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][260/625] eta 0:01:48 lr 0.001374 wd 0.0500 time 0.2625 (0.2970) data time 0.0009 (0.0041) model time 0.2617 (0.2598) loss 5.0827 (6.0422) grad_norm 1.8286 (1.9575) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][270/625] eta 0:01:44 lr 0.001374 wd 0.0500 time 0.2700 (0.2956) data time 0.0009 (0.0040) model time 0.2691 (0.2599) loss 7.0355 (6.0478) grad_norm 1.5670 (1.9651) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][280/625] eta 0:01:41 lr 0.001374 wd 0.0500 time 0.2633 (0.2943) data time 0.0009 (0.0039) model time 0.2624 (0.2600) loss 6.7587 (6.0356) grad_norm 1.4719 (1.9746) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][290/625] eta 0:01:38 lr 0.001374 wd 0.0500 time 0.2616 (0.2930) data time 0.0008 (0.0038) model time 0.2608 (0.2600) loss 5.7116 (6.0233) grad_norm 2.3383 (1.9771) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][300/625] eta 0:01:34 lr 0.001374 wd 0.0500 time 0.2598 (0.2919) data time 0.0010 (0.0037) model time 0.2587 (0.2601) loss 6.3009 (6.0174) grad_norm 3.1401 (1.9984) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][310/625] eta 0:01:31 lr 0.001373 wd 0.0500 time 0.2629 (0.2908) data time 0.0009 (0.0036) model time 0.2620 (0.2601) loss 6.5711 (6.0183) grad_norm 3.1164 (2.0180) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][320/625] eta 0:01:28 lr 0.001373 wd 0.0500 time 0.2566 (0.2898) data time 0.0012 (0.0035) model time 0.2554 (0.2600) loss 5.0118 (6.0105) grad_norm 1.3400 (2.0125) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][330/625] eta 0:01:25 lr 0.001373 wd 0.0500 time 0.2673 (0.2889) data time 0.0008 (0.0034) model time 0.2666 (0.2601) loss 5.8840 (5.9936) grad_norm 1.3283 (2.0056) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][340/625] eta 0:01:22 lr 0.001373 wd 0.0500 time 0.2602 (0.2880) data time 0.0008 (0.0033) model time 0.2593 (0.2601) loss 7.1368 (5.9992) grad_norm 1.7267 (1.9927) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][350/625] eta 0:01:18 lr 0.001373 wd 0.0500 time 0.2596 (0.2872) data time 0.0011 (0.0033) model time 0.2585 (0.2602) loss 6.5271 (6.0126) grad_norm 1.7313 (1.9806) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][360/625] eta 0:01:15 lr 0.001373 wd 0.0500 time 0.2623 (0.2865) data time 0.0010 (0.0032) model time 0.2613 (0.2602) loss 6.4166 (6.0141) grad_norm 1.1037 (1.9707) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][370/625] eta 0:01:12 lr 0.001372 wd 0.0500 time 0.2644 (0.2859) data time 0.0009 (0.0031) model time 0.2634 (0.2604) loss 6.6287 (6.0206) grad_norm 1.2751 (1.9594) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][380/625] eta 0:01:09 lr 0.001372 wd 0.0500 time 0.2572 (0.2853) data time 0.0009 (0.0031) model time 0.2563 (0.2604) loss 6.6397 (6.0218) grad_norm 1.8471 (1.9563) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][390/625] eta 0:01:06 lr 0.001372 wd 0.0500 time 0.2632 (0.2846) data time 0.0008 (0.0030) model time 0.2624 (0.2604) loss 5.8566 (6.0220) grad_norm 3.9346 (1.9797) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][400/625] eta 0:01:03 lr 0.001372 wd 0.0500 time 0.2646 (0.2840) data time 0.0007 (0.0030) model time 0.2638 (0.2605) loss 7.1226 (6.0161) grad_norm 1.6769 (1.9916) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][410/625] eta 0:01:01 lr 0.001372 wd 0.0500 time 0.2643 (0.2839) data time 0.0010 (0.0029) model time 0.2632 (0.2610) loss 5.0633 (6.0168) grad_norm 1.5746 (1.9811) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][420/625] eta 0:00:58 lr 0.001372 wd 0.0500 time 0.2583 (0.2835) data time 0.0011 (0.0029) model time 0.2572 (0.2611) loss 7.0515 (6.0116) grad_norm 1.2835 (1.9739) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][430/625] eta 0:00:55 lr 0.001371 wd 0.0500 time 0.2578 (0.2831) data time 0.0018 (0.0028) model time 0.2560 (0.2612) loss 4.9021 (6.0181) grad_norm 1.4255 (1.9651) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][440/625] eta 0:00:52 lr 0.001371 wd 0.0500 time 0.2565 (0.2826) data time 0.0012 (0.0028) model time 0.2553 (0.2612) loss 6.8799 (6.0236) grad_norm 1.1232 (1.9561) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][450/625] eta 0:00:49 lr 0.001371 wd 0.0500 time 0.2591 (0.2827) data time 0.0009 (0.0028) model time 0.2582 (0.2619) loss 5.2723 (6.0199) grad_norm 4.1189 (1.9557) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][460/625] eta 0:00:46 lr 0.001371 wd 0.0500 time 0.2678 (0.2823) data time 0.0007 (0.0027) model time 0.2672 (0.2619) loss 7.6630 (6.0330) grad_norm 2.4521 (1.9523) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][470/625] eta 0:00:43 lr 0.001371 wd 0.0500 time 0.2820 (0.2823) data time 0.0007 (0.0027) model time 0.2813 (0.2624) loss 5.8724 (6.0365) grad_norm 1.3677 (1.9447) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][480/625] eta 0:00:40 lr 0.001371 wd 0.0500 time 0.2576 (0.2819) data time 0.0009 (0.0027) model time 0.2567 (0.2624) loss 4.8268 (6.0308) grad_norm 1.9316 (1.9434) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][490/625] eta 0:00:38 lr 0.001370 wd 0.0500 time 0.2589 (0.2815) data time 0.0010 (0.0026) model time 0.2579 (0.2624) loss 5.6438 (6.0291) grad_norm 2.4848 (1.9469) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][500/625] eta 0:00:35 lr 0.001370 wd 0.0500 time 0.2579 (0.2811) data time 0.0008 (0.0026) model time 0.2571 (0.2623) loss 4.8833 (6.0179) grad_norm 2.0139 (1.9462) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][510/625] eta 0:00:32 lr 0.001370 wd 0.0500 time 0.2704 (0.2807) data time 0.0008 (0.0026) model time 0.2696 (0.2623) loss 7.4205 (6.0193) grad_norm 1.5684 (1.9390) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][520/625] eta 0:00:29 lr 0.001370 wd 0.0500 time 0.2590 (0.2803) data time 0.0008 (0.0025) model time 0.2582 (0.2623) loss 5.1143 (6.0200) grad_norm 2.3631 (1.9444) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][530/625] eta 0:00:26 lr 0.001370 wd 0.0500 time 0.2639 (0.2800) data time 0.0007 (0.0025) model time 0.2633 (0.2623) loss 4.9546 (6.0146) grad_norm 1.5163 (1.9461) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][540/625] eta 0:00:23 lr 0.001370 wd 0.0500 time 0.2617 (0.2797) data time 0.0008 (0.0025) model time 0.2609 (0.2623) loss 6.5102 (6.0245) grad_norm 1.5366 (1.9456) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][550/625] eta 0:00:20 lr 0.001369 wd 0.0500 time 0.2622 (0.2794) data time 0.0009 (0.0024) model time 0.2612 (0.2623) loss 6.0165 (6.0186) grad_norm 2.8223 (1.9414) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][560/625] eta 0:00:18 lr 0.001369 wd 0.0500 time 0.2582 (0.2790) data time 0.0008 (0.0024) model time 0.2574 (0.2623) loss 6.4666 (6.0182) grad_norm 1.0445 (1.9367) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][570/625] eta 0:00:15 lr 0.001369 wd 0.0500 time 0.2605 (0.2787) data time 0.0009 (0.0024) model time 0.2596 (0.2622) loss 5.5571 (6.0147) grad_norm 1.9947 (1.9403) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][580/625] eta 0:00:12 lr 0.001369 wd 0.0500 time 0.2622 (0.2784) data time 0.0010 (0.0024) model time 0.2612 (0.2622) loss 6.7822 (6.0187) grad_norm 1.3254 (1.9314) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:03:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:03:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:03:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:08:35 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 126) [2024-07-31 10:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][590/625] eta 0:01:32 lr 0.001369 wd 0.0500 time 0.2567 (2.6491) data time 0.0006 (0.2766) model time 0.2561 (2.3725) loss 5.6051 (6.7367) grad_norm 1.6042 (2.4455) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][600/625] eta 0:00:20 lr 0.001369 wd 0.0500 time 0.2720 (0.8079) data time 0.0010 (0.0646) model time 0.2710 (0.7433) loss 6.6078 (6.5435) grad_norm 1.2224 (1.8739) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][610/625] eta 0:00:08 lr 0.001368 wd 0.0500 time 0.2502 (0.5667) data time 0.0004 (0.0370) model time 0.2498 (0.5297) loss 7.1326 (6.5478) grad_norm 2.5590 (1.8943) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][620/625] eta 0:00:02 lr 0.001368 wd 0.0500 time 0.2495 (0.4722) data time 0.0004 (0.0261) model time 0.2491 (0.4461) loss 6.9944 (6.4692) grad_norm 2.3321 (1.8383) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 126 training takes 0:00:16 [2024-07-31 10:09:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:09:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.395 (0.395) Loss 0.6455 (0.6455) Acc@1 88.135 (88.135) Acc@5 97.852 (97.852) Mem 9656MB [2024-07-31 10:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.092) Loss 1.0605 (0.8018) Acc@1 77.197 (83.598) Acc@5 93.945 (96.804) Mem 9656MB [2024-07-31 10:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.074) Loss 1.1973 (0.9530) Acc@1 72.168 (79.781) Acc@5 92.871 (95.020) Mem 9656MB [2024-07-31 10:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.563 Acc@5 94.996 [2024-07-31 10:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-31 10:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.56% [2024-07-31 10:09:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 10:09:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 10:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.394 (0.394) Loss 0.5815 (0.5815) Acc@1 88.428 (88.428) Acc@5 98.584 (98.584) Mem 9656MB [2024-07-31 10:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.087) Loss 0.9355 (0.7231) Acc@1 78.711 (84.801) Acc@5 94.971 (97.257) Mem 9656MB [2024-07-31 10:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.071) Loss 1.0820 (0.8605) Acc@1 73.926 (81.108) Acc@5 93.652 (95.701) Mem 9656MB [2024-07-31 10:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.756 Acc@5 95.673 [2024-07-31 10:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-07-31 10:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.76% [2024-07-31 10:09:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 10:09:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 10:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][0/625] eta 0:07:11 lr 0.001368 wd 0.0500 time 0.6905 (0.6905) data time 0.3905 (0.3905) model time 0.0000 (0.0000) loss 6.0316 (6.0316) grad_norm 1.4801 (1.4801) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][10/625] eta 0:03:00 lr 0.001368 wd 0.0500 time 0.2570 (0.2934) data time 0.0007 (0.0364) model time 0.0000 (0.0000) loss 5.9562 (6.1375) grad_norm 2.2810 (2.2524) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][20/625] eta 0:02:47 lr 0.001368 wd 0.0500 time 0.2541 (0.2774) data time 0.0010 (0.0195) model time 0.0000 (0.0000) loss 5.8497 (6.0651) grad_norm 4.2250 (2.2167) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][30/625] eta 0:02:42 lr 0.001368 wd 0.0500 time 0.2769 (0.2728) data time 0.0006 (0.0135) model time 0.0000 (0.0000) loss 4.5834 (5.9666) grad_norm 1.8970 (2.3726) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][40/625] eta 0:02:36 lr 0.001368 wd 0.0500 time 0.2556 (0.2682) data time 0.0007 (0.0105) model time 0.0000 (0.0000) loss 5.3771 (5.9084) grad_norm 3.3178 (2.3216) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][50/625] eta 0:02:32 lr 0.001367 wd 0.0500 time 0.2572 (0.2656) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 6.2173 (5.8746) grad_norm 1.6482 (2.2338) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][60/625] eta 0:02:29 lr 0.001367 wd 0.0500 time 0.2676 (0.2647) data time 0.0009 (0.0074) model time 0.2666 (0.2587) loss 7.1194 (5.9666) grad_norm 2.4797 (2.1750) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][70/625] eta 0:02:26 lr 0.001367 wd 0.0500 time 0.2563 (0.2634) data time 0.0007 (0.0065) model time 0.2556 (0.2567) loss 4.7078 (6.0285) grad_norm 1.4045 (2.1162) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][80/625] eta 0:02:22 lr 0.001367 wd 0.0500 time 0.2523 (0.2623) data time 0.0017 (0.0058) model time 0.2506 (0.2556) loss 6.0427 (6.0352) grad_norm 1.8712 (2.0430) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][90/625] eta 0:02:19 lr 0.001367 wd 0.0500 time 0.2628 (0.2615) data time 0.0006 (0.0053) model time 0.2622 (0.2552) loss 5.8085 (6.0368) grad_norm 1.4628 (2.0679) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][100/625] eta 0:02:16 lr 0.001367 wd 0.0500 time 0.2543 (0.2608) data time 0.0011 (0.0048) model time 0.2532 (0.2549) loss 6.5576 (6.0465) grad_norm 2.9241 (2.0661) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][110/625] eta 0:02:14 lr 0.001366 wd 0.0500 time 0.2540 (0.2603) data time 0.0009 (0.0045) model time 0.2531 (0.2548) loss 6.2347 (6.0330) grad_norm 1.6007 (2.0387) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][120/625] eta 0:02:11 lr 0.001366 wd 0.0500 time 0.2559 (0.2601) data time 0.0007 (0.0042) model time 0.2552 (0.2551) loss 5.3087 (6.0301) grad_norm 1.7284 (2.0336) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][130/625] eta 0:02:08 lr 0.001366 wd 0.0500 time 0.2620 (0.2598) data time 0.0010 (0.0040) model time 0.2610 (0.2550) loss 6.3894 (6.0434) grad_norm 1.2375 (2.0258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][140/625] eta 0:02:05 lr 0.001366 wd 0.0500 time 0.2549 (0.2595) data time 0.0007 (0.0037) model time 0.2542 (0.2550) loss 5.2674 (6.0095) grad_norm 1.3696 (2.0080) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:09:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][150/625] eta 0:02:03 lr 0.001366 wd 0.0500 time 0.2658 (0.2593) data time 0.0007 (0.0036) model time 0.2652 (0.2551) loss 7.1720 (6.0185) grad_norm 1.7424 (1.9950) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][160/625] eta 0:02:00 lr 0.001366 wd 0.0500 time 0.2577 (0.2591) data time 0.0012 (0.0034) model time 0.2565 (0.2551) loss 4.7813 (6.0031) grad_norm 1.3685 (1.9816) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][170/625] eta 0:01:57 lr 0.001365 wd 0.0500 time 0.2569 (0.2590) data time 0.0011 (0.0033) model time 0.2558 (0.2553) loss 6.1854 (5.9883) grad_norm 1.2051 (1.9688) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][180/625] eta 0:01:55 lr 0.001365 wd 0.0500 time 0.2544 (0.2588) data time 0.0010 (0.0031) model time 0.2534 (0.2552) loss 5.6440 (5.9701) grad_norm 1.4767 (1.9412) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][190/625] eta 0:01:52 lr 0.001365 wd 0.0500 time 0.2554 (0.2587) data time 0.0009 (0.0030) model time 0.2545 (0.2551) loss 6.4506 (5.9795) grad_norm 1.6687 (1.9272) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][200/625] eta 0:01:49 lr 0.001365 wd 0.0500 time 0.2584 (0.2586) data time 0.0007 (0.0029) model time 0.2577 (0.2552) loss 6.0837 (5.9668) grad_norm 1.8814 (1.9435) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][210/625] eta 0:01:47 lr 0.001365 wd 0.0500 time 0.2628 (0.2584) data time 0.0010 (0.0028) model time 0.2619 (0.2552) loss 4.8345 (5.9712) grad_norm 1.3260 (1.9312) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][220/625] eta 0:01:44 lr 0.001365 wd 0.0500 time 0.2581 (0.2584) data time 0.0008 (0.0027) model time 0.2573 (0.2552) loss 6.0709 (5.9556) grad_norm 2.2620 (1.9258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][230/625] eta 0:01:42 lr 0.001364 wd 0.0500 time 0.2603 (0.2584) data time 0.0006 (0.0027) model time 0.2597 (0.2554) loss 6.4266 (5.9479) grad_norm 1.5143 (1.9414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][240/625] eta 0:01:39 lr 0.001364 wd 0.0500 time 0.2483 (0.2583) data time 0.0011 (0.0026) model time 0.2472 (0.2554) loss 5.1540 (5.9543) grad_norm 3.1652 (1.9445) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][250/625] eta 0:01:36 lr 0.001364 wd 0.0500 time 0.2539 (0.2583) data time 0.0008 (0.0025) model time 0.2531 (0.2555) loss 7.0208 (5.9658) grad_norm 1.8343 (1.9505) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:10:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:10:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:10:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:12:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:12:30 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 127) [2024-07-31 10:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:13:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][260/625] eta 0:08:41 lr 0.001364 wd 0.0500 time 0.2643 (1.4296) data time 0.0007 (0.1280) model time 0.2636 (1.3017) loss 6.2911 (6.3461) grad_norm 1.6809 (2.2574) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][270/625] eta 0:04:37 lr 0.001364 wd 0.0500 time 0.2548 (0.7808) data time 0.0011 (0.0575) model time 0.2537 (0.7233) loss 6.5639 (6.3063) grad_norm 1.6891 (2.0550) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][280/625] eta 0:03:25 lr 0.001364 wd 0.0500 time 0.2588 (0.5952) data time 0.0010 (0.0373) model time 0.2579 (0.5578) loss 6.6621 (6.3564) grad_norm 1.2750 (1.8875) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][290/625] eta 0:02:49 lr 0.001363 wd 0.0500 time 0.2641 (0.5072) data time 0.0011 (0.0278) model time 0.2630 (0.4794) loss 6.2445 (6.2662) grad_norm 1.4943 (1.7834) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][300/625] eta 0:02:28 lr 0.001363 wd 0.0500 time 0.2669 (0.4561) data time 0.0011 (0.0223) model time 0.2658 (0.4339) loss 6.6252 (6.2029) grad_norm 2.1513 (1.8181) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][310/625] eta 0:02:13 lr 0.001363 wd 0.0500 time 0.2629 (0.4227) data time 0.0010 (0.0186) model time 0.2619 (0.4041) loss 5.2537 (6.1730) grad_norm 1.8726 (1.8025) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][320/625] eta 0:02:01 lr 0.001363 wd 0.0500 time 0.2633 (0.3991) data time 0.0008 (0.0161) model time 0.2625 (0.3831) loss 5.1278 (6.1346) grad_norm 1.6032 (1.8303) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][330/625] eta 0:01:52 lr 0.001363 wd 0.0500 time 0.2599 (0.3812) data time 0.0008 (0.0141) model time 0.2591 (0.3671) loss 4.8274 (6.1066) grad_norm 1.1907 (1.7995) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][340/625] eta 0:01:44 lr 0.001363 wd 0.0500 time 0.2561 (0.3676) data time 0.0013 (0.0127) model time 0.2548 (0.3550) loss 5.9881 (6.0698) grad_norm 1.4076 (1.8231) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][350/625] eta 0:01:38 lr 0.001362 wd 0.0500 time 0.2594 (0.3569) data time 0.0008 (0.0115) model time 0.2587 (0.3454) loss 6.5728 (6.0713) grad_norm 1.3270 (1.8137) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][360/625] eta 0:01:32 lr 0.001362 wd 0.0500 time 0.2618 (0.3481) data time 0.0009 (0.0105) model time 0.2609 (0.3376) loss 5.6931 (6.0871) grad_norm 1.7633 (1.8126) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][370/625] eta 0:01:26 lr 0.001362 wd 0.0500 time 0.2590 (0.3407) data time 0.0031 (0.0097) model time 0.2559 (0.3310) loss 5.9993 (6.0913) grad_norm 1.4658 (1.7850) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][380/625] eta 0:01:22 lr 0.001362 wd 0.0500 time 0.2545 (0.3351) data time 0.0009 (0.0090) model time 0.2536 (0.3260) loss 5.5602 (6.0703) grad_norm 1.2658 (1.7897) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][390/625] eta 0:01:17 lr 0.001362 wd 0.0500 time 0.2597 (0.3297) data time 0.0010 (0.0085) model time 0.2588 (0.3212) loss 5.6396 (6.0717) grad_norm 1.4276 (1.7748) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:13:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:13:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:18:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:18:18 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:18:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:18:29 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:18:29 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:18:29 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:18:30 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 127) [2024-07-31 10:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][400/625] eta 0:27:25 lr 0.001362 wd 0.0500 time 7.3118 (7.3118) data time 0.6254 (0.6254) model time 6.6863 (6.6863) loss 7.1012 (7.1012) grad_norm 1.9183 (1.9183) loss_scale 2048.0000 (2048.0000) mem 10976MB [2024-07-31 10:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][410/625] eta 0:03:25 lr 0.001361 wd 0.0500 time 0.2721 (0.9581) data time 0.0009 (0.0579) model time 0.2711 (0.9002) loss 5.8229 (6.5201) grad_norm 2.2040 (1.8511) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][420/625] eta 0:02:08 lr 0.001361 wd 0.0500 time 0.2581 (0.6274) data time 0.0010 (0.0309) model time 0.2571 (0.5964) loss 5.8847 (6.3308) grad_norm 1.3615 (1.9040) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][430/625] eta 0:01:39 lr 0.001361 wd 0.0500 time 0.2670 (0.5101) data time 0.0008 (0.0213) model time 0.2662 (0.4888) loss 4.7340 (6.3644) grad_norm 2.7878 (2.0109) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:18:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][440/625] eta 0:01:23 lr 0.001361 wd 0.0500 time 0.2651 (0.4496) data time 0.0011 (0.0164) model time 0.2639 (0.4332) loss 5.7355 (6.2331) grad_norm 1.7787 (2.0330) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:18:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][450/625] eta 0:01:12 lr 0.001361 wd 0.0500 time 0.2729 (0.4135) data time 0.0007 (0.0134) model time 0.2722 (0.4000) loss 6.6076 (6.2334) grad_norm 1.9344 (2.0057) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][460/625] eta 0:01:04 lr 0.001361 wd 0.0500 time 0.2601 (0.3891) data time 0.0010 (0.0114) model time 0.2591 (0.3777) loss 5.2412 (6.1726) grad_norm 1.1588 (1.9491) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][470/625] eta 0:00:57 lr 0.001360 wd 0.0500 time 0.2689 (0.3715) data time 0.0010 (0.0100) model time 0.2678 (0.3615) loss 5.5965 (6.1299) grad_norm 1.5193 (1.9060) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][480/625] eta 0:00:51 lr 0.001360 wd 0.0500 time 0.2603 (0.3579) data time 0.0010 (0.0089) model time 0.2593 (0.3490) loss 5.4695 (6.1184) grad_norm 4.5402 (1.9271) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][490/625] eta 0:00:46 lr 0.001360 wd 0.0500 time 0.2619 (0.3474) data time 0.0007 (0.0081) model time 0.2612 (0.3393) loss 6.5643 (6.1108) grad_norm 2.8501 (1.9747) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][500/625] eta 0:00:42 lr 0.001360 wd 0.0500 time 0.2544 (0.3390) data time 0.0010 (0.0074) model time 0.2534 (0.3316) loss 5.9179 (6.1272) grad_norm 1.1962 (1.9356) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][510/625] eta 0:00:38 lr 0.001360 wd 0.0500 time 0.2618 (0.3321) data time 0.0011 (0.0068) model time 0.2607 (0.3253) loss 4.9567 (6.1171) grad_norm 3.1781 (1.9271) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][520/625] eta 0:00:34 lr 0.001360 wd 0.0500 time 0.2605 (0.3266) data time 0.0009 (0.0064) model time 0.2596 (0.3202) loss 4.3500 (6.1210) grad_norm 1.9875 (1.9149) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][530/625] eta 0:00:30 lr 0.001359 wd 0.0500 time 0.2816 (0.3219) data time 0.0013 (0.0060) model time 0.2802 (0.3160) loss 6.8878 (6.1240) grad_norm 2.4632 (1.9255) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][540/625] eta 0:00:27 lr 0.001359 wd 0.0500 time 0.2585 (0.3178) data time 0.0011 (0.0056) model time 0.2574 (0.3122) loss 5.7425 (6.1232) grad_norm 1.6143 (1.9085) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][550/625] eta 0:00:23 lr 0.001359 wd 0.0500 time 0.2624 (0.3143) data time 0.0013 (0.0053) model time 0.2610 (0.3089) loss 5.3941 (6.1097) grad_norm 1.4104 (1.8886) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][560/625] eta 0:00:20 lr 0.001359 wd 0.0500 time 0.2598 (0.3110) data time 0.0010 (0.0051) model time 0.2588 (0.3060) loss 6.9304 (6.1202) grad_norm 2.0190 (1.8707) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][570/625] eta 0:00:16 lr 0.001359 wd 0.0500 time 0.2592 (0.3083) data time 0.0010 (0.0048) model time 0.2582 (0.3034) loss 6.4739 (6.1214) grad_norm 1.2930 (1.8473) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][580/625] eta 0:00:13 lr 0.001359 wd 0.0500 time 0.2612 (0.3058) data time 0.0011 (0.0046) model time 0.2601 (0.3012) loss 7.1235 (6.1050) grad_norm 1.6317 (1.8621) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][590/625] eta 0:00:10 lr 0.001358 wd 0.0500 time 0.2707 (0.3037) data time 0.0010 (0.0045) model time 0.2697 (0.2992) loss 5.8781 (6.1097) grad_norm 1.1179 (1.8576) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][600/625] eta 0:00:07 lr 0.001358 wd 0.0500 time 0.2644 (0.3018) data time 0.0010 (0.0043) model time 0.2634 (0.2975) loss 5.2817 (6.0955) grad_norm 2.5143 (1.8994) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][610/625] eta 0:00:04 lr 0.001358 wd 0.0500 time 0.2663 (0.3000) data time 0.0007 (0.0042) model time 0.2656 (0.2958) loss 7.3833 (6.0861) grad_norm 1.1400 (1.8972) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][620/625] eta 0:00:01 lr 0.001358 wd 0.0500 time 0.2605 (0.2983) data time 0.0005 (0.0040) model time 0.2600 (0.2943) loss 6.1582 (6.0727) grad_norm 1.8976 (1.9058) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 127 training takes 0:01:06 [2024-07-31 10:19:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:19:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:19:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.6470 (0.6470) Acc@1 87.695 (87.695) Acc@5 97.998 (97.998) Mem 9656MB [2024-07-31 10:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.058 (0.096) Loss 1.0586 (0.8121) Acc@1 76.807 (83.576) Acc@5 94.238 (96.919) Mem 9656MB [2024-07-31 10:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.076) Loss 1.1641 (0.9622) Acc@1 73.438 (79.657) Acc@5 92.969 (95.229) Mem 9656MB [2024-07-31 10:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.435 Acc@5 95.210 [2024-07-31 10:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-31 10:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.855 (0.855) Loss 0.5815 (0.5815) Acc@1 88.525 (88.525) Acc@5 98.584 (98.584) Mem 9656MB [2024-07-31 10:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.137) Loss 0.9355 (0.7232) Acc@1 78.906 (84.823) Acc@5 95.068 (97.283) Mem 9656MB [2024-07-31 10:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.098) Loss 1.0811 (0.8606) Acc@1 73.877 (81.136) Acc@5 93.701 (95.715) Mem 9656MB [2024-07-31 10:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.802 Acc@5 95.681 [2024-07-31 10:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-07-31 10:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.80% [2024-07-31 10:19:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 10:19:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 10:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][0/625] eta 0:08:53 lr 0.001358 wd 0.0500 time 0.8534 (0.8534) data time 0.4767 (0.4767) model time 0.0000 (0.0000) loss 5.3577 (5.3577) grad_norm 1.7219 (1.7219) loss_scale 2048.0000 (2048.0000) mem 9651MB [2024-07-31 10:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][10/625] eta 0:03:14 lr 0.001358 wd 0.0500 time 0.2730 (0.3170) data time 0.0008 (0.0442) model time 0.0000 (0.0000) loss 6.8147 (5.9498) grad_norm 1.2876 (1.6810) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:19:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][20/625] eta 0:02:56 lr 0.001358 wd 0.0500 time 0.2735 (0.2916) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 5.0121 (5.8678) grad_norm 1.1532 (1.5522) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][30/625] eta 0:02:47 lr 0.001357 wd 0.0500 time 0.2597 (0.2822) data time 0.0014 (0.0163) model time 0.0000 (0.0000) loss 4.6569 (5.7926) grad_norm 2.6283 (1.6324) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][40/625] eta 0:02:42 lr 0.001357 wd 0.0500 time 0.2616 (0.2772) data time 0.0007 (0.0126) model time 0.0000 (0.0000) loss 5.2035 (5.7558) grad_norm 1.4162 (1.6483) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][50/625] eta 0:02:38 lr 0.001357 wd 0.0500 time 0.2585 (0.2748) data time 0.0014 (0.0104) model time 0.0000 (0.0000) loss 6.8945 (5.8436) grad_norm 1.4526 (1.6610) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][60/625] eta 0:02:34 lr 0.001357 wd 0.0500 time 0.2608 (0.2728) data time 0.0011 (0.0089) model time 0.2597 (0.2617) loss 5.3176 (5.8493) grad_norm 2.7014 (1.7566) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][70/625] eta 0:02:30 lr 0.001357 wd 0.0500 time 0.2573 (0.2713) data time 0.0012 (0.0078) model time 0.2561 (0.2614) loss 5.1768 (5.8557) grad_norm 1.8792 (1.8197) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][80/625] eta 0:02:27 lr 0.001357 wd 0.0500 time 0.2572 (0.2704) data time 0.0009 (0.0070) model time 0.2563 (0.2617) loss 4.6778 (5.8250) grad_norm 1.4163 (1.8115) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][90/625] eta 0:02:24 lr 0.001356 wd 0.0500 time 0.2622 (0.2697) data time 0.0007 (0.0064) model time 0.2615 (0.2619) loss 6.7943 (5.8905) grad_norm 2.0822 (1.8284) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][100/625] eta 0:02:21 lr 0.001356 wd 0.0500 time 0.2574 (0.2690) data time 0.0011 (0.0059) model time 0.2563 (0.2619) loss 7.0737 (5.9452) grad_norm 2.3382 (1.8384) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][110/625] eta 0:02:18 lr 0.001356 wd 0.0500 time 0.2594 (0.2685) data time 0.0012 (0.0055) model time 0.2582 (0.2619) loss 5.9905 (5.9335) grad_norm 2.3293 (1.8321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][120/625] eta 0:02:15 lr 0.001356 wd 0.0500 time 0.2612 (0.2682) data time 0.0011 (0.0051) model time 0.2601 (0.2620) loss 6.7382 (5.9476) grad_norm 1.7594 (1.8478) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][130/625] eta 0:02:12 lr 0.001356 wd 0.0500 time 0.2575 (0.2678) data time 0.0010 (0.0048) model time 0.2565 (0.2621) loss 4.7330 (5.9497) grad_norm 1.9457 (1.8686) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][140/625] eta 0:02:09 lr 0.001356 wd 0.0500 time 0.2729 (0.2675) data time 0.0010 (0.0045) model time 0.2719 (0.2622) loss 6.8853 (5.9644) grad_norm 1.1297 (1.8541) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][150/625] eta 0:02:07 lr 0.001355 wd 0.0500 time 0.2683 (0.2691) data time 0.0008 (0.0043) model time 0.2674 (0.2649) loss 4.9266 (5.9635) grad_norm 2.5145 (1.8490) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][160/625] eta 0:02:05 lr 0.001355 wd 0.0500 time 0.2670 (0.2690) data time 0.0009 (0.0044) model time 0.2661 (0.2648) loss 6.8303 (5.9498) grad_norm 1.4131 (1.8594) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][170/625] eta 0:02:02 lr 0.001355 wd 0.0500 time 0.2635 (0.2688) data time 0.0007 (0.0042) model time 0.2628 (0.2646) loss 5.9836 (5.9507) grad_norm 1.4618 (1.8633) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][180/625] eta 0:01:59 lr 0.001355 wd 0.0500 time 0.2640 (0.2685) data time 0.0014 (0.0040) model time 0.2625 (0.2644) loss 7.0833 (5.9677) grad_norm 1.2933 (1.8701) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][190/625] eta 0:01:56 lr 0.001355 wd 0.0500 time 0.2602 (0.2682) data time 0.0008 (0.0039) model time 0.2593 (0.2642) loss 5.6252 (5.9731) grad_norm 1.4208 (1.8483) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][200/625] eta 0:01:53 lr 0.001355 wd 0.0500 time 0.2707 (0.2680) data time 0.0015 (0.0037) model time 0.2692 (0.2642) loss 6.6678 (5.9862) grad_norm 1.6055 (1.8364) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][210/625] eta 0:01:51 lr 0.001354 wd 0.0500 time 0.2642 (0.2678) data time 0.0010 (0.0036) model time 0.2631 (0.2641) loss 5.0161 (6.0009) grad_norm 1.6126 (1.8219) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][220/625] eta 0:01:48 lr 0.001354 wd 0.0500 time 0.2613 (0.2676) data time 0.0008 (0.0035) model time 0.2606 (0.2640) loss 5.6274 (6.0005) grad_norm 2.3323 (1.8357) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][230/625] eta 0:01:45 lr 0.001354 wd 0.0500 time 0.2595 (0.2674) data time 0.0009 (0.0034) model time 0.2586 (0.2638) loss 5.6111 (5.9954) grad_norm 1.5580 (1.8399) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][240/625] eta 0:01:42 lr 0.001354 wd 0.0500 time 0.2752 (0.2673) data time 0.0009 (0.0033) model time 0.2744 (0.2638) loss 4.5132 (5.9823) grad_norm 1.6429 (1.8363) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][250/625] eta 0:01:40 lr 0.001354 wd 0.0500 time 0.2637 (0.2670) data time 0.0007 (0.0032) model time 0.2630 (0.2637) loss 5.5621 (5.9635) grad_norm 1.2486 (1.8655) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][260/625] eta 0:01:37 lr 0.001354 wd 0.0500 time 0.2689 (0.2673) data time 0.0012 (0.0032) model time 0.2678 (0.2640) loss 7.0969 (5.9774) grad_norm 1.6442 (1.8709) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][270/625] eta 0:01:34 lr 0.001353 wd 0.0500 time 0.2646 (0.2673) data time 0.0008 (0.0031) model time 0.2637 (0.2641) loss 5.4589 (5.9734) grad_norm 2.1473 (1.8913) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][280/625] eta 0:01:32 lr 0.001353 wd 0.0500 time 0.2655 (0.2681) data time 0.0010 (0.0031) model time 0.2645 (0.2652) loss 5.5801 (5.9707) grad_norm 1.3506 (1.8954) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][290/625] eta 0:01:29 lr 0.001353 wd 0.0500 time 0.2664 (0.2680) data time 0.0011 (0.0030) model time 0.2653 (0.2651) loss 6.2183 (5.9903) grad_norm 1.7351 (1.9004) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][300/625] eta 0:01:27 lr 0.001353 wd 0.0500 time 0.2668 (0.2679) data time 0.0007 (0.0030) model time 0.2661 (0.2650) loss 4.8489 (5.9861) grad_norm 1.2131 (1.8937) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][310/625] eta 0:01:24 lr 0.001353 wd 0.0500 time 0.2621 (0.2677) data time 0.0011 (0.0029) model time 0.2610 (0.2649) loss 5.9023 (5.9820) grad_norm 1.1693 (1.8912) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][320/625] eta 0:01:21 lr 0.001353 wd 0.0500 time 0.2637 (0.2677) data time 0.0009 (0.0028) model time 0.2629 (0.2649) loss 5.8874 (5.9797) grad_norm 1.3704 (1.9204) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][330/625] eta 0:01:18 lr 0.001352 wd 0.0500 time 0.2635 (0.2676) data time 0.0010 (0.0028) model time 0.2625 (0.2648) loss 6.8315 (5.9917) grad_norm 3.0045 (1.9321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][340/625] eta 0:01:16 lr 0.001352 wd 0.0500 time 0.2594 (0.2675) data time 0.0009 (0.0028) model time 0.2585 (0.2648) loss 4.8206 (5.9880) grad_norm 2.3137 (1.9315) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][350/625] eta 0:01:13 lr 0.001352 wd 0.0500 time 0.2644 (0.2674) data time 0.0009 (0.0027) model time 0.2635 (0.2647) loss 4.9259 (5.9848) grad_norm 2.2609 (1.9342) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][360/625] eta 0:01:10 lr 0.001352 wd 0.0500 time 0.2596 (0.2673) data time 0.0012 (0.0027) model time 0.2584 (0.2646) loss 6.3193 (5.9942) grad_norm 1.2431 (1.9275) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][370/625] eta 0:01:08 lr 0.001352 wd 0.0500 time 0.2701 (0.2672) data time 0.0007 (0.0026) model time 0.2694 (0.2646) loss 5.0044 (5.9958) grad_norm 1.4952 (1.9209) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][380/625] eta 0:01:05 lr 0.001352 wd 0.0500 time 0.2630 (0.2672) data time 0.0008 (0.0026) model time 0.2622 (0.2646) loss 5.7215 (5.9952) grad_norm 1.5376 (1.9112) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][390/625] eta 0:01:02 lr 0.001351 wd 0.0500 time 0.2653 (0.2671) data time 0.0010 (0.0026) model time 0.2643 (0.2645) loss 5.8762 (6.0022) grad_norm 2.7685 (1.9126) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][400/625] eta 0:01:00 lr 0.001351 wd 0.0500 time 0.2616 (0.2670) data time 0.0009 (0.0025) model time 0.2606 (0.2645) loss 5.8437 (6.0171) grad_norm 2.1873 (1.9084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][410/625] eta 0:00:57 lr 0.001351 wd 0.0500 time 0.2638 (0.2671) data time 0.0008 (0.0025) model time 0.2630 (0.2646) loss 5.8279 (6.0182) grad_norm 1.9152 (1.9027) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][420/625] eta 0:00:54 lr 0.001351 wd 0.0500 time 0.2657 (0.2671) data time 0.0010 (0.0025) model time 0.2647 (0.2646) loss 6.1361 (6.0158) grad_norm 2.2649 (1.9063) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][430/625] eta 0:00:52 lr 0.001351 wd 0.0500 time 0.2654 (0.2670) data time 0.0010 (0.0025) model time 0.2643 (0.2646) loss 4.6743 (6.0120) grad_norm 3.7522 (1.9267) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][440/625] eta 0:00:49 lr 0.001351 wd 0.0500 time 0.2603 (0.2670) data time 0.0009 (0.0024) model time 0.2593 (0.2646) loss 6.4881 (6.0044) grad_norm 1.7293 (1.9271) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][450/625] eta 0:00:46 lr 0.001350 wd 0.0500 time 0.2720 (0.2669) data time 0.0008 (0.0024) model time 0.2712 (0.2646) loss 5.3758 (6.0120) grad_norm 1.9998 (1.9285) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][460/625] eta 0:00:44 lr 0.001350 wd 0.0500 time 0.2591 (0.2668) data time 0.0009 (0.0024) model time 0.2582 (0.2645) loss 5.4364 (6.0073) grad_norm 1.0963 (1.9208) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][470/625] eta 0:00:41 lr 0.001350 wd 0.0500 time 0.2610 (0.2668) data time 0.0008 (0.0023) model time 0.2602 (0.2645) loss 5.4761 (6.0027) grad_norm 1.4020 (1.9150) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][480/625] eta 0:00:38 lr 0.001350 wd 0.0500 time 0.2676 (0.2672) data time 0.0011 (0.0024) model time 0.2665 (0.2649) loss 6.2571 (6.0077) grad_norm 1.6876 (1.9106) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][490/625] eta 0:00:36 lr 0.001350 wd 0.0500 time 0.2624 (0.2671) data time 0.0010 (0.0023) model time 0.2614 (0.2648) loss 6.3088 (5.9971) grad_norm 2.7807 (1.9149) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][500/625] eta 0:00:33 lr 0.001350 wd 0.0500 time 0.2632 (0.2671) data time 0.0011 (0.0023) model time 0.2621 (0.2649) loss 6.4382 (5.9979) grad_norm 1.6090 (1.9143) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][510/625] eta 0:00:30 lr 0.001349 wd 0.0500 time 0.2652 (0.2671) data time 0.0008 (0.0023) model time 0.2645 (0.2649) loss 6.2634 (6.0034) grad_norm 1.3999 (1.9190) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][520/625] eta 0:00:28 lr 0.001349 wd 0.0500 time 0.2636 (0.2672) data time 0.0008 (0.0023) model time 0.2628 (0.2649) loss 6.5735 (6.0072) grad_norm 1.2408 (1.9148) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][530/625] eta 0:00:25 lr 0.001349 wd 0.0500 time 0.2631 (0.2671) data time 0.0010 (0.0023) model time 0.2621 (0.2649) loss 7.0907 (6.0065) grad_norm 1.2159 (1.9190) loss_scale 4096.0000 (2086.5687) mem 9655MB [2024-07-31 10:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][540/625] eta 0:00:22 lr 0.001349 wd 0.0500 time 0.2715 (0.2671) data time 0.0010 (0.0022) model time 0.2705 (0.2649) loss 6.6489 (6.0133) grad_norm 1.4780 (1.9304) loss_scale 4096.0000 (2123.7116) mem 9655MB [2024-07-31 10:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][550/625] eta 0:00:20 lr 0.001349 wd 0.0500 time 0.2660 (0.2671) data time 0.0010 (0.0022) model time 0.2650 (0.2649) loss 6.3921 (6.0155) grad_norm 2.2073 (1.9321) loss_scale 4096.0000 (2159.5064) mem 9655MB [2024-07-31 10:22:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][560/625] eta 0:00:17 lr 0.001349 wd 0.0500 time 0.2739 (0.2671) data time 0.0010 (0.0022) model time 0.2729 (0.2649) loss 5.9619 (6.0159) grad_norm 1.4217 (1.9263) loss_scale 4096.0000 (2194.0250) mem 9655MB [2024-07-31 10:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][570/625] eta 0:00:14 lr 0.001348 wd 0.0500 time 0.2618 (0.2671) data time 0.0009 (0.0022) model time 0.2609 (0.2649) loss 5.4411 (6.0154) grad_norm 2.8357 (1.9214) loss_scale 4096.0000 (2227.3345) mem 9655MB [2024-07-31 10:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][580/625] eta 0:00:12 lr 0.001348 wd 0.0500 time 0.2605 (0.2670) data time 0.0008 (0.0022) model time 0.2598 (0.2649) loss 4.7975 (6.0127) grad_norm 1.7044 (1.9222) loss_scale 4096.0000 (2259.4974) mem 9655MB [2024-07-31 10:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][590/625] eta 0:00:09 lr 0.001348 wd 0.0500 time 0.2688 (0.2670) data time 0.0008 (0.0022) model time 0.2680 (0.2649) loss 6.7529 (6.0101) grad_norm 1.6325 (1.9201) loss_scale 4096.0000 (2290.5719) mem 9655MB [2024-07-31 10:22:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][600/625] eta 0:00:06 lr 0.001348 wd 0.0500 time 0.2642 (0.2670) data time 0.0015 (0.0022) model time 0.2627 (0.2649) loss 4.8011 (6.0051) grad_norm 1.8954 (1.9181) loss_scale 4096.0000 (2320.6123) mem 9655MB [2024-07-31 10:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][610/625] eta 0:00:04 lr 0.001348 wd 0.0500 time 0.2593 (0.2670) data time 0.0005 (0.0021) model time 0.2588 (0.2649) loss 8.1422 (6.0035) grad_norm 2.0017 (1.9186) loss_scale 4096.0000 (2349.6694) mem 9655MB [2024-07-31 10:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][620/625] eta 0:00:01 lr 0.001348 wd 0.0500 time 0.2611 (0.2669) data time 0.0005 (0.0021) model time 0.2606 (0.2648) loss 6.1800 (6.0018) grad_norm 2.1237 (1.9210) loss_scale 4096.0000 (2377.7907) mem 9655MB [2024-07-31 10:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 128 training takes 0:02:46 [2024-07-31 10:22:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:22:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.6846 (0.6846) Acc@1 87.842 (87.842) Acc@5 97.852 (97.852) Mem 9655MB [2024-07-31 10:22:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.099) Loss 1.0713 (0.8176) Acc@1 77.490 (83.469) Acc@5 94.043 (96.911) Mem 9655MB [2024-07-31 10:22:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2021 (0.9657) Acc@1 72.900 (79.629) Acc@5 92.432 (95.189) Mem 9655MB [2024-07-31 10:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.341 Acc@5 95.134 [2024-07-31 10:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-31 10:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.802 (0.802) Loss 0.5820 (0.5820) Acc@1 88.525 (88.525) Acc@5 98.633 (98.633) Mem 9655MB [2024-07-31 10:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.134) Loss 0.9346 (0.7235) Acc@1 78.906 (84.877) Acc@5 95.068 (97.297) Mem 9655MB [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.0811 (0.8603) Acc@1 74.023 (81.197) Acc@5 93.750 (95.722) Mem 9655MB [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.852 Acc@5 95.689 [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.85% [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 10:22:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 10:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][0/625] eta 0:07:54 lr 0.001347 wd 0.0500 time 0.7589 (0.7589) data time 0.5040 (0.5040) model time 0.0000 (0.0000) loss 6.6323 (6.6323) grad_norm 1.4120 (1.4120) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][10/625] eta 0:03:09 lr 0.001347 wd 0.0500 time 0.2668 (0.3077) data time 0.0010 (0.0469) model time 0.0000 (0.0000) loss 6.2786 (6.0963) grad_norm 1.7565 (1.9611) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][20/625] eta 0:02:53 lr 0.001347 wd 0.0500 time 0.2627 (0.2868) data time 0.0008 (0.0251) model time 0.0000 (0.0000) loss 6.0494 (6.0562) grad_norm 2.1191 (2.0730) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][30/625] eta 0:02:45 lr 0.001347 wd 0.0500 time 0.2586 (0.2789) data time 0.0009 (0.0173) model time 0.0000 (0.0000) loss 6.7733 (6.0015) grad_norm 1.9255 (2.0767) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:22:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][40/625] eta 0:02:41 lr 0.001347 wd 0.0500 time 0.2679 (0.2758) data time 0.0015 (0.0134) model time 0.0000 (0.0000) loss 5.3054 (5.9864) grad_norm 2.5127 (2.2689) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][50/625] eta 0:02:37 lr 0.001347 wd 0.0500 time 0.2585 (0.2733) data time 0.0011 (0.0110) model time 0.0000 (0.0000) loss 6.4646 (5.9922) grad_norm 3.3694 (2.2995) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][60/625] eta 0:02:33 lr 0.001346 wd 0.0500 time 0.2676 (0.2717) data time 0.0010 (0.0093) model time 0.2667 (0.2629) loss 5.7061 (5.9938) grad_norm 2.1784 (2.1959) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][70/625] eta 0:02:30 lr 0.001346 wd 0.0500 time 0.2615 (0.2705) data time 0.0008 (0.0082) model time 0.2607 (0.2623) loss 6.5424 (6.0162) grad_norm 2.5631 (2.1734) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][80/625] eta 0:02:27 lr 0.001346 wd 0.0500 time 0.2623 (0.2698) data time 0.0009 (0.0073) model time 0.2614 (0.2628) loss 6.6833 (6.0511) grad_norm 1.0060 (2.1124) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][90/625] eta 0:02:23 lr 0.001346 wd 0.0500 time 0.2641 (0.2691) data time 0.0008 (0.0066) model time 0.2633 (0.2628) loss 6.4419 (6.0343) grad_norm 1.3329 (2.0870) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][100/625] eta 0:02:20 lr 0.001346 wd 0.0500 time 0.2692 (0.2685) data time 0.0009 (0.0061) model time 0.2683 (0.2625) loss 4.5539 (5.9774) grad_norm 1.2679 (2.0448) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][110/625] eta 0:02:18 lr 0.001346 wd 0.0500 time 0.2610 (0.2681) data time 0.0010 (0.0057) model time 0.2600 (0.2625) loss 6.7928 (5.9875) grad_norm 1.4897 (2.0092) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][120/625] eta 0:02:15 lr 0.001345 wd 0.0500 time 0.2629 (0.2678) data time 0.0014 (0.0053) model time 0.2616 (0.2627) loss 5.1983 (6.0054) grad_norm 3.0408 (2.0237) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][130/625] eta 0:02:12 lr 0.001345 wd 0.0500 time 0.2587 (0.2675) data time 0.0011 (0.0050) model time 0.2576 (0.2626) loss 6.6946 (6.0254) grad_norm 1.3965 (2.0043) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][140/625] eta 0:02:09 lr 0.001345 wd 0.0500 time 0.2655 (0.2673) data time 0.0011 (0.0047) model time 0.2644 (0.2628) loss 7.3054 (6.0371) grad_norm 2.2115 (1.9903) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][150/625] eta 0:02:06 lr 0.001345 wd 0.0500 time 0.2605 (0.2670) data time 0.0013 (0.0045) model time 0.2592 (0.2626) loss 5.6953 (6.0354) grad_norm 1.3779 (1.9732) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][160/625] eta 0:02:04 lr 0.001345 wd 0.0500 time 0.2615 (0.2667) data time 0.0010 (0.0043) model time 0.2605 (0.2625) loss 6.5907 (6.0513) grad_norm 1.9051 (2.0018) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][170/625] eta 0:02:01 lr 0.001345 wd 0.0500 time 0.2592 (0.2664) data time 0.0009 (0.0041) model time 0.2583 (0.2624) loss 6.6955 (6.0512) grad_norm 1.8969 (2.0158) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][180/625] eta 0:01:58 lr 0.001344 wd 0.0500 time 0.2885 (0.2664) data time 0.0011 (0.0039) model time 0.2875 (0.2626) loss 6.8518 (6.0431) grad_norm 1.3580 (2.0098) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][190/625] eta 0:01:55 lr 0.001344 wd 0.0500 time 0.2599 (0.2662) data time 0.0008 (0.0038) model time 0.2591 (0.2626) loss 6.1223 (6.0611) grad_norm 1.7537 (1.9979) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][200/625] eta 0:01:53 lr 0.001344 wd 0.0500 time 0.2703 (0.2662) data time 0.0010 (0.0036) model time 0.2693 (0.2626) loss 4.7269 (6.0572) grad_norm 2.6704 (2.0013) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][210/625] eta 0:01:50 lr 0.001344 wd 0.0500 time 0.2576 (0.2661) data time 0.0011 (0.0035) model time 0.2565 (0.2627) loss 5.3804 (6.0660) grad_norm 1.7896 (1.9885) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][220/625] eta 0:01:47 lr 0.001344 wd 0.0500 time 0.2781 (0.2662) data time 0.0011 (0.0034) model time 0.2770 (0.2630) loss 6.7484 (6.0613) grad_norm 1.1134 (1.9708) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][230/625] eta 0:01:45 lr 0.001344 wd 0.0500 time 0.2608 (0.2662) data time 0.0015 (0.0033) model time 0.2594 (0.2630) loss 4.2771 (6.0590) grad_norm 1.8866 (1.9588) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][240/625] eta 0:01:42 lr 0.001343 wd 0.0500 time 0.2669 (0.2661) data time 0.0010 (0.0032) model time 0.2659 (0.2630) loss 5.9578 (6.0562) grad_norm 2.1098 (1.9531) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][250/625] eta 0:01:40 lr 0.001343 wd 0.0500 time 0.2628 (0.2670) data time 0.0010 (0.0032) model time 0.2619 (0.2642) loss 6.7697 (6.0542) grad_norm 1.6003 (1.9410) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][260/625] eta 0:01:37 lr 0.001343 wd 0.0500 time 0.2651 (0.2669) data time 0.0008 (0.0031) model time 0.2643 (0.2642) loss 6.3507 (6.0588) grad_norm 1.2143 (1.9376) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][270/625] eta 0:01:34 lr 0.001343 wd 0.0500 time 0.2607 (0.2671) data time 0.0011 (0.0031) model time 0.2596 (0.2644) loss 5.0821 (6.0650) grad_norm 1.2064 (1.9234) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][280/625] eta 0:01:32 lr 0.001343 wd 0.0500 time 0.2588 (0.2670) data time 0.0009 (0.0030) model time 0.2579 (0.2644) loss 6.4468 (6.0675) grad_norm 1.2975 (1.9166) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][290/625] eta 0:01:29 lr 0.001343 wd 0.0500 time 0.2583 (0.2669) data time 0.0011 (0.0029) model time 0.2572 (0.2643) loss 4.5556 (6.0546) grad_norm 1.7557 (1.9055) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][300/625] eta 0:01:26 lr 0.001342 wd 0.0500 time 0.2606 (0.2669) data time 0.0012 (0.0029) model time 0.2594 (0.2644) loss 5.6972 (6.0566) grad_norm 2.2068 (1.9067) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][310/625] eta 0:01:24 lr 0.001342 wd 0.0500 time 0.2619 (0.2668) data time 0.0007 (0.0028) model time 0.2612 (0.2643) loss 6.5546 (6.0582) grad_norm 1.9637 (1.9103) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][320/625] eta 0:01:21 lr 0.001342 wd 0.0500 time 0.2637 (0.2668) data time 0.0010 (0.0028) model time 0.2627 (0.2644) loss 6.1210 (6.0568) grad_norm 2.8816 (1.9171) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][330/625] eta 0:01:18 lr 0.001342 wd 0.0500 time 0.2589 (0.2668) data time 0.0011 (0.0027) model time 0.2578 (0.2644) loss 6.0237 (6.0735) grad_norm 2.2195 (1.9159) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][340/625] eta 0:01:16 lr 0.001342 wd 0.0500 time 0.2647 (0.2668) data time 0.0010 (0.0027) model time 0.2637 (0.2644) loss 4.6269 (6.0785) grad_norm 1.3193 (1.9129) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][350/625] eta 0:01:13 lr 0.001342 wd 0.0500 time 0.2629 (0.2668) data time 0.0008 (0.0026) model time 0.2621 (0.2644) loss 6.6316 (6.0861) grad_norm 2.3120 (1.9129) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][360/625] eta 0:01:10 lr 0.001341 wd 0.0500 time 0.2630 (0.2668) data time 0.0009 (0.0026) model time 0.2621 (0.2645) loss 5.4168 (6.0857) grad_norm 1.4980 (1.9054) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][370/625] eta 0:01:08 lr 0.001341 wd 0.0500 time 0.2639 (0.2667) data time 0.0010 (0.0025) model time 0.2629 (0.2645) loss 5.2854 (6.0776) grad_norm 1.3449 (1.8993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][380/625] eta 0:01:05 lr 0.001341 wd 0.0500 time 0.2627 (0.2667) data time 0.0010 (0.0025) model time 0.2618 (0.2645) loss 6.7147 (6.0788) grad_norm 1.1204 (1.8936) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][390/625] eta 0:01:02 lr 0.001341 wd 0.0500 time 0.2592 (0.2666) data time 0.0011 (0.0025) model time 0.2581 (0.2644) loss 6.9623 (6.0809) grad_norm 1.6767 (1.8908) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][400/625] eta 0:00:59 lr 0.001341 wd 0.0500 time 0.2618 (0.2666) data time 0.0008 (0.0024) model time 0.2610 (0.2644) loss 5.5602 (6.0723) grad_norm 1.7717 (1.9053) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][410/625] eta 0:00:57 lr 0.001341 wd 0.0500 time 0.2642 (0.2665) data time 0.0010 (0.0024) model time 0.2632 (0.2644) loss 6.4532 (6.0746) grad_norm 1.2359 (1.8984) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][420/625] eta 0:00:54 lr 0.001340 wd 0.0500 time 0.2667 (0.2665) data time 0.0007 (0.0024) model time 0.2660 (0.2643) loss 5.0846 (6.0726) grad_norm 1.4363 (1.8849) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][430/625] eta 0:00:51 lr 0.001340 wd 0.0500 time 0.2642 (0.2664) data time 0.0008 (0.0023) model time 0.2634 (0.2642) loss 6.5504 (6.0785) grad_norm 1.3314 (1.8782) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][440/625] eta 0:00:49 lr 0.001340 wd 0.0500 time 0.2554 (0.2664) data time 0.0010 (0.0023) model time 0.2544 (0.2643) loss 7.1770 (6.0809) grad_norm 1.4868 (1.8758) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][450/625] eta 0:00:46 lr 0.001340 wd 0.0500 time 0.2627 (0.2663) data time 0.0008 (0.0023) model time 0.2619 (0.2643) loss 6.5394 (6.0814) grad_norm 1.7148 (1.8927) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][460/625] eta 0:00:43 lr 0.001340 wd 0.0500 time 0.2583 (0.2663) data time 0.0011 (0.0023) model time 0.2572 (0.2642) loss 4.7291 (6.0767) grad_norm 2.5022 (1.9176) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][470/625] eta 0:00:41 lr 0.001340 wd 0.0500 time 0.2645 (0.2662) data time 0.0010 (0.0022) model time 0.2635 (0.2642) loss 4.9698 (6.0637) grad_norm 1.4104 (1.9283) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][480/625] eta 0:00:38 lr 0.001339 wd 0.0500 time 0.2610 (0.2662) data time 0.0007 (0.0022) model time 0.2602 (0.2641) loss 5.6567 (6.0608) grad_norm 1.8845 (1.9277) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][490/625] eta 0:00:35 lr 0.001339 wd 0.0500 time 0.2661 (0.2661) data time 0.0007 (0.0022) model time 0.2654 (0.2641) loss 6.6274 (6.0635) grad_norm 1.8826 (1.9196) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][500/625] eta 0:00:33 lr 0.001339 wd 0.0500 time 0.2632 (0.2661) data time 0.0010 (0.0022) model time 0.2622 (0.2641) loss 5.9340 (6.0630) grad_norm 1.0818 (1.9127) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][510/625] eta 0:00:30 lr 0.001339 wd 0.0500 time 0.2634 (0.2661) data time 0.0010 (0.0022) model time 0.2624 (0.2641) loss 6.7117 (6.0644) grad_norm 2.0703 (1.9057) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][520/625] eta 0:00:27 lr 0.001339 wd 0.0500 time 0.2655 (0.2662) data time 0.0009 (0.0021) model time 0.2646 (0.2642) loss 7.1378 (6.0729) grad_norm 2.9240 (1.9024) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][530/625] eta 0:00:25 lr 0.001339 wd 0.0500 time 0.2621 (0.2662) data time 0.0010 (0.0021) model time 0.2612 (0.2642) loss 6.1308 (6.0777) grad_norm 2.3409 (1.9055) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][540/625] eta 0:00:22 lr 0.001338 wd 0.0500 time 0.2593 (0.2669) data time 0.0007 (0.0021) model time 0.2586 (0.2651) loss 6.9635 (6.0749) grad_norm 2.8536 (1.9104) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][550/625] eta 0:00:20 lr 0.001338 wd 0.0500 time 0.2602 (0.2669) data time 0.0011 (0.0021) model time 0.2591 (0.2651) loss 6.8315 (6.0780) grad_norm 1.2851 (1.9061) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][560/625] eta 0:00:17 lr 0.001338 wd 0.0500 time 0.2641 (0.2669) data time 0.0010 (0.0021) model time 0.2631 (0.2650) loss 6.3944 (6.0836) grad_norm 1.3699 (1.9025) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][570/625] eta 0:00:14 lr 0.001338 wd 0.0500 time 0.2600 (0.2669) data time 0.0021 (0.0021) model time 0.2579 (0.2651) loss 5.1330 (6.0796) grad_norm 1.2301 (1.8988) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][580/625] eta 0:00:12 lr 0.001338 wd 0.0500 time 0.2648 (0.2669) data time 0.0008 (0.0021) model time 0.2640 (0.2651) loss 6.0358 (6.0751) grad_norm 1.1636 (1.8930) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][590/625] eta 0:00:09 lr 0.001338 wd 0.0500 time 0.2655 (0.2669) data time 0.0007 (0.0021) model time 0.2648 (0.2651) loss 4.9740 (6.0725) grad_norm 1.6706 (1.8856) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][600/625] eta 0:00:06 lr 0.001337 wd 0.0500 time 0.2652 (0.2668) data time 0.0008 (0.0020) model time 0.2644 (0.2650) loss 5.5370 (6.0733) grad_norm 1.3006 (1.8828) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][610/625] eta 0:00:04 lr 0.001337 wd 0.0500 time 0.2616 (0.2668) data time 0.0007 (0.0020) model time 0.2609 (0.2650) loss 6.4060 (6.0796) grad_norm 1.9478 (1.8871) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][620/625] eta 0:00:01 lr 0.001337 wd 0.0500 time 0.2642 (0.2668) data time 0.0005 (0.0020) model time 0.2637 (0.2650) loss 4.8471 (6.0775) grad_norm 1.7884 (1.8890) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 129 training takes 0:02:46 [2024-07-31 10:25:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:25:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.6655 (0.6655) Acc@1 88.037 (88.037) Acc@5 97.949 (97.949) Mem 9655MB [2024-07-31 10:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.108) Loss 1.1006 (0.8132) Acc@1 76.074 (83.629) Acc@5 93.555 (96.955) Mem 9655MB [2024-07-31 10:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.2334 (0.9581) Acc@1 71.094 (79.908) Acc@5 92.578 (95.222) Mem 9655MB [2024-07-31 10:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.611 Acc@5 95.168 [2024-07-31 10:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-31 10:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.61% [2024-07-31 10:25:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-07-31 10:25:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-07-31 10:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.5820 (0.5820) Acc@1 88.623 (88.623) Acc@5 98.633 (98.633) Mem 9655MB [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9351 (0.7236) Acc@1 79.004 (84.917) Acc@5 95.166 (97.288) Mem 9655MB [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0801 (0.8601) Acc@1 73.828 (81.217) Acc@5 93.799 (95.717) Mem 9655MB [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.866 Acc@5 95.683 [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.87% [2024-07-31 10:25:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 10:25:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 10:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][0/625] eta 0:07:51 lr 0.001337 wd 0.0500 time 0.7543 (0.7543) data time 0.4717 (0.4717) model time 0.0000 (0.0000) loss 5.2260 (5.2260) grad_norm 2.2824 (2.2824) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][10/625] eta 0:03:08 lr 0.001337 wd 0.0500 time 0.2623 (0.3071) data time 0.0009 (0.0439) model time 0.0000 (0.0000) loss 5.9317 (5.4184) grad_norm 1.9994 (1.8173) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][20/625] eta 0:02:53 lr 0.001337 wd 0.0500 time 0.2618 (0.2870) data time 0.0009 (0.0235) model time 0.0000 (0.0000) loss 6.2742 (5.5253) grad_norm 2.1139 (1.9521) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][30/625] eta 0:02:46 lr 0.001336 wd 0.0500 time 0.2655 (0.2796) data time 0.0010 (0.0163) model time 0.0000 (0.0000) loss 6.8518 (5.7309) grad_norm 1.1325 (1.7762) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][40/625] eta 0:02:41 lr 0.001336 wd 0.0500 time 0.2634 (0.2759) data time 0.0012 (0.0126) model time 0.0000 (0.0000) loss 5.9463 (5.8161) grad_norm 1.9718 (1.8762) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][50/625] eta 0:02:37 lr 0.001336 wd 0.0500 time 0.2628 (0.2735) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 7.2062 (5.9241) grad_norm 1.9600 (2.0017) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][60/625] eta 0:02:33 lr 0.001336 wd 0.0500 time 0.2672 (0.2720) data time 0.0010 (0.0088) model time 0.2662 (0.2635) loss 7.1023 (5.8943) grad_norm 2.1945 (1.9826) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][70/625] eta 0:02:30 lr 0.001336 wd 0.0500 time 0.2630 (0.2709) data time 0.0010 (0.0077) model time 0.2620 (0.2632) loss 5.6228 (5.8755) grad_norm 1.3157 (1.9695) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][80/625] eta 0:02:27 lr 0.001336 wd 0.0500 time 0.2622 (0.2703) data time 0.0012 (0.0069) model time 0.2610 (0.2638) loss 6.6542 (5.8983) grad_norm 2.0321 (1.9493) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][90/625] eta 0:02:24 lr 0.001335 wd 0.0500 time 0.2590 (0.2697) data time 0.0011 (0.0063) model time 0.2580 (0.2636) loss 6.2218 (5.8487) grad_norm 2.8679 (1.9698) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][100/625] eta 0:02:21 lr 0.001335 wd 0.0500 time 0.2741 (0.2692) data time 0.0007 (0.0058) model time 0.2734 (0.2637) loss 5.2410 (5.8620) grad_norm 1.3008 (1.9355) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][110/625] eta 0:02:18 lr 0.001335 wd 0.0500 time 0.2598 (0.2687) data time 0.0008 (0.0054) model time 0.2590 (0.2635) loss 6.7679 (5.8681) grad_norm 1.4720 (1.9085) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][120/625] eta 0:02:15 lr 0.001335 wd 0.0500 time 0.2633 (0.2684) data time 0.0009 (0.0050) model time 0.2624 (0.2636) loss 5.7470 (5.8824) grad_norm 5.0255 (1.9106) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][130/625] eta 0:02:12 lr 0.001335 wd 0.0500 time 0.2625 (0.2680) data time 0.0010 (0.0047) model time 0.2615 (0.2634) loss 5.3102 (5.9318) grad_norm 3.7242 (1.9794) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][140/625] eta 0:02:09 lr 0.001335 wd 0.0500 time 0.2604 (0.2677) data time 0.0010 (0.0044) model time 0.2594 (0.2634) loss 6.5389 (5.9312) grad_norm 1.6701 (1.9669) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][150/625] eta 0:02:07 lr 0.001334 wd 0.0500 time 0.2636 (0.2675) data time 0.0009 (0.0042) model time 0.2627 (0.2633) loss 6.3490 (5.9314) grad_norm 1.4568 (1.9529) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][160/625] eta 0:02:04 lr 0.001334 wd 0.0500 time 0.2631 (0.2673) data time 0.0010 (0.0040) model time 0.2621 (0.2633) loss 6.8998 (5.9592) grad_norm 2.9585 (1.9566) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][170/625] eta 0:02:01 lr 0.001334 wd 0.0500 time 0.2598 (0.2672) data time 0.0011 (0.0039) model time 0.2588 (0.2634) loss 4.9089 (5.9311) grad_norm 3.0023 (1.9856) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][180/625] eta 0:01:58 lr 0.001334 wd 0.0500 time 0.2649 (0.2671) data time 0.0009 (0.0037) model time 0.2639 (0.2635) loss 5.5741 (5.9078) grad_norm 1.4235 (1.9734) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][190/625] eta 0:01:56 lr 0.001334 wd 0.0500 time 0.2660 (0.2669) data time 0.0008 (0.0036) model time 0.2652 (0.2634) loss 5.8161 (5.8904) grad_norm 2.0069 (1.9762) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][200/625] eta 0:01:53 lr 0.001334 wd 0.0500 time 0.2633 (0.2667) data time 0.0007 (0.0034) model time 0.2626 (0.2633) loss 6.7866 (5.9033) grad_norm 1.6933 (1.9820) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][210/625] eta 0:01:50 lr 0.001333 wd 0.0500 time 0.2620 (0.2666) data time 0.0007 (0.0033) model time 0.2613 (0.2633) loss 6.7428 (5.9047) grad_norm 1.8198 (1.9565) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][220/625] eta 0:01:47 lr 0.001333 wd 0.0500 time 0.2689 (0.2666) data time 0.0007 (0.0032) model time 0.2682 (0.2634) loss 6.7827 (5.9129) grad_norm 1.3452 (1.9373) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][230/625] eta 0:01:45 lr 0.001333 wd 0.0500 time 0.2629 (0.2664) data time 0.0010 (0.0031) model time 0.2619 (0.2633) loss 5.7272 (5.9285) grad_norm 2.5866 (1.9348) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][240/625] eta 0:01:42 lr 0.001333 wd 0.0500 time 0.2629 (0.2663) data time 0.0012 (0.0030) model time 0.2616 (0.2633) loss 6.2824 (5.9223) grad_norm 2.9286 (1.9488) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][250/625] eta 0:01:39 lr 0.001333 wd 0.0500 time 0.2623 (0.2662) data time 0.0009 (0.0030) model time 0.2614 (0.2632) loss 6.6282 (5.9287) grad_norm 1.1206 (1.9374) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][260/625] eta 0:01:37 lr 0.001333 wd 0.0500 time 0.2709 (0.2661) data time 0.0010 (0.0029) model time 0.2698 (0.2632) loss 7.0426 (5.9451) grad_norm 1.9887 (1.9384) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][270/625] eta 0:01:34 lr 0.001332 wd 0.0500 time 0.2644 (0.2661) data time 0.0010 (0.0028) model time 0.2634 (0.2633) loss 4.4968 (5.9479) grad_norm 1.7782 (1.9389) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][280/625] eta 0:01:31 lr 0.001332 wd 0.0500 time 0.2846 (0.2661) data time 0.0012 (0.0028) model time 0.2834 (0.2634) loss 5.3992 (5.9372) grad_norm 1.3506 (1.9334) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][290/625] eta 0:01:29 lr 0.001332 wd 0.0500 time 0.2853 (0.2661) data time 0.0010 (0.0027) model time 0.2843 (0.2634) loss 6.1769 (5.9493) grad_norm 2.0096 (1.9246) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][300/625] eta 0:01:26 lr 0.001332 wd 0.0500 time 0.2665 (0.2661) data time 0.0013 (0.0027) model time 0.2652 (0.2634) loss 6.8332 (5.9581) grad_norm 2.2800 (1.9235) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][310/625] eta 0:01:24 lr 0.001332 wd 0.0500 time 0.2627 (0.2667) data time 0.0008 (0.0026) model time 0.2620 (0.2643) loss 6.7587 (5.9554) grad_norm 2.2400 (1.9229) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][320/625] eta 0:01:21 lr 0.001332 wd 0.0500 time 0.2636 (0.2667) data time 0.0011 (0.0026) model time 0.2625 (0.2643) loss 5.9500 (5.9613) grad_norm 1.1863 (1.9120) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][330/625] eta 0:01:18 lr 0.001331 wd 0.0500 time 0.2629 (0.2665) data time 0.0010 (0.0025) model time 0.2619 (0.2642) loss 6.6884 (5.9587) grad_norm 1.1421 (1.9132) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][340/625] eta 0:01:15 lr 0.001331 wd 0.0500 time 0.2598 (0.2664) data time 0.0010 (0.0025) model time 0.2588 (0.2641) loss 5.8616 (5.9604) grad_norm 1.8483 (1.9200) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][350/625] eta 0:01:13 lr 0.001331 wd 0.0500 time 0.2668 (0.2664) data time 0.0008 (0.0025) model time 0.2661 (0.2641) loss 6.3902 (5.9672) grad_norm 2.7436 (1.9245) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][360/625] eta 0:01:10 lr 0.001331 wd 0.0500 time 0.2640 (0.2664) data time 0.0007 (0.0024) model time 0.2633 (0.2642) loss 4.8008 (5.9667) grad_norm 2.8398 (1.9425) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][370/625] eta 0:01:07 lr 0.001331 wd 0.0500 time 0.2649 (0.2664) data time 0.0011 (0.0024) model time 0.2638 (0.2642) loss 5.6646 (5.9610) grad_norm 2.1189 (1.9460) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][380/625] eta 0:01:05 lr 0.001331 wd 0.0500 time 0.2585 (0.2663) data time 0.0012 (0.0024) model time 0.2573 (0.2641) loss 6.3078 (5.9516) grad_norm 1.4573 (1.9354) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][390/625] eta 0:01:02 lr 0.001330 wd 0.0500 time 0.2643 (0.2662) data time 0.0008 (0.0023) model time 0.2635 (0.2640) loss 5.7275 (5.9536) grad_norm 1.3916 (1.9292) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][400/625] eta 0:00:59 lr 0.001330 wd 0.0500 time 0.2700 (0.2662) data time 0.0009 (0.0023) model time 0.2691 (0.2640) loss 6.6660 (5.9610) grad_norm 1.8736 (1.9280) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][410/625] eta 0:00:57 lr 0.001330 wd 0.0500 time 0.2623 (0.2661) data time 0.0011 (0.0023) model time 0.2613 (0.2639) loss 6.3201 (5.9698) grad_norm 1.3091 (1.9206) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][420/625] eta 0:00:54 lr 0.001330 wd 0.0500 time 0.2604 (0.2661) data time 0.0010 (0.0022) model time 0.2594 (0.2639) loss 6.5342 (5.9653) grad_norm 1.4753 (1.9123) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][430/625] eta 0:00:51 lr 0.001330 wd 0.0500 time 0.2642 (0.2660) data time 0.0009 (0.0022) model time 0.2633 (0.2639) loss 6.7006 (5.9723) grad_norm 1.3530 (1.9163) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][440/625] eta 0:00:49 lr 0.001330 wd 0.0500 time 0.2650 (0.2660) data time 0.0009 (0.0022) model time 0.2641 (0.2639) loss 5.2927 (5.9678) grad_norm 2.0079 (1.9302) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][450/625] eta 0:00:46 lr 0.001329 wd 0.0500 time 0.2615 (0.2660) data time 0.0009 (0.0022) model time 0.2606 (0.2639) loss 4.5186 (5.9561) grad_norm 1.2402 (1.9313) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][460/625] eta 0:00:43 lr 0.001329 wd 0.0500 time 0.2607 (0.2660) data time 0.0009 (0.0021) model time 0.2598 (0.2639) loss 5.7032 (5.9504) grad_norm 1.7440 (1.9248) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][470/625] eta 0:00:41 lr 0.001329 wd 0.0500 time 0.2567 (0.2659) data time 0.0009 (0.0021) model time 0.2558 (0.2639) loss 5.2601 (5.9481) grad_norm 2.8318 (1.9271) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][480/625] eta 0:00:38 lr 0.001329 wd 0.0500 time 0.2639 (0.2659) data time 0.0007 (0.0021) model time 0.2632 (0.2639) loss 6.5658 (5.9410) grad_norm 1.1415 (1.9263) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][490/625] eta 0:00:35 lr 0.001329 wd 0.0500 time 0.2680 (0.2659) data time 0.0011 (0.0021) model time 0.2669 (0.2639) loss 4.4187 (5.9394) grad_norm 1.2287 (1.9201) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][500/625] eta 0:00:33 lr 0.001329 wd 0.0500 time 0.2614 (0.2659) data time 0.0012 (0.0021) model time 0.2601 (0.2639) loss 6.6406 (5.9421) grad_norm 1.7279 (1.9153) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][510/625] eta 0:00:30 lr 0.001328 wd 0.0500 time 0.2627 (0.2659) data time 0.0010 (0.0020) model time 0.2617 (0.2639) loss 5.1383 (5.9399) grad_norm 1.8169 (1.9112) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][520/625] eta 0:00:27 lr 0.001328 wd 0.0500 time 0.2629 (0.2658) data time 0.0008 (0.0020) model time 0.2620 (0.2639) loss 6.0542 (5.9387) grad_norm 1.7000 (1.9123) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][530/625] eta 0:00:25 lr 0.001328 wd 0.0500 time 0.2626 (0.2657) data time 0.0010 (0.0020) model time 0.2616 (0.2638) loss 7.3465 (5.9448) grad_norm 2.0187 (1.9099) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][540/625] eta 0:00:22 lr 0.001328 wd 0.0500 time 0.2619 (0.2657) data time 0.0008 (0.0020) model time 0.2611 (0.2637) loss 6.1121 (5.9459) grad_norm 1.4714 (1.9090) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][550/625] eta 0:00:19 lr 0.001328 wd 0.0500 time 0.2607 (0.2656) data time 0.0011 (0.0020) model time 0.2596 (0.2637) loss 6.1668 (5.9508) grad_norm 1.9315 (1.9058) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][560/625] eta 0:00:17 lr 0.001328 wd 0.0500 time 0.2643 (0.2656) data time 0.0010 (0.0020) model time 0.2633 (0.2637) loss 6.4440 (5.9568) grad_norm 2.4901 (1.9023) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][570/625] eta 0:00:14 lr 0.001327 wd 0.0500 time 0.2594 (0.2658) data time 0.0009 (0.0020) model time 0.2585 (0.2639) loss 6.2356 (5.9576) grad_norm 2.1708 (1.9031) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][580/625] eta 0:00:11 lr 0.001327 wd 0.0500 time 0.2590 (0.2661) data time 0.0011 (0.0019) model time 0.2579 (0.2642) loss 5.2117 (5.9603) grad_norm 2.1165 (1.9022) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][590/625] eta 0:00:09 lr 0.001327 wd 0.0500 time 0.2646 (0.2660) data time 0.0008 (0.0019) model time 0.2639 (0.2642) loss 5.6632 (5.9586) grad_norm 2.1526 (1.9026) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][600/625] eta 0:00:06 lr 0.001327 wd 0.0500 time 0.2623 (0.2660) data time 0.0008 (0.0019) model time 0.2615 (0.2641) loss 6.3593 (5.9593) grad_norm 2.4390 (1.9084) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][610/625] eta 0:00:03 lr 0.001327 wd 0.0500 time 0.2613 (0.2659) data time 0.0005 (0.0019) model time 0.2608 (0.2641) loss 6.1164 (5.9574) grad_norm 1.4943 (1.9077) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][620/625] eta 0:00:01 lr 0.001327 wd 0.0500 time 0.2614 (0.2659) data time 0.0005 (0.0019) model time 0.2609 (0.2640) loss 6.8579 (5.9616) grad_norm 1.5302 (1.9027) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 130 training takes 0:02:46 [2024-07-31 10:28:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:28:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.631 (0.631) Loss 0.6426 (0.6426) Acc@1 87.207 (87.207) Acc@5 97.852 (97.852) Mem 9655MB [2024-07-31 10:28:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.060 (0.111) Loss 1.0322 (0.7988) Acc@1 77.051 (83.545) Acc@5 94.971 (96.973) Mem 9655MB [2024-07-31 10:28:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.1758 (0.9479) Acc@1 73.193 (79.771) Acc@5 92.871 (95.194) Mem 9655MB [2024-07-31 10:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.561 Acc@5 95.148 [2024-07-31 10:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-31 10:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.826 (0.826) Loss 0.5830 (0.5830) Acc@1 88.574 (88.574) Acc@5 98.633 (98.633) Mem 9655MB [2024-07-31 10:28:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.138) Loss 0.9351 (0.7242) Acc@1 79.102 (84.872) Acc@5 95.117 (97.283) Mem 9655MB [2024-07-31 10:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0801 (0.8602) Acc@1 73.730 (81.203) Acc@5 93.896 (95.724) Mem 9655MB [2024-07-31 10:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.858 Acc@5 95.691 [2024-07-31 10:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-31 10:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][0/625] eta 0:12:07 lr 0.001326 wd 0.0500 time 1.1638 (1.1638) data time 0.6663 (0.6663) model time 0.0000 (0.0000) loss 5.7284 (5.7284) grad_norm 1.6939 (1.6939) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][10/625] eta 0:03:33 lr 0.001326 wd 0.0500 time 0.2644 (0.3469) data time 0.0010 (0.0616) model time 0.0000 (0.0000) loss 6.7816 (6.0865) grad_norm 2.4372 (1.8155) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][20/625] eta 0:03:05 lr 0.001326 wd 0.0500 time 0.2608 (0.3074) data time 0.0014 (0.0328) model time 0.0000 (0.0000) loss 6.1067 (5.9333) grad_norm 1.1885 (1.6475) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][30/625] eta 0:02:54 lr 0.001326 wd 0.0500 time 0.2642 (0.2933) data time 0.0009 (0.0226) model time 0.0000 (0.0000) loss 6.7995 (5.9341) grad_norm 1.3772 (1.7344) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][40/625] eta 0:02:47 lr 0.001326 wd 0.0500 time 0.2649 (0.2856) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 5.5754 (6.0602) grad_norm 2.7752 (1.9390) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][50/625] eta 0:02:41 lr 0.001326 wd 0.0500 time 0.2598 (0.2813) data time 0.0010 (0.0142) model time 0.0000 (0.0000) loss 6.2953 (6.0086) grad_norm 1.4607 (1.9891) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][60/625] eta 0:02:37 lr 0.001325 wd 0.0500 time 0.2620 (0.2780) data time 0.0010 (0.0121) model time 0.2610 (0.2604) loss 6.5646 (6.0248) grad_norm 2.0507 (2.0013) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][70/625] eta 0:02:33 lr 0.001325 wd 0.0500 time 0.2704 (0.2760) data time 0.0008 (0.0105) model time 0.2696 (0.2613) loss 6.4182 (6.0549) grad_norm 1.7278 (1.9703) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][80/625] eta 0:02:29 lr 0.001325 wd 0.0500 time 0.2620 (0.2746) data time 0.0012 (0.0094) model time 0.2608 (0.2621) loss 4.9773 (6.0297) grad_norm 2.4671 (1.9900) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][90/625] eta 0:02:26 lr 0.001325 wd 0.0500 time 0.2573 (0.2732) data time 0.0010 (0.0085) model time 0.2563 (0.2618) loss 6.3859 (6.0758) grad_norm 2.9156 (2.0195) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][100/625] eta 0:02:22 lr 0.001325 wd 0.0500 time 0.2616 (0.2722) data time 0.0013 (0.0077) model time 0.2603 (0.2618) loss 6.2689 (6.0405) grad_norm 1.3632 (1.9666) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][110/625] eta 0:02:19 lr 0.001325 wd 0.0500 time 0.2645 (0.2714) data time 0.0007 (0.0071) model time 0.2637 (0.2618) loss 5.0171 (6.0326) grad_norm 2.0150 (1.9745) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][120/625] eta 0:02:16 lr 0.001324 wd 0.0500 time 0.2664 (0.2707) data time 0.0011 (0.0067) model time 0.2653 (0.2618) loss 4.9414 (6.0307) grad_norm 1.7921 (1.9520) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][130/625] eta 0:02:13 lr 0.001324 wd 0.0500 time 0.2656 (0.2700) data time 0.0007 (0.0062) model time 0.2649 (0.2616) loss 5.2332 (6.0140) grad_norm 1.0198 (1.9169) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][140/625] eta 0:02:10 lr 0.001324 wd 0.0500 time 0.2601 (0.2694) data time 0.0010 (0.0059) model time 0.2591 (0.2615) loss 6.1699 (6.0097) grad_norm 1.3436 (1.8984) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][150/625] eta 0:02:07 lr 0.001324 wd 0.0500 time 0.2623 (0.2691) data time 0.0011 (0.0056) model time 0.2612 (0.2616) loss 6.6974 (5.9954) grad_norm 1.3708 (1.8763) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][160/625] eta 0:02:05 lr 0.001324 wd 0.0500 time 0.2639 (0.2688) data time 0.0010 (0.0053) model time 0.2630 (0.2619) loss 6.5207 (5.9930) grad_norm 1.7347 (1.8773) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][170/625] eta 0:02:02 lr 0.001324 wd 0.0500 time 0.2609 (0.2685) data time 0.0011 (0.0050) model time 0.2598 (0.2618) loss 6.4363 (5.9968) grad_norm 2.1505 (1.8801) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][180/625] eta 0:01:59 lr 0.001323 wd 0.0500 time 0.2650 (0.2681) data time 0.0009 (0.0048) model time 0.2641 (0.2618) loss 7.2062 (6.0105) grad_norm 1.9563 (1.8904) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][190/625] eta 0:01:56 lr 0.001323 wd 0.0500 time 0.2664 (0.2679) data time 0.0010 (0.0046) model time 0.2654 (0.2619) loss 6.5091 (5.9954) grad_norm 1.3312 (1.8708) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][200/625] eta 0:01:53 lr 0.001323 wd 0.0500 time 0.2629 (0.2679) data time 0.0010 (0.0045) model time 0.2619 (0.2622) loss 5.9853 (5.9883) grad_norm 2.4567 (1.8658) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][210/625] eta 0:01:51 lr 0.001323 wd 0.0500 time 0.2609 (0.2676) data time 0.0009 (0.0043) model time 0.2599 (0.2621) loss 5.3730 (5.9754) grad_norm 1.2709 (1.8539) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][220/625] eta 0:01:48 lr 0.001323 wd 0.0500 time 0.2681 (0.2676) data time 0.0007 (0.0042) model time 0.2673 (0.2623) loss 5.3626 (5.9850) grad_norm 1.6118 (1.8549) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][230/625] eta 0:01:45 lr 0.001323 wd 0.0500 time 0.2572 (0.2673) data time 0.0012 (0.0040) model time 0.2560 (0.2622) loss 7.2079 (5.9992) grad_norm 2.3987 (1.8527) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][240/625] eta 0:01:42 lr 0.001322 wd 0.0500 time 0.2649 (0.2671) data time 0.0010 (0.0039) model time 0.2639 (0.2622) loss 5.6432 (6.0025) grad_norm 1.9284 (1.8383) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][250/625] eta 0:01:40 lr 0.001322 wd 0.0500 time 0.2642 (0.2670) data time 0.0007 (0.0038) model time 0.2634 (0.2622) loss 4.1452 (6.0064) grad_norm 2.0676 (1.8302) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][260/625] eta 0:01:37 lr 0.001322 wd 0.0500 time 0.2630 (0.2668) data time 0.0010 (0.0037) model time 0.2620 (0.2622) loss 6.3854 (5.9984) grad_norm 1.7775 (1.8148) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-07-31 10:29:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:29:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:29:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:31:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:32:04 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 131) [2024-07-31 10:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][270/625] eta 0:08:30 lr 0.001322 wd 0.0500 time 0.2583 (1.4372) data time 0.0012 (0.1302) model time 0.2570 (1.3070) loss 6.4255 (6.7587) grad_norm 2.9322 (2.2939) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 10:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][280/625] eta 0:04:16 lr 0.001322 wd 0.0500 time 0.2574 (0.7443) data time 0.0009 (0.0542) model time 0.2565 (0.6902) loss 5.7033 (6.3184) grad_norm 1.6378 (1.8896) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 10:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][290/625] eta 0:03:09 lr 0.001322 wd 0.0500 time 0.2586 (0.5648) data time 0.0008 (0.0345) model time 0.2578 (0.5304) loss 6.7472 (6.3222) grad_norm 1.7310 (1.8101) loss_scale 4096.0000 (4096.0000) mem 9656MB [2024-07-31 10:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][300/625] eta 0:02:36 lr 0.001321 wd 0.0500 time 0.2563 (0.4822) data time 0.0010 (0.0254) model time 0.2552 (0.4568) loss 5.8805 (6.2958) grad_norm 1.3875 (inf) loss_scale 2048.0000 (3708.5405) mem 9656MB [2024-07-31 10:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][310/625] eta 0:02:17 lr 0.001321 wd 0.0500 time 0.2583 (0.4356) data time 0.0009 (0.0203) model time 0.2574 (0.4154) loss 6.0444 (6.2358) grad_norm 2.5752 (inf) loss_scale 2048.0000 (3355.2340) mem 9656MB [2024-07-31 10:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][320/625] eta 0:02:03 lr 0.001321 wd 0.0500 time 0.2614 (0.4053) data time 0.0011 (0.0169) model time 0.2603 (0.3884) loss 6.2307 (6.1749) grad_norm 1.5378 (inf) loss_scale 2048.0000 (3125.8947) mem 9656MB [2024-07-31 10:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][330/625] eta 0:01:53 lr 0.001321 wd 0.0500 time 0.2630 (0.3838) data time 0.0011 (0.0145) model time 0.2619 (0.3693) loss 6.6418 (6.1551) grad_norm 2.3134 (inf) loss_scale 2048.0000 (2965.0149) mem 9656MB [2024-07-31 10:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][340/625] eta 0:01:44 lr 0.001321 wd 0.0500 time 0.2572 (0.3679) data time 0.0015 (0.0128) model time 0.2558 (0.3552) loss 5.8490 (6.1119) grad_norm 3.4465 (inf) loss_scale 2048.0000 (2845.9221) mem 9656MB [2024-07-31 10:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:32:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:35:10 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 131) [2024-07-31 10:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][350/625] eta 0:08:39 lr 0.001321 wd 0.0500 time 0.2481 (1.8893) data time 0.0012 (0.1027) model time 0.2469 (1.7866) loss 5.9976 (6.3890) grad_norm 1.2837 (2.1101) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][360/625] eta 0:04:06 lr 0.001320 wd 0.0500 time 0.2507 (0.9283) data time 0.0010 (0.0429) model time 0.2497 (0.8854) loss 5.5987 (6.2990) grad_norm 2.6491 (1.9993) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][370/625] eta 0:02:53 lr 0.001320 wd 0.0500 time 0.2455 (0.6809) data time 0.0008 (0.0274) model time 0.2448 (0.6535) loss 6.8159 (6.3502) grad_norm 1.8839 (2.1264) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][380/625] eta 0:02:18 lr 0.001320 wd 0.0500 time 0.2543 (0.5660) data time 0.0009 (0.0203) model time 0.2534 (0.5457) loss 4.8793 (6.2821) grad_norm 1.9106 (2.2529) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][390/625] eta 0:01:57 lr 0.001320 wd 0.0500 time 0.3216 (0.4997) data time 0.0011 (0.0162) model time 0.3206 (0.4834) loss 6.6180 (6.1929) grad_norm 1.6347 (2.1570) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][400/625] eta 0:01:43 lr 0.001320 wd 0.0500 time 0.2527 (0.4593) data time 0.0009 (0.0136) model time 0.2519 (0.4457) loss 5.8259 (6.1731) grad_norm 1.6850 (2.0806) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][410/625] eta 0:01:32 lr 0.001320 wd 0.0500 time 0.2695 (0.4284) data time 0.0008 (0.0117) model time 0.2687 (0.4167) loss 6.4741 (6.1597) grad_norm 1.5035 (1.9680) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][420/625] eta 0:01:23 lr 0.001319 wd 0.0500 time 0.2745 (0.4080) data time 0.0009 (0.0105) model time 0.2736 (0.3976) loss 6.1483 (6.1233) grad_norm 1.1953 (1.9764) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][430/625] eta 0:01:16 lr 0.001319 wd 0.0500 time 0.3314 (0.3919) data time 0.0009 (0.0094) model time 0.3305 (0.3825) loss 6.1850 (6.1134) grad_norm 3.2448 (2.0205) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:36:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][440/625] eta 0:01:09 lr 0.001319 wd 0.0500 time 0.2462 (0.3770) data time 0.0009 (0.0085) model time 0.2453 (0.3684) loss 6.0550 (6.1265) grad_norm 2.5016 (2.0235) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:36:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][450/625] eta 0:01:04 lr 0.001319 wd 0.0500 time 0.2427 (0.3666) data time 0.0007 (0.0078) model time 0.2421 (0.3587) loss 5.8778 (6.1464) grad_norm 1.3163 (1.9887) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:36:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:36:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:52:53 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:53:07 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:53:08 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:53:08 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:53:08 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 131) [2024-07-31 10:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:54:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:54:53 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 131) [2024-07-31 10:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:55:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][460/625] eta 0:06:19 lr 0.001319 wd 0.0500 time 0.2555 (2.2987) data time 0.0006 (0.2480) model time 0.2549 (2.0506) loss 5.4497 (6.1479) grad_norm 2.0872 (1.6098) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][470/625] eta 0:01:52 lr 0.001319 wd 0.0500 time 0.2538 (0.7255) data time 0.0009 (0.0580) model time 0.2528 (0.6675) loss 6.5482 (6.3401) grad_norm 1.1845 (1.6252) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][480/625] eta 0:01:15 lr 0.001318 wd 0.0500 time 0.2645 (0.5215) data time 0.0009 (0.0332) model time 0.2636 (0.4883) loss 7.0941 (6.3674) grad_norm 1.1259 (1.6054) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][490/625] eta 0:00:59 lr 0.001318 wd 0.0500 time 0.2572 (0.4400) data time 0.0007 (0.0235) model time 0.2564 (0.4165) loss 7.0993 (6.3312) grad_norm 1.8184 (1.7120) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][500/625] eta 0:00:49 lr 0.001318 wd 0.0500 time 0.2561 (0.3964) data time 0.0009 (0.0183) model time 0.2552 (0.3781) loss 6.5451 (6.2727) grad_norm 2.0025 (1.6923) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][510/625] eta 0:00:42 lr 0.001318 wd 0.0500 time 0.2528 (0.3697) data time 0.0010 (0.0150) model time 0.2518 (0.3547) loss 6.4634 (6.2050) grad_norm 1.3231 (1.6903) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][520/625] eta 0:00:36 lr 0.001318 wd 0.0500 time 0.2535 (0.3513) data time 0.0009 (0.0128) model time 0.2526 (0.3385) loss 5.8974 (6.1782) grad_norm 2.3662 (1.7086) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][530/625] eta 0:00:32 lr 0.001318 wd 0.0500 time 0.2538 (0.3381) data time 0.0009 (0.0112) model time 0.2529 (0.3269) loss 7.4169 (6.1397) grad_norm 1.6897 (1.7389) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][540/625] eta 0:00:27 lr 0.001317 wd 0.0500 time 0.2574 (0.3280) data time 0.0008 (0.0100) model time 0.2566 (0.3180) loss 4.6930 (6.0954) grad_norm 1.9003 (1.7907) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][550/625] eta 0:00:23 lr 0.001317 wd 0.0500 time 0.2569 (0.3199) data time 0.0007 (0.0090) model time 0.2561 (0.3109) loss 6.7615 (6.0952) grad_norm 1.8206 (1.7829) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][560/625] eta 0:00:20 lr 0.001317 wd 0.0500 time 0.2530 (0.3137) data time 0.0008 (0.0083) model time 0.2522 (0.3054) loss 6.6882 (6.1355) grad_norm 2.1549 (1.7662) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][570/625] eta 0:00:16 lr 0.001317 wd 0.0500 time 0.2523 (0.3084) data time 0.0007 (0.0076) model time 0.2516 (0.3008) loss 5.3840 (6.1171) grad_norm 1.4778 (1.7428) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][580/625] eta 0:00:13 lr 0.001317 wd 0.0500 time 0.2577 (0.3041) data time 0.0009 (0.0071) model time 0.2568 (0.2970) loss 6.3720 (6.1315) grad_norm 1.7392 (1.7243) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][590/625] eta 0:00:10 lr 0.001317 wd 0.0500 time 0.2485 (0.3003) data time 0.0009 (0.0066) model time 0.2477 (0.2937) loss 6.6958 (6.1475) grad_norm 1.7821 (1.7151) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][600/625] eta 0:00:07 lr 0.001316 wd 0.0500 time 0.2530 (0.2971) data time 0.0011 (0.0062) model time 0.2519 (0.2909) loss 7.0632 (6.1348) grad_norm 2.4957 (1.7548) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][610/625] eta 0:00:04 lr 0.001316 wd 0.0500 time 0.2511 (0.2944) data time 0.0004 (0.0059) model time 0.2507 (0.2885) loss 5.8168 (6.1171) grad_norm 2.0226 (1.7926) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][620/625] eta 0:00:01 lr 0.001316 wd 0.0500 time 0.2528 (0.2918) data time 0.0004 (0.0056) model time 0.2524 (0.2861) loss 5.0166 (6.1207) grad_norm 1.7875 (1.8039) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 131 training takes 0:00:48 [2024-07-31 10:56:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:56:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.393 (0.393) Loss 0.6660 (0.6660) Acc@1 87.451 (87.451) Acc@5 98.193 (98.193) Mem 9656MB [2024-07-31 10:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.087) Loss 1.0527 (0.8185) Acc@1 77.197 (83.376) Acc@5 94.971 (96.893) Mem 9656MB [2024-07-31 10:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.2021 (0.9670) Acc@1 73.242 (79.648) Acc@5 92.139 (95.180) Mem 9656MB [2024-07-31 10:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.303 Acc@5 95.114 [2024-07-31 10:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-31 10:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.703 (0.703) Loss 0.5835 (0.5835) Acc@1 88.623 (88.623) Acc@5 98.633 (98.633) Mem 9656MB [2024-07-31 10:56:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.121) Loss 0.9355 (0.7250) Acc@1 79.150 (84.899) Acc@5 95.166 (97.283) Mem 9656MB [2024-07-31 10:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.089) Loss 1.0791 (0.8607) Acc@1 73.730 (81.224) Acc@5 93.848 (95.738) Mem 9656MB [2024-07-31 10:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.880 Acc@5 95.707 [2024-07-31 10:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-31 10:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.88% [2024-07-31 10:56:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-07-31 10:56:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-07-31 10:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][0/625] eta 0:07:56 lr 0.001316 wd 0.0500 time 0.7618 (0.7618) data time 0.4532 (0.4532) model time 0.0000 (0.0000) loss 6.5991 (6.5991) grad_norm 3.2256 (3.2256) loss_scale 2048.0000 (2048.0000) mem 9652MB [2024-07-31 10:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][10/625] eta 0:03:05 lr 0.001316 wd 0.0500 time 0.2525 (0.3012) data time 0.0008 (0.0421) model time 0.0000 (0.0000) loss 5.3480 (5.8470) grad_norm 1.9713 (2.4043) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][20/625] eta 0:02:48 lr 0.001316 wd 0.0500 time 0.2556 (0.2790) data time 0.0006 (0.0226) model time 0.0000 (0.0000) loss 6.7616 (5.9495) grad_norm 1.9314 (2.1046) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][30/625] eta 0:02:41 lr 0.001315 wd 0.0500 time 0.2535 (0.2708) data time 0.0011 (0.0156) model time 0.0000 (0.0000) loss 5.1250 (5.8676) grad_norm 1.4561 (2.1120) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][40/625] eta 0:02:36 lr 0.001315 wd 0.0500 time 0.2611 (0.2671) data time 0.0010 (0.0121) model time 0.0000 (0.0000) loss 6.9983 (5.8399) grad_norm 3.0053 (2.1170) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][50/625] eta 0:02:32 lr 0.001315 wd 0.0500 time 0.2522 (0.2645) data time 0.0013 (0.0099) model time 0.0000 (0.0000) loss 5.7939 (5.8473) grad_norm 1.0630 (2.0451) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][60/625] eta 0:02:28 lr 0.001315 wd 0.0500 time 0.2588 (0.2630) data time 0.0011 (0.0085) model time 0.2577 (0.2541) loss 6.9162 (5.9077) grad_norm 1.6840 (2.0345) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][70/625] eta 0:02:25 lr 0.001315 wd 0.0500 time 0.2533 (0.2618) data time 0.0007 (0.0074) model time 0.2526 (0.2540) loss 5.8095 (5.8813) grad_norm 1.4505 (2.0139) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][80/625] eta 0:02:22 lr 0.001315 wd 0.0500 time 0.2565 (0.2611) data time 0.0011 (0.0066) model time 0.2554 (0.2542) loss 5.0964 (5.8841) grad_norm 2.1253 (2.0467) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][90/625] eta 0:02:19 lr 0.001314 wd 0.0500 time 0.2526 (0.2605) data time 0.0009 (0.0060) model time 0.2517 (0.2544) loss 6.5188 (5.8721) grad_norm 1.6767 (2.0252) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][100/625] eta 0:02:16 lr 0.001314 wd 0.0500 time 0.2523 (0.2601) data time 0.0008 (0.0055) model time 0.2516 (0.2546) loss 6.4009 (5.8584) grad_norm 1.4846 (1.9730) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][110/625] eta 0:02:13 lr 0.001314 wd 0.0500 time 0.2532 (0.2598) data time 0.0013 (0.0051) model time 0.2520 (0.2547) loss 5.2688 (5.8846) grad_norm 1.4354 (1.9589) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][120/625] eta 0:02:10 lr 0.001314 wd 0.0500 time 0.2493 (0.2593) data time 0.0007 (0.0048) model time 0.2486 (0.2544) loss 6.8470 (5.9129) grad_norm 1.6889 (1.9793) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][130/625] eta 0:02:08 lr 0.001314 wd 0.0500 time 0.2577 (0.2589) data time 0.0020 (0.0045) model time 0.2557 (0.2543) loss 4.9230 (5.8741) grad_norm 1.6940 (1.9675) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][140/625] eta 0:02:05 lr 0.001314 wd 0.0500 time 0.2511 (0.2588) data time 0.0007 (0.0042) model time 0.2503 (0.2545) loss 4.5490 (5.8673) grad_norm 2.7187 (1.9620) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][150/625] eta 0:02:02 lr 0.001313 wd 0.0500 time 0.2544 (0.2585) data time 0.0008 (0.0040) model time 0.2536 (0.2544) loss 5.6369 (5.9048) grad_norm 2.1269 (1.9539) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][160/625] eta 0:02:00 lr 0.001313 wd 0.0500 time 0.2537 (0.2583) data time 0.0010 (0.0039) model time 0.2528 (0.2544) loss 6.2685 (5.9225) grad_norm 1.8619 (1.9328) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][170/625] eta 0:01:57 lr 0.001313 wd 0.0500 time 0.2522 (0.2581) data time 0.0010 (0.0037) model time 0.2512 (0.2543) loss 6.2033 (5.9156) grad_norm 2.0920 (1.9398) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][180/625] eta 0:01:54 lr 0.001313 wd 0.0500 time 0.2514 (0.2579) data time 0.0009 (0.0035) model time 0.2505 (0.2543) loss 5.7721 (5.9226) grad_norm 2.0370 (1.9401) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][190/625] eta 0:01:52 lr 0.001313 wd 0.0500 time 0.2572 (0.2578) data time 0.0009 (0.0034) model time 0.2563 (0.2543) loss 6.4813 (5.9325) grad_norm 2.1103 (1.9539) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][200/625] eta 0:01:49 lr 0.001313 wd 0.0500 time 0.2528 (0.2576) data time 0.0008 (0.0033) model time 0.2520 (0.2543) loss 5.3787 (5.9249) grad_norm 2.0306 (1.9627) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-07-31 10:57:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:57:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:57:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 10:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 10:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 10:59:03 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 132) [2024-07-31 10:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-31 10:59:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][210/625] eta 0:46:03 lr 0.001312 wd 0.0500 time 6.6588 (6.6588) data time 0.9557 (0.9557) model time 5.7031 (5.7031) loss 6.6736 (6.6736) grad_norm 1.3066 (1.3066) loss_scale 2048.0000 (2048.0000) mem 10976MB [2024-07-31 10:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][220/625] eta 0:05:57 lr 0.001312 wd 0.0500 time 0.2574 (0.8822) data time 0.0009 (0.0877) model time 0.2565 (0.7944) loss 4.9251 (6.5179) grad_norm 1.6078 (1.7502) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][230/625] eta 0:03:50 lr 0.001312 wd 0.0500 time 0.2532 (0.5827) data time 0.0008 (0.0464) model time 0.2523 (0.5363) loss 6.3777 (6.3209) grad_norm 1.6197 (1.7312) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][240/625] eta 0:03:03 lr 0.001312 wd 0.0500 time 0.2510 (0.4766) data time 0.0007 (0.0317) model time 0.2503 (0.4448) loss 5.2438 (6.3634) grad_norm 2.4574 (1.8132) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][250/625] eta 0:02:38 lr 0.001312 wd 0.0500 time 0.2488 (0.4221) data time 0.0010 (0.0242) model time 0.2477 (0.3978) loss 6.0855 (6.2694) grad_norm 1.7218 (1.9330) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][260/625] eta 0:02:22 lr 0.001312 wd 0.0500 time 0.2551 (0.3892) data time 0.0008 (0.0197) model time 0.2543 (0.3695) loss 6.4518 (6.2203) grad_norm 1.5646 (1.9323) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][270/625] eta 0:02:10 lr 0.001311 wd 0.0500 time 0.2547 (0.3671) data time 0.0012 (0.0166) model time 0.2535 (0.3505) loss 5.1797 (6.1268) grad_norm 1.8129 (1.9296) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][280/625] eta 0:02:01 lr 0.001311 wd 0.0500 time 0.2564 (0.3513) data time 0.0008 (0.0144) model time 0.2555 (0.3369) loss 6.1177 (6.1074) grad_norm 1.7976 (1.9022) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-07-31 10:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-31 10:59:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-07-31 10:59:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-07-31 11:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-07-31 11:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-07-31 11:31:07 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 03:47:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/config.json [2024-08-04 03:47:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_tiny_e300 [2024-08-04 03:48:05 vssd_mesa_retrain_tiny_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth.................... [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (utils.py 30): INFO resuming model: [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth' (epoch 132) [2024-08-04 03:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 03:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][290/625] eta 0:07:45 lr 0.001311 wd 0.0500 time 0.2520 (1.3901) data time 0.0008 (0.1083) model time 0.2513 (1.2818) loss 6.5444 (6.6934) grad_norm 1.3858 (1.7882) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][300/625] eta 0:03:41 lr 0.001311 wd 0.0500 time 0.2546 (0.6804) data time 0.0008 (0.0412) model time 0.2538 (0.6391) loss 6.9162 (6.4294) grad_norm 1.6253 (1.5094) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][310/625] eta 0:02:42 lr 0.001311 wd 0.0500 time 0.2718 (0.5167) data time 0.0006 (0.0257) model time 0.2712 (0.4910) loss 5.6558 (6.4286) grad_norm 0.9937 (1.5683) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][320/625] eta 0:02:15 lr 0.001311 wd 0.0500 time 0.2674 (0.4440) data time 0.0010 (0.0188) model time 0.2664 (0.4251) loss 6.0529 (6.4349) grad_norm 1.9115 (1.6451) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][330/625] eta 0:01:58 lr 0.001310 wd 0.0500 time 0.2559 (0.4027) data time 0.0009 (0.0150) model time 0.2550 (0.3878) loss 5.6269 (6.3573) grad_norm 3.1958 (1.7444) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][340/625] eta 0:01:47 lr 0.001310 wd 0.0500 time 0.2512 (0.3763) data time 0.0008 (0.0125) model time 0.2504 (0.3639) loss 6.1166 (6.2936) grad_norm 2.4123 (1.8938) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][350/625] eta 0:01:38 lr 0.001310 wd 0.0500 time 0.2503 (0.3577) data time 0.0008 (0.0107) model time 0.2495 (0.3470) loss 5.3613 (6.2575) grad_norm 2.6648 (1.9004) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][360/625] eta 0:01:31 lr 0.001310 wd 0.0500 time 0.2516 (0.3441) data time 0.0010 (0.0094) model time 0.2506 (0.3347) loss 6.0173 (6.1938) grad_norm 2.0874 (1.9033) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][370/625] eta 0:01:25 lr 0.001310 wd 0.0500 time 0.2546 (0.3337) data time 0.0007 (0.0084) model time 0.2540 (0.3253) loss 5.6153 (6.1530) grad_norm 1.3968 (1.9022) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][380/625] eta 0:01:19 lr 0.001309 wd 0.0500 time 0.2528 (0.3255) data time 0.0007 (0.0077) model time 0.2521 (0.3178) loss 6.1369 (6.1326) grad_norm 1.3604 (1.9347) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][390/625] eta 0:01:14 lr 0.001309 wd 0.0500 time 0.2584 (0.3187) data time 0.0008 (0.0070) model time 0.2576 (0.3117) loss 7.1524 (6.1843) grad_norm 1.4470 (1.9138) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:48:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][400/625] eta 0:01:10 lr 0.001309 wd 0.0500 time 0.2621 (0.3132) data time 0.0006 (0.0065) model time 0.2615 (0.3067) loss 7.1396 (6.1541) grad_norm 1.4131 (1.8709) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][410/625] eta 0:01:06 lr 0.001309 wd 0.0500 time 0.2544 (0.3086) data time 0.0008 (0.0060) model time 0.2536 (0.3025) loss 4.7281 (6.1318) grad_norm 2.0534 (1.8628) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][420/625] eta 0:01:02 lr 0.001309 wd 0.0500 time 0.2576 (0.3047) data time 0.0008 (0.0057) model time 0.2568 (0.2991) loss 5.4423 (6.1407) grad_norm 2.4226 (1.9314) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][430/625] eta 0:00:58 lr 0.001309 wd 0.0500 time 0.2557 (0.3013) data time 0.0006 (0.0053) model time 0.2551 (0.2959) loss 5.2873 (6.1202) grad_norm 1.1574 (1.9052) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][440/625] eta 0:00:55 lr 0.001308 wd 0.0500 time 0.2564 (0.2983) data time 0.0008 (0.0051) model time 0.2557 (0.2932) loss 6.7139 (6.1310) grad_norm 2.0141 (1.8961) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][450/625] eta 0:00:51 lr 0.001308 wd 0.0500 time 0.2592 (0.2957) data time 0.0009 (0.0048) model time 0.2582 (0.2909) loss 6.6371 (6.1372) grad_norm 1.0580 (1.8802) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][460/625] eta 0:00:48 lr 0.001308 wd 0.0500 time 0.2557 (0.2935) data time 0.0006 (0.0046) model time 0.2551 (0.2889) loss 5.9283 (6.1239) grad_norm 1.0521 (1.8690) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][470/625] eta 0:00:45 lr 0.001308 wd 0.0500 time 0.2575 (0.2914) data time 0.0008 (0.0044) model time 0.2567 (0.2870) loss 5.2578 (6.1133) grad_norm 2.7807 (1.8625) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][480/625] eta 0:00:42 lr 0.001308 wd 0.0500 time 0.2575 (0.2897) data time 0.0011 (0.0042) model time 0.2565 (0.2854) loss 5.1560 (6.0965) grad_norm 1.6029 (1.8579) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][490/625] eta 0:00:38 lr 0.001308 wd 0.0500 time 0.2588 (0.2880) data time 0.0011 (0.0041) model time 0.2578 (0.2839) loss 5.8261 (6.0841) grad_norm 2.8027 (1.8536) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][500/625] eta 0:00:35 lr 0.001307 wd 0.0500 time 0.2559 (0.2865) data time 0.0008 (0.0039) model time 0.2552 (0.2826) loss 4.9320 (6.0668) grad_norm 1.2906 (1.8713) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][510/625] eta 0:00:32 lr 0.001307 wd 0.0500 time 0.2546 (0.2852) data time 0.0006 (0.0038) model time 0.2540 (0.2814) loss 5.7449 (6.0804) grad_norm 2.4971 (1.8671) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][520/625] eta 0:00:29 lr 0.001307 wd 0.0500 time 0.2614 (0.2840) data time 0.0006 (0.0037) model time 0.2608 (0.2803) loss 6.9184 (6.0678) grad_norm 1.4378 (1.8630) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][530/625] eta 0:00:26 lr 0.001307 wd 0.0500 time 0.2536 (0.2828) data time 0.0013 (0.0036) model time 0.2523 (0.2792) loss 4.9189 (6.0664) grad_norm 2.1686 (1.8606) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][540/625] eta 0:00:23 lr 0.001307 wd 0.0500 time 0.2549 (0.2818) data time 0.0009 (0.0035) model time 0.2540 (0.2783) loss 4.6998 (6.0481) grad_norm 1.4788 (1.8635) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][550/625] eta 0:00:21 lr 0.001307 wd 0.0500 time 0.2518 (0.2808) data time 0.0008 (0.0034) model time 0.2510 (0.2774) loss 6.1941 (6.0419) grad_norm 1.9723 (1.8557) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][560/625] eta 0:00:18 lr 0.001306 wd 0.0500 time 0.2549 (0.2799) data time 0.0010 (0.0033) model time 0.2539 (0.2766) loss 6.9715 (6.0415) grad_norm 1.4879 (1.8525) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][570/625] eta 0:00:15 lr 0.001306 wd 0.0500 time 0.2564 (0.2791) data time 0.0009 (0.0032) model time 0.2555 (0.2759) loss 5.1118 (6.0385) grad_norm 1.2297 (1.8400) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][580/625] eta 0:00:12 lr 0.001306 wd 0.0500 time 0.2590 (0.2783) data time 0.0007 (0.0031) model time 0.2583 (0.2751) loss 4.4334 (6.0256) grad_norm 1.3372 (1.8361) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][590/625] eta 0:00:09 lr 0.001306 wd 0.0500 time 0.2501 (0.2775) data time 0.0010 (0.0031) model time 0.2491 (0.2744) loss 4.1954 (6.0132) grad_norm 2.5617 (1.8306) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][600/625] eta 0:00:06 lr 0.001306 wd 0.0500 time 0.2551 (0.2768) data time 0.0009 (0.0030) model time 0.2543 (0.2738) loss 6.8656 (6.0277) grad_norm 1.4757 (1.8267) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][610/625] eta 0:00:04 lr 0.001306 wd 0.0500 time 0.2547 (0.2762) data time 0.0006 (0.0030) model time 0.2541 (0.2732) loss 6.6849 (6.0396) grad_norm 1.1813 (1.8306) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][620/625] eta 0:00:01 lr 0.001305 wd 0.0500 time 0.2517 (0.2755) data time 0.0006 (0.0029) model time 0.2511 (0.2726) loss 5.6093 (6.0323) grad_norm 2.2609 (1.8229) loss_scale 2048.0000 (2048.0000) mem 9656MB [2024-08-04 03:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 132 training takes 0:01:33 [2024-08-04 03:49:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 03:49:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 03:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.395 (0.395) Loss 0.6621 (0.6621) Acc@1 86.865 (86.865) Acc@5 97.803 (97.803) Mem 9656MB [2024-08-04 03:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 1.0176 (0.8046) Acc@1 77.002 (83.425) Acc@5 94.531 (96.817) Mem 9656MB [2024-08-04 03:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.2070 (0.9558) Acc@1 72.510 (79.669) Acc@5 93.115 (95.168) Mem 9656MB [2024-08-04 03:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.385 Acc@5 95.146 [2024-08-04 03:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-08-04 03:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.784 (0.784) Loss 0.5835 (0.5835) Acc@1 88.623 (88.623) Acc@5 98.633 (98.633) Mem 9656MB [2024-08-04 03:50:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.126) Loss 0.9360 (0.7252) Acc@1 79.053 (84.908) Acc@5 95.166 (97.301) Mem 9656MB [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0801 (0.8607) Acc@1 73.877 (81.238) Acc@5 93.848 (95.747) Mem 9656MB [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.888 Acc@5 95.721 [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.89% [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 03:50:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 03:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][0/625] eta 0:09:25 lr 0.001305 wd 0.0500 time 0.9046 (0.9046) data time 0.4059 (0.4059) model time 0.0000 (0.0000) loss 6.2311 (6.2311) grad_norm 1.6556 (1.6556) loss_scale 2048.0000 (2048.0000) mem 9651MB [2024-08-04 03:50:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][10/625] eta 0:03:13 lr 0.001305 wd 0.0500 time 0.2541 (0.3145) data time 0.0008 (0.0377) model time 0.0000 (0.0000) loss 6.6336 (6.1338) grad_norm 1.5265 (1.6102) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][20/625] eta 0:02:53 lr 0.001305 wd 0.0500 time 0.2528 (0.2861) data time 0.0009 (0.0202) model time 0.0000 (0.0000) loss 5.4774 (6.1053) grad_norm 1.5710 (1.9914) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][30/625] eta 0:02:44 lr 0.001305 wd 0.0500 time 0.2591 (0.2764) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 5.0481 (6.0489) grad_norm 1.7165 (1.9345) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][40/625] eta 0:02:38 lr 0.001305 wd 0.0500 time 0.2593 (0.2713) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 4.6736 (6.0142) grad_norm 1.9863 (1.9529) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][50/625] eta 0:02:34 lr 0.001304 wd 0.0500 time 0.2580 (0.2681) data time 0.0014 (0.0088) model time 0.0000 (0.0000) loss 6.7791 (5.9961) grad_norm 2.8747 (2.0143) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][60/625] eta 0:02:30 lr 0.001304 wd 0.0500 time 0.2512 (0.2659) data time 0.0009 (0.0076) model time 0.2503 (0.2538) loss 6.7997 (6.0202) grad_norm 2.8010 (2.0157) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][70/625] eta 0:02:26 lr 0.001304 wd 0.0500 time 0.2547 (0.2645) data time 0.0008 (0.0066) model time 0.2539 (0.2542) loss 6.0885 (6.0370) grad_norm 1.2620 (1.9870) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][80/625] eta 0:02:23 lr 0.001304 wd 0.0500 time 0.2568 (0.2634) data time 0.0007 (0.0059) model time 0.2561 (0.2546) loss 6.7031 (6.0394) grad_norm 3.2349 (2.0269) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][90/625] eta 0:02:20 lr 0.001304 wd 0.0500 time 0.2540 (0.2625) data time 0.0007 (0.0053) model time 0.2533 (0.2546) loss 5.9445 (6.0589) grad_norm 3.6502 (2.1577) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][100/625] eta 0:02:17 lr 0.001304 wd 0.0500 time 0.2546 (0.2619) data time 0.0008 (0.0049) model time 0.2538 (0.2546) loss 7.3478 (6.0589) grad_norm 1.3567 (2.1016) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][110/625] eta 0:02:14 lr 0.001303 wd 0.0500 time 0.2591 (0.2614) data time 0.0008 (0.0045) model time 0.2583 (0.2547) loss 5.7397 (6.0535) grad_norm 1.5602 (2.0461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][120/625] eta 0:02:11 lr 0.001303 wd 0.0500 time 0.2558 (0.2608) data time 0.0009 (0.0042) model time 0.2549 (0.2546) loss 6.1969 (6.0255) grad_norm 1.3412 (2.0271) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][130/625] eta 0:02:08 lr 0.001303 wd 0.0500 time 0.2522 (0.2604) data time 0.0008 (0.0040) model time 0.2513 (0.2546) loss 5.7923 (5.9817) grad_norm 1.2109 (2.0294) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][140/625] eta 0:02:06 lr 0.001303 wd 0.0500 time 0.2599 (0.2602) data time 0.0008 (0.0038) model time 0.2591 (0.2548) loss 6.1514 (5.9681) grad_norm 1.3015 (2.0205) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][150/625] eta 0:02:03 lr 0.001303 wd 0.0500 time 0.2550 (0.2599) data time 0.0010 (0.0036) model time 0.2540 (0.2548) loss 6.6212 (5.9906) grad_norm 1.4333 (1.9996) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][160/625] eta 0:02:00 lr 0.001303 wd 0.0500 time 0.2555 (0.2597) data time 0.0011 (0.0034) model time 0.2544 (0.2549) loss 6.0090 (5.9984) grad_norm 2.1097 (1.9942) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][170/625] eta 0:01:58 lr 0.001302 wd 0.0500 time 0.2561 (0.2595) data time 0.0007 (0.0033) model time 0.2555 (0.2550) loss 6.5668 (6.0016) grad_norm 1.8690 (2.0235) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][180/625] eta 0:01:55 lr 0.001302 wd 0.0500 time 0.2535 (0.2595) data time 0.0008 (0.0032) model time 0.2528 (0.2551) loss 5.4849 (6.0102) grad_norm 1.8050 (2.0194) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][190/625] eta 0:01:52 lr 0.001302 wd 0.0500 time 0.2539 (0.2593) data time 0.0009 (0.0030) model time 0.2530 (0.2551) loss 5.3378 (5.9908) grad_norm 1.1327 (2.0034) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][200/625] eta 0:01:50 lr 0.001302 wd 0.0500 time 0.5131 (0.2604) data time 0.0007 (0.0029) model time 0.5123 (0.2568) loss 6.7544 (5.9848) grad_norm 1.2355 (1.9796) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][210/625] eta 0:01:48 lr 0.001302 wd 0.0500 time 0.2576 (0.2602) data time 0.0007 (0.0028) model time 0.2569 (0.2568) loss 6.7651 (5.9829) grad_norm 1.7475 (1.9678) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][220/625] eta 0:01:45 lr 0.001302 wd 0.0500 time 0.2580 (0.2601) data time 0.0010 (0.0028) model time 0.2570 (0.2568) loss 5.7407 (5.9942) grad_norm 4.5902 (1.9890) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][230/625] eta 0:01:42 lr 0.001301 wd 0.0500 time 0.2540 (0.2599) data time 0.0010 (0.0027) model time 0.2530 (0.2566) loss 5.9945 (5.9980) grad_norm 1.4575 (1.9980) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][240/625] eta 0:01:39 lr 0.001301 wd 0.0500 time 0.2557 (0.2597) data time 0.0007 (0.0026) model time 0.2550 (0.2565) loss 7.2109 (6.0026) grad_norm 2.2035 (1.9906) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][250/625] eta 0:01:37 lr 0.001301 wd 0.0500 time 0.2572 (0.2595) data time 0.0009 (0.0026) model time 0.2563 (0.2564) loss 6.3328 (6.0093) grad_norm 1.9764 (1.9786) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][260/625] eta 0:01:34 lr 0.001301 wd 0.0500 time 0.2577 (0.2594) data time 0.0011 (0.0025) model time 0.2567 (0.2563) loss 4.6509 (6.0053) grad_norm 1.9861 (1.9646) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][270/625] eta 0:01:32 lr 0.001301 wd 0.0500 time 0.2545 (0.2593) data time 0.0010 (0.0025) model time 0.2535 (0.2563) loss 6.3643 (6.0083) grad_norm 2.3895 (1.9549) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][280/625] eta 0:01:29 lr 0.001301 wd 0.0500 time 0.2508 (0.2592) data time 0.0010 (0.0024) model time 0.2498 (0.2562) loss 6.9771 (6.0157) grad_norm 1.6631 (1.9414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][290/625] eta 0:01:26 lr 0.001300 wd 0.0500 time 0.2525 (0.2591) data time 0.0010 (0.0024) model time 0.2515 (0.2562) loss 6.7261 (6.0209) grad_norm 1.5435 (1.9275) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 03:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][300/625] eta 0:01:24 lr 0.001300 wd 0.0500 time 0.2577 (0.2590) data time 0.0007 (0.0023) model time 0.2570 (0.2561) loss 4.5527 (6.0131) grad_norm 2.6236 (inf) loss_scale 1024.0000 (2020.7841) mem 9655MB [2024-08-04 03:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][310/625] eta 0:01:21 lr 0.001300 wd 0.0500 time 0.2551 (0.2588) data time 0.0008 (0.0023) model time 0.2543 (0.2561) loss 7.4621 (6.0104) grad_norm 2.4218 (inf) loss_scale 1024.0000 (1988.7331) mem 9655MB [2024-08-04 03:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][320/625] eta 0:01:18 lr 0.001300 wd 0.0500 time 0.2489 (0.2588) data time 0.0010 (0.0022) model time 0.2479 (0.2561) loss 4.6053 (5.9963) grad_norm 2.1821 (inf) loss_scale 1024.0000 (1958.6791) mem 9655MB [2024-08-04 03:51:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][330/625] eta 0:01:16 lr 0.001300 wd 0.0500 time 0.2570 (0.2587) data time 0.0007 (0.0022) model time 0.2563 (0.2560) loss 6.8960 (6.0066) grad_norm 2.0679 (inf) loss_scale 1024.0000 (1930.4411) mem 9655MB [2024-08-04 03:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][340/625] eta 0:01:13 lr 0.001300 wd 0.0500 time 0.2556 (0.2587) data time 0.0008 (0.0021) model time 0.2548 (0.2560) loss 6.9206 (6.0054) grad_norm 2.0513 (inf) loss_scale 1024.0000 (1903.8592) mem 9655MB [2024-08-04 03:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][350/625] eta 0:01:11 lr 0.001299 wd 0.0500 time 0.2549 (0.2586) data time 0.0010 (0.0021) model time 0.2538 (0.2560) loss 5.3210 (5.9930) grad_norm 2.2952 (inf) loss_scale 1024.0000 (1878.7920) mem 9655MB [2024-08-04 03:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][360/625] eta 0:01:08 lr 0.001299 wd 0.0500 time 0.2550 (0.2585) data time 0.0008 (0.0021) model time 0.2541 (0.2560) loss 6.5533 (5.9902) grad_norm 3.0302 (inf) loss_scale 1024.0000 (1855.1136) mem 9655MB [2024-08-04 03:51:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][370/625] eta 0:01:05 lr 0.001299 wd 0.0500 time 0.2552 (0.2585) data time 0.0008 (0.0021) model time 0.2544 (0.2560) loss 4.4178 (5.9870) grad_norm 1.5559 (inf) loss_scale 1024.0000 (1832.7116) mem 9655MB [2024-08-04 03:51:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][380/625] eta 0:01:03 lr 0.001299 wd 0.0500 time 0.2587 (0.2584) data time 0.0009 (0.0020) model time 0.2579 (0.2560) loss 6.7496 (5.9800) grad_norm 1.5273 (inf) loss_scale 1024.0000 (1811.4856) mem 9655MB [2024-08-04 03:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][390/625] eta 0:01:00 lr 0.001299 wd 0.0500 time 0.2620 (0.2584) data time 0.0011 (0.0020) model time 0.2608 (0.2560) loss 6.0536 (5.9823) grad_norm 1.8607 (inf) loss_scale 1024.0000 (1791.3453) mem 9655MB [2024-08-04 03:51:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][400/625] eta 0:00:58 lr 0.001299 wd 0.0500 time 0.2560 (0.2583) data time 0.0009 (0.0020) model time 0.2551 (0.2559) loss 7.0868 (5.9983) grad_norm 1.8337 (inf) loss_scale 1024.0000 (1772.2095) mem 9655MB [2024-08-04 03:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][410/625] eta 0:00:55 lr 0.001298 wd 0.0500 time 0.2575 (0.2583) data time 0.0008 (0.0020) model time 0.2567 (0.2559) loss 5.9872 (5.9998) grad_norm 1.3408 (inf) loss_scale 1024.0000 (1754.0049) mem 9655MB [2024-08-04 03:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][420/625] eta 0:00:52 lr 0.001298 wd 0.0500 time 0.2679 (0.2583) data time 0.0008 (0.0019) model time 0.2671 (0.2559) loss 6.4360 (6.0012) grad_norm 1.9709 (inf) loss_scale 1024.0000 (1736.6651) mem 9655MB [2024-08-04 03:51:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][430/625] eta 0:00:50 lr 0.001298 wd 0.0500 time 0.2578 (0.2582) data time 0.0006 (0.0019) model time 0.2572 (0.2559) loss 6.6623 (6.0065) grad_norm 2.7975 (inf) loss_scale 1024.0000 (1720.1299) mem 9655MB [2024-08-04 03:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][440/625] eta 0:00:47 lr 0.001298 wd 0.0500 time 0.2536 (0.2581) data time 0.0007 (0.0019) model time 0.2529 (0.2559) loss 5.9358 (6.0099) grad_norm 2.1927 (inf) loss_scale 1024.0000 (1704.3447) mem 9655MB [2024-08-04 03:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][450/625] eta 0:00:45 lr 0.001298 wd 0.0500 time 0.2536 (0.2581) data time 0.0010 (0.0019) model time 0.2526 (0.2558) loss 5.2627 (6.0009) grad_norm 1.3149 (inf) loss_scale 1024.0000 (1689.2594) mem 9655MB [2024-08-04 03:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][460/625] eta 0:00:42 lr 0.001298 wd 0.0500 time 0.2575 (0.2581) data time 0.0016 (0.0018) model time 0.2559 (0.2558) loss 5.8737 (6.0001) grad_norm 1.4675 (inf) loss_scale 1024.0000 (1674.8286) mem 9655MB [2024-08-04 03:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][470/625] eta 0:00:39 lr 0.001297 wd 0.0500 time 0.2561 (0.2580) data time 0.0007 (0.0018) model time 0.2554 (0.2558) loss 4.8702 (5.9827) grad_norm 1.8935 (inf) loss_scale 1024.0000 (1661.0106) mem 9655MB [2024-08-04 03:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][480/625] eta 0:00:37 lr 0.001297 wd 0.0500 time 0.2561 (0.2580) data time 0.0007 (0.0018) model time 0.2554 (0.2558) loss 6.4575 (5.9891) grad_norm 2.3765 (inf) loss_scale 1024.0000 (1647.7672) mem 9655MB [2024-08-04 03:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][490/625] eta 0:00:34 lr 0.001297 wd 0.0500 time 0.2523 (0.2579) data time 0.0009 (0.0018) model time 0.2513 (0.2558) loss 5.2439 (5.9787) grad_norm 3.3116 (inf) loss_scale 1024.0000 (1635.0631) mem 9655MB [2024-08-04 03:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][500/625] eta 0:00:32 lr 0.001297 wd 0.0500 time 0.2694 (0.2579) data time 0.0007 (0.0018) model time 0.2687 (0.2558) loss 5.7994 (5.9770) grad_norm 2.5545 (inf) loss_scale 1024.0000 (1622.8663) mem 9655MB [2024-08-04 03:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][510/625] eta 0:00:29 lr 0.001297 wd 0.0500 time 0.2603 (0.2579) data time 0.0007 (0.0018) model time 0.2596 (0.2558) loss 6.5294 (5.9768) grad_norm 1.7521 (inf) loss_scale 1024.0000 (1611.1468) mem 9655MB [2024-08-04 03:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][520/625] eta 0:00:27 lr 0.001297 wd 0.0500 time 0.2557 (0.2579) data time 0.0011 (0.0017) model time 0.2546 (0.2558) loss 6.9665 (5.9801) grad_norm 2.2117 (inf) loss_scale 1024.0000 (1599.8772) mem 9655MB [2024-08-04 03:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][530/625] eta 0:00:24 lr 0.001296 wd 0.0500 time 0.2607 (0.2578) data time 0.0012 (0.0017) model time 0.2595 (0.2558) loss 5.5801 (5.9834) grad_norm 1.2578 (inf) loss_scale 1024.0000 (1589.0320) mem 9655MB [2024-08-04 03:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][540/625] eta 0:00:21 lr 0.001296 wd 0.0500 time 0.2554 (0.2579) data time 0.0007 (0.0017) model time 0.2547 (0.2558) loss 6.2122 (5.9785) grad_norm 1.5462 (inf) loss_scale 1024.0000 (1578.5878) mem 9655MB [2024-08-04 03:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][550/625] eta 0:00:19 lr 0.001296 wd 0.0500 time 0.2573 (0.2578) data time 0.0008 (0.0017) model time 0.2564 (0.2558) loss 4.9337 (5.9722) grad_norm 2.2296 (inf) loss_scale 1024.0000 (1568.5227) mem 9655MB [2024-08-04 03:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][560/625] eta 0:00:16 lr 0.001296 wd 0.0500 time 0.2542 (0.2578) data time 0.0010 (0.0017) model time 0.2532 (0.2558) loss 6.7230 (5.9707) grad_norm 1.8969 (inf) loss_scale 1024.0000 (1558.8164) mem 9655MB [2024-08-04 03:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][570/625] eta 0:00:14 lr 0.001296 wd 0.0500 time 0.2546 (0.2578) data time 0.0010 (0.0017) model time 0.2535 (0.2558) loss 5.1393 (5.9713) grad_norm 3.5186 (inf) loss_scale 1024.0000 (1549.4501) mem 9655MB [2024-08-04 03:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][580/625] eta 0:00:11 lr 0.001295 wd 0.0500 time 0.2530 (0.2578) data time 0.0007 (0.0017) model time 0.2523 (0.2558) loss 6.5532 (5.9733) grad_norm 2.7314 (inf) loss_scale 1024.0000 (1540.4062) mem 9655MB [2024-08-04 03:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][590/625] eta 0:00:09 lr 0.001295 wd 0.0500 time 0.2561 (0.2578) data time 0.0012 (0.0017) model time 0.2550 (0.2558) loss 6.7532 (5.9799) grad_norm 1.8547 (inf) loss_scale 1024.0000 (1531.6684) mem 9655MB [2024-08-04 03:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][600/625] eta 0:00:06 lr 0.001295 wd 0.0500 time 0.2529 (0.2577) data time 0.0007 (0.0016) model time 0.2522 (0.2558) loss 6.0720 (5.9793) grad_norm 1.7942 (inf) loss_scale 1024.0000 (1523.2213) mem 9655MB [2024-08-04 03:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][610/625] eta 0:00:03 lr 0.001295 wd 0.0500 time 0.2518 (0.2577) data time 0.0007 (0.0016) model time 0.2512 (0.2558) loss 4.9712 (5.9676) grad_norm 1.4097 (inf) loss_scale 1024.0000 (1515.0507) mem 9655MB [2024-08-04 03:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][620/625] eta 0:00:01 lr 0.001295 wd 0.0500 time 0.2546 (0.2576) data time 0.0005 (0.0016) model time 0.2541 (0.2557) loss 6.1457 (5.9678) grad_norm 2.3622 (inf) loss_scale 1024.0000 (1507.1433) mem 9655MB [2024-08-04 03:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 133 training takes 0:02:41 [2024-08-04 03:52:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 03:52:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 03:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.559 (0.559) Loss 0.6489 (0.6489) Acc@1 88.135 (88.135) Acc@5 98.145 (98.145) Mem 9655MB [2024-08-04 03:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0566 (0.8180) Acc@1 77.051 (83.261) Acc@5 94.092 (96.848) Mem 9655MB [2024-08-04 03:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.2031 (0.9574) Acc@1 72.949 (79.771) Acc@5 92.676 (95.106) Mem 9655MB [2024-08-04 03:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.441 Acc@5 95.126 [2024-08-04 03:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-08-04 03:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.743 (0.743) Loss 0.5845 (0.5845) Acc@1 88.623 (88.623) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 03:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9355 (0.7255) Acc@1 79.150 (84.970) Acc@5 95.264 (97.319) Mem 9655MB [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0801 (0.8606) Acc@1 73.877 (81.292) Acc@5 93.799 (95.733) Mem 9655MB [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.940 Acc@5 95.715 [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.94% [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 03:52:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 03:52:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][0/625] eta 0:08:04 lr 0.001295 wd 0.0500 time 0.7752 (0.7752) data time 0.5301 (0.5301) model time 0.0000 (0.0000) loss 6.5480 (6.5480) grad_norm 1.0806 (1.0806) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][10/625] eta 0:03:06 lr 0.001295 wd 0.0500 time 0.2560 (0.3035) data time 0.0016 (0.0492) model time 0.0000 (0.0000) loss 6.4500 (6.4370) grad_norm 2.1084 (1.7572) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:52:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][20/625] eta 0:02:50 lr 0.001294 wd 0.0500 time 0.2699 (0.2818) data time 0.0010 (0.0263) model time 0.0000 (0.0000) loss 5.4294 (6.1759) grad_norm 1.7651 (1.8607) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][30/625] eta 0:02:42 lr 0.001294 wd 0.0500 time 0.2540 (0.2736) data time 0.0010 (0.0181) model time 0.0000 (0.0000) loss 5.8890 (6.1951) grad_norm 2.1131 (1.9387) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][40/625] eta 0:02:37 lr 0.001294 wd 0.0500 time 0.2551 (0.2694) data time 0.0009 (0.0139) model time 0.0000 (0.0000) loss 6.6815 (6.0895) grad_norm 1.4715 (1.8945) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][50/625] eta 0:02:33 lr 0.001294 wd 0.0500 time 0.2543 (0.2668) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 5.1505 (6.1423) grad_norm 1.7515 (1.9973) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][60/625] eta 0:02:29 lr 0.001294 wd 0.0500 time 0.2688 (0.2652) data time 0.0009 (0.0096) model time 0.2679 (0.2560) loss 6.1073 (6.0442) grad_norm 1.8018 (1.9649) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][70/625] eta 0:02:26 lr 0.001294 wd 0.0500 time 0.2529 (0.2638) data time 0.0008 (0.0084) model time 0.2520 (0.2554) loss 6.8883 (6.0294) grad_norm 2.1446 (1.8978) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][80/625] eta 0:02:23 lr 0.001293 wd 0.0500 time 0.2541 (0.2629) data time 0.0009 (0.0075) model time 0.2532 (0.2553) loss 6.3222 (6.0686) grad_norm 2.3962 (1.9475) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][90/625] eta 0:02:20 lr 0.001293 wd 0.0500 time 0.2513 (0.2621) data time 0.0011 (0.0068) model time 0.2502 (0.2552) loss 5.8833 (6.0616) grad_norm 2.7914 (1.9803) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][100/625] eta 0:02:17 lr 0.001293 wd 0.0500 time 0.2548 (0.2615) data time 0.0008 (0.0062) model time 0.2540 (0.2552) loss 5.3030 (6.0693) grad_norm 1.1514 (1.9678) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][110/625] eta 0:02:14 lr 0.001293 wd 0.0500 time 0.2554 (0.2610) data time 0.0010 (0.0057) model time 0.2543 (0.2550) loss 5.4191 (6.0263) grad_norm 2.3614 (2.0287) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][120/625] eta 0:02:11 lr 0.001293 wd 0.0500 time 0.2566 (0.2605) data time 0.0006 (0.0053) model time 0.2560 (0.2550) loss 6.0184 (6.0452) grad_norm 1.9232 (2.0889) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][130/625] eta 0:02:08 lr 0.001293 wd 0.0500 time 0.2485 (0.2601) data time 0.0009 (0.0050) model time 0.2477 (0.2549) loss 6.9710 (6.0397) grad_norm 2.7150 (2.1039) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][140/625] eta 0:02:06 lr 0.001292 wd 0.0500 time 0.2562 (0.2598) data time 0.0013 (0.0047) model time 0.2549 (0.2549) loss 6.9157 (6.0527) grad_norm 2.9946 (2.0970) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][150/625] eta 0:02:03 lr 0.001292 wd 0.0500 time 0.2540 (0.2595) data time 0.0009 (0.0044) model time 0.2532 (0.2549) loss 6.5856 (6.0604) grad_norm 2.4931 (2.0919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][160/625] eta 0:02:00 lr 0.001292 wd 0.0500 time 0.2566 (0.2593) data time 0.0009 (0.0042) model time 0.2557 (0.2549) loss 7.0350 (6.0708) grad_norm 1.5262 (2.0604) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][170/625] eta 0:01:57 lr 0.001292 wd 0.0500 time 0.2603 (0.2592) data time 0.0008 (0.0040) model time 0.2595 (0.2551) loss 6.5602 (6.0588) grad_norm 1.6373 (2.0326) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][180/625] eta 0:01:55 lr 0.001292 wd 0.0500 time 0.2577 (0.2590) data time 0.0007 (0.0039) model time 0.2570 (0.2550) loss 5.0033 (6.0653) grad_norm 1.5868 (2.0053) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][190/625] eta 0:01:52 lr 0.001291 wd 0.0500 time 0.2580 (0.2590) data time 0.0006 (0.0037) model time 0.2574 (0.2552) loss 7.1599 (6.0734) grad_norm 3.1525 (2.0148) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][200/625] eta 0:01:50 lr 0.001291 wd 0.0500 time 0.2547 (0.2589) data time 0.0007 (0.0036) model time 0.2540 (0.2552) loss 6.9690 (6.0601) grad_norm 3.3803 (2.0573) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][210/625] eta 0:01:47 lr 0.001291 wd 0.0500 time 0.2703 (0.2588) data time 0.0010 (0.0035) model time 0.2693 (0.2553) loss 7.0240 (6.0831) grad_norm 1.6279 (2.0454) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][220/625] eta 0:01:44 lr 0.001291 wd 0.0500 time 0.2651 (0.2587) data time 0.0008 (0.0033) model time 0.2643 (0.2554) loss 6.0216 (6.1026) grad_norm 2.5243 (2.0385) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][230/625] eta 0:01:42 lr 0.001291 wd 0.0500 time 0.2584 (0.2587) data time 0.0010 (0.0032) model time 0.2574 (0.2555) loss 6.4791 (6.0916) grad_norm 1.6759 (2.0370) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][240/625] eta 0:01:39 lr 0.001291 wd 0.0500 time 0.2558 (0.2586) data time 0.0007 (0.0031) model time 0.2550 (0.2555) loss 5.9255 (6.1047) grad_norm 1.3195 (2.0217) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][250/625] eta 0:01:36 lr 0.001290 wd 0.0500 time 0.2545 (0.2585) data time 0.0009 (0.0031) model time 0.2536 (0.2555) loss 5.0532 (6.1000) grad_norm 1.5731 (2.0077) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][260/625] eta 0:01:34 lr 0.001290 wd 0.0500 time 0.2648 (0.2585) data time 0.0007 (0.0030) model time 0.2641 (0.2556) loss 4.8262 (6.0925) grad_norm 2.8029 (2.0134) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][270/625] eta 0:01:31 lr 0.001290 wd 0.0500 time 0.2684 (0.2586) data time 0.0008 (0.0029) model time 0.2676 (0.2558) loss 4.4750 (6.0874) grad_norm 1.1763 (2.0076) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][280/625] eta 0:01:29 lr 0.001290 wd 0.0500 time 0.2555 (0.2586) data time 0.0010 (0.0028) model time 0.2545 (0.2558) loss 6.3715 (6.0925) grad_norm 1.6937 (2.0046) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][290/625] eta 0:01:26 lr 0.001290 wd 0.0500 time 0.2536 (0.2586) data time 0.0008 (0.0028) model time 0.2529 (0.2559) loss 6.6834 (6.0826) grad_norm 1.7928 (1.9978) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][300/625] eta 0:01:24 lr 0.001290 wd 0.0500 time 0.2534 (0.2586) data time 0.0007 (0.0027) model time 0.2527 (0.2559) loss 6.9073 (6.0926) grad_norm 2.5812 (1.9985) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][310/625] eta 0:01:21 lr 0.001289 wd 0.0500 time 0.2528 (0.2585) data time 0.0011 (0.0027) model time 0.2517 (0.2559) loss 6.5156 (6.0930) grad_norm 1.4060 (1.9821) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][320/625] eta 0:01:18 lr 0.001289 wd 0.0500 time 0.2595 (0.2585) data time 0.0007 (0.0026) model time 0.2588 (0.2559) loss 6.7445 (6.0991) grad_norm 2.5211 (1.9752) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][330/625] eta 0:01:16 lr 0.001289 wd 0.0500 time 0.2514 (0.2584) data time 0.0007 (0.0026) model time 0.2507 (0.2559) loss 5.7526 (6.0998) grad_norm 1.0981 (1.9647) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][340/625] eta 0:01:13 lr 0.001289 wd 0.0500 time 0.2583 (0.2584) data time 0.0008 (0.0025) model time 0.2574 (0.2560) loss 6.1920 (6.1014) grad_norm 2.0610 (1.9547) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][350/625] eta 0:01:11 lr 0.001289 wd 0.0500 time 0.2552 (0.2583) data time 0.0010 (0.0025) model time 0.2542 (0.2559) loss 5.3215 (6.0862) grad_norm 2.0519 (1.9482) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][360/625] eta 0:01:08 lr 0.001289 wd 0.0500 time 0.2557 (0.2583) data time 0.0009 (0.0024) model time 0.2548 (0.2559) loss 6.4022 (6.0785) grad_norm 3.5295 (1.9454) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][370/625] eta 0:01:05 lr 0.001288 wd 0.0500 time 0.2552 (0.2583) data time 0.0009 (0.0024) model time 0.2543 (0.2560) loss 5.4091 (6.0723) grad_norm 1.2647 (1.9356) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][380/625] eta 0:01:03 lr 0.001288 wd 0.0500 time 0.2563 (0.2582) data time 0.0009 (0.0024) model time 0.2554 (0.2560) loss 6.3207 (6.0779) grad_norm 2.4322 (1.9299) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][390/625] eta 0:01:00 lr 0.001288 wd 0.0500 time 0.2520 (0.2582) data time 0.0009 (0.0023) model time 0.2510 (0.2559) loss 6.4917 (6.0790) grad_norm 1.5656 (1.9244) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][400/625] eta 0:00:58 lr 0.001288 wd 0.0500 time 0.2548 (0.2582) data time 0.0009 (0.0023) model time 0.2539 (0.2560) loss 6.7792 (6.0914) grad_norm 1.1934 (1.9266) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][410/625] eta 0:00:55 lr 0.001288 wd 0.0500 time 0.2548 (0.2592) data time 0.0010 (0.0023) model time 0.2538 (0.2571) loss 5.7905 (6.0916) grad_norm 1.3140 (1.9273) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][420/625] eta 0:00:53 lr 0.001288 wd 0.0500 time 0.2581 (0.2591) data time 0.0007 (0.0022) model time 0.2574 (0.2571) loss 6.0456 (6.0884) grad_norm 1.8827 (1.9285) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][430/625] eta 0:00:50 lr 0.001287 wd 0.0500 time 0.2559 (0.2595) data time 0.0007 (0.0022) model time 0.2552 (0.2576) loss 5.3651 (6.0851) grad_norm 2.6170 (1.9363) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][440/625] eta 0:00:48 lr 0.001287 wd 0.0500 time 0.2712 (0.2595) data time 0.0009 (0.0022) model time 0.2703 (0.2576) loss 6.4297 (6.0952) grad_norm 1.9051 (1.9361) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][450/625] eta 0:00:45 lr 0.001287 wd 0.0500 time 0.2550 (0.2594) data time 0.0006 (0.0021) model time 0.2543 (0.2575) loss 4.5980 (6.0985) grad_norm 2.8228 (1.9339) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][460/625] eta 0:00:42 lr 0.001287 wd 0.0500 time 0.2602 (0.2594) data time 0.0007 (0.0021) model time 0.2594 (0.2575) loss 5.3466 (6.0953) grad_norm 1.2260 (1.9386) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][470/625] eta 0:00:40 lr 0.001287 wd 0.0500 time 0.2589 (0.2593) data time 0.0009 (0.0021) model time 0.2581 (0.2574) loss 6.4468 (6.0944) grad_norm 1.6525 (1.9390) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][480/625] eta 0:00:37 lr 0.001287 wd 0.0500 time 0.2553 (0.2592) data time 0.0008 (0.0021) model time 0.2544 (0.2573) loss 7.1352 (6.0975) grad_norm 2.5498 (1.9474) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][490/625] eta 0:00:34 lr 0.001286 wd 0.0500 time 0.2557 (0.2591) data time 0.0008 (0.0020) model time 0.2549 (0.2573) loss 6.6835 (6.0949) grad_norm 1.5074 (1.9474) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][500/625] eta 0:00:32 lr 0.001286 wd 0.0500 time 0.2547 (0.2591) data time 0.0010 (0.0020) model time 0.2537 (0.2572) loss 4.8666 (6.1023) grad_norm 3.2793 (1.9480) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][510/625] eta 0:00:29 lr 0.001286 wd 0.0500 time 0.2614 (0.2590) data time 0.0008 (0.0020) model time 0.2606 (0.2572) loss 5.1064 (6.0990) grad_norm 1.1470 (1.9436) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][520/625] eta 0:00:27 lr 0.001286 wd 0.0500 time 0.2615 (0.2590) data time 0.0009 (0.0020) model time 0.2606 (0.2571) loss 6.4209 (6.0913) grad_norm 2.1664 (1.9398) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][530/625] eta 0:00:24 lr 0.001286 wd 0.0500 time 0.2531 (0.2589) data time 0.0010 (0.0020) model time 0.2521 (0.2571) loss 6.6697 (6.0840) grad_norm 2.0813 (1.9332) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][540/625] eta 0:00:22 lr 0.001286 wd 0.0500 time 0.2587 (0.2589) data time 0.0008 (0.0019) model time 0.2579 (0.2571) loss 6.5812 (6.0851) grad_norm 2.3340 (1.9372) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][550/625] eta 0:00:19 lr 0.001285 wd 0.0500 time 0.2540 (0.2588) data time 0.0009 (0.0019) model time 0.2531 (0.2570) loss 6.2481 (6.0897) grad_norm 1.7843 (1.9393) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][560/625] eta 0:00:16 lr 0.001285 wd 0.0500 time 0.2584 (0.2587) data time 0.0008 (0.0019) model time 0.2575 (0.2570) loss 6.8118 (6.0937) grad_norm 2.0407 (1.9411) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][570/625] eta 0:00:14 lr 0.001285 wd 0.0500 time 0.2544 (0.2587) data time 0.0011 (0.0019) model time 0.2533 (0.2569) loss 6.6272 (6.0859) grad_norm 1.6929 (1.9451) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][580/625] eta 0:00:11 lr 0.001285 wd 0.0500 time 0.2671 (0.2587) data time 0.0007 (0.0019) model time 0.2665 (0.2569) loss 5.6218 (6.0776) grad_norm 1.6786 (1.9461) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][590/625] eta 0:00:09 lr 0.001285 wd 0.0500 time 0.2532 (0.2586) data time 0.0009 (0.0019) model time 0.2523 (0.2569) loss 6.1994 (6.0762) grad_norm 1.6811 (1.9394) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][600/625] eta 0:00:06 lr 0.001284 wd 0.0500 time 0.2550 (0.2586) data time 0.0008 (0.0018) model time 0.2542 (0.2569) loss 5.8208 (6.0669) grad_norm 2.1554 (1.9379) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][610/625] eta 0:00:03 lr 0.001284 wd 0.0500 time 0.2520 (0.2586) data time 0.0003 (0.0018) model time 0.2517 (0.2569) loss 5.9506 (6.0681) grad_norm 1.6883 (1.9313) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][620/625] eta 0:00:01 lr 0.001284 wd 0.0500 time 0.2543 (0.2585) data time 0.0003 (0.0018) model time 0.2540 (0.2568) loss 6.5987 (6.0664) grad_norm 3.7803 (1.9360) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 134 training takes 0:02:41 [2024-08-04 03:55:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 03:55:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 03:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.542 (0.542) Loss 0.6553 (0.6553) Acc@1 87.207 (87.207) Acc@5 98.047 (98.047) Mem 9655MB [2024-08-04 03:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0449 (0.8088) Acc@1 76.904 (83.620) Acc@5 94.531 (96.924) Mem 9655MB [2024-08-04 03:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.2100 (0.9616) Acc@1 72.363 (79.818) Acc@5 92.627 (95.133) Mem 9655MB [2024-08-04 03:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.545 Acc@5 95.168 [2024-08-04 03:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-08-04 03:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.761 (0.761) Loss 0.5845 (0.5845) Acc@1 88.623 (88.623) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 03:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9355 (0.7257) Acc@1 79.102 (84.983) Acc@5 95.312 (97.323) Mem 9655MB [2024-08-04 03:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0791 (0.8608) Acc@1 73.975 (81.280) Acc@5 93.799 (95.736) Mem 9655MB [2024-08-04 03:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.930 Acc@5 95.713 [2024-08-04 03:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-04 03:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][0/625] eta 0:11:05 lr 0.001284 wd 0.0500 time 1.0641 (1.0641) data time 0.5672 (0.5672) model time 0.0000 (0.0000) loss 5.5890 (5.5890) grad_norm 1.2225 (1.2225) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][10/625] eta 0:03:22 lr 0.001284 wd 0.0500 time 0.2564 (0.3290) data time 0.0008 (0.0524) model time 0.0000 (0.0000) loss 6.2825 (6.2936) grad_norm 1.9111 (1.6657) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][20/625] eta 0:02:57 lr 0.001284 wd 0.0500 time 0.2503 (0.2939) data time 0.0010 (0.0279) model time 0.0000 (0.0000) loss 6.1862 (6.2603) grad_norm 2.3067 (1.8534) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][30/625] eta 0:02:47 lr 0.001284 wd 0.0500 time 0.2539 (0.2817) data time 0.0010 (0.0193) model time 0.0000 (0.0000) loss 5.5602 (6.1420) grad_norm 1.3077 (1.8347) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][40/625] eta 0:02:41 lr 0.001283 wd 0.0500 time 0.2551 (0.2755) data time 0.0012 (0.0148) model time 0.0000 (0.0000) loss 6.2729 (6.1164) grad_norm 2.0830 (1.9537) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][50/625] eta 0:02:36 lr 0.001283 wd 0.0500 time 0.2587 (0.2718) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 6.1409 (6.1045) grad_norm 1.3452 (1.9397) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][60/625] eta 0:02:32 lr 0.001283 wd 0.0500 time 0.2567 (0.2696) data time 0.0010 (0.0102) model time 0.2557 (0.2576) loss 6.0387 (5.9953) grad_norm 1.3736 (1.8634) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][70/625] eta 0:02:28 lr 0.001283 wd 0.0500 time 0.2507 (0.2677) data time 0.0010 (0.0090) model time 0.2497 (0.2563) loss 5.1212 (5.9305) grad_norm 1.5538 (1.8141) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:55:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][80/625] eta 0:02:25 lr 0.001283 wd 0.0500 time 0.2506 (0.2664) data time 0.0011 (0.0080) model time 0.2495 (0.2563) loss 6.5069 (5.9355) grad_norm 1.8335 (1.8467) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][90/625] eta 0:02:21 lr 0.001283 wd 0.0500 time 0.2543 (0.2652) data time 0.0009 (0.0072) model time 0.2534 (0.2558) loss 5.9771 (5.9381) grad_norm 2.3659 (1.9371) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][100/625] eta 0:02:18 lr 0.001282 wd 0.0500 time 0.2612 (0.2644) data time 0.0007 (0.0066) model time 0.2605 (0.2558) loss 5.1327 (5.9380) grad_norm 1.6329 (1.9362) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][110/625] eta 0:02:15 lr 0.001282 wd 0.0500 time 0.2551 (0.2636) data time 0.0006 (0.0061) model time 0.2545 (0.2557) loss 7.4041 (5.9970) grad_norm 1.6738 (1.9037) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][120/625] eta 0:02:12 lr 0.001282 wd 0.0500 time 0.2529 (0.2630) data time 0.0007 (0.0056) model time 0.2522 (0.2556) loss 5.9090 (5.9759) grad_norm 1.0380 (1.8964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][130/625] eta 0:02:09 lr 0.001282 wd 0.0500 time 0.2582 (0.2624) data time 0.0008 (0.0053) model time 0.2574 (0.2555) loss 5.1711 (5.9617) grad_norm 1.9635 (1.8999) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][140/625] eta 0:02:07 lr 0.001282 wd 0.0500 time 0.2561 (0.2621) data time 0.0011 (0.0050) model time 0.2550 (0.2557) loss 6.4575 (5.9862) grad_norm 2.0643 (1.9184) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][150/625] eta 0:02:04 lr 0.001282 wd 0.0500 time 0.2578 (0.2617) data time 0.0007 (0.0047) model time 0.2571 (0.2557) loss 5.6478 (5.9838) grad_norm 1.8986 (1.9244) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][160/625] eta 0:02:01 lr 0.001281 wd 0.0500 time 0.2460 (0.2615) data time 0.0012 (0.0045) model time 0.2448 (0.2557) loss 5.9318 (5.9721) grad_norm 2.1406 (1.9346) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][170/625] eta 0:01:58 lr 0.001281 wd 0.0500 time 0.2515 (0.2611) data time 0.0008 (0.0043) model time 0.2507 (0.2556) loss 6.5063 (5.9711) grad_norm 2.6699 (1.9444) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][180/625] eta 0:01:56 lr 0.001281 wd 0.0500 time 0.2564 (0.2608) data time 0.0010 (0.0041) model time 0.2554 (0.2556) loss 5.6102 (5.9824) grad_norm 1.3270 (1.9533) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][190/625] eta 0:01:53 lr 0.001281 wd 0.0500 time 0.2572 (0.2606) data time 0.0006 (0.0039) model time 0.2566 (0.2556) loss 6.6876 (5.9910) grad_norm 1.3628 (1.9398) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][200/625] eta 0:01:50 lr 0.001281 wd 0.0500 time 0.2619 (0.2604) data time 0.0006 (0.0038) model time 0.2613 (0.2556) loss 5.7545 (5.9842) grad_norm 1.9163 (1.9387) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][210/625] eta 0:01:47 lr 0.001280 wd 0.0500 time 0.2580 (0.2602) data time 0.0011 (0.0036) model time 0.2570 (0.2556) loss 5.7630 (6.0032) grad_norm 1.6579 (1.9183) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][220/625] eta 0:01:45 lr 0.001280 wd 0.0500 time 0.2535 (0.2601) data time 0.0008 (0.0035) model time 0.2526 (0.2556) loss 5.0821 (5.9921) grad_norm 1.3673 (1.9207) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][230/625] eta 0:01:42 lr 0.001280 wd 0.0500 time 0.2587 (0.2599) data time 0.0008 (0.0034) model time 0.2579 (0.2556) loss 6.9862 (5.9906) grad_norm 1.1340 (1.9079) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][240/625] eta 0:01:40 lr 0.001280 wd 0.0500 time 0.2540 (0.2598) data time 0.0010 (0.0033) model time 0.2531 (0.2556) loss 6.8989 (6.0032) grad_norm 1.8842 (1.9056) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][250/625] eta 0:01:37 lr 0.001280 wd 0.0500 time 0.2563 (0.2596) data time 0.0007 (0.0032) model time 0.2556 (0.2556) loss 5.0689 (5.9995) grad_norm 2.1444 (1.9004) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][260/625] eta 0:01:34 lr 0.001280 wd 0.0500 time 0.2625 (0.2595) data time 0.0009 (0.0031) model time 0.2616 (0.2556) loss 4.8905 (5.9927) grad_norm 1.5971 (1.8993) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][270/625] eta 0:01:32 lr 0.001279 wd 0.0500 time 0.2608 (0.2594) data time 0.0006 (0.0030) model time 0.2603 (0.2556) loss 6.2420 (5.9894) grad_norm 4.9827 (1.9160) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][280/625] eta 0:01:29 lr 0.001279 wd 0.0500 time 0.2575 (0.2592) data time 0.0008 (0.0030) model time 0.2567 (0.2555) loss 6.3886 (5.9803) grad_norm 1.1851 (1.9097) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][290/625] eta 0:01:26 lr 0.001279 wd 0.0500 time 0.2636 (0.2592) data time 0.0010 (0.0029) model time 0.2626 (0.2556) loss 6.5869 (5.9926) grad_norm 1.5833 (1.9022) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][300/625] eta 0:01:24 lr 0.001279 wd 0.0500 time 0.2613 (0.2591) data time 0.0011 (0.0028) model time 0.2603 (0.2556) loss 5.2500 (5.9865) grad_norm 2.3034 (1.9320) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][310/625] eta 0:01:21 lr 0.001279 wd 0.0500 time 0.2582 (0.2590) data time 0.0006 (0.0028) model time 0.2576 (0.2555) loss 6.0157 (5.9793) grad_norm 1.4212 (1.9259) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][320/625] eta 0:01:18 lr 0.001279 wd 0.0500 time 0.2549 (0.2589) data time 0.0008 (0.0027) model time 0.2542 (0.2556) loss 6.5407 (5.9872) grad_norm 2.7697 (1.9305) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][330/625] eta 0:01:16 lr 0.001278 wd 0.0500 time 0.2619 (0.2589) data time 0.0009 (0.0026) model time 0.2610 (0.2556) loss 5.6733 (5.9731) grad_norm 1.3456 (1.9226) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][340/625] eta 0:01:13 lr 0.001278 wd 0.0500 time 0.2572 (0.2588) data time 0.0009 (0.0026) model time 0.2563 (0.2556) loss 5.7106 (5.9750) grad_norm 1.3862 (1.9123) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][350/625] eta 0:01:11 lr 0.001278 wd 0.0500 time 0.2569 (0.2588) data time 0.0008 (0.0025) model time 0.2561 (0.2556) loss 6.2596 (5.9754) grad_norm 1.3073 (1.9077) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][360/625] eta 0:01:08 lr 0.001278 wd 0.0500 time 0.2576 (0.2587) data time 0.0009 (0.0025) model time 0.2568 (0.2556) loss 5.8212 (5.9678) grad_norm 2.0062 (1.9028) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][370/625] eta 0:01:05 lr 0.001278 wd 0.0500 time 0.2538 (0.2586) data time 0.0008 (0.0025) model time 0.2530 (0.2556) loss 6.0558 (5.9627) grad_norm 1.2279 (1.8992) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][380/625] eta 0:01:03 lr 0.001278 wd 0.0500 time 0.2544 (0.2586) data time 0.0011 (0.0024) model time 0.2533 (0.2556) loss 5.8020 (5.9553) grad_norm 2.6397 (1.9154) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][390/625] eta 0:01:00 lr 0.001277 wd 0.0500 time 0.2541 (0.2585) data time 0.0009 (0.0024) model time 0.2531 (0.2556) loss 4.8695 (5.9536) grad_norm 1.7068 (1.9168) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][400/625] eta 0:00:58 lr 0.001277 wd 0.0500 time 0.2484 (0.2584) data time 0.0009 (0.0023) model time 0.2475 (0.2555) loss 6.5037 (5.9512) grad_norm 3.1070 (1.9156) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][410/625] eta 0:00:55 lr 0.001277 wd 0.0500 time 0.2532 (0.2584) data time 0.0010 (0.0023) model time 0.2522 (0.2555) loss 6.6048 (5.9509) grad_norm 1.4854 (1.9094) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][420/625] eta 0:00:52 lr 0.001277 wd 0.0500 time 0.2566 (0.2583) data time 0.0009 (0.0023) model time 0.2557 (0.2555) loss 6.2032 (5.9552) grad_norm 1.3156 (1.9087) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][430/625] eta 0:00:50 lr 0.001277 wd 0.0500 time 0.2532 (0.2583) data time 0.0010 (0.0022) model time 0.2522 (0.2555) loss 5.8597 (5.9556) grad_norm 4.4167 (1.9101) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][440/625] eta 0:00:47 lr 0.001277 wd 0.0500 time 0.2551 (0.2582) data time 0.0009 (0.0022) model time 0.2542 (0.2555) loss 6.5271 (5.9628) grad_norm 1.5864 (1.9130) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][450/625] eta 0:00:45 lr 0.001276 wd 0.0500 time 0.2526 (0.2581) data time 0.0011 (0.0022) model time 0.2516 (0.2554) loss 6.4338 (5.9662) grad_norm 1.5282 (1.9059) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][460/625] eta 0:00:42 lr 0.001276 wd 0.0500 time 0.2566 (0.2589) data time 0.0007 (0.0022) model time 0.2559 (0.2564) loss 5.3613 (5.9600) grad_norm 2.4520 (1.9117) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][470/625] eta 0:00:40 lr 0.001276 wd 0.0500 time 0.2668 (0.2589) data time 0.0008 (0.0021) model time 0.2660 (0.2564) loss 5.5056 (5.9572) grad_norm 2.4125 (1.9131) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][480/625] eta 0:00:37 lr 0.001276 wd 0.0500 time 0.2565 (0.2589) data time 0.0010 (0.0021) model time 0.2556 (0.2564) loss 6.3803 (5.9588) grad_norm 1.5276 (1.9129) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][490/625] eta 0:00:34 lr 0.001276 wd 0.0500 time 0.2519 (0.2588) data time 0.0008 (0.0021) model time 0.2512 (0.2564) loss 5.3475 (5.9619) grad_norm 1.7602 (1.9106) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][500/625] eta 0:00:32 lr 0.001276 wd 0.0500 time 0.2572 (0.2588) data time 0.0008 (0.0021) model time 0.2564 (0.2564) loss 6.0529 (5.9711) grad_norm 1.4668 (1.9072) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][510/625] eta 0:00:29 lr 0.001275 wd 0.0500 time 0.2581 (0.2587) data time 0.0007 (0.0020) model time 0.2574 (0.2564) loss 5.7212 (5.9643) grad_norm 2.1649 (1.9078) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][520/625] eta 0:00:27 lr 0.001275 wd 0.0500 time 0.2553 (0.2587) data time 0.0011 (0.0020) model time 0.2542 (0.2564) loss 6.6204 (5.9675) grad_norm 1.6036 (1.9049) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][530/625] eta 0:00:24 lr 0.001275 wd 0.0500 time 0.2539 (0.2587) data time 0.0010 (0.0020) model time 0.2529 (0.2563) loss 6.4544 (5.9684) grad_norm 2.7642 (1.9144) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][540/625] eta 0:00:21 lr 0.001275 wd 0.0500 time 0.2556 (0.2586) data time 0.0006 (0.0020) model time 0.2549 (0.2563) loss 6.3251 (5.9676) grad_norm 1.7171 (1.9142) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][550/625] eta 0:00:19 lr 0.001275 wd 0.0500 time 0.2564 (0.2586) data time 0.0006 (0.0020) model time 0.2558 (0.2563) loss 5.0274 (5.9752) grad_norm 1.3058 (1.9148) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][560/625] eta 0:00:16 lr 0.001274 wd 0.0500 time 0.2569 (0.2585) data time 0.0008 (0.0020) model time 0.2561 (0.2562) loss 6.1166 (5.9767) grad_norm 1.7078 (1.9086) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][570/625] eta 0:00:14 lr 0.001274 wd 0.0500 time 0.2568 (0.2585) data time 0.0008 (0.0019) model time 0.2560 (0.2562) loss 6.2226 (5.9788) grad_norm 1.2735 (1.9063) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][580/625] eta 0:00:11 lr 0.001274 wd 0.0500 time 0.2568 (0.2584) data time 0.0008 (0.0019) model time 0.2560 (0.2562) loss 7.2256 (5.9834) grad_norm 2.3888 (1.9143) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][590/625] eta 0:00:09 lr 0.001274 wd 0.0500 time 0.2517 (0.2584) data time 0.0013 (0.0019) model time 0.2505 (0.2562) loss 5.7126 (5.9835) grad_norm 2.2865 (1.9128) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][600/625] eta 0:00:06 lr 0.001274 wd 0.0500 time 0.2570 (0.2583) data time 0.0007 (0.0019) model time 0.2563 (0.2562) loss 7.0471 (5.9948) grad_norm 1.3655 (1.9108) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][610/625] eta 0:00:03 lr 0.001274 wd 0.0500 time 0.2515 (0.2583) data time 0.0006 (0.0019) model time 0.2509 (0.2561) loss 6.4725 (5.9914) grad_norm 2.9015 (1.9149) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][620/625] eta 0:00:01 lr 0.001273 wd 0.0500 time 0.2533 (0.2582) data time 0.0004 (0.0019) model time 0.2529 (0.2561) loss 4.9942 (5.9897) grad_norm 1.6202 (1.9143) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 135 training takes 0:02:41 [2024-08-04 03:58:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 03:58:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 03:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6401 (0.6401) Acc@1 88.135 (88.135) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 03:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 1.0244 (0.7979) Acc@1 77.295 (83.709) Acc@5 94.531 (97.026) Mem 9655MB [2024-08-04 03:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1602 (0.9382) Acc@1 74.023 (80.129) Acc@5 93.457 (95.268) Mem 9655MB [2024-08-04 03:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.884 Acc@5 95.282 [2024-08-04 03:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-04 03:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.88% [2024-08-04 03:58:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 03:58:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 03:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.539 (0.539) Loss 0.5850 (0.5850) Acc@1 88.574 (88.574) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 03:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9360 (0.7260) Acc@1 79.150 (84.974) Acc@5 95.312 (97.328) Mem 9655MB [2024-08-04 03:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0781 (0.8610) Acc@1 73.975 (81.299) Acc@5 93.701 (95.731) Mem 9655MB [2024-08-04 03:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.952 Acc@5 95.709 [2024-08-04 03:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-04 03:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.95% [2024-08-04 03:58:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 03:58:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 03:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][0/625] eta 0:07:11 lr 0.001273 wd 0.0500 time 0.6905 (0.6905) data time 0.4511 (0.4511) model time 0.0000 (0.0000) loss 5.0936 (5.0936) grad_norm 1.6012 (1.6012) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][10/625] eta 0:03:01 lr 0.001273 wd 0.0500 time 0.2597 (0.2945) data time 0.0010 (0.0419) model time 0.0000 (0.0000) loss 6.5828 (5.7161) grad_norm 2.3420 (1.8985) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][20/625] eta 0:02:47 lr 0.001273 wd 0.0500 time 0.2573 (0.2761) data time 0.0009 (0.0224) model time 0.0000 (0.0000) loss 6.5789 (5.7121) grad_norm 1.4759 (1.7713) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][30/625] eta 0:02:40 lr 0.001273 wd 0.0500 time 0.2558 (0.2694) data time 0.0008 (0.0155) model time 0.0000 (0.0000) loss 5.9601 (5.7878) grad_norm 2.1772 (1.6834) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][40/625] eta 0:02:35 lr 0.001273 wd 0.0500 time 0.2549 (0.2664) data time 0.0007 (0.0119) model time 0.0000 (0.0000) loss 7.0702 (5.8572) grad_norm 1.5481 (1.6761) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][50/625] eta 0:02:31 lr 0.001273 wd 0.0500 time 0.2553 (0.2643) data time 0.0012 (0.0098) model time 0.0000 (0.0000) loss 6.7642 (5.9118) grad_norm 3.2504 (1.8140) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][60/625] eta 0:02:28 lr 0.001272 wd 0.0500 time 0.2569 (0.2629) data time 0.0007 (0.0083) model time 0.2562 (0.2548) loss 5.9313 (5.9508) grad_norm 1.5596 (1.8717) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][70/625] eta 0:02:25 lr 0.001272 wd 0.0500 time 0.2571 (0.2619) data time 0.0006 (0.0073) model time 0.2565 (0.2549) loss 6.6305 (5.9348) grad_norm 1.5587 (1.8517) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][80/625] eta 0:02:22 lr 0.001272 wd 0.0500 time 0.2566 (0.2611) data time 0.0009 (0.0065) model time 0.2557 (0.2550) loss 6.4315 (5.9285) grad_norm 1.3289 (1.8410) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][90/625] eta 0:02:19 lr 0.001272 wd 0.0500 time 0.2524 (0.2605) data time 0.0010 (0.0059) model time 0.2514 (0.2548) loss 5.8742 (5.9335) grad_norm 4.4229 (1.8642) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][100/625] eta 0:02:16 lr 0.001272 wd 0.0500 time 0.2600 (0.2603) data time 0.0007 (0.0054) model time 0.2593 (0.2553) loss 5.8837 (5.9307) grad_norm 1.2942 (1.8501) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][110/625] eta 0:02:13 lr 0.001271 wd 0.0500 time 0.2534 (0.2598) data time 0.0009 (0.0050) model time 0.2525 (0.2552) loss 6.2975 (5.9818) grad_norm 1.8358 (1.8492) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][120/625] eta 0:02:11 lr 0.001271 wd 0.0500 time 0.2565 (0.2596) data time 0.0007 (0.0046) model time 0.2558 (0.2552) loss 6.8675 (6.0013) grad_norm 1.3042 (1.8354) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][130/625] eta 0:02:08 lr 0.001271 wd 0.0500 time 0.2560 (0.2593) data time 0.0008 (0.0044) model time 0.2551 (0.2553) loss 5.8340 (5.9833) grad_norm 1.3890 (1.8366) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][140/625] eta 0:02:05 lr 0.001271 wd 0.0500 time 0.2587 (0.2591) data time 0.0010 (0.0041) model time 0.2577 (0.2552) loss 5.3136 (5.9903) grad_norm 1.5099 (1.8406) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][150/625] eta 0:02:03 lr 0.001271 wd 0.0500 time 0.2572 (0.2590) data time 0.0008 (0.0039) model time 0.2564 (0.2554) loss 7.0588 (5.9998) grad_norm 1.5637 (1.8432) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][160/625] eta 0:02:00 lr 0.001271 wd 0.0500 time 0.2611 (0.2589) data time 0.0006 (0.0037) model time 0.2605 (0.2554) loss 6.0988 (5.9899) grad_norm 1.0297 (1.8327) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][170/625] eta 0:01:57 lr 0.001270 wd 0.0500 time 0.2531 (0.2586) data time 0.0009 (0.0036) model time 0.2521 (0.2553) loss 6.2636 (5.9901) grad_norm 1.3811 (1.8431) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][180/625] eta 0:01:55 lr 0.001270 wd 0.0500 time 0.2589 (0.2585) data time 0.0008 (0.0034) model time 0.2581 (0.2553) loss 6.8169 (5.9826) grad_norm 1.4043 (1.8454) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][190/625] eta 0:01:52 lr 0.001270 wd 0.0500 time 0.2542 (0.2584) data time 0.0007 (0.0033) model time 0.2534 (0.2554) loss 6.7790 (6.0017) grad_norm 2.7286 (1.8713) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][200/625] eta 0:01:49 lr 0.001270 wd 0.0500 time 0.2725 (0.2584) data time 0.0009 (0.0032) model time 0.2716 (0.2555) loss 4.2544 (5.9825) grad_norm 2.0489 (1.8741) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][210/625] eta 0:01:47 lr 0.001270 wd 0.0500 time 0.2587 (0.2583) data time 0.0008 (0.0031) model time 0.2579 (0.2555) loss 4.6624 (5.9774) grad_norm 1.4366 (1.8858) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][220/625] eta 0:01:44 lr 0.001270 wd 0.0500 time 0.2563 (0.2582) data time 0.0009 (0.0030) model time 0.2554 (0.2554) loss 6.7115 (5.9923) grad_norm 4.3276 (1.9297) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][230/625] eta 0:01:41 lr 0.001269 wd 0.0500 time 0.2588 (0.2581) data time 0.0008 (0.0029) model time 0.2580 (0.2555) loss 5.4186 (5.9857) grad_norm 1.3311 (1.9468) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][240/625] eta 0:01:39 lr 0.001269 wd 0.0500 time 0.2560 (0.2581) data time 0.0009 (0.0028) model time 0.2551 (0.2554) loss 6.6435 (6.0103) grad_norm 1.6067 (1.9463) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][250/625] eta 0:01:36 lr 0.001269 wd 0.0500 time 0.2622 (0.2580) data time 0.0006 (0.0028) model time 0.2615 (0.2554) loss 4.7299 (5.9884) grad_norm 1.5990 (1.9316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][260/625] eta 0:01:34 lr 0.001269 wd 0.0500 time 0.2518 (0.2579) data time 0.0008 (0.0027) model time 0.2510 (0.2554) loss 5.7881 (5.9977) grad_norm 2.3083 (1.9428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][270/625] eta 0:01:31 lr 0.001269 wd 0.0500 time 0.2556 (0.2579) data time 0.0008 (0.0026) model time 0.2548 (0.2554) loss 6.8535 (6.0039) grad_norm 2.9578 (1.9484) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][280/625] eta 0:01:28 lr 0.001269 wd 0.0500 time 0.2529 (0.2578) data time 0.0010 (0.0026) model time 0.2519 (0.2554) loss 5.5756 (6.0095) grad_norm 1.6935 (1.9389) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][290/625] eta 0:01:26 lr 0.001268 wd 0.0500 time 0.2599 (0.2578) data time 0.0005 (0.0025) model time 0.2594 (0.2554) loss 5.4293 (6.0011) grad_norm 2.3240 (1.9483) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][300/625] eta 0:01:23 lr 0.001268 wd 0.0500 time 0.2580 (0.2578) data time 0.0008 (0.0025) model time 0.2572 (0.2555) loss 5.1577 (6.0022) grad_norm 2.1473 (1.9442) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][310/625] eta 0:01:21 lr 0.001268 wd 0.0500 time 0.2566 (0.2577) data time 0.0007 (0.0024) model time 0.2559 (0.2555) loss 6.3818 (6.0118) grad_norm 1.3328 (1.9538) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][320/625] eta 0:01:18 lr 0.001268 wd 0.0500 time 0.2549 (0.2577) data time 0.0010 (0.0024) model time 0.2539 (0.2554) loss 5.1342 (6.0117) grad_norm 1.8146 (1.9457) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][330/625] eta 0:01:15 lr 0.001268 wd 0.0500 time 0.2612 (0.2576) data time 0.0011 (0.0023) model time 0.2601 (0.2554) loss 5.3542 (6.0138) grad_norm 1.6100 (1.9339) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][340/625] eta 0:01:13 lr 0.001268 wd 0.0500 time 0.2544 (0.2576) data time 0.0011 (0.0023) model time 0.2533 (0.2555) loss 6.5665 (6.0308) grad_norm 1.3012 (1.9257) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][350/625] eta 0:01:10 lr 0.001267 wd 0.0500 time 0.2614 (0.2576) data time 0.0009 (0.0022) model time 0.2605 (0.2555) loss 6.2469 (6.0445) grad_norm 1.4019 (1.9228) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][360/625] eta 0:01:08 lr 0.001267 wd 0.0500 time 0.2557 (0.2576) data time 0.0010 (0.0022) model time 0.2547 (0.2555) loss 6.7323 (6.0479) grad_norm 1.8190 (1.9270) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 03:59:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][370/625] eta 0:01:05 lr 0.001267 wd 0.0500 time 0.2581 (0.2575) data time 0.0009 (0.0022) model time 0.2572 (0.2555) loss 6.4720 (6.0464) grad_norm 3.0939 (1.9395) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][380/625] eta 0:01:03 lr 0.001267 wd 0.0500 time 0.2611 (0.2575) data time 0.0011 (0.0021) model time 0.2599 (0.2555) loss 6.4764 (6.0424) grad_norm 2.4402 (1.9349) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][390/625] eta 0:01:00 lr 0.001267 wd 0.0500 time 0.2594 (0.2575) data time 0.0011 (0.0021) model time 0.2583 (0.2556) loss 5.7597 (6.0440) grad_norm 2.2655 (1.9412) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][400/625] eta 0:00:57 lr 0.001267 wd 0.0500 time 0.2555 (0.2575) data time 0.0011 (0.0021) model time 0.2544 (0.2555) loss 5.6435 (6.0431) grad_norm 2.1309 (1.9441) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:00:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][410/625] eta 0:00:55 lr 0.001266 wd 0.0500 time 0.2537 (0.2575) data time 0.0015 (0.0021) model time 0.2522 (0.2555) loss 6.2056 (6.0471) grad_norm 1.5097 (1.9411) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:00:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][420/625] eta 0:00:52 lr 0.001266 wd 0.0500 time 0.2568 (0.2579) data time 0.0010 (0.0020) model time 0.2559 (0.2560) loss 6.7243 (6.0471) grad_norm 1.1740 (1.9310) loss_scale 2048.0000 (1031.2969) mem 9655MB [2024-08-04 04:00:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][430/625] eta 0:00:50 lr 0.001266 wd 0.0500 time 0.2525 (0.2578) data time 0.0007 (0.0020) model time 0.2518 (0.2560) loss 4.7822 (6.0473) grad_norm 1.5631 (1.9248) loss_scale 2048.0000 (1054.8863) mem 9655MB [2024-08-04 04:00:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][440/625] eta 0:00:47 lr 0.001266 wd 0.0500 time 0.2550 (0.2578) data time 0.0010 (0.0020) model time 0.2540 (0.2560) loss 6.7981 (6.0426) grad_norm 1.5087 (1.9237) loss_scale 2048.0000 (1077.4059) mem 9655MB [2024-08-04 04:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][450/625] eta 0:00:45 lr 0.001266 wd 0.0500 time 0.2600 (0.2578) data time 0.0008 (0.0020) model time 0.2592 (0.2560) loss 6.9123 (6.0474) grad_norm 1.6301 (1.9190) loss_scale 2048.0000 (1098.9268) mem 9655MB [2024-08-04 04:00:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][460/625] eta 0:00:42 lr 0.001265 wd 0.0500 time 0.2559 (0.2577) data time 0.0009 (0.0019) model time 0.2550 (0.2560) loss 6.5648 (6.0497) grad_norm 1.4028 (1.9136) loss_scale 2048.0000 (1119.5141) mem 9655MB [2024-08-04 04:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][470/625] eta 0:00:39 lr 0.001265 wd 0.0500 time 0.2544 (0.2577) data time 0.0007 (0.0019) model time 0.2538 (0.2559) loss 6.3878 (6.0519) grad_norm 1.7291 (1.9081) loss_scale 2048.0000 (1139.2272) mem 9655MB [2024-08-04 04:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][480/625] eta 0:00:37 lr 0.001265 wd 0.0500 time 0.2569 (0.2577) data time 0.0011 (0.0019) model time 0.2558 (0.2559) loss 5.6848 (6.0537) grad_norm 1.9469 (1.9101) loss_scale 2048.0000 (1158.1206) mem 9655MB [2024-08-04 04:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][490/625] eta 0:00:34 lr 0.001265 wd 0.0500 time 0.2518 (0.2576) data time 0.0007 (0.0019) model time 0.2511 (0.2559) loss 6.4549 (6.0636) grad_norm 1.1925 (1.9095) loss_scale 2048.0000 (1176.2444) mem 9655MB [2024-08-04 04:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][500/625] eta 0:00:32 lr 0.001265 wd 0.0500 time 0.2501 (0.2579) data time 0.0007 (0.0019) model time 0.2494 (0.2562) loss 7.0268 (6.0679) grad_norm 2.9741 (1.9397) loss_scale 2048.0000 (1193.6447) mem 9655MB [2024-08-04 04:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][510/625] eta 0:00:29 lr 0.001265 wd 0.0500 time 0.2552 (0.2579) data time 0.0008 (0.0019) model time 0.2544 (0.2562) loss 5.7511 (6.0688) grad_norm 1.3308 (1.9374) loss_scale 2048.0000 (1210.3640) mem 9655MB [2024-08-04 04:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][520/625] eta 0:00:27 lr 0.001264 wd 0.0500 time 0.2545 (0.2578) data time 0.0009 (0.0018) model time 0.2536 (0.2562) loss 5.1067 (6.0639) grad_norm 2.2144 (1.9371) loss_scale 2048.0000 (1226.4415) mem 9655MB [2024-08-04 04:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][530/625] eta 0:00:24 lr 0.001264 wd 0.0500 time 0.2568 (0.2578) data time 0.0009 (0.0018) model time 0.2559 (0.2562) loss 6.8043 (6.0652) grad_norm 1.8520 (1.9420) loss_scale 2048.0000 (1241.9134) mem 9655MB [2024-08-04 04:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][540/625] eta 0:00:21 lr 0.001264 wd 0.0500 time 0.2571 (0.2578) data time 0.0009 (0.0018) model time 0.2562 (0.2562) loss 6.0628 (6.0558) grad_norm 1.1873 (1.9396) loss_scale 2048.0000 (1256.8133) mem 9655MB [2024-08-04 04:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][550/625] eta 0:00:19 lr 0.001264 wd 0.0500 time 0.2579 (0.2578) data time 0.0007 (0.0018) model time 0.2573 (0.2562) loss 7.3559 (6.0606) grad_norm 1.3841 (1.9381) loss_scale 2048.0000 (1271.1724) mem 9655MB [2024-08-04 04:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][560/625] eta 0:00:16 lr 0.001264 wd 0.0500 time 0.2555 (0.2578) data time 0.0007 (0.0018) model time 0.2548 (0.2562) loss 5.7229 (6.0596) grad_norm 3.4422 (1.9434) loss_scale 2048.0000 (1285.0196) mem 9655MB [2024-08-04 04:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][570/625] eta 0:00:14 lr 0.001264 wd 0.0500 time 0.2571 (0.2578) data time 0.0010 (0.0018) model time 0.2561 (0.2562) loss 5.6960 (6.0569) grad_norm 1.7485 (1.9512) loss_scale 2048.0000 (1298.3818) mem 9655MB [2024-08-04 04:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][580/625] eta 0:00:11 lr 0.001263 wd 0.0500 time 0.2585 (0.2578) data time 0.0010 (0.0017) model time 0.2575 (0.2562) loss 5.2361 (6.0532) grad_norm 1.6863 (1.9459) loss_scale 2048.0000 (1311.2840) mem 9655MB [2024-08-04 04:00:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][590/625] eta 0:00:09 lr 0.001263 wd 0.0500 time 0.2564 (0.2578) data time 0.0007 (0.0017) model time 0.2556 (0.2562) loss 6.2474 (6.0557) grad_norm 1.8012 (1.9448) loss_scale 2048.0000 (1323.7496) mem 9655MB [2024-08-04 04:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][600/625] eta 0:00:06 lr 0.001263 wd 0.0500 time 0.2581 (0.2578) data time 0.0008 (0.0017) model time 0.2574 (0.2563) loss 6.8430 (6.0555) grad_norm 1.9657 (1.9465) loss_scale 2048.0000 (1335.8003) mem 9655MB [2024-08-04 04:01:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][610/625] eta 0:00:03 lr 0.001263 wd 0.0500 time 0.2527 (0.2582) data time 0.0006 (0.0017) model time 0.2521 (0.2566) loss 7.0830 (6.0575) grad_norm 1.6073 (1.9440) loss_scale 2048.0000 (1347.4566) mem 9655MB [2024-08-04 04:01:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][620/625] eta 0:00:01 lr 0.001263 wd 0.0500 time 0.2602 (0.2581) data time 0.0005 (0.0017) model time 0.2597 (0.2566) loss 6.4690 (6.0546) grad_norm 2.1885 (1.9457) loss_scale 2048.0000 (1358.7375) mem 9655MB [2024-08-04 04:01:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 136 training takes 0:02:41 [2024-08-04 04:01:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:01:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.411 (0.411) Loss 0.6533 (0.6533) Acc@1 88.477 (88.477) Acc@5 97.998 (97.998) Mem 9655MB [2024-08-04 04:01:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.091) Loss 1.0449 (0.8100) Acc@1 76.123 (83.807) Acc@5 94.824 (96.928) Mem 9655MB [2024-08-04 04:01:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.1826 (0.9589) Acc@1 73.438 (79.983) Acc@5 93.408 (95.233) Mem 9655MB [2024-08-04 04:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.724 Acc@5 95.204 [2024-08-04 04:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-04 04:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.725 (0.725) Loss 0.5845 (0.5845) Acc@1 88.574 (88.574) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9360 (0.7262) Acc@1 79.199 (84.983) Acc@5 95.264 (97.337) Mem 9655MB [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0781 (0.8611) Acc@1 74.219 (81.348) Acc@5 93.799 (95.754) Mem 9655MB [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.006 Acc@5 95.745 [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.01% [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:01:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:01:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][0/625] eta 0:07:11 lr 0.001263 wd 0.0500 time 0.6901 (0.6901) data time 0.4392 (0.4392) model time 0.0000 (0.0000) loss 5.1728 (5.1728) grad_norm 1.7351 (1.7351) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][10/625] eta 0:03:01 lr 0.001262 wd 0.0500 time 0.2548 (0.2956) data time 0.0009 (0.0408) model time 0.0000 (0.0000) loss 5.3925 (5.7138) grad_norm 1.8437 (2.1404) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][20/625] eta 0:02:47 lr 0.001262 wd 0.0500 time 0.2581 (0.2769) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 5.0113 (5.5328) grad_norm 1.4274 (2.1144) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][30/625] eta 0:02:40 lr 0.001262 wd 0.0500 time 0.2522 (0.2699) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 6.1258 (5.6655) grad_norm 2.1456 (2.1290) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][40/625] eta 0:02:35 lr 0.001262 wd 0.0500 time 0.2558 (0.2666) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 6.2688 (5.7139) grad_norm 1.2648 (2.2429) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][50/625] eta 0:02:32 lr 0.001262 wd 0.0500 time 0.2574 (0.2650) data time 0.0010 (0.0095) model time 0.0000 (0.0000) loss 6.7070 (5.7015) grad_norm 2.4305 (2.3005) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][60/625] eta 0:02:28 lr 0.001262 wd 0.0500 time 0.2574 (0.2636) data time 0.0011 (0.0081) model time 0.2562 (0.2552) loss 5.4828 (5.7965) grad_norm 1.6964 (2.1764) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][70/625] eta 0:02:25 lr 0.001261 wd 0.0500 time 0.2581 (0.2625) data time 0.0008 (0.0071) model time 0.2573 (0.2552) loss 6.6992 (5.8030) grad_norm 1.9803 (2.1109) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][80/625] eta 0:02:22 lr 0.001261 wd 0.0500 time 0.2541 (0.2616) data time 0.0009 (0.0064) model time 0.2532 (0.2549) loss 5.5381 (5.8184) grad_norm 1.6557 (2.0585) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][90/625] eta 0:02:19 lr 0.001261 wd 0.0500 time 0.2526 (0.2610) data time 0.0007 (0.0058) model time 0.2520 (0.2549) loss 6.2016 (5.8549) grad_norm 2.8996 (2.0672) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][100/625] eta 0:02:16 lr 0.001261 wd 0.0500 time 0.2570 (0.2604) data time 0.0006 (0.0053) model time 0.2564 (0.2548) loss 6.1983 (5.9158) grad_norm 1.6623 (2.0488) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][110/625] eta 0:02:13 lr 0.001261 wd 0.0500 time 0.2513 (0.2601) data time 0.0007 (0.0049) model time 0.2505 (0.2550) loss 5.7895 (5.8816) grad_norm 3.0142 (2.0629) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][120/625] eta 0:02:11 lr 0.001261 wd 0.0500 time 0.2568 (0.2598) data time 0.0008 (0.0045) model time 0.2560 (0.2551) loss 6.0188 (5.8898) grad_norm 1.2953 (2.0753) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][130/625] eta 0:02:08 lr 0.001260 wd 0.0500 time 0.2562 (0.2595) data time 0.0007 (0.0043) model time 0.2555 (0.2551) loss 6.8162 (5.9118) grad_norm 1.6562 (2.0654) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][140/625] eta 0:02:05 lr 0.001260 wd 0.0500 time 0.2586 (0.2593) data time 0.0008 (0.0040) model time 0.2578 (0.2551) loss 4.9975 (5.9192) grad_norm 1.9350 (2.0407) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][150/625] eta 0:02:03 lr 0.001260 wd 0.0500 time 0.2548 (0.2591) data time 0.0010 (0.0038) model time 0.2538 (0.2552) loss 7.3014 (5.9167) grad_norm 2.8019 (2.0562) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][160/625] eta 0:02:00 lr 0.001260 wd 0.0500 time 0.2576 (0.2589) data time 0.0006 (0.0037) model time 0.2570 (0.2552) loss 5.4014 (5.9321) grad_norm 2.0675 (2.1355) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][170/625] eta 0:01:57 lr 0.001260 wd 0.0500 time 0.2558 (0.2587) data time 0.0007 (0.0035) model time 0.2551 (0.2551) loss 6.7590 (5.9429) grad_norm 2.9333 (2.1393) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][180/625] eta 0:01:55 lr 0.001260 wd 0.0500 time 0.2543 (0.2585) data time 0.0009 (0.0034) model time 0.2534 (0.2550) loss 6.4177 (5.9488) grad_norm 1.4260 (2.1387) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][190/625] eta 0:01:52 lr 0.001259 wd 0.0500 time 0.2582 (0.2584) data time 0.0009 (0.0032) model time 0.2573 (0.2551) loss 6.3048 (5.9362) grad_norm 1.3397 (2.1282) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][200/625] eta 0:01:49 lr 0.001259 wd 0.0500 time 0.2562 (0.2584) data time 0.0007 (0.0031) model time 0.2556 (0.2552) loss 4.4343 (5.9471) grad_norm 1.2373 (2.0916) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][210/625] eta 0:01:47 lr 0.001259 wd 0.0500 time 0.2551 (0.2583) data time 0.0008 (0.0030) model time 0.2542 (0.2552) loss 6.0909 (5.9557) grad_norm 1.3220 (2.0881) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][220/625] eta 0:01:44 lr 0.001259 wd 0.0500 time 0.2575 (0.2582) data time 0.0008 (0.0029) model time 0.2567 (0.2552) loss 5.7884 (5.9575) grad_norm 2.2400 (2.0988) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][230/625] eta 0:01:41 lr 0.001259 wd 0.0500 time 0.2578 (0.2580) data time 0.0009 (0.0028) model time 0.2569 (0.2551) loss 6.4316 (5.9616) grad_norm 2.1755 (2.0961) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][240/625] eta 0:01:39 lr 0.001259 wd 0.0500 time 0.2532 (0.2579) data time 0.0008 (0.0028) model time 0.2524 (0.2551) loss 5.2251 (5.9627) grad_norm 1.3846 (2.0667) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][250/625] eta 0:01:36 lr 0.001258 wd 0.0500 time 0.2509 (0.2578) data time 0.0010 (0.0027) model time 0.2499 (0.2550) loss 6.5518 (5.9606) grad_norm 1.8819 (2.0725) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][260/625] eta 0:01:34 lr 0.001258 wd 0.0500 time 0.2554 (0.2578) data time 0.0007 (0.0026) model time 0.2547 (0.2551) loss 6.2051 (5.9670) grad_norm 2.7684 (2.0946) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][270/625] eta 0:01:31 lr 0.001258 wd 0.0500 time 0.2567 (0.2577) data time 0.0010 (0.0026) model time 0.2556 (0.2551) loss 6.0633 (5.9606) grad_norm 1.3997 (2.0995) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][280/625] eta 0:01:28 lr 0.001258 wd 0.0500 time 0.2649 (0.2577) data time 0.0011 (0.0025) model time 0.2639 (0.2551) loss 5.5415 (5.9688) grad_norm 1.6014 (2.0938) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][290/625] eta 0:01:26 lr 0.001258 wd 0.0500 time 0.2559 (0.2577) data time 0.0011 (0.0025) model time 0.2548 (0.2551) loss 6.1064 (5.9700) grad_norm 1.7723 (2.1001) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][300/625] eta 0:01:23 lr 0.001257 wd 0.0500 time 0.2561 (0.2576) data time 0.0014 (0.0024) model time 0.2547 (0.2551) loss 5.1501 (5.9583) grad_norm 2.0463 (2.0922) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][310/625] eta 0:01:21 lr 0.001257 wd 0.0500 time 0.2594 (0.2576) data time 0.0007 (0.0024) model time 0.2587 (0.2551) loss 5.5566 (5.9626) grad_norm 1.3452 (2.0739) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][320/625] eta 0:01:18 lr 0.001257 wd 0.0500 time 0.2562 (0.2575) data time 0.0008 (0.0023) model time 0.2555 (0.2551) loss 5.4548 (5.9614) grad_norm 1.5573 (2.0678) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][330/625] eta 0:01:15 lr 0.001257 wd 0.0500 time 0.2578 (0.2575) data time 0.0010 (0.0023) model time 0.2568 (0.2551) loss 6.1177 (5.9662) grad_norm 1.4019 (2.0549) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][340/625] eta 0:01:13 lr 0.001257 wd 0.0500 time 0.2527 (0.2575) data time 0.0011 (0.0023) model time 0.2516 (0.2552) loss 6.1126 (5.9699) grad_norm 1.1242 (2.0491) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][350/625] eta 0:01:10 lr 0.001257 wd 0.0500 time 0.2593 (0.2575) data time 0.0008 (0.0022) model time 0.2585 (0.2552) loss 5.2454 (5.9555) grad_norm 2.2738 (2.0468) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][360/625] eta 0:01:08 lr 0.001256 wd 0.0500 time 0.2592 (0.2575) data time 0.0007 (0.0022) model time 0.2585 (0.2552) loss 7.2540 (5.9676) grad_norm 2.2511 (2.0539) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][370/625] eta 0:01:05 lr 0.001256 wd 0.0500 time 0.2521 (0.2574) data time 0.0008 (0.0022) model time 0.2513 (0.2552) loss 6.4714 (5.9721) grad_norm 1.9689 (2.0467) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][380/625] eta 0:01:03 lr 0.001256 wd 0.0500 time 0.2561 (0.2574) data time 0.0008 (0.0021) model time 0.2553 (0.2552) loss 6.7964 (5.9760) grad_norm 1.2913 (2.0506) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][390/625] eta 0:01:00 lr 0.001256 wd 0.0500 time 0.2573 (0.2578) data time 0.0009 (0.0021) model time 0.2563 (0.2558) loss 6.9740 (5.9846) grad_norm 1.3110 (2.0397) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][400/625] eta 0:00:58 lr 0.001256 wd 0.0500 time 0.2582 (0.2578) data time 0.0007 (0.0021) model time 0.2575 (0.2558) loss 7.0239 (6.0044) grad_norm 1.3679 (2.0394) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][410/625] eta 0:00:55 lr 0.001256 wd 0.0500 time 0.2555 (0.2578) data time 0.0007 (0.0020) model time 0.2547 (0.2558) loss 6.6615 (6.0078) grad_norm 1.5102 (2.0327) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][420/625] eta 0:00:52 lr 0.001255 wd 0.0500 time 0.2688 (0.2577) data time 0.0006 (0.0020) model time 0.2682 (0.2558) loss 4.8202 (6.0126) grad_norm 1.2985 (2.0320) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][430/625] eta 0:00:50 lr 0.001255 wd 0.0500 time 0.2590 (0.2577) data time 0.0008 (0.0020) model time 0.2582 (0.2558) loss 5.3813 (6.0103) grad_norm 2.9311 (2.0315) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][440/625] eta 0:00:47 lr 0.001255 wd 0.0500 time 0.2562 (0.2577) data time 0.0010 (0.0020) model time 0.2552 (0.2558) loss 6.6702 (6.0105) grad_norm 2.6507 (2.0415) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][450/625] eta 0:00:45 lr 0.001255 wd 0.0500 time 0.2535 (0.2577) data time 0.0009 (0.0019) model time 0.2526 (0.2557) loss 6.1164 (6.0099) grad_norm 2.3691 (2.0398) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][460/625] eta 0:00:42 lr 0.001255 wd 0.0500 time 0.2770 (0.2577) data time 0.0007 (0.0019) model time 0.2763 (0.2558) loss 4.9033 (6.0143) grad_norm 1.9408 (2.0364) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][470/625] eta 0:00:39 lr 0.001255 wd 0.0500 time 0.2540 (0.2576) data time 0.0007 (0.0019) model time 0.2532 (0.2558) loss 6.6411 (6.0227) grad_norm 1.8780 (2.0252) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][480/625] eta 0:00:37 lr 0.001254 wd 0.0500 time 0.2543 (0.2576) data time 0.0009 (0.0019) model time 0.2535 (0.2558) loss 6.4389 (6.0280) grad_norm 1.5873 (2.0231) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][490/625] eta 0:00:34 lr 0.001254 wd 0.0500 time 0.2553 (0.2576) data time 0.0008 (0.0019) model time 0.2545 (0.2558) loss 5.0659 (6.0168) grad_norm 1.1057 (2.0194) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][500/625] eta 0:00:32 lr 0.001254 wd 0.0500 time 0.2547 (0.2576) data time 0.0007 (0.0018) model time 0.2540 (0.2558) loss 6.7909 (6.0109) grad_norm 1.8072 (2.0120) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][510/625] eta 0:00:29 lr 0.001254 wd 0.0500 time 0.2535 (0.2576) data time 0.0008 (0.0018) model time 0.2527 (0.2558) loss 6.6769 (6.0149) grad_norm 1.6113 (2.0055) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][520/625] eta 0:00:27 lr 0.001254 wd 0.0500 time 0.2554 (0.2575) data time 0.0008 (0.0018) model time 0.2546 (0.2557) loss 6.5933 (6.0135) grad_norm 2.6148 (2.0104) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][530/625] eta 0:00:24 lr 0.001254 wd 0.0500 time 0.2512 (0.2575) data time 0.0010 (0.0018) model time 0.2501 (0.2557) loss 6.4268 (6.0154) grad_norm 1.1959 (2.0088) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][540/625] eta 0:00:21 lr 0.001253 wd 0.0500 time 0.2546 (0.2575) data time 0.0010 (0.0018) model time 0.2536 (0.2557) loss 6.6102 (6.0205) grad_norm 1.7582 (2.0031) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][550/625] eta 0:00:19 lr 0.001253 wd 0.0500 time 0.2569 (0.2574) data time 0.0008 (0.0017) model time 0.2561 (0.2557) loss 6.9776 (6.0248) grad_norm 2.0790 (2.0109) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][560/625] eta 0:00:16 lr 0.001253 wd 0.0500 time 0.2561 (0.2574) data time 0.0008 (0.0017) model time 0.2553 (0.2557) loss 5.2872 (6.0203) grad_norm 1.9928 (2.0084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][570/625] eta 0:00:14 lr 0.001253 wd 0.0500 time 0.2530 (0.2574) data time 0.0011 (0.0017) model time 0.2519 (0.2557) loss 6.7068 (6.0266) grad_norm 1.2902 (2.0083) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][580/625] eta 0:00:11 lr 0.001253 wd 0.0500 time 0.2582 (0.2574) data time 0.0008 (0.0017) model time 0.2573 (0.2557) loss 4.9287 (6.0254) grad_norm 1.5164 (2.0093) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][590/625] eta 0:00:09 lr 0.001253 wd 0.0500 time 0.2621 (0.2573) data time 0.0009 (0.0017) model time 0.2612 (0.2557) loss 4.9890 (6.0224) grad_norm 1.4775 (2.0038) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][600/625] eta 0:00:06 lr 0.001252 wd 0.0500 time 0.2552 (0.2573) data time 0.0009 (0.0017) model time 0.2543 (0.2557) loss 5.7119 (6.0195) grad_norm 1.9651 (1.9972) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][610/625] eta 0:00:03 lr 0.001252 wd 0.0500 time 0.2542 (0.2573) data time 0.0004 (0.0017) model time 0.2538 (0.2556) loss 5.0408 (6.0190) grad_norm 3.5050 (1.9976) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][620/625] eta 0:00:01 lr 0.001252 wd 0.0500 time 0.2541 (0.2572) data time 0.0006 (0.0017) model time 0.2536 (0.2556) loss 6.3984 (6.0193) grad_norm 1.9458 (1.9952) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 137 training takes 0:02:40 [2024-08-04 04:03:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:03:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:03:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.6782 (0.6782) Acc@1 87.549 (87.549) Acc@5 97.998 (97.998) Mem 9655MB [2024-08-04 04:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0332 (0.8020) Acc@1 76.270 (83.802) Acc@5 94.971 (96.968) Mem 9655MB [2024-08-04 04:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1982 (0.9495) Acc@1 72.119 (80.166) Acc@5 92.871 (95.308) Mem 9655MB [2024-08-04 04:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.828 Acc@5 95.242 [2024-08-04 04:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-04 04:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.755 (0.755) Loss 0.5850 (0.5850) Acc@1 88.525 (88.525) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.9370 (0.7266) Acc@1 79.248 (84.996) Acc@5 95.410 (97.377) Mem 9655MB [2024-08-04 04:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0771 (0.8610) Acc@1 74.170 (81.352) Acc@5 93.750 (95.785) Mem 9655MB [2024-08-04 04:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.030 Acc@5 95.781 [2024-08-04 04:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-04 04:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.03% [2024-08-04 04:03:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:03:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][0/625] eta 0:07:24 lr 0.001252 wd 0.0500 time 0.7116 (0.7116) data time 0.4616 (0.4616) model time 0.0000 (0.0000) loss 5.2409 (5.2409) grad_norm 1.3321 (1.3321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][10/625] eta 0:03:11 lr 0.001252 wd 0.0500 time 0.2540 (0.3119) data time 0.0010 (0.0428) model time 0.0000 (0.0000) loss 6.1173 (5.9091) grad_norm 1.3326 (1.3723) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][20/625] eta 0:02:52 lr 0.001252 wd 0.0500 time 0.2577 (0.2852) data time 0.0009 (0.0229) model time 0.0000 (0.0000) loss 5.2469 (5.8133) grad_norm 2.7640 (1.6687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][30/625] eta 0:02:44 lr 0.001251 wd 0.0500 time 0.2559 (0.2758) data time 0.0008 (0.0158) model time 0.0000 (0.0000) loss 6.0669 (5.8577) grad_norm 1.4021 (1.8049) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][40/625] eta 0:02:38 lr 0.001251 wd 0.0500 time 0.2560 (0.2709) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 6.7726 (5.8968) grad_norm 2.6709 (1.8056) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][50/625] eta 0:02:34 lr 0.001251 wd 0.0500 time 0.2553 (0.2681) data time 0.0006 (0.0100) model time 0.0000 (0.0000) loss 6.1334 (5.9118) grad_norm 1.7191 (1.8025) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][60/625] eta 0:02:31 lr 0.001251 wd 0.0500 time 0.4090 (0.2685) data time 0.0008 (0.0085) model time 0.4082 (0.2702) loss 4.6424 (5.8821) grad_norm 1.1436 (1.8044) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][70/625] eta 0:02:28 lr 0.001251 wd 0.0500 time 0.2569 (0.2670) data time 0.0006 (0.0074) model time 0.2563 (0.2634) loss 5.7389 (5.9023) grad_norm 2.8568 (1.8309) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][80/625] eta 0:02:24 lr 0.001251 wd 0.0500 time 0.2572 (0.2657) data time 0.0010 (0.0066) model time 0.2562 (0.2607) loss 6.2396 (5.8912) grad_norm 1.8379 (1.8379) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][90/625] eta 0:02:21 lr 0.001250 wd 0.0500 time 0.2616 (0.2648) data time 0.0011 (0.0060) model time 0.2605 (0.2597) loss 6.8793 (5.8818) grad_norm 4.0414 (1.9608) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][100/625] eta 0:02:18 lr 0.001250 wd 0.0500 time 0.2594 (0.2640) data time 0.0007 (0.0055) model time 0.2588 (0.2588) loss 4.7870 (5.9376) grad_norm 1.2476 (1.9970) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][110/625] eta 0:02:15 lr 0.001250 wd 0.0500 time 0.2593 (0.2632) data time 0.0008 (0.0051) model time 0.2585 (0.2581) loss 6.2719 (5.9640) grad_norm 1.3034 (1.9597) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][120/625] eta 0:02:12 lr 0.001250 wd 0.0500 time 0.2594 (0.2626) data time 0.0008 (0.0048) model time 0.2586 (0.2577) loss 6.0135 (5.9804) grad_norm 1.6801 (1.9363) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][130/625] eta 0:02:09 lr 0.001250 wd 0.0500 time 0.2515 (0.2621) data time 0.0008 (0.0045) model time 0.2507 (0.2574) loss 5.2468 (5.9883) grad_norm 1.2133 (1.8915) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][140/625] eta 0:02:06 lr 0.001249 wd 0.0500 time 0.2522 (0.2618) data time 0.0008 (0.0042) model time 0.2514 (0.2573) loss 6.4112 (5.9929) grad_norm 1.9259 (1.8716) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][150/625] eta 0:02:04 lr 0.001249 wd 0.0500 time 0.2528 (0.2614) data time 0.0012 (0.0041) model time 0.2515 (0.2569) loss 6.7657 (6.0069) grad_norm 1.4540 (1.8604) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][160/625] eta 0:02:01 lr 0.001249 wd 0.0500 time 0.2542 (0.2612) data time 0.0014 (0.0039) model time 0.2528 (0.2569) loss 6.8518 (6.0175) grad_norm 1.0997 (1.8873) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][170/625] eta 0:01:58 lr 0.001249 wd 0.0500 time 0.2520 (0.2609) data time 0.0009 (0.0037) model time 0.2511 (0.2568) loss 5.6581 (6.0350) grad_norm 2.1139 (1.8738) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][180/625] eta 0:01:56 lr 0.001249 wd 0.0500 time 0.2606 (0.2607) data time 0.0010 (0.0036) model time 0.2596 (0.2568) loss 5.7488 (6.0239) grad_norm 1.4802 (1.8598) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][190/625] eta 0:01:53 lr 0.001249 wd 0.0500 time 0.2545 (0.2605) data time 0.0009 (0.0034) model time 0.2536 (0.2567) loss 7.2675 (6.0329) grad_norm 1.1829 (1.8531) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][200/625] eta 0:01:50 lr 0.001248 wd 0.0500 time 0.2599 (0.2603) data time 0.0007 (0.0033) model time 0.2592 (0.2567) loss 6.7876 (6.0377) grad_norm 2.6930 (1.8579) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][210/625] eta 0:01:47 lr 0.001248 wd 0.0500 time 0.2558 (0.2601) data time 0.0006 (0.0032) model time 0.2552 (0.2566) loss 6.4641 (6.0387) grad_norm 1.9629 (1.8443) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][220/625] eta 0:01:45 lr 0.001248 wd 0.0500 time 0.2538 (0.2599) data time 0.0007 (0.0031) model time 0.2531 (0.2565) loss 5.0416 (6.0337) grad_norm 1.3615 (1.8308) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][230/625] eta 0:01:42 lr 0.001248 wd 0.0500 time 0.2569 (0.2598) data time 0.0008 (0.0030) model time 0.2561 (0.2564) loss 4.8724 (6.0323) grad_norm 1.8966 (1.8412) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][240/625] eta 0:01:39 lr 0.001248 wd 0.0500 time 0.2605 (0.2596) data time 0.0006 (0.0029) model time 0.2598 (0.2563) loss 4.8220 (6.0156) grad_norm 1.7424 (1.8719) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][250/625] eta 0:01:37 lr 0.001248 wd 0.0500 time 0.2552 (0.2594) data time 0.0007 (0.0028) model time 0.2544 (0.2563) loss 6.7996 (6.0214) grad_norm 1.9827 (1.8823) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][260/625] eta 0:01:34 lr 0.001247 wd 0.0500 time 0.2550 (0.2593) data time 0.0009 (0.0027) model time 0.2541 (0.2562) loss 6.1163 (6.0261) grad_norm 2.2070 (1.8860) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][270/625] eta 0:01:32 lr 0.001247 wd 0.0500 time 0.2549 (0.2592) data time 0.0011 (0.0027) model time 0.2539 (0.2561) loss 5.5882 (6.0294) grad_norm 2.2113 (1.9003) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][280/625] eta 0:01:29 lr 0.001247 wd 0.0500 time 0.2545 (0.2591) data time 0.0012 (0.0026) model time 0.2533 (0.2561) loss 4.4706 (6.0396) grad_norm 2.0443 (1.9065) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][290/625] eta 0:01:26 lr 0.001247 wd 0.0500 time 0.2585 (0.2590) data time 0.0007 (0.0026) model time 0.2578 (0.2561) loss 6.3591 (6.0309) grad_norm 1.3077 (1.8972) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][300/625] eta 0:01:24 lr 0.001247 wd 0.0500 time 0.2610 (0.2590) data time 0.0006 (0.0025) model time 0.2604 (0.2562) loss 6.9875 (6.0366) grad_norm 2.3685 (1.8874) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][310/625] eta 0:01:21 lr 0.001247 wd 0.0500 time 0.2539 (0.2589) data time 0.0009 (0.0024) model time 0.2530 (0.2561) loss 6.9205 (6.0334) grad_norm 1.2130 (1.8757) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][320/625] eta 0:01:18 lr 0.001246 wd 0.0500 time 0.2541 (0.2589) data time 0.0010 (0.0024) model time 0.2531 (0.2562) loss 7.0610 (6.0310) grad_norm 2.2001 (1.8702) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][330/625] eta 0:01:16 lr 0.001246 wd 0.0500 time 0.2569 (0.2588) data time 0.0006 (0.0024) model time 0.2563 (0.2562) loss 6.0058 (6.0317) grad_norm 2.5137 (1.8712) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][340/625] eta 0:01:13 lr 0.001246 wd 0.0500 time 0.2595 (0.2588) data time 0.0010 (0.0023) model time 0.2585 (0.2563) loss 6.8038 (6.0340) grad_norm 2.7454 (1.8956) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][350/625] eta 0:01:11 lr 0.001246 wd 0.0500 time 0.2566 (0.2588) data time 0.0006 (0.0023) model time 0.2560 (0.2562) loss 6.1562 (6.0260) grad_norm 2.2746 (1.9002) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][360/625] eta 0:01:08 lr 0.001246 wd 0.0500 time 0.2606 (0.2588) data time 0.0009 (0.0022) model time 0.2597 (0.2563) loss 6.2256 (6.0342) grad_norm 1.6559 (1.8927) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][370/625] eta 0:01:06 lr 0.001246 wd 0.0500 time 0.4469 (0.2593) data time 0.0006 (0.0022) model time 0.4464 (0.2569) loss 4.3779 (6.0306) grad_norm 1.9784 (1.8956) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][380/625] eta 0:01:03 lr 0.001245 wd 0.0500 time 0.2632 (0.2592) data time 0.0006 (0.0022) model time 0.2626 (0.2568) loss 4.9764 (6.0296) grad_norm 1.5565 (1.9071) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][390/625] eta 0:01:00 lr 0.001245 wd 0.0500 time 0.2545 (0.2591) data time 0.0008 (0.0021) model time 0.2537 (0.2568) loss 5.9519 (6.0368) grad_norm 2.0644 (1.9006) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][400/625] eta 0:00:58 lr 0.001245 wd 0.0500 time 0.2566 (0.2590) data time 0.0007 (0.0021) model time 0.2559 (0.2567) loss 4.8482 (6.0314) grad_norm 1.2551 (1.9001) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][410/625] eta 0:00:55 lr 0.001245 wd 0.0500 time 0.2557 (0.2590) data time 0.0009 (0.0021) model time 0.2548 (0.2567) loss 5.3480 (6.0328) grad_norm 2.8078 (1.9044) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][420/625] eta 0:00:53 lr 0.001245 wd 0.0500 time 0.2573 (0.2589) data time 0.0008 (0.0020) model time 0.2565 (0.2567) loss 6.6812 (6.0299) grad_norm 2.0745 (1.9031) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][430/625] eta 0:00:50 lr 0.001244 wd 0.0500 time 0.2578 (0.2588) data time 0.0008 (0.0020) model time 0.2571 (0.2566) loss 5.5679 (6.0281) grad_norm 1.8439 (1.9081) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][440/625] eta 0:00:47 lr 0.001244 wd 0.0500 time 0.2546 (0.2588) data time 0.0013 (0.0020) model time 0.2533 (0.2566) loss 6.5744 (6.0252) grad_norm 1.6685 (1.9126) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][450/625] eta 0:00:45 lr 0.001244 wd 0.0500 time 0.2547 (0.2587) data time 0.0008 (0.0020) model time 0.2539 (0.2566) loss 6.6429 (6.0296) grad_norm 1.4035 (1.9179) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][460/625] eta 0:00:42 lr 0.001244 wd 0.0500 time 0.2599 (0.2587) data time 0.0006 (0.0019) model time 0.2593 (0.2566) loss 4.9403 (6.0227) grad_norm 2.4391 (1.9138) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][470/625] eta 0:00:40 lr 0.001244 wd 0.0500 time 0.2541 (0.2587) data time 0.0009 (0.0019) model time 0.2532 (0.2566) loss 5.3444 (6.0246) grad_norm 2.5902 (1.9111) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:05:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][480/625] eta 0:00:37 lr 0.001244 wd 0.0500 time 0.2536 (0.2586) data time 0.0006 (0.0019) model time 0.2530 (0.2566) loss 6.3183 (6.0225) grad_norm 1.6841 (1.9050) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][490/625] eta 0:00:34 lr 0.001243 wd 0.0500 time 0.2529 (0.2586) data time 0.0011 (0.0019) model time 0.2518 (0.2565) loss 6.9856 (6.0174) grad_norm 1.6485 (1.9013) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][500/625] eta 0:00:32 lr 0.001243 wd 0.0500 time 0.2584 (0.2586) data time 0.0008 (0.0019) model time 0.2576 (0.2566) loss 6.7294 (6.0227) grad_norm 1.3730 (1.9004) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][510/625] eta 0:00:29 lr 0.001243 wd 0.0500 time 0.2601 (0.2586) data time 0.0010 (0.0018) model time 0.2591 (0.2566) loss 5.9470 (6.0249) grad_norm 1.1543 (1.9018) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][520/625] eta 0:00:27 lr 0.001243 wd 0.0500 time 0.2607 (0.2585) data time 0.0006 (0.0018) model time 0.2601 (0.2566) loss 7.0639 (6.0260) grad_norm 2.3109 (1.8952) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][530/625] eta 0:00:24 lr 0.001243 wd 0.0500 time 0.2570 (0.2585) data time 0.0006 (0.0018) model time 0.2564 (0.2566) loss 5.3079 (6.0273) grad_norm 2.0837 (1.8921) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][540/625] eta 0:00:21 lr 0.001243 wd 0.0500 time 0.2563 (0.2585) data time 0.0010 (0.0018) model time 0.2553 (0.2566) loss 4.6773 (6.0253) grad_norm 1.5774 (1.8904) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][550/625] eta 0:00:19 lr 0.001242 wd 0.0500 time 0.2528 (0.2584) data time 0.0009 (0.0018) model time 0.2519 (0.2565) loss 6.2185 (6.0219) grad_norm 2.2008 (1.8843) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][560/625] eta 0:00:16 lr 0.001242 wd 0.0500 time 0.2588 (0.2584) data time 0.0006 (0.0018) model time 0.2582 (0.2565) loss 5.8253 (6.0248) grad_norm 2.6272 (1.8956) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][570/625] eta 0:00:14 lr 0.001242 wd 0.0500 time 0.2573 (0.2584) data time 0.0006 (0.0017) model time 0.2567 (0.2565) loss 6.2138 (6.0201) grad_norm 3.1057 (1.8994) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][580/625] eta 0:00:11 lr 0.001242 wd 0.0500 time 0.2571 (0.2584) data time 0.0010 (0.0017) model time 0.2561 (0.2565) loss 6.0788 (6.0248) grad_norm 1.7914 (1.8949) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][590/625] eta 0:00:09 lr 0.001242 wd 0.0500 time 0.2567 (0.2584) data time 0.0008 (0.0017) model time 0.2559 (0.2565) loss 7.5632 (6.0308) grad_norm 2.3646 (1.9144) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][600/625] eta 0:00:06 lr 0.001242 wd 0.0500 time 0.2583 (0.2583) data time 0.0009 (0.0017) model time 0.2574 (0.2565) loss 5.4055 (6.0290) grad_norm 1.7364 (1.9162) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][610/625] eta 0:00:03 lr 0.001241 wd 0.0500 time 0.2542 (0.2584) data time 0.0004 (0.0017) model time 0.2538 (0.2565) loss 4.5542 (6.0232) grad_norm 1.5483 (1.9095) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][620/625] eta 0:00:01 lr 0.001241 wd 0.0500 time 0.2544 (0.2583) data time 0.0005 (0.0017) model time 0.2539 (0.2565) loss 7.0980 (6.0178) grad_norm 1.4939 (1.9042) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 138 training takes 0:02:41 [2024-08-04 04:06:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:06:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.6611 (0.6611) Acc@1 87.695 (87.695) Acc@5 98.047 (98.047) Mem 9655MB [2024-08-04 04:06:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0488 (0.8093) Acc@1 77.344 (83.940) Acc@5 94.727 (96.968) Mem 9655MB [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1748 (0.9558) Acc@1 73.926 (80.180) Acc@5 92.920 (95.231) Mem 9655MB [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.888 Acc@5 95.210 [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.89% [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:06:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5864 (0.5864) Acc@1 88.574 (88.574) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.095) Loss 0.9370 (0.7271) Acc@1 79.199 (85.014) Acc@5 95.459 (97.399) Mem 9655MB [2024-08-04 04:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0771 (0.8613) Acc@1 74.414 (81.417) Acc@5 93.652 (95.810) Mem 9655MB [2024-08-04 04:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.080 Acc@5 95.807 [2024-08-04 04:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 04:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.08% [2024-08-04 04:06:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:06:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][0/625] eta 0:06:59 lr 0.001241 wd 0.0500 time 0.6706 (0.6706) data time 0.4179 (0.4179) model time 0.0000 (0.0000) loss 6.3033 (6.3033) grad_norm 1.4038 (1.4038) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][10/625] eta 0:02:59 lr 0.001241 wd 0.0500 time 0.2543 (0.2924) data time 0.0006 (0.0389) model time 0.0000 (0.0000) loss 6.0517 (5.8333) grad_norm 2.2769 (1.5586) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][20/625] eta 0:02:45 lr 0.001241 wd 0.0500 time 0.2532 (0.2742) data time 0.0007 (0.0209) model time 0.0000 (0.0000) loss 6.3379 (5.7916) grad_norm 2.3844 (1.5315) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][30/625] eta 0:02:39 lr 0.001241 wd 0.0500 time 0.2561 (0.2684) data time 0.0007 (0.0144) model time 0.0000 (0.0000) loss 5.1396 (5.8192) grad_norm 1.3286 (1.6304) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][40/625] eta 0:02:35 lr 0.001240 wd 0.0500 time 0.2551 (0.2651) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 5.9431 (5.9055) grad_norm 1.2352 (1.6108) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][50/625] eta 0:02:31 lr 0.001240 wd 0.0500 time 0.2547 (0.2632) data time 0.0011 (0.0091) model time 0.0000 (0.0000) loss 5.0339 (5.9181) grad_norm 1.3403 (1.6138) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][60/625] eta 0:02:27 lr 0.001240 wd 0.0500 time 0.2531 (0.2618) data time 0.0007 (0.0078) model time 0.2524 (0.2539) loss 6.6748 (5.9053) grad_norm 2.5140 (1.7842) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][70/625] eta 0:02:24 lr 0.001240 wd 0.0500 time 0.2528 (0.2612) data time 0.0010 (0.0068) model time 0.2518 (0.2551) loss 4.9688 (5.8779) grad_norm 1.6762 (1.7553) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][80/625] eta 0:02:22 lr 0.001240 wd 0.0500 time 0.2561 (0.2607) data time 0.0009 (0.0061) model time 0.2552 (0.2555) loss 7.0324 (5.9081) grad_norm 1.7448 (1.7258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][90/625] eta 0:02:19 lr 0.001240 wd 0.0500 time 0.2552 (0.2602) data time 0.0009 (0.0055) model time 0.2543 (0.2555) loss 6.0390 (5.9449) grad_norm 1.9619 (1.7741) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][100/625] eta 0:02:16 lr 0.001239 wd 0.0500 time 0.2555 (0.2599) data time 0.0007 (0.0051) model time 0.2548 (0.2555) loss 6.5488 (5.9396) grad_norm 2.0676 (1.7820) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][110/625] eta 0:02:13 lr 0.001239 wd 0.0500 time 0.2520 (0.2595) data time 0.0007 (0.0047) model time 0.2514 (0.2554) loss 5.6435 (5.9193) grad_norm 2.0497 (1.7845) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][120/625] eta 0:02:10 lr 0.001239 wd 0.0500 time 0.2508 (0.2592) data time 0.0009 (0.0044) model time 0.2499 (0.2554) loss 6.9839 (5.9455) grad_norm 1.5325 (1.7820) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][130/625] eta 0:02:08 lr 0.001239 wd 0.0500 time 0.2569 (0.2590) data time 0.0008 (0.0041) model time 0.2561 (0.2553) loss 6.4609 (5.9639) grad_norm 1.6288 (1.7831) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][140/625] eta 0:02:05 lr 0.001239 wd 0.0500 time 0.2598 (0.2588) data time 0.0006 (0.0039) model time 0.2592 (0.2553) loss 5.2072 (5.9535) grad_norm 1.9463 (1.8123) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][150/625] eta 0:02:02 lr 0.001239 wd 0.0500 time 0.2536 (0.2586) data time 0.0018 (0.0037) model time 0.2518 (0.2553) loss 7.2554 (5.9786) grad_norm 1.8792 (1.8652) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][160/625] eta 0:02:00 lr 0.001238 wd 0.0500 time 0.2532 (0.2584) data time 0.0007 (0.0035) model time 0.2525 (0.2552) loss 6.6276 (5.9866) grad_norm 1.8754 (1.8917) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][170/625] eta 0:01:57 lr 0.001238 wd 0.0500 time 0.2625 (0.2583) data time 0.0007 (0.0034) model time 0.2618 (0.2553) loss 4.8125 (5.9937) grad_norm 5.2843 (1.9236) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][180/625] eta 0:01:54 lr 0.001238 wd 0.0500 time 0.2576 (0.2583) data time 0.0006 (0.0033) model time 0.2570 (0.2554) loss 4.5783 (5.9738) grad_norm 2.1204 (1.9297) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][190/625] eta 0:01:52 lr 0.001238 wd 0.0500 time 0.2522 (0.2582) data time 0.0011 (0.0031) model time 0.2511 (0.2555) loss 6.1271 (5.9882) grad_norm 1.4655 (1.9257) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][200/625] eta 0:01:49 lr 0.001238 wd 0.0500 time 0.2564 (0.2582) data time 0.0008 (0.0030) model time 0.2555 (0.2556) loss 6.9697 (5.9924) grad_norm 1.4039 (1.8987) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][210/625] eta 0:01:47 lr 0.001237 wd 0.0500 time 0.2592 (0.2581) data time 0.0006 (0.0029) model time 0.2587 (0.2555) loss 4.4236 (5.9858) grad_norm 1.5694 (1.8907) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][220/625] eta 0:01:44 lr 0.001237 wd 0.0500 time 0.2580 (0.2580) data time 0.0008 (0.0028) model time 0.2572 (0.2555) loss 7.0696 (5.9922) grad_norm 2.1020 (1.8876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][230/625] eta 0:01:41 lr 0.001237 wd 0.0500 time 0.2548 (0.2580) data time 0.0008 (0.0028) model time 0.2540 (0.2556) loss 6.1402 (5.9988) grad_norm 1.6694 (1.8987) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][240/625] eta 0:01:39 lr 0.001237 wd 0.0500 time 0.2552 (0.2579) data time 0.0008 (0.0027) model time 0.2544 (0.2556) loss 5.2796 (5.9852) grad_norm 2.1843 (1.8899) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][250/625] eta 0:01:36 lr 0.001237 wd 0.0500 time 0.2541 (0.2578) data time 0.0009 (0.0026) model time 0.2532 (0.2555) loss 6.6924 (5.9779) grad_norm 2.1436 (1.8845) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][260/625] eta 0:01:34 lr 0.001237 wd 0.0500 time 0.2601 (0.2578) data time 0.0008 (0.0025) model time 0.2593 (0.2555) loss 7.1526 (5.9733) grad_norm 1.9981 (1.8880) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][270/625] eta 0:01:31 lr 0.001236 wd 0.0500 time 0.2530 (0.2577) data time 0.0008 (0.0025) model time 0.2522 (0.2555) loss 4.9985 (5.9682) grad_norm 1.8322 (1.9093) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][280/625] eta 0:01:29 lr 0.001236 wd 0.0500 time 0.2377 (0.2583) data time 0.0009 (0.0024) model time 0.2367 (0.2563) loss 6.1421 (5.9685) grad_norm 3.3377 (1.9341) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][290/625] eta 0:01:26 lr 0.001236 wd 0.0500 time 0.2568 (0.2582) data time 0.0013 (0.0024) model time 0.2555 (0.2562) loss 7.6279 (5.9604) grad_norm 1.3953 (1.9277) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][300/625] eta 0:01:23 lr 0.001236 wd 0.0500 time 0.2754 (0.2582) data time 0.0009 (0.0023) model time 0.2746 (0.2562) loss 6.1566 (5.9608) grad_norm 1.1930 (1.9242) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][310/625] eta 0:01:21 lr 0.001236 wd 0.0500 time 0.2543 (0.2582) data time 0.0010 (0.0023) model time 0.2533 (0.2563) loss 7.4833 (5.9706) grad_norm 1.4514 (1.9156) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][320/625] eta 0:01:18 lr 0.001236 wd 0.0500 time 0.2559 (0.2581) data time 0.0010 (0.0022) model time 0.2549 (0.2562) loss 6.4907 (5.9741) grad_norm 2.0051 (1.9120) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][330/625] eta 0:01:16 lr 0.001235 wd 0.0500 time 0.2559 (0.2587) data time 0.0007 (0.0022) model time 0.2552 (0.2569) loss 6.2277 (5.9734) grad_norm 2.3222 (1.9090) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][340/625] eta 0:01:13 lr 0.001235 wd 0.0500 time 0.2589 (0.2586) data time 0.0006 (0.0022) model time 0.2584 (0.2568) loss 6.6238 (5.9689) grad_norm 1.2154 (1.9048) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][350/625] eta 0:01:11 lr 0.001235 wd 0.0500 time 0.2585 (0.2585) data time 0.0008 (0.0021) model time 0.2577 (0.2568) loss 6.8760 (5.9828) grad_norm 1.2728 (1.8944) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][360/625] eta 0:01:08 lr 0.001235 wd 0.0500 time 0.2551 (0.2584) data time 0.0007 (0.0021) model time 0.2545 (0.2567) loss 4.4388 (5.9801) grad_norm 1.3525 (1.8876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][370/625] eta 0:01:06 lr 0.001235 wd 0.0500 time 0.2580 (0.2589) data time 0.0010 (0.0021) model time 0.2571 (0.2573) loss 4.8127 (5.9738) grad_norm 1.2873 (1.8751) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][380/625] eta 0:01:03 lr 0.001235 wd 0.0500 time 0.2537 (0.2589) data time 0.0007 (0.0020) model time 0.2530 (0.2573) loss 7.0456 (5.9854) grad_norm 1.6655 (1.8748) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][390/625] eta 0:01:00 lr 0.001234 wd 0.0500 time 0.2581 (0.2589) data time 0.0006 (0.0020) model time 0.2575 (0.2573) loss 7.0423 (5.9896) grad_norm 1.7540 (1.8893) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][400/625] eta 0:00:58 lr 0.001234 wd 0.0500 time 0.2542 (0.2588) data time 0.0007 (0.0020) model time 0.2535 (0.2572) loss 4.5896 (5.9790) grad_norm 2.4987 (1.9068) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][410/625] eta 0:00:55 lr 0.001234 wd 0.0500 time 0.2535 (0.2587) data time 0.0008 (0.0020) model time 0.2527 (0.2571) loss 5.9513 (5.9806) grad_norm 2.1893 (1.9154) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][420/625] eta 0:00:53 lr 0.001234 wd 0.0500 time 0.2555 (0.2587) data time 0.0010 (0.0019) model time 0.2544 (0.2571) loss 5.8800 (5.9916) grad_norm 1.4928 (1.9053) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][430/625] eta 0:00:50 lr 0.001234 wd 0.0500 time 0.2546 (0.2586) data time 0.0007 (0.0019) model time 0.2539 (0.2571) loss 6.7124 (5.9892) grad_norm 1.0891 (1.9037) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][440/625] eta 0:00:47 lr 0.001233 wd 0.0500 time 0.2598 (0.2585) data time 0.0007 (0.0019) model time 0.2591 (0.2570) loss 4.8264 (5.9966) grad_norm 1.9744 (1.9061) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][450/625] eta 0:00:45 lr 0.001233 wd 0.0500 time 0.2544 (0.2585) data time 0.0013 (0.0019) model time 0.2531 (0.2569) loss 6.5884 (5.9947) grad_norm 1.7394 (1.9009) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][460/625] eta 0:00:42 lr 0.001233 wd 0.0500 time 0.2602 (0.2584) data time 0.0008 (0.0019) model time 0.2594 (0.2569) loss 6.2704 (5.9978) grad_norm 1.7632 (1.8956) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][470/625] eta 0:00:40 lr 0.001233 wd 0.0500 time 0.2521 (0.2584) data time 0.0009 (0.0018) model time 0.2512 (0.2568) loss 5.2559 (5.9975) grad_norm 1.5885 (1.8898) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][480/625] eta 0:00:37 lr 0.001233 wd 0.0500 time 0.2527 (0.2583) data time 0.0010 (0.0018) model time 0.2518 (0.2568) loss 4.7075 (6.0017) grad_norm 1.4601 (1.8848) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][490/625] eta 0:00:34 lr 0.001233 wd 0.0500 time 0.2516 (0.2583) data time 0.0009 (0.0018) model time 0.2507 (0.2567) loss 6.4985 (6.0071) grad_norm 1.7708 (1.8839) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][500/625] eta 0:00:32 lr 0.001232 wd 0.0500 time 0.2518 (0.2582) data time 0.0006 (0.0018) model time 0.2511 (0.2567) loss 7.1118 (6.0120) grad_norm 2.8232 (1.8900) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][510/625] eta 0:00:29 lr 0.001232 wd 0.0500 time 0.2518 (0.2582) data time 0.0010 (0.0018) model time 0.2509 (0.2567) loss 6.0531 (6.0086) grad_norm 3.0456 (1.8984) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][520/625] eta 0:00:27 lr 0.001232 wd 0.0500 time 0.2544 (0.2582) data time 0.0009 (0.0018) model time 0.2535 (0.2566) loss 5.9743 (6.0029) grad_norm 1.5272 (1.8966) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][530/625] eta 0:00:24 lr 0.001232 wd 0.0500 time 0.2587 (0.2581) data time 0.0016 (0.0017) model time 0.2571 (0.2566) loss 5.8036 (6.0054) grad_norm 1.5936 (1.8990) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][540/625] eta 0:00:21 lr 0.001232 wd 0.0500 time 0.2581 (0.2581) data time 0.0006 (0.0017) model time 0.2575 (0.2566) loss 5.0803 (6.0065) grad_norm 1.8731 (1.9088) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:09:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][550/625] eta 0:00:19 lr 0.001232 wd 0.0500 time 0.2571 (0.2581) data time 0.0007 (0.0017) model time 0.2563 (0.2566) loss 6.5223 (6.0120) grad_norm 2.0341 (1.9187) loss_scale 4096.0000 (2077.7350) mem 9655MB [2024-08-04 04:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][560/625] eta 0:00:16 lr 0.001231 wd 0.0500 time 0.2539 (0.2581) data time 0.0007 (0.0017) model time 0.2532 (0.2566) loss 5.6005 (6.0138) grad_norm 1.6036 (1.9219) loss_scale 4096.0000 (2113.7112) mem 9655MB [2024-08-04 04:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][570/625] eta 0:00:14 lr 0.001231 wd 0.0500 time 0.2567 (0.2580) data time 0.0008 (0.0017) model time 0.2559 (0.2566) loss 7.0683 (6.0215) grad_norm 1.6963 (1.9195) loss_scale 4096.0000 (2148.4273) mem 9655MB [2024-08-04 04:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][580/625] eta 0:00:11 lr 0.001231 wd 0.0500 time 0.2632 (0.2580) data time 0.0008 (0.0017) model time 0.2624 (0.2565) loss 5.5880 (6.0245) grad_norm 2.6508 (1.9158) loss_scale 4096.0000 (2181.9484) mem 9655MB [2024-08-04 04:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][590/625] eta 0:00:09 lr 0.001231 wd 0.0500 time 0.2603 (0.2580) data time 0.0005 (0.0017) model time 0.2598 (0.2565) loss 5.3422 (6.0240) grad_norm 2.9229 (1.9194) loss_scale 4096.0000 (2214.3350) mem 9655MB [2024-08-04 04:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][600/625] eta 0:00:06 lr 0.001231 wd 0.0500 time 0.2577 (0.2580) data time 0.0010 (0.0017) model time 0.2567 (0.2565) loss 6.8426 (6.0267) grad_norm 1.5947 (1.9201) loss_scale 4096.0000 (2245.6439) mem 9655MB [2024-08-04 04:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][610/625] eta 0:00:03 lr 0.001231 wd 0.0500 time 0.2505 (0.2580) data time 0.0004 (0.0016) model time 0.2501 (0.2565) loss 6.3281 (6.0359) grad_norm 1.2145 (1.9179) loss_scale 4096.0000 (2275.9280) mem 9655MB [2024-08-04 04:09:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][620/625] eta 0:00:01 lr 0.001230 wd 0.0500 time 0.2533 (0.2579) data time 0.0003 (0.0016) model time 0.2530 (0.2565) loss 5.9948 (6.0339) grad_norm 1.5189 (1.9198) loss_scale 4096.0000 (2305.2367) mem 9655MB [2024-08-04 04:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 139 training takes 0:02:41 [2024-08-04 04:09:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:09:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.6528 (0.6528) Acc@1 87.842 (87.842) Acc@5 98.047 (98.047) Mem 9655MB [2024-08-04 04:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.0352 (0.7971) Acc@1 76.758 (83.714) Acc@5 94.824 (96.884) Mem 9655MB [2024-08-04 04:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2070 (0.9461) Acc@1 72.998 (79.964) Acc@5 92.041 (95.213) Mem 9655MB [2024-08-04 04:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.752 Acc@5 95.226 [2024-08-04 04:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-04 04:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.783 (0.783) Loss 0.5874 (0.5874) Acc@1 88.623 (88.623) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.129) Loss 0.9380 (0.7275) Acc@1 79.346 (85.063) Acc@5 95.459 (97.425) Mem 9655MB [2024-08-04 04:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0771 (0.8614) Acc@1 74.414 (81.441) Acc@5 93.604 (95.819) Mem 9655MB [2024-08-04 04:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.112 Acc@5 95.809 [2024-08-04 04:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 04:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.11% [2024-08-04 04:09:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:09:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:09:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][0/625] eta 0:07:15 lr 0.001230 wd 0.0500 time 0.6970 (0.6970) data time 0.4548 (0.4548) model time 0.0000 (0.0000) loss 6.0253 (6.0253) grad_norm 3.3460 (3.3460) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][10/625] eta 0:03:01 lr 0.001230 wd 0.0500 time 0.2543 (0.2952) data time 0.0007 (0.0422) model time 0.0000 (0.0000) loss 6.9836 (6.0523) grad_norm 2.6437 (2.2572) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][20/625] eta 0:02:47 lr 0.001230 wd 0.0500 time 0.2598 (0.2769) data time 0.0010 (0.0225) model time 0.0000 (0.0000) loss 6.5193 (6.0488) grad_norm 1.4413 (2.0462) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][30/625] eta 0:02:40 lr 0.001230 wd 0.0500 time 0.2591 (0.2702) data time 0.0010 (0.0155) model time 0.0000 (0.0000) loss 6.1307 (6.0326) grad_norm 3.2002 (1.9897) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][40/625] eta 0:02:35 lr 0.001230 wd 0.0500 time 0.2549 (0.2663) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 5.6514 (6.0310) grad_norm 1.2674 (2.0206) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][50/625] eta 0:02:31 lr 0.001229 wd 0.0500 time 0.2578 (0.2641) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 6.2322 (6.0788) grad_norm 3.6615 (2.0669) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][60/625] eta 0:02:28 lr 0.001229 wd 0.0500 time 0.2582 (0.2626) data time 0.0006 (0.0083) model time 0.2576 (0.2543) loss 7.4366 (6.0386) grad_norm 1.3626 (1.9894) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][70/625] eta 0:02:25 lr 0.001229 wd 0.0500 time 0.2570 (0.2617) data time 0.0010 (0.0073) model time 0.2560 (0.2546) loss 6.7373 (6.0119) grad_norm 1.6738 (2.0067) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][80/625] eta 0:02:22 lr 0.001229 wd 0.0500 time 0.2568 (0.2610) data time 0.0008 (0.0065) model time 0.2560 (0.2548) loss 4.5427 (6.0114) grad_norm 1.9659 (1.9933) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][90/625] eta 0:02:19 lr 0.001229 wd 0.0500 time 0.2571 (0.2606) data time 0.0006 (0.0059) model time 0.2565 (0.2553) loss 5.8102 (6.0280) grad_norm 2.5884 (1.9580) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][100/625] eta 0:02:16 lr 0.001229 wd 0.0500 time 0.2535 (0.2601) data time 0.0009 (0.0054) model time 0.2526 (0.2551) loss 6.8617 (6.0061) grad_norm 1.4945 (1.9566) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][110/625] eta 0:02:13 lr 0.001228 wd 0.0500 time 0.2485 (0.2597) data time 0.0011 (0.0050) model time 0.2474 (0.2549) loss 6.6884 (5.9880) grad_norm 1.6923 (1.9773) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:09:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][120/625] eta 0:02:10 lr 0.001228 wd 0.0500 time 0.2583 (0.2594) data time 0.0009 (0.0047) model time 0.2575 (0.2550) loss 6.3125 (5.9936) grad_norm 1.4557 (1.9607) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][130/625] eta 0:02:08 lr 0.001228 wd 0.0500 time 0.2579 (0.2592) data time 0.0006 (0.0044) model time 0.2573 (0.2551) loss 4.9351 (6.0000) grad_norm 2.0590 (1.9553) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][140/625] eta 0:02:05 lr 0.001228 wd 0.0500 time 0.2561 (0.2589) data time 0.0009 (0.0042) model time 0.2552 (0.2549) loss 4.9359 (5.9912) grad_norm 1.7063 (1.9858) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][150/625] eta 0:02:02 lr 0.001228 wd 0.0500 time 0.2516 (0.2587) data time 0.0010 (0.0040) model time 0.2506 (0.2549) loss 5.3618 (5.9698) grad_norm 1.5971 (2.0510) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][160/625] eta 0:02:00 lr 0.001228 wd 0.0500 time 0.2569 (0.2586) data time 0.0008 (0.0038) model time 0.2561 (0.2550) loss 5.9016 (5.9575) grad_norm 2.4285 (2.0663) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][170/625] eta 0:01:57 lr 0.001227 wd 0.0500 time 0.2589 (0.2584) data time 0.0008 (0.0036) model time 0.2582 (0.2550) loss 6.8112 (5.9579) grad_norm 1.3739 (2.0481) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][180/625] eta 0:01:54 lr 0.001227 wd 0.0500 time 0.2569 (0.2583) data time 0.0007 (0.0035) model time 0.2562 (0.2550) loss 5.1844 (5.9533) grad_norm 1.3115 (2.0165) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][190/625] eta 0:01:52 lr 0.001227 wd 0.0500 time 0.2575 (0.2581) data time 0.0007 (0.0033) model time 0.2569 (0.2550) loss 6.7106 (5.9574) grad_norm 1.9680 (2.0010) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][200/625] eta 0:01:49 lr 0.001227 wd 0.0500 time 0.2547 (0.2580) data time 0.0007 (0.0032) model time 0.2540 (0.2550) loss 5.9242 (5.9496) grad_norm 1.7588 (2.0164) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][210/625] eta 0:01:47 lr 0.001227 wd 0.0500 time 0.2538 (0.2580) data time 0.0011 (0.0031) model time 0.2527 (0.2550) loss 4.8105 (5.9349) grad_norm 1.3577 (1.9989) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][220/625] eta 0:01:44 lr 0.001226 wd 0.0500 time 0.2568 (0.2579) data time 0.0007 (0.0030) model time 0.2560 (0.2551) loss 6.6379 (5.9566) grad_norm 1.5798 (1.9950) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][230/625] eta 0:01:41 lr 0.001226 wd 0.0500 time 0.2551 (0.2580) data time 0.0007 (0.0029) model time 0.2544 (0.2552) loss 7.3058 (5.9673) grad_norm 2.4708 (2.0097) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][240/625] eta 0:01:39 lr 0.001226 wd 0.0500 time 0.2598 (0.2579) data time 0.0008 (0.0028) model time 0.2591 (0.2552) loss 5.9753 (5.9692) grad_norm 1.6387 (2.0148) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][250/625] eta 0:01:36 lr 0.001226 wd 0.0500 time 0.2535 (0.2578) data time 0.0009 (0.0028) model time 0.2526 (0.2552) loss 6.6761 (5.9657) grad_norm 2.1232 (2.0154) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][260/625] eta 0:01:34 lr 0.001226 wd 0.0500 time 0.2615 (0.2584) data time 0.0007 (0.0027) model time 0.2608 (0.2560) loss 6.0337 (5.9677) grad_norm 2.1277 (2.0249) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][270/625] eta 0:01:31 lr 0.001226 wd 0.0500 time 0.2551 (0.2583) data time 0.0008 (0.0026) model time 0.2543 (0.2560) loss 5.6207 (5.9858) grad_norm 1.9290 (2.0117) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][280/625] eta 0:01:29 lr 0.001225 wd 0.0500 time 0.2580 (0.2583) data time 0.0007 (0.0026) model time 0.2573 (0.2561) loss 7.1244 (5.9848) grad_norm 1.6874 (1.9978) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][290/625] eta 0:01:26 lr 0.001225 wd 0.0500 time 0.2537 (0.2583) data time 0.0007 (0.0025) model time 0.2530 (0.2561) loss 5.0724 (5.9956) grad_norm 1.7036 (1.9998) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][300/625] eta 0:01:23 lr 0.001225 wd 0.0500 time 0.2551 (0.2582) data time 0.0008 (0.0025) model time 0.2542 (0.2561) loss 6.6974 (5.9964) grad_norm 1.8132 (1.9909) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][310/625] eta 0:01:21 lr 0.001225 wd 0.0500 time 0.2548 (0.2582) data time 0.0009 (0.0024) model time 0.2539 (0.2560) loss 6.1206 (5.9920) grad_norm 2.0513 (1.9965) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][320/625] eta 0:01:18 lr 0.001225 wd 0.0500 time 0.2580 (0.2581) data time 0.0008 (0.0024) model time 0.2572 (0.2560) loss 5.8046 (5.9817) grad_norm 2.7581 (1.9962) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][330/625] eta 0:01:16 lr 0.001225 wd 0.0500 time 0.2519 (0.2580) data time 0.0006 (0.0023) model time 0.2513 (0.2560) loss 6.2424 (5.9844) grad_norm 2.6846 (2.0033) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][340/625] eta 0:01:13 lr 0.001224 wd 0.0500 time 0.2559 (0.2580) data time 0.0010 (0.0023) model time 0.2549 (0.2560) loss 6.8928 (5.9866) grad_norm 1.2499 (1.9897) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:10:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][350/625] eta 0:01:10 lr 0.001224 wd 0.0500 time 0.2548 (0.2580) data time 0.0012 (0.0022) model time 0.2537 (0.2560) loss 5.4211 (5.9928) grad_norm 2.5648 (1.9915) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][360/625] eta 0:01:08 lr 0.001224 wd 0.0500 time 0.2561 (0.2579) data time 0.0008 (0.0022) model time 0.2554 (0.2559) loss 6.0570 (5.9864) grad_norm 1.4936 (1.9857) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][370/625] eta 0:01:05 lr 0.001224 wd 0.0500 time 0.2508 (0.2578) data time 0.0008 (0.0022) model time 0.2500 (0.2559) loss 6.0099 (5.9845) grad_norm 2.2299 (1.9777) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][380/625] eta 0:01:03 lr 0.001224 wd 0.0500 time 0.2580 (0.2579) data time 0.0009 (0.0021) model time 0.2571 (0.2559) loss 6.7083 (5.9821) grad_norm 1.6886 (1.9765) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][390/625] eta 0:01:00 lr 0.001224 wd 0.0500 time 0.2572 (0.2578) data time 0.0007 (0.0021) model time 0.2565 (0.2559) loss 4.8420 (5.9880) grad_norm 1.7763 (1.9804) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][400/625] eta 0:00:58 lr 0.001223 wd 0.0500 time 0.2561 (0.2578) data time 0.0010 (0.0021) model time 0.2551 (0.2559) loss 4.6554 (5.9874) grad_norm 1.6188 (1.9876) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][410/625] eta 0:00:55 lr 0.001223 wd 0.0500 time 0.2538 (0.2578) data time 0.0008 (0.0021) model time 0.2531 (0.2559) loss 6.3385 (5.9821) grad_norm 2.0855 (1.9826) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][420/625] eta 0:00:52 lr 0.001223 wd 0.0500 time 0.2574 (0.2577) data time 0.0018 (0.0020) model time 0.2556 (0.2559) loss 5.7027 (5.9813) grad_norm 2.6935 (1.9872) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][430/625] eta 0:00:50 lr 0.001223 wd 0.0500 time 0.2559 (0.2577) data time 0.0010 (0.0020) model time 0.2549 (0.2559) loss 5.6205 (5.9862) grad_norm 1.8647 (1.9934) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][440/625] eta 0:00:47 lr 0.001223 wd 0.0500 time 0.2552 (0.2577) data time 0.0013 (0.0020) model time 0.2539 (0.2559) loss 6.7191 (5.9853) grad_norm 1.2297 (1.9898) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][450/625] eta 0:00:45 lr 0.001222 wd 0.0500 time 0.2601 (0.2577) data time 0.0009 (0.0020) model time 0.2592 (0.2559) loss 6.2158 (5.9905) grad_norm 1.5018 (1.9759) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][460/625] eta 0:00:42 lr 0.001222 wd 0.0500 time 0.2589 (0.2576) data time 0.0006 (0.0019) model time 0.2584 (0.2559) loss 6.2771 (5.9917) grad_norm 1.5105 (1.9661) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][470/625] eta 0:00:39 lr 0.001222 wd 0.0500 time 0.2543 (0.2576) data time 0.0008 (0.0019) model time 0.2535 (0.2559) loss 5.5708 (5.9931) grad_norm 1.4685 (1.9634) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][480/625] eta 0:00:37 lr 0.001222 wd 0.0500 time 0.2576 (0.2576) data time 0.0008 (0.0019) model time 0.2568 (0.2559) loss 6.6568 (5.9900) grad_norm 2.2164 (1.9582) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][490/625] eta 0:00:34 lr 0.001222 wd 0.0500 time 0.2548 (0.2576) data time 0.0008 (0.0019) model time 0.2540 (0.2558) loss 4.8889 (5.9937) grad_norm 2.0464 (1.9635) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][500/625] eta 0:00:32 lr 0.001222 wd 0.0500 time 0.2687 (0.2575) data time 0.0007 (0.0019) model time 0.2680 (0.2558) loss 6.6965 (6.0038) grad_norm 1.9577 (1.9589) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][510/625] eta 0:00:29 lr 0.001221 wd 0.0500 time 0.2589 (0.2575) data time 0.0008 (0.0018) model time 0.2581 (0.2558) loss 4.6913 (6.0014) grad_norm 1.1261 (1.9527) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][520/625] eta 0:00:27 lr 0.001221 wd 0.0500 time 0.2566 (0.2575) data time 0.0011 (0.0018) model time 0.2555 (0.2558) loss 6.4906 (6.0028) grad_norm 2.3241 (1.9498) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][530/625] eta 0:00:24 lr 0.001221 wd 0.0500 time 0.2591 (0.2575) data time 0.0008 (0.0018) model time 0.2583 (0.2558) loss 6.1491 (6.0090) grad_norm 1.9274 (1.9561) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][540/625] eta 0:00:21 lr 0.001221 wd 0.0500 time 0.2585 (0.2574) data time 0.0008 (0.0018) model time 0.2577 (0.2558) loss 4.7699 (6.0081) grad_norm 1.3452 (1.9637) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][550/625] eta 0:00:19 lr 0.001221 wd 0.0500 time 0.2580 (0.2574) data time 0.0009 (0.0018) model time 0.2571 (0.2558) loss 6.9657 (6.0127) grad_norm 2.2007 (1.9700) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][560/625] eta 0:00:16 lr 0.001221 wd 0.0500 time 0.2575 (0.2574) data time 0.0007 (0.0018) model time 0.2568 (0.2558) loss 7.1994 (6.0137) grad_norm 1.8667 (1.9743) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][570/625] eta 0:00:14 lr 0.001220 wd 0.0500 time 0.2616 (0.2574) data time 0.0006 (0.0017) model time 0.2610 (0.2558) loss 8.1472 (6.0177) grad_norm 2.4579 (1.9763) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][580/625] eta 0:00:11 lr 0.001220 wd 0.0500 time 0.2552 (0.2574) data time 0.0007 (0.0017) model time 0.2546 (0.2558) loss 6.9568 (6.0167) grad_norm 3.3663 (1.9759) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][590/625] eta 0:00:09 lr 0.001220 wd 0.0500 time 0.2605 (0.2574) data time 0.0008 (0.0017) model time 0.2597 (0.2558) loss 4.5916 (6.0202) grad_norm 1.9048 (1.9925) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][600/625] eta 0:00:06 lr 0.001220 wd 0.0500 time 0.2691 (0.2574) data time 0.0008 (0.0017) model time 0.2683 (0.2558) loss 6.2155 (6.0208) grad_norm 2.8337 (1.9934) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][610/625] eta 0:00:03 lr 0.001220 wd 0.0500 time 0.2526 (0.2574) data time 0.0006 (0.0017) model time 0.2520 (0.2558) loss 5.1547 (6.0205) grad_norm 1.8937 (1.9931) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][620/625] eta 0:00:01 lr 0.001220 wd 0.0500 time 0.2519 (0.2573) data time 0.0004 (0.0017) model time 0.2516 (0.2558) loss 6.5641 (6.0184) grad_norm 2.4396 (1.9908) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 140 training takes 0:02:40 [2024-08-04 04:12:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:12:08 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.6836 (0.6836) Acc@1 87.988 (87.988) Acc@5 97.949 (97.949) Mem 9655MB [2024-08-04 04:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0723 (0.8183) Acc@1 78.076 (83.940) Acc@5 94.727 (97.013) Mem 9655MB [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2344 (0.9641) Acc@1 72.900 (80.299) Acc@5 92.432 (95.268) Mem 9655MB [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.966 Acc@5 95.216 [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.97% [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:12:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:12:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.5874 (0.5874) Acc@1 88.818 (88.818) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:12:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9375 (0.7275) Acc@1 79.541 (85.121) Acc@5 95.557 (97.425) Mem 9655MB [2024-08-04 04:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0781 (0.8613) Acc@1 74.365 (81.485) Acc@5 93.750 (95.833) Mem 9655MB [2024-08-04 04:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.152 Acc@5 95.827 [2024-08-04 04:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.15% [2024-08-04 04:12:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:12:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:12:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][0/625] eta 0:07:32 lr 0.001219 wd 0.0500 time 0.7246 (0.7246) data time 0.4837 (0.4837) model time 0.0000 (0.0000) loss 6.6316 (6.6316) grad_norm 1.8843 (1.8843) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][10/625] eta 0:03:15 lr 0.001219 wd 0.0500 time 0.2575 (0.3172) data time 0.0009 (0.0447) model time 0.0000 (0.0000) loss 6.8256 (6.4806) grad_norm 1.8675 (2.4015) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][20/625] eta 0:02:53 lr 0.001219 wd 0.0500 time 0.2526 (0.2874) data time 0.0008 (0.0238) model time 0.0000 (0.0000) loss 4.7874 (6.0241) grad_norm 1.5392 (2.3290) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][30/625] eta 0:02:44 lr 0.001219 wd 0.0500 time 0.2528 (0.2770) data time 0.0010 (0.0164) model time 0.0000 (0.0000) loss 6.1397 (5.9926) grad_norm 1.1906 (2.0766) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][40/625] eta 0:02:39 lr 0.001219 wd 0.0500 time 0.2560 (0.2718) data time 0.0009 (0.0127) model time 0.0000 (0.0000) loss 5.2407 (6.0227) grad_norm 2.4176 (2.0014) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][50/625] eta 0:02:36 lr 0.001219 wd 0.0500 time 0.2561 (0.2729) data time 0.0009 (0.0104) model time 0.0000 (0.0000) loss 5.6157 (5.9939) grad_norm 1.2017 (1.9778) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][60/625] eta 0:02:32 lr 0.001218 wd 0.0500 time 0.2557 (0.2701) data time 0.0007 (0.0088) model time 0.2550 (0.2552) loss 6.3657 (6.0173) grad_norm 2.0606 (1.9434) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][70/625] eta 0:02:28 lr 0.001218 wd 0.0500 time 0.2505 (0.2682) data time 0.0010 (0.0077) model time 0.2495 (0.2553) loss 5.0501 (5.9831) grad_norm 2.0016 (1.9449) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][80/625] eta 0:02:25 lr 0.001218 wd 0.0500 time 0.2553 (0.2666) data time 0.0009 (0.0069) model time 0.2545 (0.2549) loss 5.0711 (6.0075) grad_norm 1.9053 (1.9488) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][90/625] eta 0:02:22 lr 0.001218 wd 0.0500 time 0.2578 (0.2656) data time 0.0006 (0.0062) model time 0.2572 (0.2553) loss 6.4384 (6.0005) grad_norm 1.7433 (2.0147) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][100/625] eta 0:02:18 lr 0.001218 wd 0.0500 time 0.2542 (0.2645) data time 0.0009 (0.0057) model time 0.2533 (0.2551) loss 6.7520 (6.0078) grad_norm 2.9170 (2.0138) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][110/625] eta 0:02:15 lr 0.001218 wd 0.0500 time 0.2558 (0.2638) data time 0.0008 (0.0053) model time 0.2550 (0.2551) loss 5.9214 (6.0276) grad_norm 1.2570 (2.0001) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][120/625] eta 0:02:12 lr 0.001217 wd 0.0500 time 0.2555 (0.2632) data time 0.0008 (0.0049) model time 0.2547 (0.2551) loss 6.1968 (6.0260) grad_norm 2.0213 (1.9946) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][130/625] eta 0:02:09 lr 0.001217 wd 0.0500 time 0.2610 (0.2626) data time 0.0007 (0.0046) model time 0.2602 (0.2551) loss 5.2826 (6.0538) grad_norm 1.1993 (1.9900) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][140/625] eta 0:02:07 lr 0.001217 wd 0.0500 time 0.2572 (0.2621) data time 0.0007 (0.0044) model time 0.2565 (0.2551) loss 5.9369 (6.0365) grad_norm 1.3497 (1.9772) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][150/625] eta 0:02:04 lr 0.001217 wd 0.0500 time 0.2558 (0.2617) data time 0.0008 (0.0041) model time 0.2550 (0.2550) loss 6.2821 (6.0358) grad_norm 1.5440 (2.0115) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][160/625] eta 0:02:01 lr 0.001217 wd 0.0500 time 0.2577 (0.2623) data time 0.0006 (0.0039) model time 0.2570 (0.2565) loss 6.0188 (6.0212) grad_norm 1.9731 (2.0091) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][170/625] eta 0:01:59 lr 0.001216 wd 0.0500 time 0.2532 (0.2621) data time 0.0007 (0.0038) model time 0.2525 (0.2565) loss 5.4508 (6.0142) grad_norm 1.7212 (2.0511) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:13:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][180/625] eta 0:01:56 lr 0.001216 wd 0.0500 time 0.2602 (0.2617) data time 0.0007 (0.0036) model time 0.2595 (0.2564) loss 6.4223 (6.0041) grad_norm 1.5547 (2.0493) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][190/625] eta 0:01:53 lr 0.001216 wd 0.0500 time 0.2488 (0.2614) data time 0.0008 (0.0035) model time 0.2480 (0.2563) loss 5.6558 (5.9990) grad_norm inf (inf) loss_scale 2048.0000 (4085.2775) mem 9655MB [2024-08-04 04:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][200/625] eta 0:01:50 lr 0.001216 wd 0.0500 time 0.2587 (0.2612) data time 0.0008 (0.0033) model time 0.2579 (0.2562) loss 5.8723 (5.9881) grad_norm 2.1601 (inf) loss_scale 2048.0000 (3983.9204) mem 9655MB [2024-08-04 04:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][210/625] eta 0:01:48 lr 0.001216 wd 0.0500 time 0.2554 (0.2609) data time 0.0010 (0.0032) model time 0.2544 (0.2562) loss 6.1560 (5.9766) grad_norm 2.3176 (inf) loss_scale 2048.0000 (3892.1706) mem 9655MB [2024-08-04 04:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][220/625] eta 0:01:45 lr 0.001216 wd 0.0500 time 0.2569 (0.2608) data time 0.0005 (0.0031) model time 0.2564 (0.2562) loss 4.5606 (5.9713) grad_norm 1.6764 (inf) loss_scale 2048.0000 (3808.7240) mem 9655MB [2024-08-04 04:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][230/625] eta 0:01:42 lr 0.001215 wd 0.0500 time 0.2546 (0.2605) data time 0.0008 (0.0030) model time 0.2538 (0.2561) loss 4.8841 (5.9854) grad_norm 2.6522 (inf) loss_scale 2048.0000 (3732.5022) mem 9655MB [2024-08-04 04:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][240/625] eta 0:01:40 lr 0.001215 wd 0.0500 time 0.2562 (0.2603) data time 0.0009 (0.0029) model time 0.2554 (0.2560) loss 7.1124 (5.9799) grad_norm 2.3866 (inf) loss_scale 2048.0000 (3662.6058) mem 9655MB [2024-08-04 04:13:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][250/625] eta 0:01:37 lr 0.001215 wd 0.0500 time 0.2596 (0.2601) data time 0.0006 (0.0029) model time 0.2590 (0.2559) loss 6.3436 (5.9815) grad_norm 1.9324 (inf) loss_scale 2048.0000 (3598.2789) mem 9655MB [2024-08-04 04:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][260/625] eta 0:01:34 lr 0.001215 wd 0.0500 time 0.2560 (0.2600) data time 0.0009 (0.0028) model time 0.2550 (0.2559) loss 4.0598 (5.9865) grad_norm 1.8510 (inf) loss_scale 2048.0000 (3538.8812) mem 9655MB [2024-08-04 04:13:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][270/625] eta 0:01:32 lr 0.001215 wd 0.0500 time 0.2565 (0.2599) data time 0.0011 (0.0027) model time 0.2554 (0.2559) loss 5.1672 (5.9822) grad_norm 1.9802 (inf) loss_scale 2048.0000 (3483.8672) mem 9655MB [2024-08-04 04:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][280/625] eta 0:01:29 lr 0.001215 wd 0.0500 time 0.2582 (0.2598) data time 0.0008 (0.0027) model time 0.2574 (0.2559) loss 5.4359 (5.9813) grad_norm 2.3174 (inf) loss_scale 2048.0000 (3432.7687) mem 9655MB [2024-08-04 04:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][290/625] eta 0:01:26 lr 0.001214 wd 0.0500 time 0.2624 (0.2596) data time 0.0006 (0.0026) model time 0.2618 (0.2559) loss 6.0461 (5.9874) grad_norm 1.5678 (inf) loss_scale 2048.0000 (3385.1821) mem 9655MB [2024-08-04 04:13:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][300/625] eta 0:01:24 lr 0.001214 wd 0.0500 time 0.2541 (0.2595) data time 0.0008 (0.0025) model time 0.2533 (0.2559) loss 6.8771 (6.0019) grad_norm 1.4550 (inf) loss_scale 2048.0000 (3340.7575) mem 9655MB [2024-08-04 04:13:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][310/625] eta 0:01:21 lr 0.001214 wd 0.0500 time 0.2600 (0.2594) data time 0.0007 (0.0025) model time 0.2592 (0.2559) loss 6.4170 (5.9942) grad_norm 1.7928 (inf) loss_scale 2048.0000 (3299.1897) mem 9655MB [2024-08-04 04:13:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][320/625] eta 0:01:19 lr 0.001214 wd 0.0500 time 0.2545 (0.2594) data time 0.0010 (0.0024) model time 0.2535 (0.2559) loss 5.3257 (5.9994) grad_norm 2.0084 (inf) loss_scale 2048.0000 (3260.2118) mem 9655MB [2024-08-04 04:13:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][330/625] eta 0:01:16 lr 0.001214 wd 0.0500 time 0.2555 (0.2593) data time 0.0010 (0.0024) model time 0.2545 (0.2558) loss 6.3457 (5.9894) grad_norm 1.4094 (inf) loss_scale 2048.0000 (3223.5891) mem 9655MB [2024-08-04 04:13:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][340/625] eta 0:01:13 lr 0.001214 wd 0.0500 time 0.2523 (0.2591) data time 0.0010 (0.0024) model time 0.2513 (0.2558) loss 6.0333 (5.9901) grad_norm 2.3790 (inf) loss_scale 2048.0000 (3189.1144) mem 9655MB [2024-08-04 04:13:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][350/625] eta 0:01:11 lr 0.001213 wd 0.0500 time 0.2539 (0.2590) data time 0.0007 (0.0023) model time 0.2533 (0.2557) loss 6.5011 (5.9963) grad_norm 1.1640 (inf) loss_scale 2048.0000 (3156.6040) mem 9655MB [2024-08-04 04:13:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][360/625] eta 0:01:08 lr 0.001213 wd 0.0500 time 0.2580 (0.2589) data time 0.0008 (0.0023) model time 0.2571 (0.2557) loss 6.8440 (6.0008) grad_norm 1.1759 (inf) loss_scale 2048.0000 (3125.8947) mem 9655MB [2024-08-04 04:13:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][370/625] eta 0:01:06 lr 0.001213 wd 0.0500 time 0.2560 (0.2588) data time 0.0006 (0.0022) model time 0.2554 (0.2557) loss 5.4832 (5.9979) grad_norm 1.3458 (inf) loss_scale 2048.0000 (3096.8410) mem 9655MB [2024-08-04 04:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][380/625] eta 0:01:03 lr 0.001213 wd 0.0500 time 0.2585 (0.2588) data time 0.0008 (0.0022) model time 0.2577 (0.2556) loss 6.3526 (5.9973) grad_norm 1.2691 (inf) loss_scale 2048.0000 (3069.3123) mem 9655MB [2024-08-04 04:13:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][390/625] eta 0:01:00 lr 0.001213 wd 0.0500 time 0.2574 (0.2587) data time 0.0008 (0.0022) model time 0.2566 (0.2556) loss 4.6978 (5.9854) grad_norm 1.9405 (inf) loss_scale 2048.0000 (3043.1918) mem 9655MB [2024-08-04 04:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][400/625] eta 0:00:58 lr 0.001212 wd 0.0500 time 0.2599 (0.2587) data time 0.0008 (0.0021) model time 0.2591 (0.2556) loss 6.2677 (5.9883) grad_norm 1.8597 (inf) loss_scale 2048.0000 (3018.3741) mem 9655MB [2024-08-04 04:13:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][410/625] eta 0:00:55 lr 0.001212 wd 0.0500 time 0.2568 (0.2586) data time 0.0011 (0.0021) model time 0.2557 (0.2556) loss 6.8814 (5.9934) grad_norm 1.2484 (inf) loss_scale 2048.0000 (2994.7640) mem 9655MB [2024-08-04 04:14:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][420/625] eta 0:00:52 lr 0.001212 wd 0.0500 time 0.2582 (0.2585) data time 0.0008 (0.0021) model time 0.2573 (0.2556) loss 6.1169 (5.9999) grad_norm 1.3582 (inf) loss_scale 2048.0000 (2972.2755) mem 9655MB [2024-08-04 04:14:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][430/625] eta 0:00:50 lr 0.001212 wd 0.0500 time 0.2559 (0.2584) data time 0.0017 (0.0021) model time 0.2541 (0.2556) loss 5.5045 (5.9978) grad_norm 1.8610 (inf) loss_scale 2048.0000 (2950.8306) mem 9655MB [2024-08-04 04:14:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][440/625] eta 0:00:47 lr 0.001212 wd 0.0500 time 0.2574 (0.2584) data time 0.0008 (0.0020) model time 0.2566 (0.2556) loss 6.0273 (5.9924) grad_norm 1.9451 (inf) loss_scale 2048.0000 (2930.3583) mem 9655MB [2024-08-04 04:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][450/625] eta 0:00:45 lr 0.001212 wd 0.0500 time 0.2580 (0.2584) data time 0.0007 (0.0020) model time 0.2573 (0.2556) loss 6.5811 (5.9919) grad_norm 3.8170 (inf) loss_scale 2048.0000 (2910.7938) mem 9655MB [2024-08-04 04:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][460/625] eta 0:00:42 lr 0.001211 wd 0.0500 time 0.2599 (0.2583) data time 0.0006 (0.0020) model time 0.2594 (0.2555) loss 7.0352 (5.9886) grad_norm 2.1230 (inf) loss_scale 2048.0000 (2892.0781) mem 9655MB [2024-08-04 04:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][470/625] eta 0:00:40 lr 0.001211 wd 0.0500 time 0.2570 (0.2583) data time 0.0011 (0.0020) model time 0.2559 (0.2556) loss 7.1308 (5.9931) grad_norm 1.7199 (inf) loss_scale 2048.0000 (2874.1571) mem 9655MB [2024-08-04 04:14:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][480/625] eta 0:00:37 lr 0.001211 wd 0.0500 time 0.2506 (0.2583) data time 0.0009 (0.0019) model time 0.2497 (0.2556) loss 6.1692 (5.9898) grad_norm 1.7387 (inf) loss_scale 2048.0000 (2856.9813) mem 9655MB [2024-08-04 04:14:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][490/625] eta 0:00:34 lr 0.001211 wd 0.0500 time 0.2570 (0.2582) data time 0.0012 (0.0019) model time 0.2558 (0.2556) loss 5.7218 (5.9989) grad_norm 1.7265 (inf) loss_scale 2048.0000 (2840.5051) mem 9655MB [2024-08-04 04:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][500/625] eta 0:00:32 lr 0.001211 wd 0.0500 time 0.2590 (0.2582) data time 0.0010 (0.0019) model time 0.2580 (0.2556) loss 5.3233 (5.9989) grad_norm 3.5949 (inf) loss_scale 2048.0000 (2824.6866) mem 9655MB [2024-08-04 04:14:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][510/625] eta 0:00:29 lr 0.001211 wd 0.0500 time 0.2552 (0.2581) data time 0.0012 (0.0019) model time 0.2540 (0.2555) loss 5.3531 (6.0026) grad_norm 2.3448 (inf) loss_scale 2048.0000 (2809.4873) mem 9655MB [2024-08-04 04:14:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][520/625] eta 0:00:27 lr 0.001210 wd 0.0500 time 0.2528 (0.2581) data time 0.0011 (0.0019) model time 0.2517 (0.2555) loss 6.5307 (6.0066) grad_norm 2.0954 (inf) loss_scale 2048.0000 (2794.8714) mem 9655MB [2024-08-04 04:14:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][530/625] eta 0:00:24 lr 0.001210 wd 0.0500 time 0.2570 (0.2580) data time 0.0009 (0.0019) model time 0.2561 (0.2555) loss 6.7013 (6.0195) grad_norm 1.5343 (inf) loss_scale 2048.0000 (2780.8060) mem 9655MB [2024-08-04 04:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][540/625] eta 0:00:21 lr 0.001210 wd 0.0500 time 0.2701 (0.2580) data time 0.0008 (0.0018) model time 0.2694 (0.2555) loss 6.4507 (6.0233) grad_norm 1.6980 (inf) loss_scale 2048.0000 (2767.2606) mem 9655MB [2024-08-04 04:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][550/625] eta 0:00:19 lr 0.001210 wd 0.0500 time 0.2550 (0.2580) data time 0.0010 (0.0018) model time 0.2540 (0.2555) loss 5.8672 (6.0247) grad_norm 1.8369 (inf) loss_scale 2048.0000 (2754.2069) mem 9655MB [2024-08-04 04:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][560/625] eta 0:00:16 lr 0.001210 wd 0.0500 time 0.2639 (0.2580) data time 0.0007 (0.0018) model time 0.2632 (0.2555) loss 6.7745 (6.0160) grad_norm 1.4470 (inf) loss_scale 2048.0000 (2741.6185) mem 9655MB [2024-08-04 04:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][570/625] eta 0:00:14 lr 0.001210 wd 0.0500 time 0.2503 (0.2579) data time 0.0006 (0.0018) model time 0.2497 (0.2555) loss 6.5053 (6.0230) grad_norm 2.9828 (inf) loss_scale 2048.0000 (2729.4711) mem 9655MB [2024-08-04 04:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][580/625] eta 0:00:11 lr 0.001209 wd 0.0500 time 0.2537 (0.2579) data time 0.0008 (0.0018) model time 0.2529 (0.2556) loss 6.4796 (6.0251) grad_norm 2.5322 (inf) loss_scale 2048.0000 (2717.7418) mem 9655MB [2024-08-04 04:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][590/625] eta 0:00:09 lr 0.001209 wd 0.0500 time 0.2569 (0.2579) data time 0.0008 (0.0018) model time 0.2561 (0.2556) loss 6.8706 (6.0242) grad_norm 1.3390 (inf) loss_scale 2048.0000 (2706.4095) mem 9655MB [2024-08-04 04:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][600/625] eta 0:00:06 lr 0.001209 wd 0.0500 time 0.2549 (0.2579) data time 0.0008 (0.0017) model time 0.2541 (0.2556) loss 6.8654 (6.0306) grad_norm 1.9311 (inf) loss_scale 2048.0000 (2695.4542) mem 9655MB [2024-08-04 04:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][610/625] eta 0:00:03 lr 0.001209 wd 0.0500 time 0.2515 (0.2579) data time 0.0005 (0.0017) model time 0.2510 (0.2556) loss 5.9640 (6.0331) grad_norm 1.3815 (inf) loss_scale 2048.0000 (2684.8576) mem 9655MB [2024-08-04 04:14:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][620/625] eta 0:00:01 lr 0.001209 wd 0.0500 time 0.2533 (0.2578) data time 0.0003 (0.0017) model time 0.2531 (0.2555) loss 5.3192 (6.0268) grad_norm 1.3869 (inf) loss_scale 2048.0000 (2674.6023) mem 9655MB [2024-08-04 04:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 141 training takes 0:02:41 [2024-08-04 04:14:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:14:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.505 (0.505) Loss 0.6621 (0.6621) Acc@1 88.281 (88.281) Acc@5 98.193 (98.193) Mem 9655MB [2024-08-04 04:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0449 (0.8142) Acc@1 77.539 (83.860) Acc@5 94.727 (97.057) Mem 9655MB [2024-08-04 04:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1943 (0.9641) Acc@1 72.998 (80.064) Acc@5 93.311 (95.371) Mem 9655MB [2024-08-04 04:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.744 Acc@5 95.345 [2024-08-04 04:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-04 04:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.734 (0.734) Loss 0.5874 (0.5874) Acc@1 88.818 (88.818) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9370 (0.7274) Acc@1 79.590 (85.161) Acc@5 95.508 (97.421) Mem 9655MB [2024-08-04 04:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0791 (0.8610) Acc@1 74.316 (81.494) Acc@5 93.848 (95.852) Mem 9655MB [2024-08-04 04:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.160 Acc@5 95.847 [2024-08-04 04:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.16% [2024-08-04 04:14:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:14:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][0/625] eta 0:07:26 lr 0.001209 wd 0.0500 time 0.7144 (0.7144) data time 0.4632 (0.4632) model time 0.0000 (0.0000) loss 6.4966 (6.4966) grad_norm 2.3123 (2.3123) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][10/625] eta 0:03:04 lr 0.001208 wd 0.0500 time 0.2546 (0.2997) data time 0.0007 (0.0429) model time 0.0000 (0.0000) loss 5.7910 (6.1894) grad_norm 1.6032 (2.4054) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][20/625] eta 0:02:49 lr 0.001208 wd 0.0500 time 0.2725 (0.2801) data time 0.0007 (0.0230) model time 0.0000 (0.0000) loss 6.0591 (6.2206) grad_norm 2.6562 (2.5424) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][30/625] eta 0:02:42 lr 0.001208 wd 0.0500 time 0.2554 (0.2726) data time 0.0009 (0.0158) model time 0.0000 (0.0000) loss 5.7009 (6.1936) grad_norm 1.4210 (2.2712) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][40/625] eta 0:02:37 lr 0.001208 wd 0.0500 time 0.2513 (0.2684) data time 0.0009 (0.0122) model time 0.0000 (0.0000) loss 5.4774 (6.1823) grad_norm 1.3583 (2.1058) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][50/625] eta 0:02:32 lr 0.001208 wd 0.0500 time 0.2553 (0.2659) data time 0.0020 (0.0100) model time 0.0000 (0.0000) loss 5.2735 (6.2398) grad_norm 1.2832 (1.9878) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][60/625] eta 0:02:29 lr 0.001208 wd 0.0500 time 0.2535 (0.2643) data time 0.0007 (0.0086) model time 0.2528 (0.2549) loss 6.7963 (6.1846) grad_norm 1.3399 (1.9281) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][70/625] eta 0:02:27 lr 0.001207 wd 0.0500 time 0.2562 (0.2661) data time 0.0016 (0.0075) model time 0.2545 (0.2658) loss 5.8197 (6.0799) grad_norm 2.3361 (1.8977) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][80/625] eta 0:02:24 lr 0.001207 wd 0.0500 time 0.2547 (0.2649) data time 0.0008 (0.0067) model time 0.2539 (0.2623) loss 6.8472 (6.0911) grad_norm 1.4694 (1.8683) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][90/625] eta 0:02:21 lr 0.001207 wd 0.0500 time 0.2565 (0.2641) data time 0.0009 (0.0060) model time 0.2556 (0.2609) loss 6.9269 (6.0810) grad_norm 2.1942 (1.8925) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][100/625] eta 0:02:18 lr 0.001207 wd 0.0500 time 0.2546 (0.2633) data time 0.0007 (0.0055) model time 0.2538 (0.2598) loss 6.4202 (6.0799) grad_norm 3.9079 (1.9171) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][110/625] eta 0:02:15 lr 0.001207 wd 0.0500 time 0.2533 (0.2626) data time 0.0011 (0.0051) model time 0.2522 (0.2589) loss 6.1958 (6.0806) grad_norm 1.2555 (1.9697) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][120/625] eta 0:02:13 lr 0.001206 wd 0.0500 time 0.2562 (0.2635) data time 0.0009 (0.0048) model time 0.2552 (0.2609) loss 6.6341 (6.0697) grad_norm 1.4798 (2.0234) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][130/625] eta 0:02:10 lr 0.001206 wd 0.0500 time 0.2549 (0.2629) data time 0.0010 (0.0045) model time 0.2539 (0.2601) loss 5.5810 (6.0615) grad_norm 2.4636 (2.0297) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][140/625] eta 0:02:08 lr 0.001206 wd 0.0500 time 0.2568 (0.2654) data time 0.0010 (0.0042) model time 0.2558 (0.2643) loss 4.6636 (6.0487) grad_norm 1.4594 (2.0868) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][150/625] eta 0:02:05 lr 0.001206 wd 0.0500 time 0.2548 (0.2648) data time 0.0010 (0.0040) model time 0.2538 (0.2633) loss 5.4697 (6.0127) grad_norm 1.1365 (2.0709) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][160/625] eta 0:02:02 lr 0.001206 wd 0.0500 time 0.2575 (0.2643) data time 0.0007 (0.0038) model time 0.2568 (0.2626) loss 5.1313 (6.0116) grad_norm 2.3481 (2.0584) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][170/625] eta 0:02:00 lr 0.001206 wd 0.0500 time 0.2524 (0.2638) data time 0.0008 (0.0036) model time 0.2516 (0.2619) loss 5.1559 (6.0143) grad_norm 2.7008 (2.0372) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][180/625] eta 0:01:57 lr 0.001205 wd 0.0500 time 0.2824 (0.2636) data time 0.0009 (0.0035) model time 0.2816 (0.2618) loss 6.4519 (5.9947) grad_norm 1.5950 (2.0198) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][190/625] eta 0:01:54 lr 0.001205 wd 0.0500 time 0.2516 (0.2632) data time 0.0009 (0.0034) model time 0.2506 (0.2613) loss 5.9564 (5.9967) grad_norm 1.1679 (1.9978) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][200/625] eta 0:01:51 lr 0.001205 wd 0.0500 time 0.2541 (0.2629) data time 0.0008 (0.0032) model time 0.2533 (0.2610) loss 6.9759 (6.0071) grad_norm 1.6738 (1.9742) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][210/625] eta 0:01:48 lr 0.001205 wd 0.0500 time 0.2635 (0.2626) data time 0.0010 (0.0031) model time 0.2625 (0.2607) loss 5.3909 (6.0179) grad_norm 1.7852 (1.9687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][220/625] eta 0:01:46 lr 0.001205 wd 0.0500 time 0.2562 (0.2623) data time 0.0010 (0.0030) model time 0.2552 (0.2603) loss 5.7762 (6.0152) grad_norm 1.9013 (1.9566) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:15:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][230/625] eta 0:01:43 lr 0.001205 wd 0.0500 time 0.2648 (0.2621) data time 0.0006 (0.0029) model time 0.2642 (0.2601) loss 7.4436 (6.0282) grad_norm 1.8669 (1.9501) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][240/625] eta 0:01:40 lr 0.001204 wd 0.0500 time 0.2584 (0.2618) data time 0.0008 (0.0029) model time 0.2576 (0.2598) loss 4.9640 (6.0225) grad_norm 2.2072 (1.9715) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][250/625] eta 0:01:38 lr 0.001204 wd 0.0500 time 0.2551 (0.2616) data time 0.0009 (0.0028) model time 0.2543 (0.2596) loss 5.3152 (5.9979) grad_norm 2.6545 (1.9665) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][260/625] eta 0:01:35 lr 0.001204 wd 0.0500 time 0.2646 (0.2614) data time 0.0007 (0.0027) model time 0.2640 (0.2594) loss 4.5750 (6.0012) grad_norm 1.6171 (1.9703) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][270/625] eta 0:01:32 lr 0.001204 wd 0.0500 time 0.2540 (0.2612) data time 0.0010 (0.0026) model time 0.2530 (0.2592) loss 7.4430 (6.0071) grad_norm 1.4642 (1.9800) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][280/625] eta 0:01:30 lr 0.001204 wd 0.0500 time 0.2578 (0.2610) data time 0.0006 (0.0026) model time 0.2572 (0.2590) loss 6.8449 (6.0170) grad_norm 1.5047 (1.9891) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][290/625] eta 0:01:27 lr 0.001204 wd 0.0500 time 0.2524 (0.2609) data time 0.0011 (0.0025) model time 0.2513 (0.2588) loss 6.4376 (6.0252) grad_norm 1.3520 (1.9933) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][300/625] eta 0:01:24 lr 0.001203 wd 0.0500 time 0.2658 (0.2607) data time 0.0009 (0.0025) model time 0.2649 (0.2587) loss 6.6781 (6.0364) grad_norm 2.8674 (2.0010) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][310/625] eta 0:01:22 lr 0.001203 wd 0.0500 time 0.2515 (0.2606) data time 0.0010 (0.0024) model time 0.2505 (0.2586) loss 5.7055 (6.0330) grad_norm 1.6753 (1.9977) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][320/625] eta 0:01:19 lr 0.001203 wd 0.0500 time 0.2553 (0.2604) data time 0.0008 (0.0024) model time 0.2545 (0.2584) loss 5.6126 (6.0336) grad_norm 1.3701 (1.9843) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][330/625] eta 0:01:16 lr 0.001203 wd 0.0500 time 0.2610 (0.2604) data time 0.0008 (0.0023) model time 0.2601 (0.2584) loss 6.3604 (6.0519) grad_norm 1.8366 (1.9818) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][340/625] eta 0:01:14 lr 0.001203 wd 0.0500 time 0.2569 (0.2603) data time 0.0007 (0.0023) model time 0.2562 (0.2584) loss 7.1026 (6.0633) grad_norm 3.3993 (1.9816) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][350/625] eta 0:01:11 lr 0.001202 wd 0.0500 time 0.2586 (0.2602) data time 0.0007 (0.0023) model time 0.2579 (0.2583) loss 5.8412 (6.0688) grad_norm 1.2139 (2.0191) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][360/625] eta 0:01:08 lr 0.001202 wd 0.0500 time 0.2521 (0.2601) data time 0.0008 (0.0022) model time 0.2513 (0.2582) loss 5.2954 (6.0721) grad_norm 1.2801 (2.0118) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][370/625] eta 0:01:06 lr 0.001202 wd 0.0500 time 0.2549 (0.2600) data time 0.0010 (0.0022) model time 0.2539 (0.2581) loss 6.6117 (6.0736) grad_norm 1.1007 (2.0147) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][380/625] eta 0:01:03 lr 0.001202 wd 0.0500 time 0.2543 (0.2599) data time 0.0006 (0.0022) model time 0.2537 (0.2580) loss 6.3680 (6.0752) grad_norm 1.6508 (2.0043) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][390/625] eta 0:01:01 lr 0.001202 wd 0.0500 time 0.2515 (0.2598) data time 0.0009 (0.0021) model time 0.2506 (0.2579) loss 5.2857 (6.0673) grad_norm 1.3399 (1.9903) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][400/625] eta 0:00:58 lr 0.001202 wd 0.0500 time 0.2593 (0.2597) data time 0.0014 (0.0021) model time 0.2579 (0.2579) loss 7.1382 (6.0702) grad_norm 1.3753 (1.9806) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][410/625] eta 0:00:55 lr 0.001201 wd 0.0500 time 0.2570 (0.2596) data time 0.0009 (0.0021) model time 0.2561 (0.2578) loss 6.5075 (6.0766) grad_norm 1.9845 (1.9849) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][420/625] eta 0:00:53 lr 0.001201 wd 0.0500 time 0.2567 (0.2596) data time 0.0010 (0.0020) model time 0.2557 (0.2578) loss 6.1566 (6.0755) grad_norm 1.4852 (1.9862) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][430/625] eta 0:00:50 lr 0.001201 wd 0.0500 time 0.2683 (0.2595) data time 0.0008 (0.0020) model time 0.2675 (0.2577) loss 4.9770 (6.0755) grad_norm 2.1037 (1.9854) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][440/625] eta 0:00:47 lr 0.001201 wd 0.0500 time 0.2542 (0.2595) data time 0.0009 (0.0020) model time 0.2533 (0.2577) loss 7.3185 (6.0736) grad_norm 1.7059 (1.9834) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][450/625] eta 0:00:45 lr 0.001201 wd 0.0500 time 0.2570 (0.2594) data time 0.0013 (0.0020) model time 0.2557 (0.2576) loss 6.3013 (6.0735) grad_norm 1.6690 (1.9781) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:16:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][460/625] eta 0:00:42 lr 0.001201 wd 0.0500 time 0.2560 (0.2593) data time 0.0006 (0.0019) model time 0.2555 (0.2576) loss 5.5520 (6.0703) grad_norm 1.9440 (1.9746) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][470/625] eta 0:00:40 lr 0.001200 wd 0.0500 time 0.2513 (0.2593) data time 0.0010 (0.0019) model time 0.2503 (0.2575) loss 6.2400 (6.0658) grad_norm 2.0945 (1.9783) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][480/625] eta 0:00:37 lr 0.001200 wd 0.0500 time 0.2739 (0.2593) data time 0.0006 (0.0019) model time 0.2734 (0.2575) loss 5.0453 (6.0620) grad_norm 4.2023 (1.9854) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][490/625] eta 0:00:34 lr 0.001200 wd 0.0500 time 0.2580 (0.2592) data time 0.0006 (0.0019) model time 0.2574 (0.2575) loss 4.8280 (6.0612) grad_norm 1.5581 (1.9907) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][500/625] eta 0:00:32 lr 0.001200 wd 0.0500 time 0.2610 (0.2592) data time 0.0006 (0.0019) model time 0.2603 (0.2575) loss 4.8729 (6.0587) grad_norm 3.3470 (1.9898) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][510/625] eta 0:00:29 lr 0.001200 wd 0.0500 time 0.2529 (0.2591) data time 0.0010 (0.0018) model time 0.2520 (0.2574) loss 6.5391 (6.0518) grad_norm 2.2466 (1.9875) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][520/625] eta 0:00:27 lr 0.001200 wd 0.0500 time 0.2579 (0.2591) data time 0.0006 (0.0018) model time 0.2573 (0.2574) loss 6.5566 (6.0491) grad_norm 1.4075 (1.9840) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][530/625] eta 0:00:24 lr 0.001199 wd 0.0500 time 0.2587 (0.2591) data time 0.0007 (0.0018) model time 0.2580 (0.2574) loss 6.8268 (6.0503) grad_norm 2.5514 (1.9807) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][540/625] eta 0:00:22 lr 0.001199 wd 0.0500 time 0.2572 (0.2590) data time 0.0007 (0.0018) model time 0.2565 (0.2574) loss 6.8604 (6.0481) grad_norm 1.5714 (1.9867) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][550/625] eta 0:00:19 lr 0.001199 wd 0.0500 time 0.2537 (0.2590) data time 0.0009 (0.0018) model time 0.2528 (0.2573) loss 5.9490 (6.0505) grad_norm 1.4811 (1.9794) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][560/625] eta 0:00:16 lr 0.001199 wd 0.0500 time 0.2592 (0.2589) data time 0.0009 (0.0017) model time 0.2583 (0.2573) loss 6.4598 (6.0477) grad_norm 3.5938 (1.9848) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][570/625] eta 0:00:14 lr 0.001199 wd 0.0500 time 0.2567 (0.2589) data time 0.0006 (0.0017) model time 0.2561 (0.2573) loss 6.0686 (6.0510) grad_norm 1.3026 (1.9820) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][580/625] eta 0:00:11 lr 0.001198 wd 0.0500 time 0.2581 (0.2588) data time 0.0009 (0.0017) model time 0.2572 (0.2572) loss 6.9807 (6.0547) grad_norm 1.7058 (1.9829) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][590/625] eta 0:00:09 lr 0.001198 wd 0.0500 time 0.2582 (0.2588) data time 0.0008 (0.0017) model time 0.2574 (0.2572) loss 4.9704 (6.0526) grad_norm 1.5195 (1.9818) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][600/625] eta 0:00:06 lr 0.001198 wd 0.0500 time 0.2533 (0.2588) data time 0.0007 (0.0017) model time 0.2526 (0.2572) loss 6.0898 (6.0512) grad_norm 1.9396 (1.9807) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][610/625] eta 0:00:03 lr 0.001198 wd 0.0500 time 0.2527 (0.2587) data time 0.0004 (0.0017) model time 0.2523 (0.2572) loss 5.1155 (6.0518) grad_norm 1.3507 (1.9758) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][620/625] eta 0:00:01 lr 0.001198 wd 0.0500 time 0.2525 (0.2586) data time 0.0003 (0.0017) model time 0.2522 (0.2571) loss 5.6987 (6.0493) grad_norm 1.2105 (1.9676) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 142 training takes 0:02:41 [2024-08-04 04:17:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:17:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.6641 (0.6641) Acc@1 88.428 (88.428) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0596 (0.8296) Acc@1 77.490 (83.936) Acc@5 94.580 (97.004) Mem 9655MB [2024-08-04 04:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1924 (0.9690) Acc@1 73.633 (80.143) Acc@5 93.164 (95.431) Mem 9655MB [2024-08-04 04:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.888 Acc@5 95.403 [2024-08-04 04:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-04 04:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.711 (0.711) Loss 0.5879 (0.5879) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.130) Loss 0.9380 (0.7278) Acc@1 79.541 (85.156) Acc@5 95.508 (97.434) Mem 9655MB [2024-08-04 04:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0781 (0.8612) Acc@1 74.414 (81.499) Acc@5 93.750 (95.871) Mem 9655MB [2024-08-04 04:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.180 Acc@5 95.857 [2024-08-04 04:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.18% [2024-08-04 04:17:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:17:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][0/625] eta 0:07:29 lr 0.001198 wd 0.0500 time 0.7196 (0.7196) data time 0.4805 (0.4805) model time 0.0000 (0.0000) loss 6.3027 (6.3027) grad_norm 1.5978 (1.5978) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][10/625] eta 0:03:02 lr 0.001198 wd 0.0500 time 0.2538 (0.2974) data time 0.0006 (0.0444) model time 0.0000 (0.0000) loss 4.9646 (6.1194) grad_norm 1.2751 (1.5350) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][20/625] eta 0:02:48 lr 0.001197 wd 0.0500 time 0.2562 (0.2778) data time 0.0008 (0.0237) model time 0.0000 (0.0000) loss 5.9618 (5.9031) grad_norm 1.9113 (1.6185) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][30/625] eta 0:02:41 lr 0.001197 wd 0.0500 time 0.2548 (0.2709) data time 0.0007 (0.0164) model time 0.0000 (0.0000) loss 6.6078 (6.0151) grad_norm 1.2941 (1.5616) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][40/625] eta 0:02:36 lr 0.001197 wd 0.0500 time 0.2555 (0.2673) data time 0.0007 (0.0126) model time 0.0000 (0.0000) loss 4.9965 (5.9931) grad_norm 1.3321 (1.6721) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][50/625] eta 0:02:32 lr 0.001197 wd 0.0500 time 0.2558 (0.2651) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 4.4635 (6.0056) grad_norm 2.4377 (1.7050) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][60/625] eta 0:02:30 lr 0.001197 wd 0.0500 time 0.2550 (0.2666) data time 0.0008 (0.0088) model time 0.2542 (0.2737) loss 4.6381 (6.0008) grad_norm 1.3007 (1.7691) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][70/625] eta 0:02:27 lr 0.001196 wd 0.0500 time 0.2558 (0.2653) data time 0.0006 (0.0077) model time 0.2552 (0.2648) loss 6.0964 (6.0346) grad_norm 1.5658 (1.7751) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][80/625] eta 0:02:24 lr 0.001196 wd 0.0500 time 0.2612 (0.2642) data time 0.0009 (0.0069) model time 0.2603 (0.2618) loss 4.4179 (5.9783) grad_norm 1.9561 (1.8970) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][90/625] eta 0:02:20 lr 0.001196 wd 0.0500 time 0.2563 (0.2633) data time 0.0009 (0.0062) model time 0.2555 (0.2601) loss 6.1252 (5.9463) grad_norm 4.1662 (2.0258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][100/625] eta 0:02:17 lr 0.001196 wd 0.0500 time 0.2628 (0.2626) data time 0.0006 (0.0057) model time 0.2623 (0.2592) loss 5.1173 (5.9229) grad_norm 2.6405 (2.0520) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][110/625] eta 0:02:14 lr 0.001196 wd 0.0500 time 0.2559 (0.2621) data time 0.0008 (0.0053) model time 0.2551 (0.2585) loss 5.2631 (5.9397) grad_norm 1.4298 (2.0415) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][120/625] eta 0:02:12 lr 0.001196 wd 0.0500 time 0.2589 (0.2617) data time 0.0007 (0.0049) model time 0.2582 (0.2582) loss 6.4292 (5.9603) grad_norm 1.8055 (2.0108) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][130/625] eta 0:02:09 lr 0.001195 wd 0.0500 time 0.2556 (0.2612) data time 0.0011 (0.0046) model time 0.2545 (0.2578) loss 6.8823 (5.9946) grad_norm 2.3632 (2.0324) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][140/625] eta 0:02:06 lr 0.001195 wd 0.0500 time 0.2580 (0.2610) data time 0.0008 (0.0043) model time 0.2573 (0.2577) loss 7.0839 (6.0180) grad_norm 1.5938 (2.0200) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][150/625] eta 0:02:03 lr 0.001195 wd 0.0500 time 0.2576 (0.2606) data time 0.0006 (0.0041) model time 0.2570 (0.2573) loss 6.1834 (6.0078) grad_norm 1.8568 (2.0098) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][160/625] eta 0:02:02 lr 0.001195 wd 0.0500 time 0.2541 (0.2628) data time 0.0006 (0.0039) model time 0.2535 (0.2608) loss 6.3956 (5.9696) grad_norm 2.1968 (2.0094) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][170/625] eta 0:01:59 lr 0.001195 wd 0.0500 time 0.2580 (0.2624) data time 0.0009 (0.0037) model time 0.2571 (0.2604) loss 6.7325 (5.9718) grad_norm 1.4420 (2.0145) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][180/625] eta 0:01:56 lr 0.001195 wd 0.0500 time 0.2575 (0.2621) data time 0.0009 (0.0036) model time 0.2566 (0.2600) loss 4.9580 (5.9714) grad_norm 1.4410 (2.0040) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][190/625] eta 0:01:53 lr 0.001194 wd 0.0500 time 0.2534 (0.2618) data time 0.0008 (0.0034) model time 0.2525 (0.2596) loss 5.7878 (5.9901) grad_norm 2.4287 (2.0150) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][200/625] eta 0:01:51 lr 0.001194 wd 0.0500 time 0.3965 (0.2622) data time 0.0008 (0.0033) model time 0.3957 (0.2603) loss 6.8436 (6.0054) grad_norm 1.1583 (1.9949) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][210/625] eta 0:01:48 lr 0.001194 wd 0.0500 time 0.2552 (0.2626) data time 0.0006 (0.0032) model time 0.2546 (0.2609) loss 6.1100 (6.0088) grad_norm 1.3909 (1.9863) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][220/625] eta 0:01:46 lr 0.001194 wd 0.0500 time 0.2566 (0.2624) data time 0.0008 (0.0031) model time 0.2558 (0.2606) loss 6.7072 (6.0178) grad_norm 1.4535 (1.9779) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][230/625] eta 0:01:43 lr 0.001194 wd 0.0500 time 0.2621 (0.2622) data time 0.0006 (0.0030) model time 0.2615 (0.2605) loss 5.9774 (6.0156) grad_norm 1.1938 (1.9679) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][240/625] eta 0:01:40 lr 0.001193 wd 0.0500 time 0.2549 (0.2619) data time 0.0007 (0.0029) model time 0.2542 (0.2601) loss 5.3527 (6.0076) grad_norm 1.4607 (1.9628) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][250/625] eta 0:01:38 lr 0.001193 wd 0.0500 time 0.2554 (0.2617) data time 0.0009 (0.0028) model time 0.2545 (0.2599) loss 4.9860 (5.9996) grad_norm 2.1851 (2.0088) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][260/625] eta 0:01:35 lr 0.001193 wd 0.0500 time 0.2582 (0.2615) data time 0.0008 (0.0028) model time 0.2574 (0.2597) loss 5.4042 (5.9893) grad_norm 1.6301 (2.0128) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][270/625] eta 0:01:32 lr 0.001193 wd 0.0500 time 0.2582 (0.2613) data time 0.0012 (0.0027) model time 0.2570 (0.2595) loss 6.6854 (5.9825) grad_norm 1.4856 (2.0051) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][280/625] eta 0:01:30 lr 0.001193 wd 0.0500 time 0.2675 (0.2612) data time 0.0006 (0.0026) model time 0.2669 (0.2594) loss 5.8034 (5.9742) grad_norm 1.2720 (1.9925) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][290/625] eta 0:01:27 lr 0.001193 wd 0.0500 time 0.2583 (0.2610) data time 0.0009 (0.0026) model time 0.2574 (0.2592) loss 6.1267 (5.9761) grad_norm 1.7322 (1.9965) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][300/625] eta 0:01:24 lr 0.001192 wd 0.0500 time 0.2553 (0.2608) data time 0.0008 (0.0025) model time 0.2545 (0.2590) loss 7.1159 (5.9857) grad_norm 1.0653 (1.9896) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][310/625] eta 0:01:22 lr 0.001192 wd 0.0500 time 0.2590 (0.2607) data time 0.0006 (0.0025) model time 0.2584 (0.2589) loss 4.9825 (5.9817) grad_norm 1.6606 (1.9809) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][320/625] eta 0:01:19 lr 0.001192 wd 0.0500 time 0.2589 (0.2605) data time 0.0010 (0.0024) model time 0.2580 (0.2587) loss 4.7022 (5.9770) grad_norm 2.5214 (1.9794) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][330/625] eta 0:01:16 lr 0.001192 wd 0.0500 time 0.2565 (0.2605) data time 0.0010 (0.0024) model time 0.2555 (0.2587) loss 7.0110 (5.9837) grad_norm 2.2860 (1.9794) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][340/625] eta 0:01:14 lr 0.001192 wd 0.0500 time 0.2553 (0.2603) data time 0.0007 (0.0023) model time 0.2546 (0.2586) loss 5.9514 (5.9830) grad_norm 1.7505 (1.9862) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][350/625] eta 0:01:11 lr 0.001192 wd 0.0500 time 0.2499 (0.2602) data time 0.0010 (0.0023) model time 0.2490 (0.2585) loss 5.2139 (5.9831) grad_norm 1.5882 (1.9813) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][360/625] eta 0:01:08 lr 0.001191 wd 0.0500 time 0.2557 (0.2601) data time 0.0006 (0.0023) model time 0.2551 (0.2584) loss 7.0277 (5.9814) grad_norm 1.6455 (1.9746) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][370/625] eta 0:01:06 lr 0.001191 wd 0.0500 time 0.2514 (0.2600) data time 0.0010 (0.0022) model time 0.2504 (0.2583) loss 5.3078 (5.9733) grad_norm 1.2230 (1.9706) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][380/625] eta 0:01:03 lr 0.001191 wd 0.0500 time 0.2597 (0.2599) data time 0.0007 (0.0022) model time 0.2591 (0.2582) loss 6.2367 (5.9715) grad_norm 2.5471 (1.9717) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][390/625] eta 0:01:01 lr 0.001191 wd 0.0500 time 0.2555 (0.2598) data time 0.0008 (0.0021) model time 0.2547 (0.2581) loss 5.7430 (5.9734) grad_norm 1.4032 (1.9687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][400/625] eta 0:00:58 lr 0.001191 wd 0.0500 time 0.2525 (0.2597) data time 0.0009 (0.0021) model time 0.2516 (0.2580) loss 6.6729 (5.9710) grad_norm 1.8515 (1.9632) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][410/625] eta 0:00:55 lr 0.001191 wd 0.0500 time 0.2572 (0.2597) data time 0.0008 (0.0021) model time 0.2564 (0.2580) loss 5.4348 (5.9634) grad_norm 1.7185 (1.9591) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][420/625] eta 0:00:53 lr 0.001190 wd 0.0500 time 0.2503 (0.2596) data time 0.0009 (0.0021) model time 0.2494 (0.2579) loss 7.0839 (5.9570) grad_norm 2.2924 (1.9603) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][430/625] eta 0:00:50 lr 0.001190 wd 0.0500 time 0.2572 (0.2595) data time 0.0008 (0.0020) model time 0.2564 (0.2578) loss 5.7121 (5.9609) grad_norm 1.7350 (1.9579) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][440/625] eta 0:00:47 lr 0.001190 wd 0.0500 time 0.2598 (0.2594) data time 0.0007 (0.0020) model time 0.2591 (0.2578) loss 6.4581 (5.9637) grad_norm 1.7214 (1.9535) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][450/625] eta 0:00:45 lr 0.001190 wd 0.0500 time 0.2551 (0.2594) data time 0.0009 (0.0020) model time 0.2542 (0.2577) loss 5.8440 (5.9583) grad_norm 1.3749 (1.9473) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][460/625] eta 0:00:42 lr 0.001190 wd 0.0500 time 0.2526 (0.2593) data time 0.0012 (0.0020) model time 0.2514 (0.2576) loss 4.9746 (5.9587) grad_norm 2.1744 (1.9532) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][470/625] eta 0:00:40 lr 0.001189 wd 0.0500 time 0.2539 (0.2592) data time 0.0010 (0.0020) model time 0.2529 (0.2575) loss 5.9788 (5.9567) grad_norm 1.8816 (1.9582) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][480/625] eta 0:00:37 lr 0.001189 wd 0.0500 time 0.2553 (0.2592) data time 0.0010 (0.0019) model time 0.2543 (0.2575) loss 5.9786 (5.9595) grad_norm 2.6229 (1.9673) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][490/625] eta 0:00:34 lr 0.001189 wd 0.0500 time 0.2661 (0.2591) data time 0.0010 (0.0019) model time 0.2651 (0.2575) loss 5.5981 (5.9509) grad_norm 1.7246 (1.9688) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][500/625] eta 0:00:32 lr 0.001189 wd 0.0500 time 0.2554 (0.2591) data time 0.0009 (0.0019) model time 0.2545 (0.2574) loss 5.4245 (5.9549) grad_norm 1.7779 (1.9658) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:19:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][510/625] eta 0:00:29 lr 0.001189 wd 0.0500 time 0.2539 (0.2590) data time 0.0010 (0.0019) model time 0.2529 (0.2574) loss 5.9658 (5.9587) grad_norm 3.1113 (1.9653) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][520/625] eta 0:00:27 lr 0.001189 wd 0.0500 time 0.2623 (0.2590) data time 0.0009 (0.0019) model time 0.2615 (0.2574) loss 5.2764 (5.9550) grad_norm 1.4039 (1.9603) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][530/625] eta 0:00:24 lr 0.001188 wd 0.0500 time 0.2516 (0.2589) data time 0.0009 (0.0018) model time 0.2507 (0.2573) loss 5.2206 (5.9567) grad_norm 1.5392 (1.9605) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][540/625] eta 0:00:22 lr 0.001188 wd 0.0500 time 0.2615 (0.2589) data time 0.0007 (0.0018) model time 0.2607 (0.2573) loss 7.1422 (5.9587) grad_norm 1.6665 (1.9616) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][550/625] eta 0:00:19 lr 0.001188 wd 0.0500 time 0.2530 (0.2588) data time 0.0010 (0.0018) model time 0.2521 (0.2573) loss 5.3596 (5.9563) grad_norm 2.1862 (1.9590) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][560/625] eta 0:00:16 lr 0.001188 wd 0.0500 time 0.2569 (0.2588) data time 0.0009 (0.0018) model time 0.2560 (0.2572) loss 5.6980 (5.9627) grad_norm 1.2570 (1.9556) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][570/625] eta 0:00:14 lr 0.001188 wd 0.0500 time 0.2585 (0.2588) data time 0.0008 (0.0018) model time 0.2577 (0.2572) loss 6.7400 (5.9781) grad_norm 2.0543 (1.9526) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][580/625] eta 0:00:11 lr 0.001188 wd 0.0500 time 0.2580 (0.2587) data time 0.0005 (0.0018) model time 0.2575 (0.2572) loss 6.4628 (5.9871) grad_norm 2.0487 (1.9509) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][590/625] eta 0:00:09 lr 0.001187 wd 0.0500 time 0.2565 (0.2587) data time 0.0009 (0.0018) model time 0.2556 (0.2571) loss 6.2358 (5.9813) grad_norm 2.3231 (1.9538) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][600/625] eta 0:00:06 lr 0.001187 wd 0.0500 time 0.2553 (0.2587) data time 0.0007 (0.0017) model time 0.2547 (0.2571) loss 5.6257 (5.9773) grad_norm 1.8333 (1.9497) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][610/625] eta 0:00:03 lr 0.001187 wd 0.0500 time 0.2524 (0.2586) data time 0.0005 (0.0017) model time 0.2519 (0.2570) loss 6.2797 (5.9818) grad_norm 4.2335 (1.9613) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][620/625] eta 0:00:01 lr 0.001187 wd 0.0500 time 0.2532 (0.2585) data time 0.0005 (0.0017) model time 0.2527 (0.2570) loss 5.7409 (5.9771) grad_norm 1.6703 (1.9572) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 143 training takes 0:02:41 [2024-08-04 04:20:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:20:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:20:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.6724 (0.6724) Acc@1 87.842 (87.842) Acc@5 98.047 (98.047) Mem 9655MB [2024-08-04 04:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0615 (0.8233) Acc@1 78.320 (83.794) Acc@5 94.482 (96.893) Mem 9655MB [2024-08-04 04:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.2227 (0.9724) Acc@1 72.412 (79.987) Acc@5 92.285 (95.136) Mem 9655MB [2024-08-04 04:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.685 Acc@5 95.124 [2024-08-04 04:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-04 04:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.805 (0.805) Loss 0.5884 (0.5884) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9395 (0.7284) Acc@1 79.541 (85.130) Acc@5 95.410 (97.430) Mem 9655MB [2024-08-04 04:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0781 (0.8615) Acc@1 74.609 (81.489) Acc@5 93.750 (95.887) Mem 9655MB [2024-08-04 04:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.176 Acc@5 95.867 [2024-08-04 04:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][0/625] eta 0:11:25 lr 0.001187 wd 0.0500 time 1.0963 (1.0963) data time 0.5937 (0.5937) model time 0.0000 (0.0000) loss 6.0730 (6.0730) grad_norm 1.2214 (1.2214) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][10/625] eta 0:03:25 lr 0.001187 wd 0.0500 time 0.2603 (0.3345) data time 0.0006 (0.0548) model time 0.0000 (0.0000) loss 5.9103 (6.0277) grad_norm 4.3799 (1.9003) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][20/625] eta 0:02:59 lr 0.001186 wd 0.0500 time 0.2552 (0.2971) data time 0.0008 (0.0291) model time 0.0000 (0.0000) loss 5.3989 (5.9744) grad_norm 1.6408 (1.9096) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][30/625] eta 0:02:48 lr 0.001186 wd 0.0500 time 0.2537 (0.2836) data time 0.0007 (0.0200) model time 0.0000 (0.0000) loss 5.1963 (6.0981) grad_norm 2.1755 (2.0454) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][40/625] eta 0:02:43 lr 0.001186 wd 0.0500 time 0.2569 (0.2798) data time 0.0008 (0.0154) model time 0.0000 (0.0000) loss 5.9542 (6.0803) grad_norm 2.4259 (1.9519) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][50/625] eta 0:02:38 lr 0.001186 wd 0.0500 time 0.2582 (0.2750) data time 0.0007 (0.0125) model time 0.0000 (0.0000) loss 7.1652 (6.1472) grad_norm 2.0081 (1.9461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][60/625] eta 0:02:33 lr 0.001186 wd 0.0500 time 0.2584 (0.2725) data time 0.0009 (0.0106) model time 0.2574 (0.2585) loss 4.9807 (6.1410) grad_norm 1.9708 (1.9808) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][70/625] eta 0:02:30 lr 0.001186 wd 0.0500 time 0.2523 (0.2703) data time 0.0013 (0.0093) model time 0.2511 (0.2575) loss 6.8890 (6.0789) grad_norm 1.3754 (1.9860) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][80/625] eta 0:02:26 lr 0.001185 wd 0.0500 time 0.2560 (0.2686) data time 0.0006 (0.0082) model time 0.2554 (0.2568) loss 6.0281 (6.0711) grad_norm 1.3562 (1.9599) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][90/625] eta 0:02:22 lr 0.001185 wd 0.0500 time 0.2541 (0.2673) data time 0.0009 (0.0074) model time 0.2531 (0.2565) loss 5.9438 (6.0797) grad_norm 2.6692 (1.9543) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:20:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][100/625] eta 0:02:19 lr 0.001185 wd 0.0500 time 0.2571 (0.2662) data time 0.0007 (0.0068) model time 0.2564 (0.2563) loss 5.5412 (6.0895) grad_norm 2.2391 (1.9326) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][110/625] eta 0:02:16 lr 0.001185 wd 0.0500 time 0.2551 (0.2654) data time 0.0006 (0.0062) model time 0.2545 (0.2563) loss 5.4122 (6.0625) grad_norm 1.2319 (1.9046) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][120/625] eta 0:02:13 lr 0.001185 wd 0.0500 time 0.2530 (0.2648) data time 0.0008 (0.0058) model time 0.2522 (0.2564) loss 6.0632 (6.0327) grad_norm 2.1361 (1.8921) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][130/625] eta 0:02:10 lr 0.001184 wd 0.0500 time 0.2566 (0.2642) data time 0.0006 (0.0054) model time 0.2560 (0.2564) loss 7.1848 (6.0775) grad_norm 2.3150 (1.8897) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][140/625] eta 0:02:07 lr 0.001184 wd 0.0500 time 0.2564 (0.2636) data time 0.0012 (0.0051) model time 0.2552 (0.2563) loss 6.5387 (6.0711) grad_norm 2.0335 (1.8939) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][150/625] eta 0:02:05 lr 0.001184 wd 0.0500 time 0.2537 (0.2633) data time 0.0007 (0.0048) model time 0.2530 (0.2564) loss 6.3611 (6.0703) grad_norm 2.4233 (1.9060) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][160/625] eta 0:02:02 lr 0.001184 wd 0.0500 time 0.2534 (0.2629) data time 0.0008 (0.0046) model time 0.2527 (0.2563) loss 7.2411 (6.0723) grad_norm 1.6043 (1.9241) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][170/625] eta 0:01:59 lr 0.001184 wd 0.0500 time 0.2551 (0.2624) data time 0.0008 (0.0044) model time 0.2543 (0.2562) loss 6.3208 (6.0776) grad_norm 1.2332 (1.9106) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][180/625] eta 0:01:56 lr 0.001184 wd 0.0500 time 0.2564 (0.2621) data time 0.0006 (0.0042) model time 0.2558 (0.2562) loss 5.0896 (6.0838) grad_norm 2.3030 (1.9126) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][190/625] eta 0:01:53 lr 0.001183 wd 0.0500 time 0.2549 (0.2618) data time 0.0008 (0.0040) model time 0.2541 (0.2561) loss 5.9607 (6.0680) grad_norm 2.0637 (1.9393) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][200/625] eta 0:01:51 lr 0.001183 wd 0.0500 time 0.2529 (0.2615) data time 0.0010 (0.0038) model time 0.2519 (0.2560) loss 6.7348 (6.0721) grad_norm 1.5869 (1.9344) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][210/625] eta 0:01:48 lr 0.001183 wd 0.0500 time 0.2557 (0.2613) data time 0.0010 (0.0037) model time 0.2547 (0.2560) loss 6.3517 (6.0758) grad_norm 1.5059 (1.9143) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][220/625] eta 0:01:45 lr 0.001183 wd 0.0500 time 0.2549 (0.2610) data time 0.0007 (0.0036) model time 0.2542 (0.2559) loss 5.4533 (6.0687) grad_norm 2.1054 (1.9469) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][230/625] eta 0:01:43 lr 0.001183 wd 0.0500 time 0.2568 (0.2621) data time 0.0007 (0.0035) model time 0.2561 (0.2576) loss 5.9429 (6.0561) grad_norm 2.6942 (1.9599) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][240/625] eta 0:01:40 lr 0.001183 wd 0.0500 time 0.2535 (0.2618) data time 0.0010 (0.0034) model time 0.2525 (0.2574) loss 6.3388 (6.0635) grad_norm 1.9269 (1.9622) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][250/625] eta 0:01:38 lr 0.001182 wd 0.0500 time 0.2584 (0.2616) data time 0.0009 (0.0033) model time 0.2575 (0.2573) loss 5.3878 (6.0669) grad_norm 2.2735 (1.9672) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][260/625] eta 0:01:35 lr 0.001182 wd 0.0500 time 0.2553 (0.2614) data time 0.0008 (0.0032) model time 0.2544 (0.2572) loss 4.7811 (6.0687) grad_norm 2.6295 (1.9724) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][270/625] eta 0:01:32 lr 0.001182 wd 0.0500 time 0.2607 (0.2612) data time 0.0007 (0.0031) model time 0.2600 (0.2572) loss 5.7769 (6.0709) grad_norm 1.2153 (1.9678) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][280/625] eta 0:01:30 lr 0.001182 wd 0.0500 time 0.2566 (0.2610) data time 0.0009 (0.0030) model time 0.2558 (0.2570) loss 5.3824 (6.0704) grad_norm 1.8014 (1.9610) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][290/625] eta 0:01:27 lr 0.001182 wd 0.0500 time 0.2536 (0.2609) data time 0.0007 (0.0029) model time 0.2528 (0.2570) loss 5.1345 (6.0733) grad_norm 1.3950 (1.9647) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][300/625] eta 0:01:24 lr 0.001182 wd 0.0500 time 0.2578 (0.2607) data time 0.0008 (0.0029) model time 0.2570 (0.2569) loss 5.6573 (6.0630) grad_norm 1.5016 (1.9727) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][310/625] eta 0:01:22 lr 0.001181 wd 0.0500 time 0.2517 (0.2606) data time 0.0010 (0.0028) model time 0.2507 (0.2568) loss 5.9846 (6.0505) grad_norm 1.5249 (1.9621) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][320/625] eta 0:01:19 lr 0.001181 wd 0.0500 time 0.2592 (0.2604) data time 0.0009 (0.0028) model time 0.2583 (0.2568) loss 4.5919 (6.0398) grad_norm 1.5327 (1.9528) loss_scale 4096.0000 (2086.2804) mem 9655MB [2024-08-04 04:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][330/625] eta 0:01:16 lr 0.001181 wd 0.0500 time 0.2598 (0.2603) data time 0.0005 (0.0027) model time 0.2593 (0.2567) loss 5.0818 (6.0439) grad_norm 2.5049 (1.9778) loss_scale 4096.0000 (2146.9970) mem 9655MB [2024-08-04 04:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][340/625] eta 0:01:14 lr 0.001181 wd 0.0500 time 0.2547 (0.2602) data time 0.0012 (0.0027) model time 0.2535 (0.2566) loss 6.0900 (6.0382) grad_norm 2.1672 (1.9794) loss_scale 4096.0000 (2204.1525) mem 9655MB [2024-08-04 04:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][350/625] eta 0:01:11 lr 0.001181 wd 0.0500 time 0.2521 (0.2601) data time 0.0011 (0.0026) model time 0.2511 (0.2567) loss 6.1412 (6.0215) grad_norm 1.7181 (1.9806) loss_scale 4096.0000 (2258.0513) mem 9655MB [2024-08-04 04:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][360/625] eta 0:01:08 lr 0.001180 wd 0.0500 time 0.2668 (0.2601) data time 0.0010 (0.0026) model time 0.2658 (0.2567) loss 6.5585 (6.0295) grad_norm 1.3064 (1.9837) loss_scale 4096.0000 (2308.9640) mem 9655MB [2024-08-04 04:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][370/625] eta 0:01:06 lr 0.001180 wd 0.0500 time 0.2569 (0.2600) data time 0.0007 (0.0025) model time 0.2562 (0.2567) loss 7.8143 (6.0358) grad_norm 2.0540 (2.0032) loss_scale 4096.0000 (2357.1321) mem 9655MB [2024-08-04 04:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][380/625] eta 0:01:03 lr 0.001180 wd 0.0500 time 0.2557 (0.2599) data time 0.0007 (0.0025) model time 0.2550 (0.2566) loss 5.2074 (6.0309) grad_norm 2.7993 (2.0085) loss_scale 4096.0000 (2402.7717) mem 9655MB [2024-08-04 04:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][390/625] eta 0:01:01 lr 0.001180 wd 0.0500 time 0.2549 (0.2598) data time 0.0007 (0.0024) model time 0.2543 (0.2566) loss 6.0066 (6.0310) grad_norm 1.2345 (2.0196) loss_scale 4096.0000 (2446.0767) mem 9655MB [2024-08-04 04:22:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][400/625] eta 0:00:58 lr 0.001180 wd 0.0500 time 0.2552 (0.2597) data time 0.0007 (0.0024) model time 0.2545 (0.2566) loss 4.6809 (6.0281) grad_norm 2.3068 (2.0385) loss_scale 4096.0000 (2487.2219) mem 9655MB [2024-08-04 04:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][410/625] eta 0:00:55 lr 0.001180 wd 0.0500 time 0.2504 (0.2596) data time 0.0009 (0.0024) model time 0.2494 (0.2565) loss 5.1397 (6.0327) grad_norm 1.6532 (2.0463) loss_scale 4096.0000 (2526.3650) mem 9655MB [2024-08-04 04:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][420/625] eta 0:00:53 lr 0.001179 wd 0.0500 time 0.2533 (0.2595) data time 0.0008 (0.0023) model time 0.2525 (0.2565) loss 5.4765 (6.0381) grad_norm 1.4948 (2.0403) loss_scale 4096.0000 (2563.6485) mem 9655MB [2024-08-04 04:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][430/625] eta 0:00:50 lr 0.001179 wd 0.0500 time 0.2547 (0.2595) data time 0.0009 (0.0023) model time 0.2538 (0.2565) loss 6.7519 (6.0359) grad_norm 3.2571 (2.0358) loss_scale 4096.0000 (2599.2019) mem 9655MB [2024-08-04 04:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][440/625] eta 0:00:47 lr 0.001179 wd 0.0500 time 0.2551 (0.2594) data time 0.0007 (0.0023) model time 0.2544 (0.2565) loss 7.4724 (6.0350) grad_norm 2.8246 (2.0408) loss_scale 4096.0000 (2633.1429) mem 9655MB [2024-08-04 04:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][450/625] eta 0:00:45 lr 0.001179 wd 0.0500 time 0.2509 (0.2598) data time 0.0009 (0.0022) model time 0.2500 (0.2569) loss 7.2846 (6.0446) grad_norm 1.4159 (2.0468) loss_scale 4096.0000 (2665.5787) mem 9655MB [2024-08-04 04:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][460/625] eta 0:00:42 lr 0.001179 wd 0.0500 time 0.2538 (0.2601) data time 0.0007 (0.0022) model time 0.2531 (0.2574) loss 5.0306 (6.0415) grad_norm 2.2102 (2.0468) loss_scale 4096.0000 (2696.6074) mem 9655MB [2024-08-04 04:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][470/625] eta 0:00:40 lr 0.001179 wd 0.0500 time 0.2532 (0.2601) data time 0.0009 (0.0022) model time 0.2523 (0.2573) loss 6.5722 (6.0413) grad_norm 1.9995 (2.0608) loss_scale 4096.0000 (2726.3185) mem 9655MB [2024-08-04 04:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][480/625] eta 0:00:37 lr 0.001178 wd 0.0500 time 0.2548 (0.2600) data time 0.0009 (0.0022) model time 0.2539 (0.2573) loss 6.0881 (6.0389) grad_norm 1.1377 (2.0558) loss_scale 4096.0000 (2754.7942) mem 9655MB [2024-08-04 04:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][490/625] eta 0:00:35 lr 0.001178 wd 0.0500 time 0.2531 (0.2599) data time 0.0010 (0.0021) model time 0.2521 (0.2572) loss 6.9055 (6.0409) grad_norm 2.3596 (2.0486) loss_scale 4096.0000 (2782.1100) mem 9655MB [2024-08-04 04:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][500/625] eta 0:00:32 lr 0.001178 wd 0.0500 time 0.2603 (0.2598) data time 0.0007 (0.0021) model time 0.2596 (0.2572) loss 5.4820 (6.0339) grad_norm 1.2263 (2.0440) loss_scale 4096.0000 (2808.3353) mem 9655MB [2024-08-04 04:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][510/625] eta 0:00:29 lr 0.001178 wd 0.0500 time 0.2533 (0.2598) data time 0.0011 (0.0021) model time 0.2522 (0.2572) loss 6.9412 (6.0210) grad_norm 2.8302 (2.0398) loss_scale 4096.0000 (2833.5342) mem 9655MB [2024-08-04 04:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][520/625] eta 0:00:27 lr 0.001178 wd 0.0500 time 0.2597 (0.2597) data time 0.0010 (0.0021) model time 0.2587 (0.2571) loss 5.9484 (6.0229) grad_norm 1.7808 (2.0479) loss_scale 4096.0000 (2857.7658) mem 9655MB [2024-08-04 04:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][530/625] eta 0:00:24 lr 0.001177 wd 0.0500 time 0.2591 (0.2596) data time 0.0006 (0.0020) model time 0.2585 (0.2571) loss 7.2951 (6.0214) grad_norm 4.1561 (2.0503) loss_scale 4096.0000 (2881.0847) mem 9655MB [2024-08-04 04:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][540/625] eta 0:00:22 lr 0.001177 wd 0.0500 time 0.2581 (0.2596) data time 0.0012 (0.0020) model time 0.2569 (0.2571) loss 4.9968 (6.0185) grad_norm 2.0032 (2.0592) loss_scale 4096.0000 (2903.5416) mem 9655MB [2024-08-04 04:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][550/625] eta 0:00:19 lr 0.001177 wd 0.0500 time 0.2538 (0.2596) data time 0.0009 (0.0020) model time 0.2529 (0.2570) loss 6.8011 (6.0180) grad_norm 3.1962 (2.0668) loss_scale 4096.0000 (2925.1833) mem 9655MB [2024-08-04 04:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][560/625] eta 0:00:16 lr 0.001177 wd 0.0500 time 0.2576 (0.2595) data time 0.0008 (0.0020) model time 0.2568 (0.2570) loss 6.9005 (6.0174) grad_norm 1.4905 (2.0630) loss_scale 4096.0000 (2946.0535) mem 9655MB [2024-08-04 04:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][570/625] eta 0:00:14 lr 0.001177 wd 0.0500 time 0.2568 (0.2594) data time 0.0007 (0.0020) model time 0.2561 (0.2570) loss 5.2742 (6.0121) grad_norm 1.8841 (2.0607) loss_scale 4096.0000 (2966.1926) mem 9655MB [2024-08-04 04:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][580/625] eta 0:00:11 lr 0.001177 wd 0.0500 time 0.2573 (0.2594) data time 0.0008 (0.0019) model time 0.2564 (0.2569) loss 5.4418 (6.0169) grad_norm 1.4124 (2.0588) loss_scale 4096.0000 (2985.6386) mem 9655MB [2024-08-04 04:23:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][590/625] eta 0:00:09 lr 0.001176 wd 0.0500 time 0.2547 (0.2593) data time 0.0009 (0.0019) model time 0.2538 (0.2569) loss 5.9526 (6.0132) grad_norm 1.3097 (2.0574) loss_scale 4096.0000 (3004.4264) mem 9655MB [2024-08-04 04:23:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][600/625] eta 0:00:06 lr 0.001176 wd 0.0500 time 0.2608 (0.2592) data time 0.0005 (0.0019) model time 0.2603 (0.2569) loss 6.4556 (6.0117) grad_norm 2.5522 (2.0550) loss_scale 4096.0000 (3022.5890) mem 9655MB [2024-08-04 04:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][610/625] eta 0:00:03 lr 0.001176 wd 0.0500 time 0.2531 (0.2592) data time 0.0006 (0.0019) model time 0.2525 (0.2568) loss 7.1047 (6.0111) grad_norm 1.8397 (2.0561) loss_scale 4096.0000 (3040.1571) mem 9655MB [2024-08-04 04:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][620/625] eta 0:00:01 lr 0.001176 wd 0.0500 time 0.2545 (0.2591) data time 0.0003 (0.0019) model time 0.2542 (0.2568) loss 7.7497 (6.0157) grad_norm 2.4396 (2.0561) loss_scale 4096.0000 (3057.1594) mem 9655MB [2024-08-04 04:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 144 training takes 0:02:41 [2024-08-04 04:23:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:23:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.624 (0.624) Loss 0.6719 (0.6719) Acc@1 88.721 (88.721) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 04:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 1.0889 (0.8438) Acc@1 77.100 (83.749) Acc@5 93.750 (96.924) Mem 9655MB [2024-08-04 04:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.2588 (0.9799) Acc@1 72.217 (80.032) Acc@5 92.285 (95.240) Mem 9655MB [2024-08-04 04:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.716 Acc@5 95.234 [2024-08-04 04:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-04 04:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.748 (0.748) Loss 0.5889 (0.5889) Acc@1 88.965 (88.965) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.126) Loss 0.9395 (0.7284) Acc@1 79.688 (85.152) Acc@5 95.410 (97.439) Mem 9655MB [2024-08-04 04:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0791 (0.8617) Acc@1 74.463 (81.517) Acc@5 93.701 (95.889) Mem 9655MB [2024-08-04 04:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.202 Acc@5 95.885 [2024-08-04 04:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.20% [2024-08-04 04:23:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:23:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][0/625] eta 0:07:57 lr 0.001176 wd 0.0500 time 0.7636 (0.7636) data time 0.5228 (0.5228) model time 0.0000 (0.0000) loss 7.7692 (7.7692) grad_norm 2.0228 (2.0228) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][10/625] eta 0:03:05 lr 0.001176 wd 0.0500 time 0.2574 (0.3018) data time 0.0015 (0.0485) model time 0.0000 (0.0000) loss 5.3123 (6.2278) grad_norm 1.4815 (2.1403) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][20/625] eta 0:02:49 lr 0.001175 wd 0.0500 time 0.2509 (0.2797) data time 0.0008 (0.0258) model time 0.0000 (0.0000) loss 6.0692 (6.0163) grad_norm 1.2824 (2.1519) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][30/625] eta 0:02:41 lr 0.001175 wd 0.0500 time 0.2510 (0.2720) data time 0.0010 (0.0178) model time 0.0000 (0.0000) loss 5.7402 (6.0895) grad_norm 1.4284 (2.0481) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][40/625] eta 0:02:36 lr 0.001175 wd 0.0500 time 0.2590 (0.2683) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 6.0006 (6.0245) grad_norm 2.6619 (1.9345) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][50/625] eta 0:02:32 lr 0.001175 wd 0.0500 time 0.2571 (0.2659) data time 0.0011 (0.0112) model time 0.0000 (0.0000) loss 6.6047 (6.0695) grad_norm 2.4928 (1.9485) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][60/625] eta 0:02:29 lr 0.001175 wd 0.0500 time 0.2565 (0.2644) data time 0.0008 (0.0095) model time 0.2556 (0.2554) loss 6.8032 (6.0513) grad_norm 1.8571 (1.9070) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][70/625] eta 0:02:26 lr 0.001175 wd 0.0500 time 0.2540 (0.2632) data time 0.0010 (0.0083) model time 0.2530 (0.2551) loss 5.8217 (6.0411) grad_norm 2.5945 (1.8967) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][80/625] eta 0:02:22 lr 0.001174 wd 0.0500 time 0.2546 (0.2623) data time 0.0008 (0.0074) model time 0.2538 (0.2552) loss 5.1104 (6.0102) grad_norm 2.6539 (1.9154) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][90/625] eta 0:02:19 lr 0.001174 wd 0.0500 time 0.2539 (0.2616) data time 0.0008 (0.0067) model time 0.2532 (0.2552) loss 6.1494 (6.0330) grad_norm 1.9692 (1.9093) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][100/625] eta 0:02:17 lr 0.001174 wd 0.0500 time 0.2546 (0.2611) data time 0.0008 (0.0061) model time 0.2537 (0.2553) loss 6.6498 (6.0369) grad_norm 1.3184 (1.8955) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][110/625] eta 0:02:14 lr 0.001174 wd 0.0500 time 0.2522 (0.2607) data time 0.0016 (0.0057) model time 0.2507 (0.2553) loss 5.2453 (6.0438) grad_norm 1.6783 (1.8962) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][120/625] eta 0:02:11 lr 0.001174 wd 0.0500 time 0.2554 (0.2603) data time 0.0009 (0.0053) model time 0.2545 (0.2554) loss 6.2683 (6.0184) grad_norm 1.8836 (1.9086) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][130/625] eta 0:02:08 lr 0.001174 wd 0.0500 time 0.2539 (0.2601) data time 0.0006 (0.0049) model time 0.2533 (0.2554) loss 4.6305 (5.9774) grad_norm 2.6435 (1.9139) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][140/625] eta 0:02:05 lr 0.001173 wd 0.0500 time 0.2568 (0.2598) data time 0.0010 (0.0046) model time 0.2557 (0.2554) loss 6.6622 (5.9737) grad_norm 2.6437 (1.9253) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:23:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][150/625] eta 0:02:03 lr 0.001173 wd 0.0500 time 0.2576 (0.2595) data time 0.0006 (0.0044) model time 0.2570 (0.2554) loss 4.6028 (5.9675) grad_norm 1.2480 (1.9400) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][160/625] eta 0:02:00 lr 0.001173 wd 0.0500 time 0.2574 (0.2593) data time 0.0008 (0.0042) model time 0.2566 (0.2554) loss 6.3761 (5.9549) grad_norm 1.4575 (1.9456) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][170/625] eta 0:01:57 lr 0.001173 wd 0.0500 time 0.2553 (0.2591) data time 0.0009 (0.0040) model time 0.2544 (0.2553) loss 6.2425 (5.9429) grad_norm 2.9311 (1.9760) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][180/625] eta 0:01:55 lr 0.001173 wd 0.0500 time 0.2549 (0.2589) data time 0.0009 (0.0038) model time 0.2540 (0.2553) loss 5.3032 (5.9489) grad_norm 1.7287 (1.9947) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][190/625] eta 0:01:52 lr 0.001173 wd 0.0500 time 0.2560 (0.2588) data time 0.0011 (0.0037) model time 0.2549 (0.2553) loss 4.9889 (5.9366) grad_norm 1.9307 (1.9812) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][200/625] eta 0:01:49 lr 0.001172 wd 0.0500 time 0.2541 (0.2587) data time 0.0016 (0.0035) model time 0.2525 (0.2553) loss 5.7215 (5.9506) grad_norm 2.3880 (1.9714) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][210/625] eta 0:01:47 lr 0.001172 wd 0.0500 time 0.2539 (0.2586) data time 0.0008 (0.0034) model time 0.2532 (0.2553) loss 6.3306 (5.9402) grad_norm 2.6567 (1.9736) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][220/625] eta 0:01:44 lr 0.001172 wd 0.0500 time 0.2571 (0.2585) data time 0.0006 (0.0033) model time 0.2564 (0.2554) loss 6.0473 (5.9304) grad_norm 1.3829 (1.9621) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][230/625] eta 0:01:42 lr 0.001172 wd 0.0500 time 0.2561 (0.2584) data time 0.0007 (0.0032) model time 0.2554 (0.2554) loss 7.0010 (5.9488) grad_norm 1.5533 (1.9583) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][240/625] eta 0:01:39 lr 0.001172 wd 0.0500 time 0.2557 (0.2588) data time 0.0008 (0.0031) model time 0.2549 (0.2559) loss 5.2571 (5.9484) grad_norm 1.6006 (1.9412) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][250/625] eta 0:01:36 lr 0.001171 wd 0.0500 time 0.2581 (0.2587) data time 0.0006 (0.0030) model time 0.2575 (0.2559) loss 7.0399 (5.9654) grad_norm 1.9780 (1.9529) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][260/625] eta 0:01:34 lr 0.001171 wd 0.0500 time 0.2559 (0.2585) data time 0.0009 (0.0029) model time 0.2550 (0.2558) loss 6.4004 (5.9624) grad_norm 3.2631 (1.9650) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][270/625] eta 0:01:31 lr 0.001171 wd 0.0500 time 0.2549 (0.2585) data time 0.0007 (0.0029) model time 0.2542 (0.2558) loss 5.1312 (5.9609) grad_norm 3.4911 (1.9759) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][280/625] eta 0:01:29 lr 0.001171 wd 0.0500 time 0.2552 (0.2584) data time 0.0010 (0.0028) model time 0.2543 (0.2558) loss 6.7400 (5.9715) grad_norm 2.8236 (1.9814) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][290/625] eta 0:01:26 lr 0.001171 wd 0.0500 time 0.2568 (0.2583) data time 0.0010 (0.0027) model time 0.2558 (0.2557) loss 5.2152 (5.9816) grad_norm 1.6584 (1.9739) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][300/625] eta 0:01:23 lr 0.001171 wd 0.0500 time 0.2577 (0.2582) data time 0.0010 (0.0027) model time 0.2567 (0.2557) loss 5.4507 (5.9708) grad_norm 3.6202 (1.9772) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][310/625] eta 0:01:21 lr 0.001170 wd 0.0500 time 0.2541 (0.2581) data time 0.0011 (0.0026) model time 0.2531 (0.2557) loss 7.1397 (5.9761) grad_norm 1.4036 (1.9748) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][320/625] eta 0:01:18 lr 0.001170 wd 0.0500 time 0.2585 (0.2581) data time 0.0008 (0.0026) model time 0.2576 (0.2556) loss 5.3547 (5.9712) grad_norm 3.1085 (1.9915) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][330/625] eta 0:01:16 lr 0.001170 wd 0.0500 time 0.2580 (0.2580) data time 0.0008 (0.0025) model time 0.2572 (0.2556) loss 6.4634 (5.9745) grad_norm 1.6848 (1.9815) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][340/625] eta 0:01:13 lr 0.001170 wd 0.0500 time 0.2546 (0.2579) data time 0.0008 (0.0025) model time 0.2538 (0.2556) loss 6.7593 (5.9776) grad_norm 1.6598 (1.9798) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][350/625] eta 0:01:10 lr 0.001170 wd 0.0500 time 0.2549 (0.2579) data time 0.0010 (0.0024) model time 0.2539 (0.2556) loss 6.4733 (5.9787) grad_norm 1.2321 (1.9744) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][360/625] eta 0:01:08 lr 0.001170 wd 0.0500 time 0.2623 (0.2578) data time 0.0009 (0.0024) model time 0.2613 (0.2556) loss 6.9996 (5.9740) grad_norm 2.4823 (1.9882) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][370/625] eta 0:01:05 lr 0.001169 wd 0.0500 time 0.2560 (0.2578) data time 0.0006 (0.0024) model time 0.2554 (0.2555) loss 5.9013 (5.9741) grad_norm 1.8723 (1.9899) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][380/625] eta 0:01:03 lr 0.001169 wd 0.0500 time 0.2575 (0.2577) data time 0.0009 (0.0023) model time 0.2567 (0.2555) loss 6.3084 (5.9678) grad_norm 2.8984 (1.9886) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][390/625] eta 0:01:00 lr 0.001169 wd 0.0500 time 0.2539 (0.2577) data time 0.0011 (0.0023) model time 0.2529 (0.2555) loss 5.6280 (5.9660) grad_norm 2.7299 (1.9929) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][400/625] eta 0:00:57 lr 0.001169 wd 0.0500 time 0.2600 (0.2577) data time 0.0007 (0.0022) model time 0.2592 (0.2556) loss 6.1410 (5.9610) grad_norm 1.5467 (1.9978) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][410/625] eta 0:00:55 lr 0.001169 wd 0.0500 time 0.2522 (0.2577) data time 0.0008 (0.0022) model time 0.2514 (0.2556) loss 6.5975 (5.9560) grad_norm 1.6896 (1.9950) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][420/625] eta 0:00:52 lr 0.001168 wd 0.0500 time 0.2540 (0.2576) data time 0.0009 (0.0022) model time 0.2531 (0.2556) loss 5.2679 (5.9554) grad_norm 2.1162 (1.9925) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][430/625] eta 0:00:50 lr 0.001168 wd 0.0500 time 0.2562 (0.2576) data time 0.0008 (0.0022) model time 0.2554 (0.2555) loss 7.2343 (5.9669) grad_norm 1.8341 (1.9884) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][440/625] eta 0:00:47 lr 0.001168 wd 0.0500 time 0.2618 (0.2576) data time 0.0008 (0.0021) model time 0.2610 (0.2555) loss 6.1905 (5.9655) grad_norm 2.6977 (1.9993) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][450/625] eta 0:00:45 lr 0.001168 wd 0.0500 time 0.2552 (0.2576) data time 0.0008 (0.0021) model time 0.2544 (0.2556) loss 6.5533 (5.9716) grad_norm 1.1509 (2.0072) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][460/625] eta 0:00:42 lr 0.001168 wd 0.0500 time 0.2518 (0.2576) data time 0.0008 (0.0021) model time 0.2510 (0.2556) loss 7.1283 (5.9652) grad_norm 1.7922 (2.0145) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][470/625] eta 0:00:39 lr 0.001168 wd 0.0500 time 0.2571 (0.2580) data time 0.0010 (0.0021) model time 0.2560 (0.2561) loss 6.8291 (5.9597) grad_norm 1.6549 (2.0151) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][480/625] eta 0:00:37 lr 0.001167 wd 0.0500 time 0.2564 (0.2579) data time 0.0009 (0.0020) model time 0.2555 (0.2561) loss 6.9469 (5.9656) grad_norm 2.2646 (2.0244) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][490/625] eta 0:00:34 lr 0.001167 wd 0.0500 time 0.2555 (0.2579) data time 0.0007 (0.0020) model time 0.2547 (0.2560) loss 5.4569 (5.9686) grad_norm 3.3659 (2.0298) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][500/625] eta 0:00:32 lr 0.001167 wd 0.0500 time 0.2563 (0.2579) data time 0.0007 (0.0020) model time 0.2556 (0.2560) loss 5.6058 (5.9713) grad_norm 2.0995 (2.0212) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][510/625] eta 0:00:29 lr 0.001167 wd 0.0500 time 0.2640 (0.2579) data time 0.0009 (0.0020) model time 0.2631 (0.2560) loss 5.8094 (5.9682) grad_norm 1.8871 (2.0178) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][520/625] eta 0:00:27 lr 0.001167 wd 0.0500 time 0.2561 (0.2578) data time 0.0008 (0.0019) model time 0.2553 (0.2560) loss 6.9889 (5.9721) grad_norm 2.5649 (2.0199) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][530/625] eta 0:00:24 lr 0.001167 wd 0.0500 time 0.2549 (0.2578) data time 0.0008 (0.0019) model time 0.2541 (0.2560) loss 5.6987 (5.9710) grad_norm 1.2198 (2.0222) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][540/625] eta 0:00:21 lr 0.001166 wd 0.0500 time 0.2575 (0.2578) data time 0.0007 (0.0019) model time 0.2568 (0.2560) loss 5.7124 (5.9724) grad_norm 1.5114 (2.0262) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][550/625] eta 0:00:19 lr 0.001166 wd 0.0500 time 0.2602 (0.2577) data time 0.0008 (0.0019) model time 0.2594 (0.2559) loss 6.2078 (5.9772) grad_norm 1.2039 (2.0174) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][560/625] eta 0:00:16 lr 0.001166 wd 0.0500 time 0.2567 (0.2577) data time 0.0008 (0.0019) model time 0.2559 (0.2559) loss 5.7630 (5.9758) grad_norm 2.0230 (2.0105) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][570/625] eta 0:00:14 lr 0.001166 wd 0.0500 time 0.2585 (0.2576) data time 0.0006 (0.0019) model time 0.2579 (0.2559) loss 5.9279 (5.9816) grad_norm 1.5483 (2.0063) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][580/625] eta 0:00:11 lr 0.001166 wd 0.0500 time 0.2559 (0.2576) data time 0.0009 (0.0018) model time 0.2550 (0.2559) loss 6.2084 (5.9731) grad_norm 2.2277 (2.0021) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][590/625] eta 0:00:09 lr 0.001165 wd 0.0500 time 0.2575 (0.2576) data time 0.0006 (0.0018) model time 0.2569 (0.2558) loss 6.5341 (5.9783) grad_norm 1.9550 (1.9966) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][600/625] eta 0:00:06 lr 0.001165 wd 0.0500 time 0.2593 (0.2576) data time 0.0009 (0.0018) model time 0.2585 (0.2559) loss 5.6845 (5.9763) grad_norm 2.1657 (1.9919) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][610/625] eta 0:00:03 lr 0.001165 wd 0.0500 time 0.2532 (0.2576) data time 0.0004 (0.0018) model time 0.2528 (0.2558) loss 6.6556 (5.9824) grad_norm 1.6295 (1.9838) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][620/625] eta 0:00:01 lr 0.001165 wd 0.0500 time 0.2530 (0.2575) data time 0.0006 (0.0018) model time 0.2524 (0.2558) loss 6.0046 (5.9821) grad_norm 1.5988 (1.9802) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 145 training takes 0:02:40 [2024-08-04 04:26:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:26:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.6558 (0.6558) Acc@1 88.672 (88.672) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 04:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0576 (0.8172) Acc@1 76.807 (83.829) Acc@5 94.678 (97.004) Mem 9655MB [2024-08-04 04:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1943 (0.9647) Acc@1 73.486 (80.180) Acc@5 93.018 (95.331) Mem 9655MB [2024-08-04 04:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.950 Acc@5 95.365 [2024-08-04 04:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-04 04:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.692 (0.692) Loss 0.5898 (0.5898) Acc@1 89.014 (89.014) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.122) Loss 0.9399 (0.7294) Acc@1 79.688 (85.121) Acc@5 95.361 (97.439) Mem 9655MB [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.090) Loss 1.0781 (0.8621) Acc@1 74.658 (81.527) Acc@5 93.750 (95.891) Mem 9655MB [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.218 Acc@5 95.879 [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.22% [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:26:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][0/625] eta 0:07:27 lr 0.001165 wd 0.0500 time 0.7162 (0.7162) data time 0.4689 (0.4689) model time 0.0000 (0.0000) loss 6.3741 (6.3741) grad_norm 1.5618 (1.5618) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][10/625] eta 0:03:03 lr 0.001165 wd 0.0500 time 0.2548 (0.2982) data time 0.0009 (0.0435) model time 0.0000 (0.0000) loss 6.8421 (6.0046) grad_norm 1.5927 (2.1672) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][20/625] eta 0:02:48 lr 0.001165 wd 0.0500 time 0.2561 (0.2783) data time 0.0008 (0.0232) model time 0.0000 (0.0000) loss 5.0935 (5.8032) grad_norm 1.3947 (1.8578) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][30/625] eta 0:02:41 lr 0.001164 wd 0.0500 time 0.2533 (0.2711) data time 0.0009 (0.0160) model time 0.0000 (0.0000) loss 6.4164 (5.9831) grad_norm 2.4793 (2.0177) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][40/625] eta 0:02:36 lr 0.001164 wd 0.0500 time 0.2554 (0.2675) data time 0.0012 (0.0123) model time 0.0000 (0.0000) loss 6.1150 (6.0396) grad_norm 2.2044 (2.0030) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][50/625] eta 0:02:32 lr 0.001164 wd 0.0500 time 0.2544 (0.2654) data time 0.0006 (0.0101) model time 0.0000 (0.0000) loss 5.9921 (6.0376) grad_norm 2.0747 (2.2059) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][60/625] eta 0:02:29 lr 0.001164 wd 0.0500 time 0.2576 (0.2639) data time 0.0007 (0.0086) model time 0.2568 (0.2554) loss 6.4967 (6.0547) grad_norm 1.9391 (2.1914) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][70/625] eta 0:02:25 lr 0.001164 wd 0.0500 time 0.2525 (0.2629) data time 0.0009 (0.0075) model time 0.2516 (0.2555) loss 6.0929 (6.0332) grad_norm 1.2924 (2.1126) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][80/625] eta 0:02:22 lr 0.001163 wd 0.0500 time 0.2552 (0.2620) data time 0.0007 (0.0067) model time 0.2545 (0.2554) loss 6.7553 (6.0603) grad_norm 1.4197 (2.0745) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][90/625] eta 0:02:19 lr 0.001163 wd 0.0500 time 0.2555 (0.2614) data time 0.0007 (0.0060) model time 0.2547 (0.2554) loss 7.0658 (6.0640) grad_norm 1.4015 (2.0094) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][100/625] eta 0:02:16 lr 0.001163 wd 0.0500 time 0.2536 (0.2609) data time 0.0017 (0.0056) model time 0.2519 (0.2553) loss 5.9316 (6.0624) grad_norm 2.6541 (2.0236) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][110/625] eta 0:02:14 lr 0.001163 wd 0.0500 time 0.2567 (0.2606) data time 0.0007 (0.0051) model time 0.2560 (0.2556) loss 5.2934 (6.0326) grad_norm 2.1397 (2.0295) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][120/625] eta 0:02:11 lr 0.001163 wd 0.0500 time 0.2589 (0.2603) data time 0.0009 (0.0048) model time 0.2580 (0.2556) loss 5.4561 (5.9991) grad_norm 1.1983 (2.0033) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][130/625] eta 0:02:08 lr 0.001163 wd 0.0500 time 0.2569 (0.2600) data time 0.0009 (0.0045) model time 0.2560 (0.2557) loss 6.5489 (6.0230) grad_norm 1.7414 (2.0120) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][140/625] eta 0:02:06 lr 0.001162 wd 0.0500 time 0.2582 (0.2612) data time 0.0006 (0.0042) model time 0.2575 (0.2579) loss 5.2186 (6.0287) grad_norm 1.5065 (2.0058) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][150/625] eta 0:02:04 lr 0.001162 wd 0.0500 time 0.2543 (0.2622) data time 0.0010 (0.0040) model time 0.2533 (0.2596) loss 5.6207 (6.0254) grad_norm 2.3705 (2.0538) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][160/625] eta 0:02:01 lr 0.001162 wd 0.0500 time 0.2546 (0.2619) data time 0.0009 (0.0038) model time 0.2536 (0.2593) loss 6.6089 (6.0465) grad_norm 1.3895 (2.0427) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][170/625] eta 0:01:59 lr 0.001162 wd 0.0500 time 0.2609 (0.2616) data time 0.0007 (0.0036) model time 0.2603 (0.2591) loss 4.9410 (6.0417) grad_norm 1.3819 (2.0200) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][180/625] eta 0:01:56 lr 0.001162 wd 0.0500 time 0.2545 (0.2613) data time 0.0007 (0.0035) model time 0.2538 (0.2588) loss 6.6355 (6.0597) grad_norm 2.1345 (2.0339) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][190/625] eta 0:01:53 lr 0.001162 wd 0.0500 time 0.2560 (0.2610) data time 0.0009 (0.0034) model time 0.2551 (0.2585) loss 5.0851 (6.0424) grad_norm 2.3784 (2.0341) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][200/625] eta 0:01:50 lr 0.001161 wd 0.0500 time 0.2553 (0.2608) data time 0.0007 (0.0032) model time 0.2546 (0.2583) loss 7.2976 (6.0298) grad_norm 1.5153 (2.0395) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][210/625] eta 0:01:48 lr 0.001161 wd 0.0500 time 0.2589 (0.2606) data time 0.0011 (0.0031) model time 0.2578 (0.2582) loss 7.3695 (6.0261) grad_norm 2.0899 (2.0324) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][220/625] eta 0:01:45 lr 0.001161 wd 0.0500 time 0.2550 (0.2604) data time 0.0008 (0.0030) model time 0.2542 (0.2580) loss 6.7814 (6.0042) grad_norm 1.7229 (2.0133) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][230/625] eta 0:01:42 lr 0.001161 wd 0.0500 time 0.2559 (0.2602) data time 0.0009 (0.0029) model time 0.2550 (0.2579) loss 6.8693 (6.0000) grad_norm 2.2383 (2.0315) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][240/625] eta 0:01:40 lr 0.001161 wd 0.0500 time 0.2537 (0.2601) data time 0.0008 (0.0029) model time 0.2529 (0.2577) loss 5.3717 (6.0093) grad_norm 1.3127 (2.0284) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][250/625] eta 0:01:37 lr 0.001160 wd 0.0500 time 0.2536 (0.2599) data time 0.0010 (0.0028) model time 0.2526 (0.2576) loss 6.7005 (6.0003) grad_norm 2.0919 (2.0231) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][260/625] eta 0:01:34 lr 0.001160 wd 0.0500 time 0.2628 (0.2598) data time 0.0008 (0.0027) model time 0.2620 (0.2575) loss 5.8056 (6.0027) grad_norm 1.8083 (2.0161) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][270/625] eta 0:01:32 lr 0.001160 wd 0.0500 time 0.2615 (0.2597) data time 0.0009 (0.0027) model time 0.2606 (0.2574) loss 6.4293 (6.0023) grad_norm 2.2487 (inf) loss_scale 2048.0000 (4080.8856) mem 9655MB [2024-08-04 04:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][280/625] eta 0:01:29 lr 0.001160 wd 0.0500 time 0.2598 (0.2596) data time 0.0007 (0.0026) model time 0.2590 (0.2573) loss 5.1000 (6.0109) grad_norm 1.5147 (inf) loss_scale 2048.0000 (4008.5409) mem 9655MB [2024-08-04 04:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][290/625] eta 0:01:26 lr 0.001160 wd 0.0500 time 0.2529 (0.2594) data time 0.0010 (0.0025) model time 0.2518 (0.2572) loss 6.3630 (6.0189) grad_norm 1.6420 (inf) loss_scale 2048.0000 (3941.1684) mem 9655MB [2024-08-04 04:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][300/625] eta 0:01:24 lr 0.001160 wd 0.0500 time 0.2592 (0.2593) data time 0.0007 (0.0025) model time 0.2584 (0.2571) loss 7.1101 (6.0221) grad_norm 2.4892 (inf) loss_scale 2048.0000 (3878.2724) mem 9655MB [2024-08-04 04:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][310/625] eta 0:01:21 lr 0.001159 wd 0.0500 time 0.2557 (0.2592) data time 0.0007 (0.0024) model time 0.2550 (0.2571) loss 5.3771 (6.0318) grad_norm 1.4713 (inf) loss_scale 2048.0000 (3819.4212) mem 9655MB [2024-08-04 04:27:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][320/625] eta 0:01:19 lr 0.001159 wd 0.0500 time 0.2567 (0.2591) data time 0.0008 (0.0024) model time 0.2559 (0.2569) loss 6.3285 (6.0310) grad_norm 1.4403 (inf) loss_scale 2048.0000 (3764.2368) mem 9655MB [2024-08-04 04:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][330/625] eta 0:01:16 lr 0.001159 wd 0.0500 time 0.2531 (0.2594) data time 0.0008 (0.0023) model time 0.2523 (0.2573) loss 6.4269 (6.0380) grad_norm 1.4942 (inf) loss_scale 2048.0000 (3712.3867) mem 9655MB [2024-08-04 04:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][340/625] eta 0:01:13 lr 0.001159 wd 0.0500 time 0.2572 (0.2592) data time 0.0006 (0.0023) model time 0.2566 (0.2572) loss 5.5144 (6.0326) grad_norm 2.1266 (inf) loss_scale 2048.0000 (3663.5777) mem 9655MB [2024-08-04 04:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][350/625] eta 0:01:11 lr 0.001159 wd 0.0500 time 0.2549 (0.2591) data time 0.0009 (0.0023) model time 0.2540 (0.2571) loss 6.3104 (6.0316) grad_norm 2.3118 (inf) loss_scale 2048.0000 (3617.5499) mem 9655MB [2024-08-04 04:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][360/625] eta 0:01:08 lr 0.001159 wd 0.0500 time 0.2548 (0.2590) data time 0.0012 (0.0022) model time 0.2536 (0.2571) loss 6.0347 (6.0354) grad_norm 2.3932 (inf) loss_scale 2048.0000 (3574.0720) mem 9655MB [2024-08-04 04:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][370/625] eta 0:01:06 lr 0.001158 wd 0.0500 time 0.2562 (0.2589) data time 0.0007 (0.0022) model time 0.2555 (0.2570) loss 5.5909 (6.0346) grad_norm 1.9104 (inf) loss_scale 2048.0000 (3532.9380) mem 9655MB [2024-08-04 04:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][380/625] eta 0:01:03 lr 0.001158 wd 0.0500 time 0.2605 (0.2589) data time 0.0007 (0.0022) model time 0.2598 (0.2569) loss 7.2540 (6.0431) grad_norm 2.3156 (inf) loss_scale 2048.0000 (3493.9633) mem 9655MB [2024-08-04 04:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][390/625] eta 0:01:00 lr 0.001158 wd 0.0500 time 0.2582 (0.2588) data time 0.0009 (0.0021) model time 0.2572 (0.2568) loss 5.9872 (6.0454) grad_norm 2.4508 (inf) loss_scale 2048.0000 (3456.9821) mem 9655MB [2024-08-04 04:27:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][400/625] eta 0:00:58 lr 0.001158 wd 0.0500 time 0.2621 (0.2587) data time 0.0009 (0.0021) model time 0.2612 (0.2568) loss 6.1210 (6.0479) grad_norm 1.8025 (inf) loss_scale 2048.0000 (3421.8454) mem 9655MB [2024-08-04 04:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][410/625] eta 0:00:55 lr 0.001158 wd 0.0500 time 0.2554 (0.2587) data time 0.0010 (0.0021) model time 0.2544 (0.2568) loss 6.4095 (6.0469) grad_norm 1.6808 (inf) loss_scale 2048.0000 (3388.4185) mem 9655MB [2024-08-04 04:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][420/625] eta 0:00:53 lr 0.001157 wd 0.0500 time 0.2604 (0.2586) data time 0.0008 (0.0021) model time 0.2596 (0.2567) loss 5.6691 (6.0477) grad_norm 2.2786 (inf) loss_scale 2048.0000 (3356.5796) mem 9655MB [2024-08-04 04:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][430/625] eta 0:00:50 lr 0.001157 wd 0.0500 time 0.2557 (0.2586) data time 0.0008 (0.0020) model time 0.2549 (0.2567) loss 5.9340 (6.0450) grad_norm 2.6249 (inf) loss_scale 2048.0000 (3326.2181) mem 9655MB [2024-08-04 04:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][440/625] eta 0:00:47 lr 0.001157 wd 0.0500 time 0.2620 (0.2586) data time 0.0006 (0.0020) model time 0.2614 (0.2567) loss 6.7052 (6.0389) grad_norm 2.3595 (inf) loss_scale 2048.0000 (3297.2336) mem 9655MB [2024-08-04 04:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][450/625] eta 0:00:45 lr 0.001157 wd 0.0500 time 0.2563 (0.2585) data time 0.0010 (0.0020) model time 0.2553 (0.2567) loss 6.5094 (6.0435) grad_norm 2.4651 (inf) loss_scale 2048.0000 (3269.5344) mem 9655MB [2024-08-04 04:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][460/625] eta 0:00:42 lr 0.001157 wd 0.0500 time 0.2518 (0.2585) data time 0.0007 (0.0020) model time 0.2511 (0.2567) loss 6.3539 (6.0394) grad_norm 1.2447 (inf) loss_scale 2048.0000 (3243.0369) mem 9655MB [2024-08-04 04:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][470/625] eta 0:00:40 lr 0.001157 wd 0.0500 time 0.2603 (0.2585) data time 0.0008 (0.0019) model time 0.2595 (0.2567) loss 7.5212 (6.0455) grad_norm 1.4629 (inf) loss_scale 2048.0000 (3217.6645) mem 9655MB [2024-08-04 04:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][480/625] eta 0:00:37 lr 0.001156 wd 0.0500 time 0.2553 (0.2584) data time 0.0012 (0.0019) model time 0.2540 (0.2566) loss 6.3382 (6.0437) grad_norm 1.9075 (inf) loss_scale 2048.0000 (3193.3472) mem 9655MB [2024-08-04 04:28:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][490/625] eta 0:00:34 lr 0.001156 wd 0.0500 time 0.2566 (0.2584) data time 0.0015 (0.0019) model time 0.2551 (0.2566) loss 5.3038 (6.0382) grad_norm 1.9561 (inf) loss_scale 2048.0000 (3170.0204) mem 9655MB [2024-08-04 04:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][500/625] eta 0:00:32 lr 0.001156 wd 0.0500 time 0.2548 (0.2583) data time 0.0006 (0.0019) model time 0.2542 (0.2566) loss 7.0210 (6.0481) grad_norm 1.1897 (inf) loss_scale 2048.0000 (3147.6248) mem 9655MB [2024-08-04 04:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][510/625] eta 0:00:29 lr 0.001156 wd 0.0500 time 0.2561 (0.2583) data time 0.0012 (0.0019) model time 0.2549 (0.2566) loss 5.7347 (6.0459) grad_norm 1.0900 (inf) loss_scale 2048.0000 (3126.1057) mem 9655MB [2024-08-04 04:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][520/625] eta 0:00:27 lr 0.001156 wd 0.0500 time 0.2545 (0.2582) data time 0.0011 (0.0018) model time 0.2534 (0.2565) loss 5.7568 (6.0433) grad_norm 2.4891 (inf) loss_scale 2048.0000 (3105.4127) mem 9655MB [2024-08-04 04:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][530/625] eta 0:00:24 lr 0.001156 wd 0.0500 time 0.2586 (0.2582) data time 0.0009 (0.0018) model time 0.2577 (0.2565) loss 5.9330 (6.0443) grad_norm 1.7847 (inf) loss_scale 2048.0000 (3085.4991) mem 9655MB [2024-08-04 04:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][540/625] eta 0:00:21 lr 0.001155 wd 0.0500 time 0.2542 (0.2582) data time 0.0008 (0.0018) model time 0.2534 (0.2565) loss 5.5305 (6.0365) grad_norm 1.2498 (inf) loss_scale 2048.0000 (3066.3216) mem 9655MB [2024-08-04 04:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][550/625] eta 0:00:19 lr 0.001155 wd 0.0500 time 0.2570 (0.2581) data time 0.0008 (0.0018) model time 0.2562 (0.2564) loss 6.3632 (6.0378) grad_norm 1.4925 (inf) loss_scale 2048.0000 (3047.8403) mem 9655MB [2024-08-04 04:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][560/625] eta 0:00:16 lr 0.001155 wd 0.0500 time 0.2587 (0.2581) data time 0.0007 (0.0018) model time 0.2580 (0.2564) loss 6.4843 (6.0392) grad_norm 2.9502 (inf) loss_scale 2048.0000 (3030.0178) mem 9655MB [2024-08-04 04:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][570/625] eta 0:00:14 lr 0.001155 wd 0.0500 time 0.2557 (0.2581) data time 0.0009 (0.0018) model time 0.2548 (0.2564) loss 6.2401 (6.0438) grad_norm 1.3353 (inf) loss_scale 2048.0000 (3012.8196) mem 9655MB [2024-08-04 04:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][580/625] eta 0:00:11 lr 0.001155 wd 0.0500 time 0.2572 (0.2580) data time 0.0007 (0.0018) model time 0.2565 (0.2563) loss 4.6100 (6.0348) grad_norm 1.9742 (inf) loss_scale 2048.0000 (2996.2134) mem 9655MB [2024-08-04 04:28:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][590/625] eta 0:00:09 lr 0.001155 wd 0.0500 time 0.2590 (0.2580) data time 0.0006 (0.0018) model time 0.2584 (0.2563) loss 6.4604 (6.0322) grad_norm 2.7928 (inf) loss_scale 2048.0000 (2980.1692) mem 9655MB [2024-08-04 04:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][600/625] eta 0:00:06 lr 0.001154 wd 0.0500 time 0.2570 (0.2580) data time 0.0008 (0.0017) model time 0.2563 (0.2563) loss 5.1887 (6.0279) grad_norm 1.6980 (inf) loss_scale 2048.0000 (2964.6589) mem 9655MB [2024-08-04 04:28:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][610/625] eta 0:00:03 lr 0.001154 wd 0.0500 time 0.2533 (0.2580) data time 0.0004 (0.0017) model time 0.2529 (0.2563) loss 7.0972 (6.0319) grad_norm 1.3415 (inf) loss_scale 2048.0000 (2949.6563) mem 9655MB [2024-08-04 04:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][620/625] eta 0:00:01 lr 0.001154 wd 0.0500 time 0.2542 (0.2579) data time 0.0006 (0.0017) model time 0.2537 (0.2563) loss 6.8856 (6.0325) grad_norm 2.3051 (inf) loss_scale 2048.0000 (2935.1369) mem 9655MB [2024-08-04 04:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 146 training takes 0:02:41 [2024-08-04 04:28:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:28:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:28:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.6660 (0.6660) Acc@1 87.500 (87.500) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 04:28:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0898 (0.8171) Acc@1 76.807 (83.900) Acc@5 93.945 (96.862) Mem 9655MB [2024-08-04 04:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1904 (0.9637) Acc@1 74.121 (80.162) Acc@5 92.822 (95.233) Mem 9655MB [2024-08-04 04:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.842 Acc@5 95.252 [2024-08-04 04:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-04 04:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.742 (0.742) Loss 0.5908 (0.5908) Acc@1 88.965 (88.965) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9409 (0.7298) Acc@1 79.688 (85.125) Acc@5 95.410 (97.452) Mem 9655MB [2024-08-04 04:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0781 (0.8621) Acc@1 74.414 (81.515) Acc@5 93.750 (95.896) Mem 9655MB [2024-08-04 04:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.220 Acc@5 95.891 [2024-08-04 04:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 04:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.22% [2024-08-04 04:28:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:28:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][0/625] eta 0:07:47 lr 0.001154 wd 0.0500 time 0.7485 (0.7485) data time 0.4970 (0.4970) model time 0.0000 (0.0000) loss 6.7346 (6.7346) grad_norm 1.6977 (1.6977) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][10/625] eta 0:03:04 lr 0.001154 wd 0.0500 time 0.2543 (0.3007) data time 0.0008 (0.0460) model time 0.0000 (0.0000) loss 5.9761 (5.9212) grad_norm 1.6617 (2.3152) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:28:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][20/625] eta 0:02:48 lr 0.001154 wd 0.0500 time 0.2528 (0.2791) data time 0.0008 (0.0246) model time 0.0000 (0.0000) loss 6.0849 (6.1389) grad_norm 1.3465 (2.1726) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][30/625] eta 0:02:42 lr 0.001153 wd 0.0500 time 0.2576 (0.2723) data time 0.0005 (0.0170) model time 0.0000 (0.0000) loss 5.6590 (5.9672) grad_norm 1.7352 (2.0509) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:29:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][40/625] eta 0:02:36 lr 0.001153 wd 0.0500 time 0.2578 (0.2681) data time 0.0009 (0.0131) model time 0.0000 (0.0000) loss 5.8155 (5.9304) grad_norm 2.1361 (2.1361) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][50/625] eta 0:02:32 lr 0.001153 wd 0.0500 time 0.2553 (0.2657) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 6.4162 (5.9362) grad_norm 1.6677 (2.1283) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][60/625] eta 0:02:29 lr 0.001153 wd 0.0500 time 0.2517 (0.2640) data time 0.0009 (0.0091) model time 0.2508 (0.2547) loss 5.1693 (5.8839) grad_norm 2.7632 (2.0877) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][70/625] eta 0:02:25 lr 0.001153 wd 0.0500 time 0.2556 (0.2630) data time 0.0008 (0.0080) model time 0.2548 (0.2554) loss 5.5555 (5.8706) grad_norm 1.2288 (2.0681) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][80/625] eta 0:02:22 lr 0.001152 wd 0.0500 time 0.2552 (0.2621) data time 0.0011 (0.0071) model time 0.2541 (0.2549) loss 7.2241 (5.9095) grad_norm 2.1665 (inf) loss_scale 1024.0000 (1946.8642) mem 9655MB [2024-08-04 04:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][90/625] eta 0:02:19 lr 0.001152 wd 0.0500 time 0.2536 (0.2613) data time 0.0010 (0.0064) model time 0.2526 (0.2548) loss 4.7185 (5.8868) grad_norm 4.5076 (inf) loss_scale 1024.0000 (1845.4505) mem 9655MB [2024-08-04 04:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][100/625] eta 0:02:16 lr 0.001152 wd 0.0500 time 0.2576 (0.2608) data time 0.0006 (0.0059) model time 0.2570 (0.2547) loss 4.8970 (5.9001) grad_norm 1.7981 (inf) loss_scale 1024.0000 (1764.1188) mem 9655MB [2024-08-04 04:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][110/625] eta 0:02:14 lr 0.001152 wd 0.0500 time 0.2573 (0.2603) data time 0.0006 (0.0054) model time 0.2567 (0.2547) loss 5.6270 (5.9306) grad_norm 5.1407 (inf) loss_scale 1024.0000 (1697.4414) mem 9655MB [2024-08-04 04:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][120/625] eta 0:02:11 lr 0.001152 wd 0.0500 time 0.2541 (0.2600) data time 0.0010 (0.0051) model time 0.2532 (0.2549) loss 6.9042 (5.9391) grad_norm 1.6247 (inf) loss_scale 1024.0000 (1641.7851) mem 9655MB [2024-08-04 04:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][130/625] eta 0:02:08 lr 0.001152 wd 0.0500 time 0.2555 (0.2598) data time 0.0010 (0.0048) model time 0.2545 (0.2551) loss 5.5845 (5.9615) grad_norm 1.5984 (inf) loss_scale 1024.0000 (1594.6260) mem 9655MB [2024-08-04 04:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][140/625] eta 0:02:05 lr 0.001151 wd 0.0500 time 0.2563 (0.2596) data time 0.0016 (0.0045) model time 0.2547 (0.2552) loss 7.0549 (5.9549) grad_norm 1.4749 (inf) loss_scale 1024.0000 (1554.1560) mem 9655MB [2024-08-04 04:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][150/625] eta 0:02:03 lr 0.001151 wd 0.0500 time 0.2587 (0.2595) data time 0.0008 (0.0043) model time 0.2579 (0.2553) loss 6.8880 (5.9598) grad_norm 1.2206 (inf) loss_scale 1024.0000 (1519.0464) mem 9655MB [2024-08-04 04:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][160/625] eta 0:02:01 lr 0.001151 wd 0.0500 time 0.2576 (0.2605) data time 0.0010 (0.0041) model time 0.2567 (0.2572) loss 6.3318 (5.9608) grad_norm 2.0333 (inf) loss_scale 1024.0000 (1488.2981) mem 9655MB [2024-08-04 04:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][170/625] eta 0:01:58 lr 0.001151 wd 0.0500 time 0.2545 (0.2603) data time 0.0010 (0.0039) model time 0.2535 (0.2571) loss 6.5905 (5.9638) grad_norm 1.5912 (inf) loss_scale 1024.0000 (1461.1462) mem 9655MB [2024-08-04 04:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][180/625] eta 0:01:55 lr 0.001151 wd 0.0500 time 0.2612 (0.2602) data time 0.0006 (0.0037) model time 0.2606 (0.2571) loss 5.2128 (5.9644) grad_norm 2.4476 (inf) loss_scale 1024.0000 (1436.9945) mem 9655MB [2024-08-04 04:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][190/625] eta 0:01:53 lr 0.001151 wd 0.0500 time 0.2597 (0.2600) data time 0.0007 (0.0036) model time 0.2590 (0.2569) loss 6.3806 (5.9965) grad_norm 2.1763 (inf) loss_scale 1024.0000 (1415.3717) mem 9655MB [2024-08-04 04:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][200/625] eta 0:01:50 lr 0.001150 wd 0.0500 time 0.2558 (0.2597) data time 0.0008 (0.0034) model time 0.2549 (0.2568) loss 5.1572 (5.9801) grad_norm 1.5033 (inf) loss_scale 1024.0000 (1395.9005) mem 9655MB [2024-08-04 04:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][210/625] eta 0:01:47 lr 0.001150 wd 0.0500 time 0.2565 (0.2596) data time 0.0009 (0.0033) model time 0.2556 (0.2567) loss 6.7636 (5.9742) grad_norm 2.5133 (inf) loss_scale 1024.0000 (1378.2749) mem 9655MB [2024-08-04 04:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][220/625] eta 0:01:45 lr 0.001150 wd 0.0500 time 0.2564 (0.2594) data time 0.0008 (0.0032) model time 0.2556 (0.2566) loss 6.0083 (5.9799) grad_norm 2.3612 (inf) loss_scale 1024.0000 (1362.2443) mem 9655MB [2024-08-04 04:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][230/625] eta 0:01:42 lr 0.001150 wd 0.0500 time 0.2598 (0.2593) data time 0.0008 (0.0031) model time 0.2589 (0.2565) loss 6.6064 (5.9796) grad_norm 2.2708 (inf) loss_scale 1024.0000 (1347.6017) mem 9655MB [2024-08-04 04:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][240/625] eta 0:01:39 lr 0.001150 wd 0.0500 time 0.2592 (0.2591) data time 0.0008 (0.0030) model time 0.2584 (0.2565) loss 4.7859 (5.9672) grad_norm 1.9549 (inf) loss_scale 1024.0000 (1334.1743) mem 9655MB [2024-08-04 04:29:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][250/625] eta 0:01:37 lr 0.001149 wd 0.0500 time 0.2540 (0.2590) data time 0.0008 (0.0029) model time 0.2532 (0.2564) loss 6.3356 (5.9775) grad_norm 1.4246 (inf) loss_scale 1024.0000 (1321.8167) mem 9655MB [2024-08-04 04:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][260/625] eta 0:01:34 lr 0.001149 wd 0.0500 time 0.2580 (0.2589) data time 0.0009 (0.0028) model time 0.2571 (0.2564) loss 5.7922 (5.9660) grad_norm 1.3609 (inf) loss_scale 1024.0000 (1310.4061) mem 9655MB [2024-08-04 04:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][270/625] eta 0:01:31 lr 0.001149 wd 0.0500 time 0.2603 (0.2588) data time 0.0006 (0.0028) model time 0.2596 (0.2563) loss 6.9722 (5.9697) grad_norm 1.5332 (inf) loss_scale 1024.0000 (1299.8376) mem 9655MB [2024-08-04 04:30:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][280/625] eta 0:01:29 lr 0.001149 wd 0.0500 time 0.2552 (0.2588) data time 0.0009 (0.0027) model time 0.2543 (0.2563) loss 6.3173 (5.9684) grad_norm 1.6029 (inf) loss_scale 1024.0000 (1290.0214) mem 9655MB [2024-08-04 04:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][290/625] eta 0:01:26 lr 0.001149 wd 0.0500 time 0.2591 (0.2587) data time 0.0006 (0.0026) model time 0.2586 (0.2563) loss 7.0413 (5.9719) grad_norm 1.3762 (inf) loss_scale 1024.0000 (1280.8797) mem 9655MB [2024-08-04 04:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][300/625] eta 0:01:24 lr 0.001149 wd 0.0500 time 0.2625 (0.2593) data time 0.0006 (0.0026) model time 0.2620 (0.2571) loss 6.9027 (5.9801) grad_norm 1.6032 (inf) loss_scale 1024.0000 (1272.3455) mem 9655MB [2024-08-04 04:30:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][310/625] eta 0:01:21 lr 0.001148 wd 0.0500 time 0.2570 (0.2592) data time 0.0009 (0.0025) model time 0.2561 (0.2570) loss 7.0154 (5.9832) grad_norm 1.8825 (inf) loss_scale 1024.0000 (1264.3601) mem 9655MB [2024-08-04 04:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][320/625] eta 0:01:19 lr 0.001148 wd 0.0500 time 0.2568 (0.2591) data time 0.0009 (0.0025) model time 0.2559 (0.2570) loss 6.3463 (5.9874) grad_norm 2.3821 (inf) loss_scale 1024.0000 (1256.8723) mem 9655MB [2024-08-04 04:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][330/625] eta 0:01:16 lr 0.001148 wd 0.0500 time 0.2610 (0.2591) data time 0.0007 (0.0024) model time 0.2603 (0.2569) loss 7.5441 (5.9994) grad_norm 1.4318 (inf) loss_scale 1024.0000 (1249.8369) mem 9655MB [2024-08-04 04:30:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][340/625] eta 0:01:13 lr 0.001148 wd 0.0500 time 0.2593 (0.2590) data time 0.0005 (0.0024) model time 0.2588 (0.2569) loss 6.3140 (6.0091) grad_norm 1.3645 (inf) loss_scale 1024.0000 (1243.2141) mem 9655MB [2024-08-04 04:30:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][350/625] eta 0:01:11 lr 0.001148 wd 0.0500 time 0.2514 (0.2589) data time 0.0009 (0.0024) model time 0.2505 (0.2568) loss 5.8964 (6.0064) grad_norm 2.3836 (inf) loss_scale 1024.0000 (1236.9687) mem 9655MB [2024-08-04 04:30:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][360/625] eta 0:01:08 lr 0.001148 wd 0.0500 time 0.2589 (0.2588) data time 0.0007 (0.0023) model time 0.2582 (0.2567) loss 5.3963 (6.0116) grad_norm 1.4341 (inf) loss_scale 1024.0000 (1231.0693) mem 9655MB [2024-08-04 04:30:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][370/625] eta 0:01:05 lr 0.001147 wd 0.0500 time 0.2565 (0.2587) data time 0.0006 (0.0023) model time 0.2558 (0.2567) loss 5.0435 (6.0010) grad_norm 1.8465 (inf) loss_scale 1024.0000 (1225.4879) mem 9655MB [2024-08-04 04:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][380/625] eta 0:01:03 lr 0.001147 wd 0.0500 time 0.2596 (0.2587) data time 0.0007 (0.0023) model time 0.2589 (0.2567) loss 6.2444 (6.0074) grad_norm 1.1498 (inf) loss_scale 1024.0000 (1220.1995) mem 9655MB [2024-08-04 04:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][390/625] eta 0:01:00 lr 0.001147 wd 0.0500 time 0.2509 (0.2586) data time 0.0010 (0.0022) model time 0.2499 (0.2566) loss 6.1396 (6.0095) grad_norm 2.3389 (inf) loss_scale 1024.0000 (1215.1816) mem 9655MB [2024-08-04 04:30:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][400/625] eta 0:00:58 lr 0.001147 wd 0.0500 time 0.2610 (0.2588) data time 0.0010 (0.0022) model time 0.2601 (0.2569) loss 4.3481 (6.0195) grad_norm 4.6532 (inf) loss_scale 1024.0000 (1210.4140) mem 9655MB [2024-08-04 04:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][410/625] eta 0:00:55 lr 0.001147 wd 0.0500 time 0.2542 (0.2588) data time 0.0009 (0.0022) model time 0.2533 (0.2568) loss 5.9956 (6.0238) grad_norm 1.8457 (inf) loss_scale 1024.0000 (1205.8783) mem 9655MB [2024-08-04 04:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][420/625] eta 0:00:53 lr 0.001147 wd 0.0500 time 0.2541 (0.2592) data time 0.0009 (0.0021) model time 0.2532 (0.2573) loss 5.3374 (6.0080) grad_norm 1.1869 (inf) loss_scale 1024.0000 (1201.5582) mem 9655MB [2024-08-04 04:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][430/625] eta 0:00:50 lr 0.001146 wd 0.0500 time 0.2580 (0.2591) data time 0.0006 (0.0021) model time 0.2573 (0.2573) loss 6.2099 (6.0028) grad_norm 1.2781 (inf) loss_scale 1024.0000 (1197.4385) mem 9655MB [2024-08-04 04:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][440/625] eta 0:00:47 lr 0.001146 wd 0.0500 time 0.2531 (0.2591) data time 0.0010 (0.0021) model time 0.2521 (0.2573) loss 5.9462 (6.0078) grad_norm 2.4911 (inf) loss_scale 1024.0000 (1193.5057) mem 9655MB [2024-08-04 04:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][450/625] eta 0:00:45 lr 0.001146 wd 0.0500 time 0.2538 (0.2590) data time 0.0009 (0.0020) model time 0.2528 (0.2572) loss 7.0177 (6.0184) grad_norm 2.0004 (inf) loss_scale 1024.0000 (1189.7472) mem 9655MB [2024-08-04 04:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][460/625] eta 0:00:42 lr 0.001146 wd 0.0500 time 0.2542 (0.2590) data time 0.0009 (0.0020) model time 0.2533 (0.2572) loss 5.9523 (6.0089) grad_norm 1.7535 (inf) loss_scale 1024.0000 (1186.1518) mem 9655MB [2024-08-04 04:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][470/625] eta 0:00:40 lr 0.001146 wd 0.0500 time 0.2563 (0.2589) data time 0.0007 (0.0020) model time 0.2556 (0.2571) loss 5.0213 (6.0133) grad_norm 1.6319 (inf) loss_scale 1024.0000 (1182.7091) mem 9655MB [2024-08-04 04:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][480/625] eta 0:00:37 lr 0.001145 wd 0.0500 time 0.2551 (0.2588) data time 0.0007 (0.0020) model time 0.2543 (0.2571) loss 5.1715 (6.0104) grad_norm 3.1913 (inf) loss_scale 1024.0000 (1179.4096) mem 9655MB [2024-08-04 04:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][490/625] eta 0:00:34 lr 0.001145 wd 0.0500 time 0.2580 (0.2588) data time 0.0010 (0.0020) model time 0.2569 (0.2570) loss 5.4697 (6.0094) grad_norm 1.8466 (inf) loss_scale 1024.0000 (1176.2444) mem 9655MB [2024-08-04 04:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][500/625] eta 0:00:32 lr 0.001145 wd 0.0500 time 0.2568 (0.2587) data time 0.0006 (0.0019) model time 0.2562 (0.2570) loss 5.9074 (6.0119) grad_norm 1.4724 (inf) loss_scale 1024.0000 (1173.2056) mem 9655MB [2024-08-04 04:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][510/625] eta 0:00:29 lr 0.001145 wd 0.0500 time 0.2543 (0.2587) data time 0.0016 (0.0019) model time 0.2527 (0.2569) loss 6.9981 (6.0088) grad_norm 1.6408 (inf) loss_scale 1024.0000 (1170.2857) mem 9655MB [2024-08-04 04:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][520/625] eta 0:00:27 lr 0.001145 wd 0.0500 time 0.2551 (0.2586) data time 0.0007 (0.0019) model time 0.2544 (0.2569) loss 6.7248 (6.0086) grad_norm 2.3651 (inf) loss_scale 1024.0000 (1167.4779) mem 9655MB [2024-08-04 04:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][530/625] eta 0:00:24 lr 0.001145 wd 0.0500 time 0.2557 (0.2586) data time 0.0015 (0.0019) model time 0.2542 (0.2569) loss 6.2080 (6.0018) grad_norm 2.6550 (inf) loss_scale 1024.0000 (1164.7759) mem 9655MB [2024-08-04 04:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][540/625] eta 0:00:21 lr 0.001144 wd 0.0500 time 0.2587 (0.2586) data time 0.0008 (0.0019) model time 0.2579 (0.2569) loss 6.4482 (6.0065) grad_norm 2.5326 (inf) loss_scale 1024.0000 (1162.1738) mem 9655MB [2024-08-04 04:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][550/625] eta 0:00:19 lr 0.001144 wd 0.0500 time 0.2547 (0.2585) data time 0.0008 (0.0018) model time 0.2540 (0.2569) loss 5.0986 (5.9983) grad_norm 1.8332 (inf) loss_scale 1024.0000 (1159.6661) mem 9655MB [2024-08-04 04:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][560/625] eta 0:00:16 lr 0.001144 wd 0.0500 time 0.2572 (0.2585) data time 0.0007 (0.0018) model time 0.2565 (0.2568) loss 7.0912 (6.0039) grad_norm 1.8888 (inf) loss_scale 1024.0000 (1157.2478) mem 9655MB [2024-08-04 04:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][570/625] eta 0:00:14 lr 0.001144 wd 0.0500 time 0.2545 (0.2585) data time 0.0007 (0.0018) model time 0.2539 (0.2568) loss 6.3369 (6.0043) grad_norm 2.5012 (inf) loss_scale 1024.0000 (1154.9142) mem 9655MB [2024-08-04 04:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][580/625] eta 0:00:11 lr 0.001144 wd 0.0500 time 0.2682 (0.2584) data time 0.0008 (0.0018) model time 0.2674 (0.2568) loss 6.5281 (6.0121) grad_norm 4.9091 (inf) loss_scale 1024.0000 (1152.6609) mem 9655MB [2024-08-04 04:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][590/625] eta 0:00:09 lr 0.001144 wd 0.0500 time 0.2610 (0.2584) data time 0.0010 (0.0018) model time 0.2600 (0.2568) loss 5.2649 (6.0171) grad_norm 1.6769 (inf) loss_scale 1024.0000 (1150.4839) mem 9655MB [2024-08-04 04:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][600/625] eta 0:00:06 lr 0.001143 wd 0.0500 time 0.2579 (0.2584) data time 0.0006 (0.0018) model time 0.2573 (0.2568) loss 5.1547 (6.0178) grad_norm 1.2108 (inf) loss_scale 1024.0000 (1148.3794) mem 9655MB [2024-08-04 04:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][610/625] eta 0:00:03 lr 0.001143 wd 0.0500 time 0.2529 (0.2583) data time 0.0006 (0.0018) model time 0.2523 (0.2567) loss 5.6299 (6.0187) grad_norm 1.3082 (inf) loss_scale 1024.0000 (1146.3437) mem 9655MB [2024-08-04 04:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][620/625] eta 0:00:01 lr 0.001143 wd 0.0500 time 0.2536 (0.2583) data time 0.0006 (0.0017) model time 0.2531 (0.2567) loss 6.8723 (6.0228) grad_norm 1.7972 (inf) loss_scale 1024.0000 (1144.3736) mem 9655MB [2024-08-04 04:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 147 training takes 0:02:41 [2024-08-04 04:31:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:31:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.479 (0.479) Loss 0.6333 (0.6333) Acc@1 88.281 (88.281) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 04:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0723 (0.7951) Acc@1 76.562 (83.851) Acc@5 94.824 (97.017) Mem 9655MB [2024-08-04 04:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1768 (0.9443) Acc@1 74.316 (80.194) Acc@5 92.969 (95.392) Mem 9655MB [2024-08-04 04:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.910 Acc@5 95.393 [2024-08-04 04:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-04 04:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.778 (0.778) Loss 0.5903 (0.5903) Acc@1 88.867 (88.867) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9414 (0.7298) Acc@1 79.590 (85.143) Acc@5 95.361 (97.439) Mem 9655MB [2024-08-04 04:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0771 (0.8622) Acc@1 74.658 (81.564) Acc@5 93.652 (95.896) Mem 9655MB [2024-08-04 04:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.270 Acc@5 95.889 [2024-08-04 04:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-04 04:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.27% [2024-08-04 04:31:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:31:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][0/625] eta 0:08:34 lr 0.001143 wd 0.0500 time 0.8239 (0.8239) data time 0.5840 (0.5840) model time 0.0000 (0.0000) loss 6.7448 (6.7448) grad_norm 1.9795 (1.9795) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][10/625] eta 0:03:09 lr 0.001143 wd 0.0500 time 0.2539 (0.3080) data time 0.0007 (0.0539) model time 0.0000 (0.0000) loss 5.5562 (5.9823) grad_norm 1.4643 (1.7716) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][20/625] eta 0:02:51 lr 0.001143 wd 0.0500 time 0.2557 (0.2830) data time 0.0014 (0.0287) model time 0.0000 (0.0000) loss 7.7955 (6.1259) grad_norm 1.4690 (1.6371) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][30/625] eta 0:02:42 lr 0.001142 wd 0.0500 time 0.2491 (0.2737) data time 0.0009 (0.0197) model time 0.0000 (0.0000) loss 5.8149 (6.0419) grad_norm 3.2717 (1.8983) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][40/625] eta 0:02:37 lr 0.001142 wd 0.0500 time 0.2564 (0.2693) data time 0.0008 (0.0151) model time 0.0000 (0.0000) loss 6.1983 (5.9736) grad_norm 2.6139 (1.8565) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][50/625] eta 0:02:33 lr 0.001142 wd 0.0500 time 0.2661 (0.2669) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 5.1402 (5.9654) grad_norm 1.9833 (1.9074) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][60/625] eta 0:02:29 lr 0.001142 wd 0.0500 time 0.2573 (0.2650) data time 0.0007 (0.0105) model time 0.2566 (0.2544) loss 6.9351 (5.9890) grad_norm 3.7096 (1.9100) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][70/625] eta 0:02:26 lr 0.001142 wd 0.0500 time 0.2531 (0.2639) data time 0.0009 (0.0091) model time 0.2523 (0.2555) loss 6.5908 (5.9714) grad_norm 1.5791 (1.9806) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][80/625] eta 0:02:23 lr 0.001141 wd 0.0500 time 0.2547 (0.2629) data time 0.0012 (0.0081) model time 0.2536 (0.2553) loss 5.4462 (5.9941) grad_norm 2.9895 (1.9925) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][90/625] eta 0:02:20 lr 0.001141 wd 0.0500 time 0.2578 (0.2624) data time 0.0008 (0.0074) model time 0.2569 (0.2556) loss 6.6992 (6.0097) grad_norm 1.7185 (1.9710) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][100/625] eta 0:02:17 lr 0.001141 wd 0.0500 time 0.2529 (0.2617) data time 0.0010 (0.0067) model time 0.2519 (0.2554) loss 6.9329 (6.0061) grad_norm 1.7152 (1.9575) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][110/625] eta 0:02:14 lr 0.001141 wd 0.0500 time 0.2560 (0.2612) data time 0.0007 (0.0062) model time 0.2553 (0.2554) loss 6.5699 (6.0053) grad_norm 2.5077 (1.9971) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][120/625] eta 0:02:11 lr 0.001141 wd 0.0500 time 0.2574 (0.2609) data time 0.0008 (0.0058) model time 0.2566 (0.2556) loss 6.2145 (5.9989) grad_norm 1.6220 (1.9736) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][130/625] eta 0:02:08 lr 0.001141 wd 0.0500 time 0.2454 (0.2606) data time 0.0011 (0.0054) model time 0.2444 (0.2556) loss 6.0726 (5.9985) grad_norm 2.3118 (1.9929) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][140/625] eta 0:02:06 lr 0.001140 wd 0.0500 time 0.2555 (0.2602) data time 0.0011 (0.0051) model time 0.2544 (0.2555) loss 5.2776 (5.9571) grad_norm 1.5095 (1.9934) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][150/625] eta 0:02:03 lr 0.001140 wd 0.0500 time 0.2522 (0.2601) data time 0.0007 (0.0048) model time 0.2515 (0.2556) loss 5.1811 (5.9597) grad_norm 2.6701 (1.9830) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][160/625] eta 0:02:00 lr 0.001140 wd 0.0500 time 0.2564 (0.2598) data time 0.0009 (0.0046) model time 0.2555 (0.2555) loss 5.6353 (5.9580) grad_norm 1.7144 (1.9646) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][170/625] eta 0:01:58 lr 0.001140 wd 0.0500 time 0.2580 (0.2595) data time 0.0008 (0.0044) model time 0.2572 (0.2554) loss 6.4158 (5.9677) grad_norm 2.6010 (1.9617) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][180/625] eta 0:01:55 lr 0.001140 wd 0.0500 time 0.2547 (0.2593) data time 0.0009 (0.0042) model time 0.2538 (0.2554) loss 6.1008 (5.9552) grad_norm 2.3024 (1.9628) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][190/625] eta 0:01:52 lr 0.001140 wd 0.0500 time 0.2527 (0.2591) data time 0.0010 (0.0040) model time 0.2517 (0.2554) loss 5.8268 (5.9500) grad_norm 2.1231 (1.9789) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][200/625] eta 0:01:50 lr 0.001139 wd 0.0500 time 0.2563 (0.2591) data time 0.0006 (0.0038) model time 0.2557 (0.2554) loss 6.0129 (5.9534) grad_norm 1.3329 (1.9746) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][210/625] eta 0:01:47 lr 0.001139 wd 0.0500 time 0.2553 (0.2589) data time 0.0008 (0.0037) model time 0.2545 (0.2554) loss 5.3247 (5.9543) grad_norm 2.3030 (1.9780) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][220/625] eta 0:01:44 lr 0.001139 wd 0.0500 time 0.2556 (0.2588) data time 0.0011 (0.0036) model time 0.2545 (0.2554) loss 7.1169 (5.9630) grad_norm 2.0724 (1.9763) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][230/625] eta 0:01:42 lr 0.001139 wd 0.0500 time 0.2564 (0.2587) data time 0.0008 (0.0035) model time 0.2556 (0.2554) loss 7.3065 (5.9711) grad_norm 1.6679 (1.9891) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][240/625] eta 0:01:39 lr 0.001139 wd 0.0500 time 0.2557 (0.2586) data time 0.0010 (0.0034) model time 0.2547 (0.2554) loss 5.4926 (5.9527) grad_norm 2.1656 (1.9824) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][250/625] eta 0:01:36 lr 0.001138 wd 0.0500 time 0.2555 (0.2585) data time 0.0006 (0.0033) model time 0.2548 (0.2554) loss 6.6956 (5.9499) grad_norm 1.5555 (1.9777) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][260/625] eta 0:01:34 lr 0.001138 wd 0.0500 time 0.2563 (0.2584) data time 0.0008 (0.0032) model time 0.2555 (0.2554) loss 7.3082 (5.9585) grad_norm 1.2655 (2.0072) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][270/625] eta 0:01:31 lr 0.001138 wd 0.0500 time 0.2536 (0.2584) data time 0.0010 (0.0031) model time 0.2527 (0.2555) loss 5.1239 (5.9624) grad_norm 3.1182 (2.0259) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][280/625] eta 0:01:29 lr 0.001138 wd 0.0500 time 0.2549 (0.2583) data time 0.0009 (0.0030) model time 0.2540 (0.2554) loss 5.6723 (5.9677) grad_norm 2.2544 (2.0453) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][290/625] eta 0:01:26 lr 0.001138 wd 0.0500 time 0.2528 (0.2582) data time 0.0006 (0.0029) model time 0.2521 (0.2555) loss 5.1836 (5.9746) grad_norm 1.5555 (2.0277) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][300/625] eta 0:01:23 lr 0.001138 wd 0.0500 time 0.2555 (0.2582) data time 0.0009 (0.0029) model time 0.2546 (0.2554) loss 5.0224 (5.9815) grad_norm 1.4850 (2.0298) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][310/625] eta 0:01:21 lr 0.001137 wd 0.0500 time 0.2547 (0.2582) data time 0.0009 (0.0028) model time 0.2538 (0.2555) loss 4.8814 (5.9888) grad_norm 1.4596 (2.0312) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][320/625] eta 0:01:18 lr 0.001137 wd 0.0500 time 0.2545 (0.2581) data time 0.0007 (0.0028) model time 0.2538 (0.2555) loss 4.9312 (5.9950) grad_norm 3.0563 (2.0442) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][330/625] eta 0:01:16 lr 0.001137 wd 0.0500 time 0.2543 (0.2587) data time 0.0007 (0.0027) model time 0.2535 (0.2563) loss 4.6612 (6.0014) grad_norm 1.5740 (2.0523) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][340/625] eta 0:01:13 lr 0.001137 wd 0.0500 time 0.2519 (0.2586) data time 0.0009 (0.0026) model time 0.2510 (0.2563) loss 6.7754 (6.0009) grad_norm 3.8396 (2.0698) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][350/625] eta 0:01:11 lr 0.001137 wd 0.0500 time 0.2549 (0.2586) data time 0.0009 (0.0026) model time 0.2540 (0.2562) loss 6.6895 (6.0150) grad_norm 1.9235 (2.1021) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][360/625] eta 0:01:08 lr 0.001137 wd 0.0500 time 0.2573 (0.2589) data time 0.0010 (0.0026) model time 0.2563 (0.2567) loss 6.0546 (6.0172) grad_norm 1.7264 (2.1085) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][370/625] eta 0:01:06 lr 0.001136 wd 0.0500 time 0.2565 (0.2589) data time 0.0011 (0.0025) model time 0.2554 (0.2566) loss 6.3027 (6.0216) grad_norm 1.7096 (2.1035) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][380/625] eta 0:01:03 lr 0.001136 wd 0.0500 time 0.2539 (0.2588) data time 0.0007 (0.0025) model time 0.2532 (0.2566) loss 7.4094 (6.0209) grad_norm 2.8339 (2.0964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][390/625] eta 0:01:00 lr 0.001136 wd 0.0500 time 0.2591 (0.2587) data time 0.0008 (0.0024) model time 0.2583 (0.2565) loss 6.3635 (6.0123) grad_norm 2.1800 (2.0988) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][400/625] eta 0:00:58 lr 0.001136 wd 0.0500 time 0.2555 (0.2586) data time 0.0009 (0.0024) model time 0.2546 (0.2565) loss 6.2647 (6.0191) grad_norm 3.2220 (2.1075) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][410/625] eta 0:00:55 lr 0.001136 wd 0.0500 time 0.2585 (0.2586) data time 0.0011 (0.0024) model time 0.2574 (0.2564) loss 5.2692 (6.0198) grad_norm 1.4198 (2.1040) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][420/625] eta 0:00:53 lr 0.001135 wd 0.0500 time 0.2592 (0.2588) data time 0.0009 (0.0023) model time 0.2583 (0.2567) loss 6.0340 (6.0203) grad_norm 1.9328 (2.1026) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][430/625] eta 0:00:50 lr 0.001135 wd 0.0500 time 0.2617 (0.2588) data time 0.0008 (0.0023) model time 0.2610 (0.2567) loss 4.8052 (6.0207) grad_norm 1.8139 (2.0914) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][440/625] eta 0:00:47 lr 0.001135 wd 0.0500 time 0.2557 (0.2587) data time 0.0009 (0.0023) model time 0.2548 (0.2567) loss 5.0675 (6.0179) grad_norm 1.4135 (2.0786) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][450/625] eta 0:00:45 lr 0.001135 wd 0.0500 time 0.2577 (0.2586) data time 0.0008 (0.0022) model time 0.2569 (0.2566) loss 7.0508 (6.0196) grad_norm 1.7205 (2.0766) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][460/625] eta 0:00:42 lr 0.001135 wd 0.0500 time 0.2530 (0.2586) data time 0.0008 (0.0022) model time 0.2522 (0.2566) loss 5.2170 (6.0227) grad_norm 2.1066 (2.0695) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][470/625] eta 0:00:40 lr 0.001135 wd 0.0500 time 0.2525 (0.2585) data time 0.0008 (0.0022) model time 0.2517 (0.2566) loss 4.6416 (6.0199) grad_norm 4.5368 (2.0735) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][480/625] eta 0:00:37 lr 0.001134 wd 0.0500 time 0.2562 (0.2585) data time 0.0011 (0.0022) model time 0.2551 (0.2566) loss 5.5339 (6.0199) grad_norm 2.2446 (2.0732) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][490/625] eta 0:00:34 lr 0.001134 wd 0.0500 time 0.2565 (0.2585) data time 0.0011 (0.0021) model time 0.2554 (0.2566) loss 5.9785 (6.0229) grad_norm 1.4836 (2.0684) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][500/625] eta 0:00:32 lr 0.001134 wd 0.0500 time 0.2561 (0.2585) data time 0.0010 (0.0021) model time 0.2551 (0.2566) loss 6.4192 (6.0217) grad_norm 1.9161 (2.0602) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][510/625] eta 0:00:29 lr 0.001134 wd 0.0500 time 0.2551 (0.2584) data time 0.0010 (0.0021) model time 0.2541 (0.2565) loss 5.4182 (6.0156) grad_norm 1.6816 (2.0544) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][520/625] eta 0:00:27 lr 0.001134 wd 0.0500 time 0.2572 (0.2584) data time 0.0009 (0.0021) model time 0.2563 (0.2565) loss 6.5708 (6.0148) grad_norm 1.3220 (2.0491) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][530/625] eta 0:00:24 lr 0.001134 wd 0.0500 time 0.2520 (0.2583) data time 0.0009 (0.0020) model time 0.2511 (0.2565) loss 7.1929 (6.0167) grad_norm 1.3429 (2.0447) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][540/625] eta 0:00:21 lr 0.001133 wd 0.0500 time 0.2541 (0.2583) data time 0.0009 (0.0020) model time 0.2532 (0.2564) loss 5.8477 (6.0267) grad_norm 1.6437 (2.0424) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][550/625] eta 0:00:19 lr 0.001133 wd 0.0500 time 0.2509 (0.2582) data time 0.0009 (0.0020) model time 0.2500 (0.2564) loss 5.4619 (6.0296) grad_norm 2.6771 (2.0406) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][560/625] eta 0:00:16 lr 0.001133 wd 0.0500 time 0.2583 (0.2582) data time 0.0008 (0.0020) model time 0.2574 (0.2564) loss 5.8355 (6.0208) grad_norm 2.1359 (2.0369) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][570/625] eta 0:00:14 lr 0.001133 wd 0.0500 time 0.2530 (0.2581) data time 0.0010 (0.0020) model time 0.2519 (0.2563) loss 6.2709 (6.0250) grad_norm 1.8783 (2.0398) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][580/625] eta 0:00:11 lr 0.001133 wd 0.0500 time 0.2587 (0.2581) data time 0.0006 (0.0019) model time 0.2581 (0.2563) loss 6.2739 (6.0213) grad_norm 3.7742 (2.0519) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][590/625] eta 0:00:09 lr 0.001132 wd 0.0500 time 0.2503 (0.2581) data time 0.0008 (0.0019) model time 0.2495 (0.2563) loss 7.3479 (6.0279) grad_norm 2.0693 (2.0622) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][600/625] eta 0:00:06 lr 0.001132 wd 0.0500 time 0.2566 (0.2580) data time 0.0009 (0.0019) model time 0.2557 (0.2563) loss 5.0023 (6.0198) grad_norm 1.7212 (2.0611) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][610/625] eta 0:00:03 lr 0.001132 wd 0.0500 time 0.2515 (0.2580) data time 0.0004 (0.0019) model time 0.2511 (0.2562) loss 6.4031 (6.0183) grad_norm 2.7033 (2.0640) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][620/625] eta 0:00:01 lr 0.001132 wd 0.0500 time 0.2539 (0.2579) data time 0.0005 (0.0019) model time 0.2535 (0.2562) loss 5.5639 (6.0172) grad_norm 1.3211 (2.0588) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 148 training takes 0:02:41 [2024-08-04 04:34:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:34:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.550 (0.550) Loss 0.6357 (0.6357) Acc@1 88.672 (88.672) Acc@5 98.340 (98.340) Mem 9655MB [2024-08-04 04:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 1.0342 (0.7874) Acc@1 77.197 (83.922) Acc@5 94.482 (97.066) Mem 9655MB [2024-08-04 04:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1260 (0.9230) Acc@1 74.463 (80.399) Acc@5 94.189 (95.450) Mem 9655MB [2024-08-04 04:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.088 Acc@5 95.463 [2024-08-04 04:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-08-04 04:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.09% [2024-08-04 04:34:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:34:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.5913 (0.5913) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9404 (0.7300) Acc@1 79.492 (85.178) Acc@5 95.459 (97.461) Mem 9655MB [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0781 (0.8621) Acc@1 74.756 (81.629) Acc@5 93.750 (95.919) Mem 9655MB [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.328 Acc@5 95.911 [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.33% [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:34:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][0/625] eta 0:08:56 lr 0.001132 wd 0.0500 time 0.8591 (0.8591) data time 0.6214 (0.6214) model time 0.0000 (0.0000) loss 6.8144 (6.8144) grad_norm 1.4994 (1.4994) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][10/625] eta 0:03:10 lr 0.001132 wd 0.0500 time 0.2571 (0.3103) data time 0.0011 (0.0574) model time 0.0000 (0.0000) loss 5.8322 (6.0215) grad_norm 2.7111 (1.6191) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][20/625] eta 0:02:52 lr 0.001132 wd 0.0500 time 0.2573 (0.2850) data time 0.0008 (0.0305) model time 0.0000 (0.0000) loss 6.0325 (6.1249) grad_norm 1.3892 (1.6958) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][30/625] eta 0:02:44 lr 0.001131 wd 0.0500 time 0.2541 (0.2757) data time 0.0008 (0.0210) model time 0.0000 (0.0000) loss 4.5902 (6.0670) grad_norm 2.5236 (1.8765) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][40/625] eta 0:02:38 lr 0.001131 wd 0.0500 time 0.2552 (0.2708) data time 0.0009 (0.0161) model time 0.0000 (0.0000) loss 6.9505 (6.0319) grad_norm 2.3152 (1.9767) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][50/625] eta 0:02:34 lr 0.001131 wd 0.0500 time 0.2577 (0.2681) data time 0.0008 (0.0131) model time 0.0000 (0.0000) loss 6.0493 (5.9743) grad_norm 2.1580 (2.0526) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][60/625] eta 0:02:30 lr 0.001131 wd 0.0500 time 0.2566 (0.2662) data time 0.0011 (0.0111) model time 0.2555 (0.2557) loss 4.6958 (5.9748) grad_norm 2.3368 (2.0041) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][70/625] eta 0:02:27 lr 0.001131 wd 0.0500 time 0.2537 (0.2649) data time 0.0008 (0.0097) model time 0.2529 (0.2558) loss 5.7465 (5.9791) grad_norm 1.1742 (1.9786) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][80/625] eta 0:02:23 lr 0.001130 wd 0.0500 time 0.2544 (0.2638) data time 0.0009 (0.0086) model time 0.2535 (0.2556) loss 5.8219 (5.9847) grad_norm 1.3693 (1.9309) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][90/625] eta 0:02:20 lr 0.001130 wd 0.0500 time 0.2563 (0.2631) data time 0.0008 (0.0078) model time 0.2556 (0.2558) loss 6.6037 (5.9510) grad_norm 2.7920 (1.9128) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][100/625] eta 0:02:17 lr 0.001130 wd 0.0500 time 0.2566 (0.2625) data time 0.0008 (0.0071) model time 0.2558 (0.2558) loss 6.0354 (5.9320) grad_norm 1.5203 (1.9002) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][110/625] eta 0:02:15 lr 0.001130 wd 0.0500 time 0.2562 (0.2640) data time 0.0009 (0.0065) model time 0.2553 (0.2597) loss 6.1813 (5.9571) grad_norm 1.7590 (1.8872) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][120/625] eta 0:02:13 lr 0.001130 wd 0.0500 time 0.2555 (0.2634) data time 0.0013 (0.0061) model time 0.2542 (0.2590) loss 5.5935 (5.9384) grad_norm 2.3007 (1.8880) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][130/625] eta 0:02:10 lr 0.001130 wd 0.0500 time 0.2542 (0.2628) data time 0.0008 (0.0057) model time 0.2534 (0.2585) loss 6.4563 (5.9483) grad_norm 1.6295 (1.8934) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][140/625] eta 0:02:07 lr 0.001129 wd 0.0500 time 0.2600 (0.2624) data time 0.0012 (0.0054) model time 0.2588 (0.2581) loss 5.2472 (5.9586) grad_norm 1.4487 (1.8705) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][150/625] eta 0:02:04 lr 0.001129 wd 0.0500 time 0.2660 (0.2621) data time 0.0006 (0.0051) model time 0.2654 (0.2580) loss 6.0646 (5.9626) grad_norm 1.8840 (1.8466) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][160/625] eta 0:02:01 lr 0.001129 wd 0.0500 time 0.2532 (0.2617) data time 0.0010 (0.0048) model time 0.2522 (0.2578) loss 5.6339 (5.9660) grad_norm 2.3343 (1.8324) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][170/625] eta 0:01:58 lr 0.001129 wd 0.0500 time 0.2523 (0.2613) data time 0.0006 (0.0046) model time 0.2517 (0.2575) loss 6.1283 (5.9762) grad_norm 2.4909 (1.8321) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][180/625] eta 0:01:56 lr 0.001129 wd 0.0500 time 0.2575 (0.2610) data time 0.0009 (0.0044) model time 0.2566 (0.2573) loss 6.8036 (5.9695) grad_norm 1.6325 (1.8271) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][190/625] eta 0:01:53 lr 0.001129 wd 0.0500 time 0.2512 (0.2608) data time 0.0009 (0.0042) model time 0.2502 (0.2572) loss 5.2830 (5.9666) grad_norm 2.4944 (1.8634) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][200/625] eta 0:01:50 lr 0.001128 wd 0.0500 time 0.2570 (0.2606) data time 0.0007 (0.0040) model time 0.2563 (0.2571) loss 6.4641 (5.9541) grad_norm 1.0770 (1.8767) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][210/625] eta 0:01:48 lr 0.001128 wd 0.0500 time 0.2535 (0.2604) data time 0.0007 (0.0039) model time 0.2528 (0.2570) loss 6.5524 (5.9612) grad_norm 1.6434 (1.8703) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][220/625] eta 0:01:45 lr 0.001128 wd 0.0500 time 0.2555 (0.2602) data time 0.0008 (0.0037) model time 0.2547 (0.2569) loss 5.4705 (5.9520) grad_norm 1.3473 (1.8682) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][230/625] eta 0:01:42 lr 0.001128 wd 0.0500 time 0.2577 (0.2600) data time 0.0008 (0.0036) model time 0.2569 (0.2568) loss 6.5199 (5.9431) grad_norm 1.6291 (1.8577) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][240/625] eta 0:01:40 lr 0.001128 wd 0.0500 time 0.2547 (0.2598) data time 0.0009 (0.0035) model time 0.2538 (0.2567) loss 6.2316 (5.9469) grad_norm 2.2418 (1.8764) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][250/625] eta 0:01:37 lr 0.001127 wd 0.0500 time 0.2621 (0.2597) data time 0.0006 (0.0034) model time 0.2616 (0.2567) loss 4.6822 (5.9415) grad_norm 1.3320 (1.8844) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][260/625] eta 0:01:34 lr 0.001127 wd 0.0500 time 0.2597 (0.2596) data time 0.0009 (0.0033) model time 0.2588 (0.2566) loss 5.2614 (5.9251) grad_norm 2.3164 (1.8944) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][270/625] eta 0:01:32 lr 0.001127 wd 0.0500 time 0.2556 (0.2595) data time 0.0010 (0.0032) model time 0.2545 (0.2565) loss 5.3039 (5.9291) grad_norm 2.1829 (1.8944) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][280/625] eta 0:01:29 lr 0.001127 wd 0.0500 time 0.2607 (0.2594) data time 0.0007 (0.0032) model time 0.2600 (0.2565) loss 5.7635 (5.9180) grad_norm 2.1231 (1.9099) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][290/625] eta 0:01:26 lr 0.001127 wd 0.0500 time 0.2573 (0.2592) data time 0.0006 (0.0031) model time 0.2567 (0.2564) loss 6.9649 (5.9170) grad_norm 3.7609 (1.9219) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][300/625] eta 0:01:24 lr 0.001127 wd 0.0500 time 0.2553 (0.2591) data time 0.0021 (0.0030) model time 0.2532 (0.2563) loss 6.2199 (5.9290) grad_norm 1.5350 (1.9262) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][310/625] eta 0:01:21 lr 0.001126 wd 0.0500 time 0.2598 (0.2590) data time 0.0010 (0.0029) model time 0.2588 (0.2563) loss 4.9793 (5.9277) grad_norm 1.8235 (1.9193) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][320/625] eta 0:01:18 lr 0.001126 wd 0.0500 time 0.2578 (0.2589) data time 0.0008 (0.0029) model time 0.2570 (0.2563) loss 7.2360 (5.9277) grad_norm 3.0416 (1.9181) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][330/625] eta 0:01:16 lr 0.001126 wd 0.0500 time 0.2570 (0.2589) data time 0.0008 (0.0028) model time 0.2562 (0.2562) loss 5.9845 (5.9307) grad_norm 1.4522 (1.9256) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][340/625] eta 0:01:13 lr 0.001126 wd 0.0500 time 0.2560 (0.2587) data time 0.0009 (0.0028) model time 0.2551 (0.2562) loss 7.0749 (5.9375) grad_norm 1.6757 (1.9192) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][350/625] eta 0:01:11 lr 0.001126 wd 0.0500 time 0.2550 (0.2587) data time 0.0010 (0.0027) model time 0.2540 (0.2561) loss 6.1775 (5.9470) grad_norm 1.8236 (1.9239) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][360/625] eta 0:01:08 lr 0.001126 wd 0.0500 time 0.2556 (0.2586) data time 0.0008 (0.0027) model time 0.2548 (0.2561) loss 6.3539 (5.9450) grad_norm 1.9136 (1.9194) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][370/625] eta 0:01:05 lr 0.001125 wd 0.0500 time 0.2601 (0.2586) data time 0.0008 (0.0026) model time 0.2593 (0.2561) loss 6.2098 (5.9454) grad_norm 2.7023 (1.9298) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][380/625] eta 0:01:03 lr 0.001125 wd 0.0500 time 0.3927 (0.2589) data time 0.0008 (0.0026) model time 0.3919 (0.2565) loss 5.9580 (5.9459) grad_norm 2.0755 (1.9282) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][390/625] eta 0:01:00 lr 0.001125 wd 0.0500 time 0.2581 (0.2588) data time 0.0007 (0.0025) model time 0.2573 (0.2564) loss 6.6112 (5.9488) grad_norm 1.7835 (1.9289) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][400/625] eta 0:00:58 lr 0.001125 wd 0.0500 time 0.2373 (0.2590) data time 0.0007 (0.0025) model time 0.2366 (0.2567) loss 6.4232 (5.9543) grad_norm 1.4452 (1.9333) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][410/625] eta 0:00:55 lr 0.001125 wd 0.0500 time 0.2591 (0.2590) data time 0.0006 (0.0025) model time 0.2585 (0.2568) loss 6.9890 (5.9600) grad_norm 1.8235 (1.9504) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][420/625] eta 0:00:53 lr 0.001124 wd 0.0500 time 0.2555 (0.2589) data time 0.0007 (0.0024) model time 0.2548 (0.2567) loss 6.2775 (5.9571) grad_norm 3.5310 (1.9628) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][430/625] eta 0:00:50 lr 0.001124 wd 0.0500 time 0.2521 (0.2589) data time 0.0009 (0.0024) model time 0.2512 (0.2567) loss 6.4219 (5.9524) grad_norm 2.1281 (1.9743) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][440/625] eta 0:00:47 lr 0.001124 wd 0.0500 time 0.2536 (0.2588) data time 0.0007 (0.0023) model time 0.2529 (0.2567) loss 5.7505 (5.9495) grad_norm 3.1992 (1.9826) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][450/625] eta 0:00:45 lr 0.001124 wd 0.0500 time 0.2578 (0.2588) data time 0.0010 (0.0023) model time 0.2568 (0.2567) loss 6.4683 (5.9546) grad_norm 3.1802 (1.9854) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][460/625] eta 0:00:42 lr 0.001124 wd 0.0500 time 0.2537 (0.2588) data time 0.0010 (0.0023) model time 0.2527 (0.2567) loss 6.2507 (5.9564) grad_norm 3.6040 (2.0037) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][470/625] eta 0:00:40 lr 0.001124 wd 0.0500 time 0.2595 (0.2587) data time 0.0008 (0.0023) model time 0.2587 (0.2566) loss 6.3686 (5.9572) grad_norm 1.3438 (2.0119) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][480/625] eta 0:00:37 lr 0.001123 wd 0.0500 time 0.2554 (0.2587) data time 0.0009 (0.0022) model time 0.2546 (0.2566) loss 6.1124 (5.9611) grad_norm 2.3182 (2.0112) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][490/625] eta 0:00:34 lr 0.001123 wd 0.0500 time 0.2576 (0.2586) data time 0.0008 (0.0022) model time 0.2568 (0.2566) loss 5.9292 (5.9532) grad_norm 2.4828 (2.0084) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][500/625] eta 0:00:32 lr 0.001123 wd 0.0500 time 0.2644 (0.2586) data time 0.0010 (0.0022) model time 0.2634 (0.2566) loss 6.7639 (5.9570) grad_norm 1.4018 (2.0083) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][510/625] eta 0:00:29 lr 0.001123 wd 0.0500 time 0.2590 (0.2586) data time 0.0008 (0.0022) model time 0.2582 (0.2566) loss 5.8893 (5.9609) grad_norm 1.3913 (2.0120) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][520/625] eta 0:00:27 lr 0.001123 wd 0.0500 time 0.2552 (0.2585) data time 0.0008 (0.0021) model time 0.2544 (0.2565) loss 7.2754 (5.9633) grad_norm 1.9826 (2.0065) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][530/625] eta 0:00:24 lr 0.001123 wd 0.0500 time 0.2630 (0.2585) data time 0.0008 (0.0021) model time 0.2623 (0.2565) loss 4.6630 (5.9661) grad_norm 1.3785 (2.0097) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][540/625] eta 0:00:21 lr 0.001122 wd 0.0500 time 0.2562 (0.2584) data time 0.0008 (0.0021) model time 0.2554 (0.2565) loss 6.7096 (5.9758) grad_norm 2.4335 (2.0105) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][550/625] eta 0:00:19 lr 0.001122 wd 0.0500 time 0.2577 (0.2588) data time 0.0016 (0.0021) model time 0.2562 (0.2569) loss 6.0839 (5.9800) grad_norm 3.2433 (2.0179) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][560/625] eta 0:00:16 lr 0.001122 wd 0.0500 time 0.2566 (0.2587) data time 0.0008 (0.0021) model time 0.2557 (0.2568) loss 5.7393 (5.9730) grad_norm 1.4033 (2.0147) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][570/625] eta 0:00:14 lr 0.001122 wd 0.0500 time 0.2578 (0.2586) data time 0.0007 (0.0020) model time 0.2571 (0.2568) loss 4.5322 (5.9805) grad_norm 1.9120 (2.0159) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][580/625] eta 0:00:11 lr 0.001122 wd 0.0500 time 0.2565 (0.2586) data time 0.0009 (0.0020) model time 0.2556 (0.2567) loss 5.4709 (5.9758) grad_norm 1.5047 (2.0114) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][590/625] eta 0:00:09 lr 0.001121 wd 0.0500 time 0.2569 (0.2586) data time 0.0008 (0.0020) model time 0.2561 (0.2567) loss 5.2837 (5.9767) grad_norm 2.1226 (2.0060) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:36:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][600/625] eta 0:00:06 lr 0.001121 wd 0.0500 time 0.2552 (0.2585) data time 0.0010 (0.0020) model time 0.2542 (0.2567) loss 6.2329 (5.9759) grad_norm 3.0685 (2.0175) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][610/625] eta 0:00:03 lr 0.001121 wd 0.0500 time 0.2519 (0.2585) data time 0.0004 (0.0020) model time 0.2515 (0.2566) loss 6.1527 (5.9797) grad_norm 1.3516 (2.0180) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][620/625] eta 0:00:01 lr 0.001121 wd 0.0500 time 0.2515 (0.2584) data time 0.0005 (0.0019) model time 0.2510 (0.2566) loss 6.9162 (5.9798) grad_norm 1.6576 (2.0146) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 149 training takes 0:02:41 [2024-08-04 04:37:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:37:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.6235 (0.6235) Acc@1 87.842 (87.842) Acc@5 98.193 (98.193) Mem 9655MB [2024-08-04 04:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.099) Loss 0.9932 (0.7576) Acc@1 77.734 (83.958) Acc@5 94.482 (96.915) Mem 9655MB [2024-08-04 04:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1045 (0.9026) Acc@1 73.975 (80.313) Acc@5 94.287 (95.354) Mem 9655MB [2024-08-04 04:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.034 Acc@5 95.339 [2024-08-04 04:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-04 04:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.741 (0.741) Loss 0.5908 (0.5908) Acc@1 88.867 (88.867) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9409 (0.7302) Acc@1 79.443 (85.192) Acc@5 95.459 (97.443) Mem 9655MB [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0781 (0.8623) Acc@1 74.756 (81.638) Acc@5 93.799 (95.940) Mem 9655MB [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.330 Acc@5 95.929 [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.33% [2024-08-04 04:37:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:37:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][0/625] eta 0:08:37 lr 0.001121 wd 0.0500 time 0.8276 (0.8276) data time 0.5868 (0.5868) model time 0.0000 (0.0000) loss 5.8035 (5.8035) grad_norm 1.5835 (1.5835) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][10/625] eta 0:03:08 lr 0.001121 wd 0.0500 time 0.2551 (0.3066) data time 0.0009 (0.0541) model time 0.0000 (0.0000) loss 6.3799 (5.9517) grad_norm 3.1506 (2.4579) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][20/625] eta 0:02:51 lr 0.001120 wd 0.0500 time 0.2548 (0.2827) data time 0.0008 (0.0288) model time 0.0000 (0.0000) loss 5.7966 (5.9694) grad_norm 2.2918 (2.5332) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][30/625] eta 0:02:42 lr 0.001120 wd 0.0500 time 0.2570 (0.2739) data time 0.0009 (0.0198) model time 0.0000 (0.0000) loss 6.3194 (5.9598) grad_norm 2.6060 (2.4214) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][40/625] eta 0:02:37 lr 0.001120 wd 0.0500 time 0.2572 (0.2696) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 6.7743 (6.0456) grad_norm 2.3522 (2.4311) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][50/625] eta 0:02:33 lr 0.001120 wd 0.0500 time 0.2578 (0.2671) data time 0.0006 (0.0124) model time 0.0000 (0.0000) loss 6.2944 (6.0817) grad_norm 2.5096 (2.3887) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][60/625] eta 0:02:30 lr 0.001120 wd 0.0500 time 0.2600 (0.2655) data time 0.0008 (0.0105) model time 0.2592 (0.2564) loss 6.0713 (6.1205) grad_norm 2.2480 (2.3993) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][70/625] eta 0:02:26 lr 0.001120 wd 0.0500 time 0.2590 (0.2641) data time 0.0005 (0.0092) model time 0.2584 (0.2555) loss 5.7973 (6.1258) grad_norm 2.2373 (2.3822) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][80/625] eta 0:02:23 lr 0.001119 wd 0.0500 time 0.2583 (0.2631) data time 0.0005 (0.0082) model time 0.2578 (0.2554) loss 6.3215 (6.1417) grad_norm 1.6774 (2.3227) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][90/625] eta 0:02:20 lr 0.001119 wd 0.0500 time 0.2544 (0.2623) data time 0.0010 (0.0074) model time 0.2534 (0.2553) loss 6.2254 (6.1154) grad_norm 3.8523 (2.3834) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][100/625] eta 0:02:17 lr 0.001119 wd 0.0500 time 0.2547 (0.2619) data time 0.0007 (0.0067) model time 0.2540 (0.2556) loss 6.2949 (6.1074) grad_norm 1.9104 (2.3661) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][110/625] eta 0:02:14 lr 0.001119 wd 0.0500 time 0.2545 (0.2614) data time 0.0010 (0.0062) model time 0.2535 (0.2556) loss 5.2508 (6.0910) grad_norm 1.3976 (2.2972) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][120/625] eta 0:02:11 lr 0.001119 wd 0.0500 time 0.2522 (0.2610) data time 0.0007 (0.0058) model time 0.2515 (0.2557) loss 6.0699 (6.0859) grad_norm 1.2240 (2.2524) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][130/625] eta 0:02:09 lr 0.001119 wd 0.0500 time 0.2514 (0.2606) data time 0.0008 (0.0054) model time 0.2505 (0.2556) loss 6.7849 (6.1010) grad_norm 1.3775 (2.2102) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][140/625] eta 0:02:06 lr 0.001118 wd 0.0500 time 0.2579 (0.2603) data time 0.0012 (0.0051) model time 0.2567 (0.2555) loss 4.6166 (6.0851) grad_norm 1.4204 (2.1710) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][150/625] eta 0:02:03 lr 0.001118 wd 0.0500 time 0.2581 (0.2601) data time 0.0010 (0.0048) model time 0.2571 (0.2556) loss 6.3081 (6.0745) grad_norm 2.1258 (2.1629) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][160/625] eta 0:02:00 lr 0.001118 wd 0.0500 time 0.2568 (0.2598) data time 0.0010 (0.0046) model time 0.2558 (0.2555) loss 6.0713 (6.0712) grad_norm 1.9952 (2.1712) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][170/625] eta 0:01:58 lr 0.001118 wd 0.0500 time 0.2553 (0.2596) data time 0.0009 (0.0044) model time 0.2543 (0.2554) loss 5.4676 (6.0612) grad_norm 2.4518 (2.1759) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][180/625] eta 0:01:55 lr 0.001118 wd 0.0500 time 0.2557 (0.2594) data time 0.0006 (0.0042) model time 0.2551 (0.2554) loss 7.0776 (6.0818) grad_norm 1.9788 (2.1658) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][190/625] eta 0:01:52 lr 0.001117 wd 0.0500 time 0.2537 (0.2591) data time 0.0008 (0.0040) model time 0.2529 (0.2553) loss 5.9527 (6.0733) grad_norm 1.5362 (2.1297) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 04:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][200/625] eta 0:01:50 lr 0.001117 wd 0.0500 time 0.2578 (0.2590) data time 0.0007 (0.0038) model time 0.2571 (0.2553) loss 6.4413 (6.0757) grad_norm 1.5837 (2.1055) loss_scale 2048.0000 (1039.2836) mem 9655MB [2024-08-04 04:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][210/625] eta 0:01:47 lr 0.001117 wd 0.0500 time 0.2565 (0.2588) data time 0.0010 (0.0037) model time 0.2555 (0.2552) loss 5.9866 (6.0531) grad_norm 1.6988 (2.0988) loss_scale 2048.0000 (1087.0900) mem 9655MB [2024-08-04 04:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][220/625] eta 0:01:44 lr 0.001117 wd 0.0500 time 0.2557 (0.2587) data time 0.0007 (0.0036) model time 0.2549 (0.2552) loss 4.5299 (6.0507) grad_norm 1.2217 (2.0755) loss_scale 2048.0000 (1130.5701) mem 9655MB [2024-08-04 04:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][230/625] eta 0:01:42 lr 0.001117 wd 0.0500 time 0.4668 (0.2595) data time 0.0008 (0.0035) model time 0.4660 (0.2564) loss 6.7184 (6.0479) grad_norm 1.4472 (2.0651) loss_scale 2048.0000 (1170.2857) mem 9655MB [2024-08-04 04:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][240/625] eta 0:01:40 lr 0.001117 wd 0.0500 time 0.2590 (0.2602) data time 0.0008 (0.0034) model time 0.2582 (0.2574) loss 5.3586 (6.0523) grad_norm 1.6551 (2.0564) loss_scale 2048.0000 (1206.7054) mem 9655MB [2024-08-04 04:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][250/625] eta 0:01:37 lr 0.001116 wd 0.0500 time 0.2614 (0.2600) data time 0.0006 (0.0033) model time 0.2608 (0.2573) loss 5.4192 (6.0530) grad_norm 2.4411 (2.0605) loss_scale 2048.0000 (1240.2231) mem 9655MB [2024-08-04 04:38:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][260/625] eta 0:01:34 lr 0.001116 wd 0.0500 time 0.2656 (0.2599) data time 0.0007 (0.0032) model time 0.2649 (0.2572) loss 6.8196 (6.0382) grad_norm 2.7733 (2.0655) loss_scale 2048.0000 (1271.1724) mem 9655MB [2024-08-04 04:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][270/625] eta 0:01:32 lr 0.001116 wd 0.0500 time 0.2551 (0.2598) data time 0.0008 (0.0031) model time 0.2542 (0.2572) loss 5.4323 (6.0488) grad_norm 1.4703 (2.0662) loss_scale 2048.0000 (1299.8376) mem 9655MB [2024-08-04 04:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][280/625] eta 0:01:29 lr 0.001116 wd 0.0500 time 0.2547 (0.2597) data time 0.0008 (0.0030) model time 0.2539 (0.2571) loss 5.8844 (6.0378) grad_norm 1.8812 (2.0600) loss_scale 2048.0000 (1326.4626) mem 9655MB [2024-08-04 04:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][290/625] eta 0:01:26 lr 0.001116 wd 0.0500 time 0.2584 (0.2596) data time 0.0008 (0.0029) model time 0.2576 (0.2571) loss 4.8880 (6.0291) grad_norm 3.4864 (2.0680) loss_scale 2048.0000 (1351.2577) mem 9655MB [2024-08-04 04:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][300/625] eta 0:01:24 lr 0.001116 wd 0.0500 time 0.2613 (0.2595) data time 0.0009 (0.0029) model time 0.2604 (0.2570) loss 6.6987 (6.0289) grad_norm 2.0090 (2.0787) loss_scale 2048.0000 (1374.4053) mem 9655MB [2024-08-04 04:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][310/625] eta 0:01:21 lr 0.001115 wd 0.0500 time 0.2503 (0.2594) data time 0.0007 (0.0028) model time 0.2496 (0.2570) loss 4.7112 (6.0231) grad_norm 1.7812 (2.0930) loss_scale 2048.0000 (1396.0643) mem 9655MB [2024-08-04 04:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][320/625] eta 0:01:19 lr 0.001115 wd 0.0500 time 0.2587 (0.2593) data time 0.0008 (0.0027) model time 0.2579 (0.2569) loss 6.0036 (6.0227) grad_norm 1.7237 (2.0960) loss_scale 2048.0000 (1416.3738) mem 9655MB [2024-08-04 04:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][330/625] eta 0:01:16 lr 0.001115 wd 0.0500 time 0.2580 (0.2592) data time 0.0008 (0.0027) model time 0.2572 (0.2569) loss 6.0901 (6.0271) grad_norm 1.1314 (2.0783) loss_scale 2048.0000 (1435.4562) mem 9655MB [2024-08-04 04:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][340/625] eta 0:01:13 lr 0.001115 wd 0.0500 time 0.2587 (0.2591) data time 0.0006 (0.0026) model time 0.2581 (0.2568) loss 6.5614 (6.0322) grad_norm 1.5972 (2.0684) loss_scale 2048.0000 (1453.4194) mem 9655MB [2024-08-04 04:38:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][350/625] eta 0:01:11 lr 0.001115 wd 0.0500 time 0.2524 (0.2590) data time 0.0011 (0.0026) model time 0.2513 (0.2567) loss 5.9041 (6.0351) grad_norm 2.0817 (2.0624) loss_scale 2048.0000 (1470.3590) mem 9655MB [2024-08-04 04:38:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][360/625] eta 0:01:08 lr 0.001114 wd 0.0500 time 0.2582 (0.2590) data time 0.0010 (0.0025) model time 0.2572 (0.2567) loss 6.1410 (6.0363) grad_norm 1.8287 (2.0514) loss_scale 2048.0000 (1486.3601) mem 9655MB [2024-08-04 04:38:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][370/625] eta 0:01:06 lr 0.001114 wd 0.0500 time 0.2552 (0.2589) data time 0.0011 (0.0025) model time 0.2541 (0.2567) loss 5.3673 (6.0294) grad_norm 1.5603 (2.0449) loss_scale 2048.0000 (1501.4987) mem 9655MB [2024-08-04 04:38:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][380/625] eta 0:01:03 lr 0.001114 wd 0.0500 time 0.2531 (0.2589) data time 0.0010 (0.0025) model time 0.2521 (0.2567) loss 6.4269 (6.0381) grad_norm 1.1476 (2.0416) loss_scale 2048.0000 (1515.8425) mem 9655MB [2024-08-04 04:38:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][390/625] eta 0:01:00 lr 0.001114 wd 0.0500 time 0.2534 (0.2589) data time 0.0008 (0.0024) model time 0.2526 (0.2567) loss 6.3484 (6.0376) grad_norm 1.1132 (2.0397) loss_scale 2048.0000 (1529.4527) mem 9655MB [2024-08-04 04:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][400/625] eta 0:00:58 lr 0.001114 wd 0.0500 time 0.2565 (0.2589) data time 0.0008 (0.0024) model time 0.2557 (0.2568) loss 5.9607 (6.0414) grad_norm 1.5420 (2.0512) loss_scale 2048.0000 (1542.3840) mem 9655MB [2024-08-04 04:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][410/625] eta 0:00:55 lr 0.001114 wd 0.0500 time 0.2559 (0.2588) data time 0.0013 (0.0024) model time 0.2546 (0.2567) loss 7.3287 (6.0441) grad_norm 1.2671 (2.0535) loss_scale 2048.0000 (1554.6861) mem 9655MB [2024-08-04 04:38:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][420/625] eta 0:00:53 lr 0.001113 wd 0.0500 time 0.2568 (0.2588) data time 0.0007 (0.0023) model time 0.2561 (0.2567) loss 5.8369 (6.0440) grad_norm 3.2408 (2.0555) loss_scale 2048.0000 (1566.4038) mem 9655MB [2024-08-04 04:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][430/625] eta 0:00:50 lr 0.001113 wd 0.0500 time 0.2617 (0.2590) data time 0.0006 (0.0023) model time 0.2611 (0.2569) loss 6.1410 (6.0444) grad_norm 2.8280 (2.0591) loss_scale 2048.0000 (1577.5777) mem 9655MB [2024-08-04 04:39:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][440/625] eta 0:00:47 lr 0.001113 wd 0.0500 time 0.2687 (0.2589) data time 0.0008 (0.0023) model time 0.2679 (0.2569) loss 4.9436 (6.0391) grad_norm 1.9237 (2.0533) loss_scale 2048.0000 (1588.2449) mem 9655MB [2024-08-04 04:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][450/625] eta 0:00:45 lr 0.001113 wd 0.0500 time 0.2573 (0.2588) data time 0.0007 (0.0022) model time 0.2566 (0.2569) loss 6.8347 (6.0404) grad_norm 1.8573 (2.0468) loss_scale 2048.0000 (1598.4390) mem 9655MB [2024-08-04 04:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][460/625] eta 0:00:42 lr 0.001113 wd 0.0500 time 0.2579 (0.2588) data time 0.0008 (0.0022) model time 0.2570 (0.2568) loss 6.9739 (6.0459) grad_norm 1.4487 (2.0509) loss_scale 2048.0000 (1608.1909) mem 9655MB [2024-08-04 04:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][470/625] eta 0:00:40 lr 0.001113 wd 0.0500 time 0.2481 (0.2587) data time 0.0009 (0.0022) model time 0.2471 (0.2568) loss 5.3456 (6.0429) grad_norm 2.8147 (2.0438) loss_scale 2048.0000 (1617.5287) mem 9655MB [2024-08-04 04:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][480/625] eta 0:00:37 lr 0.001112 wd 0.0500 time 0.2517 (0.2587) data time 0.0010 (0.0022) model time 0.2507 (0.2567) loss 6.9069 (6.0421) grad_norm 1.4731 (2.0500) loss_scale 2048.0000 (1626.4782) mem 9655MB [2024-08-04 04:39:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][490/625] eta 0:00:34 lr 0.001112 wd 0.0500 time 0.2651 (0.2586) data time 0.0006 (0.0021) model time 0.2645 (0.2567) loss 5.2272 (6.0463) grad_norm 1.7518 (2.0496) loss_scale 2048.0000 (1635.0631) mem 9655MB [2024-08-04 04:39:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][500/625] eta 0:00:32 lr 0.001112 wd 0.0500 time 0.2604 (0.2586) data time 0.0010 (0.0021) model time 0.2594 (0.2567) loss 6.3036 (6.0521) grad_norm 3.4155 (2.0532) loss_scale 2048.0000 (1643.3054) mem 9655MB [2024-08-04 04:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][510/625] eta 0:00:29 lr 0.001112 wd 0.0500 time 0.2531 (0.2585) data time 0.0010 (0.0021) model time 0.2521 (0.2567) loss 6.8073 (6.0489) grad_norm 1.3687 (2.0523) loss_scale 2048.0000 (1651.2250) mem 9655MB [2024-08-04 04:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][520/625] eta 0:00:27 lr 0.001112 wd 0.0500 time 0.2547 (0.2585) data time 0.0013 (0.0021) model time 0.2535 (0.2566) loss 6.1994 (6.0537) grad_norm 1.7079 (2.0456) loss_scale 2048.0000 (1658.8407) mem 9655MB [2024-08-04 04:39:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][530/625] eta 0:00:24 lr 0.001111 wd 0.0500 time 0.2546 (0.2585) data time 0.0006 (0.0020) model time 0.2540 (0.2566) loss 6.7607 (6.0578) grad_norm 1.1825 (2.0491) loss_scale 2048.0000 (1666.1695) mem 9655MB [2024-08-04 04:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][540/625] eta 0:00:21 lr 0.001111 wd 0.0500 time 0.2544 (0.2584) data time 0.0007 (0.0020) model time 0.2537 (0.2566) loss 6.0583 (6.0608) grad_norm 2.7054 (2.0444) loss_scale 2048.0000 (1673.2274) mem 9655MB [2024-08-04 04:39:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][550/625] eta 0:00:19 lr 0.001111 wd 0.0500 time 0.2557 (0.2584) data time 0.0008 (0.0020) model time 0.2549 (0.2566) loss 6.0159 (6.0624) grad_norm 2.1364 (2.0498) loss_scale 2048.0000 (1680.0290) mem 9655MB [2024-08-04 04:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][560/625] eta 0:00:16 lr 0.001111 wd 0.0500 time 0.4696 (0.2588) data time 0.0009 (0.0020) model time 0.4687 (0.2570) loss 6.0105 (6.0569) grad_norm 2.5123 (2.0546) loss_scale 2048.0000 (1686.5882) mem 9655MB [2024-08-04 04:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][570/625] eta 0:00:14 lr 0.001111 wd 0.0500 time 0.2539 (0.2588) data time 0.0008 (0.0020) model time 0.2531 (0.2570) loss 5.4392 (6.0557) grad_norm 3.2780 (2.0623) loss_scale 2048.0000 (1692.9177) mem 9655MB [2024-08-04 04:39:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][580/625] eta 0:00:11 lr 0.001111 wd 0.0500 time 0.2571 (0.2588) data time 0.0008 (0.0019) model time 0.2563 (0.2570) loss 6.5233 (6.0578) grad_norm 1.5603 (2.0661) loss_scale 2048.0000 (1699.0293) mem 9655MB [2024-08-04 04:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][590/625] eta 0:00:09 lr 0.001110 wd 0.0500 time 0.2557 (0.2587) data time 0.0006 (0.0019) model time 0.2551 (0.2570) loss 4.9085 (6.0616) grad_norm 1.7889 (2.0695) loss_scale 2048.0000 (1704.9340) mem 9655MB [2024-08-04 04:39:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][600/625] eta 0:00:06 lr 0.001110 wd 0.0500 time 0.2599 (0.2587) data time 0.0008 (0.0019) model time 0.2591 (0.2569) loss 6.4995 (6.0565) grad_norm 2.4528 (2.0678) loss_scale 2048.0000 (1710.6423) mem 9655MB [2024-08-04 04:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][610/625] eta 0:00:03 lr 0.001110 wd 0.0500 time 0.2531 (0.2586) data time 0.0006 (0.0019) model time 0.2524 (0.2569) loss 5.4104 (6.0525) grad_norm 1.9414 (2.0639) loss_scale 2048.0000 (1716.1637) mem 9655MB [2024-08-04 04:39:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][620/625] eta 0:00:01 lr 0.001110 wd 0.0500 time 0.2724 (0.2586) data time 0.0006 (0.0019) model time 0.2719 (0.2569) loss 5.9693 (6.0486) grad_norm 1.9968 (2.0619) loss_scale 2048.0000 (1721.5072) mem 9655MB [2024-08-04 04:39:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 150 training takes 0:02:41 [2024-08-04 04:39:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:39:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:39:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.535 (0.535) Loss 0.6626 (0.6626) Acc@1 88.135 (88.135) Acc@5 98.145 (98.145) Mem 9655MB [2024-08-04 04:39:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 1.0596 (0.7937) Acc@1 77.051 (84.268) Acc@5 94.336 (97.084) Mem 9655MB [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1719 (0.9334) Acc@1 73.633 (80.585) Acc@5 93.164 (95.471) Mem 9655MB [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.242 Acc@5 95.435 [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.24% [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:39:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:39:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.489 (0.489) Loss 0.5908 (0.5908) Acc@1 88.867 (88.867) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:39:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9404 (0.7304) Acc@1 79.346 (85.205) Acc@5 95.459 (97.456) Mem 9655MB [2024-08-04 04:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0762 (0.8621) Acc@1 74.854 (81.682) Acc@5 93.896 (95.971) Mem 9655MB [2024-08-04 04:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.372 Acc@5 95.953 [2024-08-04 04:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 04:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.37% [2024-08-04 04:39:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:39:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][0/625] eta 0:07:52 lr 0.001110 wd 0.0500 time 0.7559 (0.7559) data time 0.5160 (0.5160) model time 0.0000 (0.0000) loss 5.7264 (5.7264) grad_norm 2.3364 (2.3364) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][10/625] eta 0:03:05 lr 0.001110 wd 0.0500 time 0.2557 (0.3009) data time 0.0006 (0.0477) model time 0.0000 (0.0000) loss 4.8313 (5.9397) grad_norm 1.8494 (2.3965) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][20/625] eta 0:02:49 lr 0.001109 wd 0.0500 time 0.2559 (0.2810) data time 0.0008 (0.0255) model time 0.0000 (0.0000) loss 7.2086 (5.9758) grad_norm 2.1176 (2.1261) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][30/625] eta 0:02:42 lr 0.001109 wd 0.0500 time 0.2530 (0.2731) data time 0.0008 (0.0176) model time 0.0000 (0.0000) loss 4.8331 (5.9202) grad_norm 2.5154 (2.1920) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][40/625] eta 0:02:37 lr 0.001109 wd 0.0500 time 0.2583 (0.2690) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 6.5775 (5.9255) grad_norm 1.3004 (2.1247) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][50/625] eta 0:02:33 lr 0.001109 wd 0.0500 time 0.2588 (0.2664) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 6.3214 (5.9683) grad_norm 2.3795 (2.0872) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][60/625] eta 0:02:29 lr 0.001109 wd 0.0500 time 0.2585 (0.2648) data time 0.0006 (0.0094) model time 0.2578 (0.2559) loss 4.3525 (5.9336) grad_norm 2.0205 (2.0914) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][70/625] eta 0:02:26 lr 0.001109 wd 0.0500 time 0.2592 (0.2637) data time 0.0006 (0.0082) model time 0.2585 (0.2560) loss 6.6499 (6.0090) grad_norm 1.3028 (2.0341) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][80/625] eta 0:02:23 lr 0.001108 wd 0.0500 time 0.2580 (0.2627) data time 0.0006 (0.0073) model time 0.2574 (0.2556) loss 6.2148 (5.9796) grad_norm 1.3354 (2.0551) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][90/625] eta 0:02:20 lr 0.001108 wd 0.0500 time 0.2547 (0.2620) data time 0.0008 (0.0066) model time 0.2539 (0.2556) loss 6.9555 (5.9946) grad_norm 2.1473 (2.0463) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][100/625] eta 0:02:17 lr 0.001108 wd 0.0500 time 0.2574 (0.2615) data time 0.0006 (0.0060) model time 0.2568 (0.2556) loss 5.2505 (5.9950) grad_norm 2.1812 (2.0073) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][110/625] eta 0:02:14 lr 0.001108 wd 0.0500 time 0.2548 (0.2611) data time 0.0010 (0.0055) model time 0.2538 (0.2557) loss 6.8547 (6.0052) grad_norm 1.3039 (2.0073) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][120/625] eta 0:02:11 lr 0.001108 wd 0.0500 time 0.2568 (0.2607) data time 0.0011 (0.0052) model time 0.2557 (0.2557) loss 5.7226 (6.0085) grad_norm 1.1899 (1.9845) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][130/625] eta 0:02:08 lr 0.001108 wd 0.0500 time 0.2540 (0.2606) data time 0.0010 (0.0048) model time 0.2530 (0.2559) loss 5.6955 (5.9694) grad_norm 2.2375 (1.9601) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][140/625] eta 0:02:06 lr 0.001107 wd 0.0500 time 0.2531 (0.2603) data time 0.0011 (0.0046) model time 0.2520 (0.2559) loss 5.7546 (5.9471) grad_norm 1.8902 (1.9890) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][150/625] eta 0:02:03 lr 0.001107 wd 0.0500 time 0.2550 (0.2600) data time 0.0008 (0.0043) model time 0.2542 (0.2559) loss 5.3002 (5.9480) grad_norm 1.7347 (1.9951) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][160/625] eta 0:02:00 lr 0.001107 wd 0.0500 time 0.2555 (0.2598) data time 0.0011 (0.0041) model time 0.2545 (0.2558) loss 6.3811 (5.9446) grad_norm 2.0304 (1.9839) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][170/625] eta 0:01:58 lr 0.001107 wd 0.0500 time 0.2573 (0.2597) data time 0.0010 (0.0039) model time 0.2564 (0.2559) loss 5.8113 (5.9552) grad_norm 1.5756 (1.9893) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][180/625] eta 0:01:55 lr 0.001107 wd 0.0500 time 0.2645 (0.2595) data time 0.0009 (0.0038) model time 0.2637 (0.2559) loss 5.4314 (5.9573) grad_norm 2.2008 (2.0226) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][190/625] eta 0:01:52 lr 0.001106 wd 0.0500 time 0.2549 (0.2593) data time 0.0007 (0.0036) model time 0.2542 (0.2559) loss 6.1009 (5.9681) grad_norm 2.0856 (2.0461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][200/625] eta 0:01:50 lr 0.001106 wd 0.0500 time 0.2565 (0.2592) data time 0.0009 (0.0035) model time 0.2556 (0.2558) loss 6.1589 (5.9639) grad_norm 2.3798 (2.0382) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][210/625] eta 0:01:47 lr 0.001106 wd 0.0500 time 0.2554 (0.2592) data time 0.0008 (0.0034) model time 0.2545 (0.2560) loss 5.8444 (5.9538) grad_norm 1.1810 (2.0234) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][220/625] eta 0:01:44 lr 0.001106 wd 0.0500 time 0.2607 (0.2591) data time 0.0005 (0.0033) model time 0.2602 (0.2560) loss 7.1219 (5.9732) grad_norm 1.2207 (2.0194) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][230/625] eta 0:01:42 lr 0.001106 wd 0.0500 time 0.2586 (0.2590) data time 0.0006 (0.0032) model time 0.2580 (0.2560) loss 6.1067 (5.9755) grad_norm 2.1784 (2.0033) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:40:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][240/625] eta 0:01:39 lr 0.001106 wd 0.0500 time 0.2581 (0.2589) data time 0.0013 (0.0031) model time 0.2569 (0.2560) loss 6.0286 (5.9678) grad_norm 1.2249 (1.9890) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][250/625] eta 0:01:37 lr 0.001105 wd 0.0500 time 0.2606 (0.2597) data time 0.0006 (0.0030) model time 0.2599 (0.2571) loss 6.2616 (5.9646) grad_norm 2.7799 (1.9864) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][260/625] eta 0:01:34 lr 0.001105 wd 0.0500 time 0.2600 (0.2596) data time 0.0009 (0.0029) model time 0.2591 (0.2570) loss 6.6539 (5.9736) grad_norm 1.9519 (1.9898) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][270/625] eta 0:01:32 lr 0.001105 wd 0.0500 time 0.2511 (0.2594) data time 0.0009 (0.0029) model time 0.2502 (0.2569) loss 6.1593 (5.9640) grad_norm 2.0404 (2.0017) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][280/625] eta 0:01:29 lr 0.001105 wd 0.0500 time 0.2542 (0.2593) data time 0.0008 (0.0028) model time 0.2534 (0.2568) loss 7.1627 (5.9627) grad_norm 2.8610 (1.9995) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][290/625] eta 0:01:26 lr 0.001105 wd 0.0500 time 0.2570 (0.2593) data time 0.0007 (0.0027) model time 0.2563 (0.2568) loss 6.4261 (5.9643) grad_norm 2.2926 (1.9903) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][300/625] eta 0:01:24 lr 0.001104 wd 0.0500 time 0.2582 (0.2601) data time 0.0008 (0.0027) model time 0.2574 (0.2578) loss 5.4351 (5.9492) grad_norm 1.6747 (1.9781) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][310/625] eta 0:01:21 lr 0.001104 wd 0.0500 time 0.2546 (0.2599) data time 0.0016 (0.0026) model time 0.2530 (0.2577) loss 5.4640 (5.9631) grad_norm 1.6871 (2.0109) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][320/625] eta 0:01:19 lr 0.001104 wd 0.0500 time 0.2559 (0.2598) data time 0.0008 (0.0026) model time 0.2551 (0.2576) loss 6.2648 (5.9618) grad_norm 2.7101 (2.0301) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][330/625] eta 0:01:16 lr 0.001104 wd 0.0500 time 0.2544 (0.2597) data time 0.0009 (0.0025) model time 0.2535 (0.2575) loss 5.8850 (5.9659) grad_norm 1.5384 (2.0312) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][340/625] eta 0:01:13 lr 0.001104 wd 0.0500 time 0.2567 (0.2596) data time 0.0008 (0.0025) model time 0.2559 (0.2574) loss 4.7465 (5.9623) grad_norm 2.9905 (2.0436) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][350/625] eta 0:01:11 lr 0.001104 wd 0.0500 time 0.2551 (0.2595) data time 0.0008 (0.0024) model time 0.2543 (0.2574) loss 5.6460 (5.9590) grad_norm 1.9249 (2.0378) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][360/625] eta 0:01:08 lr 0.001103 wd 0.0500 time 0.2549 (0.2594) data time 0.0009 (0.0024) model time 0.2539 (0.2573) loss 5.1840 (5.9652) grad_norm 2.0844 (2.0357) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][370/625] eta 0:01:06 lr 0.001103 wd 0.0500 time 0.2555 (0.2593) data time 0.0010 (0.0023) model time 0.2544 (0.2572) loss 6.4082 (5.9699) grad_norm 2.8918 (2.0412) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][380/625] eta 0:01:03 lr 0.001103 wd 0.0500 time 0.2542 (0.2592) data time 0.0008 (0.0023) model time 0.2534 (0.2571) loss 5.0491 (5.9634) grad_norm 1.7808 (2.0414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][390/625] eta 0:01:00 lr 0.001103 wd 0.0500 time 0.2563 (0.2591) data time 0.0011 (0.0023) model time 0.2552 (0.2571) loss 6.5872 (5.9683) grad_norm 1.8097 (2.0279) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][400/625] eta 0:00:58 lr 0.001103 wd 0.0500 time 0.2548 (0.2590) data time 0.0006 (0.0022) model time 0.2542 (0.2570) loss 4.9897 (5.9610) grad_norm 3.1987 (2.0323) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][410/625] eta 0:00:55 lr 0.001103 wd 0.0500 time 0.2534 (0.2590) data time 0.0009 (0.0022) model time 0.2525 (0.2570) loss 6.4421 (5.9641) grad_norm 1.2605 (2.0297) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][420/625] eta 0:00:53 lr 0.001102 wd 0.0500 time 0.2599 (0.2589) data time 0.0006 (0.0022) model time 0.2593 (0.2570) loss 5.7377 (5.9729) grad_norm 2.0969 (2.0286) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][430/625] eta 0:00:50 lr 0.001102 wd 0.0500 time 0.2570 (0.2588) data time 0.0008 (0.0021) model time 0.2562 (0.2569) loss 4.8680 (5.9699) grad_norm 1.6237 (2.0367) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][440/625] eta 0:00:47 lr 0.001102 wd 0.0500 time 0.2588 (0.2588) data time 0.0009 (0.0021) model time 0.2579 (0.2568) loss 6.1059 (5.9728) grad_norm 2.6717 (2.0375) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][450/625] eta 0:00:45 lr 0.001102 wd 0.0500 time 0.2556 (0.2587) data time 0.0007 (0.0021) model time 0.2548 (0.2568) loss 6.6086 (5.9783) grad_norm 1.6302 (2.0310) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][460/625] eta 0:00:42 lr 0.001102 wd 0.0500 time 0.2578 (0.2587) data time 0.0009 (0.0021) model time 0.2569 (0.2568) loss 6.2454 (5.9807) grad_norm 2.5675 (2.0224) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][470/625] eta 0:00:40 lr 0.001101 wd 0.0500 time 0.2578 (0.2586) data time 0.0006 (0.0020) model time 0.2572 (0.2567) loss 6.9241 (5.9818) grad_norm 2.6143 (2.0285) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][480/625] eta 0:00:37 lr 0.001101 wd 0.0500 time 0.2602 (0.2586) data time 0.0006 (0.0020) model time 0.2596 (0.2567) loss 6.9041 (5.9853) grad_norm 1.4659 (2.0214) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][490/625] eta 0:00:34 lr 0.001101 wd 0.0500 time 0.2535 (0.2585) data time 0.0007 (0.0020) model time 0.2528 (0.2566) loss 6.7715 (5.9915) grad_norm 1.7637 (2.0225) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][500/625] eta 0:00:32 lr 0.001101 wd 0.0500 time 0.2530 (0.2584) data time 0.0009 (0.0020) model time 0.2521 (0.2566) loss 6.3493 (5.9935) grad_norm 2.3242 (2.0164) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][510/625] eta 0:00:29 lr 0.001101 wd 0.0500 time 0.2550 (0.2584) data time 0.0012 (0.0020) model time 0.2539 (0.2566) loss 4.4270 (5.9850) grad_norm 1.2706 (2.0084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][520/625] eta 0:00:27 lr 0.001101 wd 0.0500 time 0.2532 (0.2584) data time 0.0010 (0.0019) model time 0.2522 (0.2565) loss 5.9586 (5.9823) grad_norm 2.2436 (2.0040) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][530/625] eta 0:00:24 lr 0.001100 wd 0.0500 time 0.2542 (0.2583) data time 0.0008 (0.0019) model time 0.2534 (0.2565) loss 5.3296 (5.9740) grad_norm 3.4232 (2.0112) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][540/625] eta 0:00:21 lr 0.001100 wd 0.0500 time 0.2566 (0.2583) data time 0.0006 (0.0019) model time 0.2560 (0.2565) loss 5.9270 (5.9750) grad_norm 1.2739 (2.0125) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][550/625] eta 0:00:19 lr 0.001100 wd 0.0500 time 0.2570 (0.2583) data time 0.0009 (0.0019) model time 0.2561 (0.2565) loss 6.5115 (5.9692) grad_norm 2.1684 (2.0095) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][560/625] eta 0:00:16 lr 0.001100 wd 0.0500 time 0.2558 (0.2583) data time 0.0012 (0.0019) model time 0.2546 (0.2565) loss 6.0120 (5.9686) grad_norm 1.9705 (2.0108) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][570/625] eta 0:00:14 lr 0.001100 wd 0.0500 time 0.2532 (0.2582) data time 0.0010 (0.0018) model time 0.2522 (0.2565) loss 5.7707 (5.9651) grad_norm 2.3595 (2.0117) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][580/625] eta 0:00:11 lr 0.001100 wd 0.0500 time 0.2531 (0.2582) data time 0.0007 (0.0018) model time 0.2524 (0.2565) loss 6.1809 (5.9717) grad_norm 1.5436 (2.0074) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][590/625] eta 0:00:09 lr 0.001099 wd 0.0500 time 0.2580 (0.2582) data time 0.0008 (0.0018) model time 0.2571 (0.2564) loss 5.0612 (5.9665) grad_norm 1.6426 (2.0054) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][600/625] eta 0:00:06 lr 0.001099 wd 0.0500 time 0.2564 (0.2581) data time 0.0006 (0.0018) model time 0.2558 (0.2564) loss 5.2002 (5.9594) grad_norm 1.4328 (2.0040) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][610/625] eta 0:00:03 lr 0.001099 wd 0.0500 time 0.2535 (0.2581) data time 0.0005 (0.0018) model time 0.2529 (0.2564) loss 6.2933 (5.9583) grad_norm 2.0176 (1.9994) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][620/625] eta 0:00:01 lr 0.001099 wd 0.0500 time 0.2555 (0.2583) data time 0.0005 (0.0018) model time 0.2550 (0.2566) loss 6.8294 (5.9556) grad_norm 1.5693 (1.9970) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 151 training takes 0:02:41 [2024-08-04 04:42:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:42:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:42:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.555 (0.555) Loss 0.6475 (0.6475) Acc@1 88.525 (88.525) Acc@5 98.193 (98.193) Mem 9655MB [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0107 (0.7885) Acc@1 78.809 (84.322) Acc@5 95.068 (97.039) Mem 9655MB [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.1416 (0.9292) Acc@1 74.463 (80.622) Acc@5 92.627 (95.389) Mem 9655MB [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.334 Acc@5 95.399 [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.33% [2024-08-04 04:42:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:42:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.5913 (0.5913) Acc@1 88.867 (88.867) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 04:42:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9409 (0.7302) Acc@1 79.346 (85.201) Acc@5 95.312 (97.434) Mem 9655MB [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0752 (0.8618) Acc@1 75.000 (81.676) Acc@5 93.848 (95.964) Mem 9655MB [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.374 Acc@5 95.951 [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.37% [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:42:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][0/625] eta 0:07:23 lr 0.001099 wd 0.0500 time 0.7100 (0.7100) data time 0.4705 (0.4705) model time 0.0000 (0.0000) loss 5.4733 (5.4733) grad_norm 1.7566 (1.7566) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][10/625] eta 0:03:02 lr 0.001099 wd 0.0500 time 0.2551 (0.2964) data time 0.0008 (0.0436) model time 0.0000 (0.0000) loss 5.7098 (5.8958) grad_norm 1.2228 (2.0611) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][20/625] eta 0:02:47 lr 0.001098 wd 0.0500 time 0.2592 (0.2769) data time 0.0007 (0.0233) model time 0.0000 (0.0000) loss 5.3453 (6.1111) grad_norm 7.4417 (2.5425) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][30/625] eta 0:02:40 lr 0.001098 wd 0.0500 time 0.2559 (0.2705) data time 0.0013 (0.0161) model time 0.0000 (0.0000) loss 5.6662 (6.0451) grad_norm 1.9091 (2.5665) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][40/625] eta 0:02:36 lr 0.001098 wd 0.0500 time 0.2588 (0.2677) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 5.3433 (6.0564) grad_norm 2.9798 (2.4810) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][50/625] eta 0:02:32 lr 0.001098 wd 0.0500 time 0.2551 (0.2655) data time 0.0010 (0.0102) model time 0.0000 (0.0000) loss 6.1004 (6.0753) grad_norm 2.4371 (2.3658) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:42:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][60/625] eta 0:02:29 lr 0.001098 wd 0.0500 time 0.2566 (0.2639) data time 0.0007 (0.0087) model time 0.2559 (0.2549) loss 4.5843 (5.9918) grad_norm 1.7445 (2.2340) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][70/625] eta 0:02:25 lr 0.001098 wd 0.0500 time 0.2499 (0.2630) data time 0.0007 (0.0076) model time 0.2493 (0.2557) loss 6.5720 (6.0075) grad_norm 1.8358 (2.2360) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][80/625] eta 0:02:22 lr 0.001097 wd 0.0500 time 0.2595 (0.2621) data time 0.0006 (0.0068) model time 0.2589 (0.2554) loss 5.0982 (6.0139) grad_norm 3.5090 (2.2961) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][90/625] eta 0:02:19 lr 0.001097 wd 0.0500 time 0.2683 (0.2616) data time 0.0008 (0.0061) model time 0.2674 (0.2557) loss 5.3634 (6.0500) grad_norm 2.4579 (2.2898) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][100/625] eta 0:02:17 lr 0.001097 wd 0.0500 time 0.2546 (0.2611) data time 0.0009 (0.0056) model time 0.2537 (0.2557) loss 6.0481 (6.0373) grad_norm 1.7368 (2.2474) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][110/625] eta 0:02:14 lr 0.001097 wd 0.0500 time 0.2519 (0.2607) data time 0.0011 (0.0052) model time 0.2508 (0.2556) loss 6.9091 (6.0252) grad_norm 1.3653 (2.1935) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][120/625] eta 0:02:11 lr 0.001097 wd 0.0500 time 0.2562 (0.2603) data time 0.0011 (0.0048) model time 0.2550 (0.2556) loss 6.0801 (5.9959) grad_norm 1.4821 (2.1587) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][130/625] eta 0:02:08 lr 0.001096 wd 0.0500 time 0.2566 (0.2600) data time 0.0009 (0.0045) model time 0.2557 (0.2556) loss 6.6283 (6.0123) grad_norm 1.4205 (2.1118) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][140/625] eta 0:02:05 lr 0.001096 wd 0.0500 time 0.2583 (0.2598) data time 0.0010 (0.0043) model time 0.2573 (0.2555) loss 6.7029 (6.0007) grad_norm 2.5383 (2.0820) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][150/625] eta 0:02:03 lr 0.001096 wd 0.0500 time 0.2549 (0.2595) data time 0.0009 (0.0041) model time 0.2540 (0.2555) loss 5.4163 (5.9745) grad_norm 1.5468 (2.0858) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][160/625] eta 0:02:00 lr 0.001096 wd 0.0500 time 0.2591 (0.2593) data time 0.0007 (0.0039) model time 0.2584 (0.2555) loss 6.0028 (5.9640) grad_norm 1.8802 (2.0689) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][170/625] eta 0:01:57 lr 0.001096 wd 0.0500 time 0.2543 (0.2591) data time 0.0008 (0.0037) model time 0.2536 (0.2554) loss 6.4791 (5.9653) grad_norm 1.7351 (2.0545) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][180/625] eta 0:01:55 lr 0.001096 wd 0.0500 time 0.2588 (0.2589) data time 0.0011 (0.0036) model time 0.2578 (0.2554) loss 6.7221 (5.9681) grad_norm 1.3995 (2.0587) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][190/625] eta 0:01:52 lr 0.001095 wd 0.0500 time 0.2597 (0.2587) data time 0.0008 (0.0034) model time 0.2589 (0.2553) loss 6.8855 (5.9611) grad_norm 1.8570 (2.0518) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][200/625] eta 0:01:49 lr 0.001095 wd 0.0500 time 0.2601 (0.2586) data time 0.0005 (0.0033) model time 0.2595 (0.2553) loss 5.0405 (5.9675) grad_norm 1.4621 (2.0476) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][210/625] eta 0:01:47 lr 0.001095 wd 0.0500 time 0.2541 (0.2585) data time 0.0007 (0.0032) model time 0.2534 (0.2554) loss 5.9169 (5.9681) grad_norm 2.6799 (2.0798) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][220/625] eta 0:01:44 lr 0.001095 wd 0.0500 time 0.2524 (0.2584) data time 0.0010 (0.0031) model time 0.2513 (0.2553) loss 5.9275 (5.9505) grad_norm 1.2733 (2.0603) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][230/625] eta 0:01:42 lr 0.001095 wd 0.0500 time 0.2515 (0.2583) data time 0.0009 (0.0030) model time 0.2506 (0.2553) loss 5.6453 (5.9393) grad_norm 3.3354 (2.0483) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][240/625] eta 0:01:39 lr 0.001094 wd 0.0500 time 0.2674 (0.2582) data time 0.0008 (0.0029) model time 0.2666 (0.2553) loss 6.0819 (5.9379) grad_norm 2.1084 (2.0502) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][250/625] eta 0:01:36 lr 0.001094 wd 0.0500 time 0.2596 (0.2581) data time 0.0009 (0.0028) model time 0.2588 (0.2553) loss 6.2084 (5.9507) grad_norm 2.2481 (2.1131) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][260/625] eta 0:01:34 lr 0.001094 wd 0.0500 time 0.2526 (0.2588) data time 0.0006 (0.0027) model time 0.2520 (0.2563) loss 5.5062 (5.9497) grad_norm 2.1903 (2.1353) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][270/625] eta 0:01:31 lr 0.001094 wd 0.0500 time 0.2705 (0.2588) data time 0.0009 (0.0027) model time 0.2696 (0.2563) loss 7.0332 (5.9412) grad_norm 3.7634 (2.1436) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][280/625] eta 0:01:29 lr 0.001094 wd 0.0500 time 0.2570 (0.2587) data time 0.0006 (0.0026) model time 0.2565 (0.2563) loss 6.9161 (5.9485) grad_norm 2.7476 (2.1433) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:43:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][290/625] eta 0:01:26 lr 0.001094 wd 0.0500 time 0.2587 (0.2586) data time 0.0009 (0.0026) model time 0.2578 (0.2562) loss 5.9246 (5.9559) grad_norm 1.2918 (2.1426) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][300/625] eta 0:01:24 lr 0.001093 wd 0.0500 time 0.2503 (0.2585) data time 0.0009 (0.0025) model time 0.2494 (0.2561) loss 6.2616 (5.9581) grad_norm 1.9332 (2.1461) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][310/625] eta 0:01:21 lr 0.001093 wd 0.0500 time 0.2556 (0.2585) data time 0.0008 (0.0024) model time 0.2548 (0.2562) loss 6.1942 (5.9755) grad_norm 1.8184 (2.1439) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][320/625] eta 0:01:18 lr 0.001093 wd 0.0500 time 0.2555 (0.2584) data time 0.0007 (0.0024) model time 0.2548 (0.2562) loss 5.2075 (5.9716) grad_norm 1.2274 (2.1337) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][330/625] eta 0:01:16 lr 0.001093 wd 0.0500 time 0.2586 (0.2584) data time 0.0009 (0.0024) model time 0.2577 (0.2562) loss 5.7956 (5.9775) grad_norm 1.5423 (2.1266) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][340/625] eta 0:01:13 lr 0.001093 wd 0.0500 time 0.2555 (0.2584) data time 0.0008 (0.0023) model time 0.2547 (0.2562) loss 6.1369 (5.9746) grad_norm 2.7416 (2.1246) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][350/625] eta 0:01:11 lr 0.001093 wd 0.0500 time 0.2580 (0.2583) data time 0.0006 (0.0023) model time 0.2574 (0.2562) loss 6.2288 (5.9703) grad_norm 1.8950 (2.1272) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][360/625] eta 0:01:08 lr 0.001092 wd 0.0500 time 0.2589 (0.2583) data time 0.0010 (0.0022) model time 0.2579 (0.2561) loss 7.2841 (5.9728) grad_norm 1.9468 (2.1206) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][370/625] eta 0:01:05 lr 0.001092 wd 0.0500 time 0.2609 (0.2583) data time 0.0008 (0.0022) model time 0.2601 (0.2562) loss 7.6057 (5.9773) grad_norm 1.7570 (2.1183) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][380/625] eta 0:01:03 lr 0.001092 wd 0.0500 time 0.2571 (0.2582) data time 0.0008 (0.0022) model time 0.2563 (0.2562) loss 7.0106 (5.9844) grad_norm 2.3833 (2.1151) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][390/625] eta 0:01:00 lr 0.001092 wd 0.0500 time 0.2571 (0.2582) data time 0.0008 (0.0022) model time 0.2563 (0.2561) loss 6.5766 (5.9888) grad_norm 1.8543 (2.1079) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][400/625] eta 0:00:58 lr 0.001092 wd 0.0500 time 0.2588 (0.2582) data time 0.0009 (0.0021) model time 0.2579 (0.2562) loss 6.4785 (5.9830) grad_norm 1.7614 (2.1016) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][410/625] eta 0:00:55 lr 0.001091 wd 0.0500 time 0.2557 (0.2581) data time 0.0010 (0.0021) model time 0.2547 (0.2561) loss 6.8302 (5.9851) grad_norm 1.5286 (2.0914) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][420/625] eta 0:00:52 lr 0.001091 wd 0.0500 time 0.2622 (0.2581) data time 0.0008 (0.0021) model time 0.2614 (0.2561) loss 5.5250 (5.9871) grad_norm 1.5488 (2.0856) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][430/625] eta 0:00:50 lr 0.001091 wd 0.0500 time 0.2556 (0.2581) data time 0.0010 (0.0020) model time 0.2546 (0.2561) loss 5.3064 (5.9866) grad_norm 1.3840 (2.0826) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][440/625] eta 0:00:47 lr 0.001091 wd 0.0500 time 0.2511 (0.2580) data time 0.0009 (0.0020) model time 0.2502 (0.2561) loss 6.5854 (5.9913) grad_norm 1.1985 (2.0923) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][450/625] eta 0:00:45 lr 0.001091 wd 0.0500 time 0.2590 (0.2580) data time 0.0019 (0.0020) model time 0.2571 (0.2560) loss 5.6123 (5.9848) grad_norm 2.2120 (2.0833) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][460/625] eta 0:00:42 lr 0.001091 wd 0.0500 time 0.2547 (0.2579) data time 0.0008 (0.0020) model time 0.2539 (0.2560) loss 6.6551 (5.9964) grad_norm 2.2080 (2.0763) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][470/625] eta 0:00:39 lr 0.001090 wd 0.0500 time 0.2573 (0.2579) data time 0.0010 (0.0020) model time 0.2563 (0.2560) loss 6.1307 (5.9960) grad_norm 1.6045 (2.0721) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][480/625] eta 0:00:37 lr 0.001090 wd 0.0500 time 0.2657 (0.2583) data time 0.0006 (0.0019) model time 0.2651 (0.2565) loss 4.8387 (5.9993) grad_norm 2.5695 (2.0760) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][490/625] eta 0:00:34 lr 0.001090 wd 0.0500 time 0.2573 (0.2583) data time 0.0008 (0.0019) model time 0.2564 (0.2565) loss 6.2254 (6.0059) grad_norm 1.3551 (2.0714) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][500/625] eta 0:00:32 lr 0.001090 wd 0.0500 time 0.2542 (0.2582) data time 0.0008 (0.0019) model time 0.2534 (0.2564) loss 6.2703 (6.0055) grad_norm 1.3066 (2.0687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][510/625] eta 0:00:29 lr 0.001090 wd 0.0500 time 0.2563 (0.2590) data time 0.0008 (0.0019) model time 0.2554 (0.2573) loss 5.3690 (6.0091) grad_norm 1.8019 (2.0654) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:44:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][520/625] eta 0:00:27 lr 0.001090 wd 0.0500 time 0.2560 (0.2589) data time 0.0009 (0.0019) model time 0.2551 (0.2573) loss 6.5011 (6.0090) grad_norm 3.0630 (2.0806) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][530/625] eta 0:00:24 lr 0.001089 wd 0.0500 time 0.2572 (0.2589) data time 0.0017 (0.0018) model time 0.2555 (0.2572) loss 5.2646 (6.0049) grad_norm 2.0883 (2.0966) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][540/625] eta 0:00:22 lr 0.001089 wd 0.0500 time 0.2583 (0.2588) data time 0.0009 (0.0018) model time 0.2575 (0.2572) loss 5.7279 (6.0087) grad_norm 1.5438 (2.1070) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][550/625] eta 0:00:19 lr 0.001089 wd 0.0500 time 0.2547 (0.2588) data time 0.0007 (0.0018) model time 0.2539 (0.2571) loss 6.1185 (6.0094) grad_norm 3.3051 (2.1124) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][560/625] eta 0:00:16 lr 0.001089 wd 0.0500 time 0.2553 (0.2587) data time 0.0006 (0.0018) model time 0.2546 (0.2571) loss 5.3964 (6.0110) grad_norm 2.2703 (2.1148) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][570/625] eta 0:00:14 lr 0.001089 wd 0.0500 time 0.2536 (0.2587) data time 0.0009 (0.0018) model time 0.2527 (0.2571) loss 6.8431 (6.0131) grad_norm 1.7125 (2.1199) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][580/625] eta 0:00:11 lr 0.001088 wd 0.0500 time 0.2557 (0.2586) data time 0.0006 (0.0018) model time 0.2551 (0.2570) loss 6.6473 (6.0087) grad_norm 2.0444 (2.1258) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][590/625] eta 0:00:09 lr 0.001088 wd 0.0500 time 0.2563 (0.2586) data time 0.0010 (0.0018) model time 0.2553 (0.2570) loss 6.1520 (6.0033) grad_norm 1.4263 (2.1295) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][600/625] eta 0:00:06 lr 0.001088 wd 0.0500 time 0.2513 (0.2585) data time 0.0009 (0.0017) model time 0.2504 (0.2570) loss 6.6210 (6.0088) grad_norm 1.8265 (2.1259) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][610/625] eta 0:00:03 lr 0.001088 wd 0.0500 time 0.2522 (0.2585) data time 0.0004 (0.0017) model time 0.2518 (0.2569) loss 7.1749 (6.0114) grad_norm 1.5844 (2.1234) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][620/625] eta 0:00:01 lr 0.001088 wd 0.0500 time 0.2536 (0.2584) data time 0.0006 (0.0017) model time 0.2531 (0.2568) loss 6.6126 (6.0149) grad_norm 2.6554 (2.1203) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 152 training takes 0:02:41 [2024-08-04 04:45:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:45:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:45:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.6123 (0.6123) Acc@1 88.672 (88.672) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 04:45:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 1.0059 (0.7864) Acc@1 78.662 (83.936) Acc@5 95.068 (97.128) Mem 9655MB [2024-08-04 04:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1738 (0.9297) Acc@1 74.316 (80.464) Acc@5 92.871 (95.482) Mem 9655MB [2024-08-04 04:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.272 Acc@5 95.491 [2024-08-04 04:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 04:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.805 (0.805) Loss 0.5908 (0.5908) Acc@1 89.062 (89.062) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9409 (0.7306) Acc@1 79.297 (85.227) Acc@5 95.410 (97.434) Mem 9655MB [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0752 (0.8619) Acc@1 75.000 (81.706) Acc@5 93.945 (95.989) Mem 9655MB [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.420 Acc@5 95.981 [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.42% [2024-08-04 04:45:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:45:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][0/625] eta 0:08:07 lr 0.001088 wd 0.0500 time 0.7806 (0.7806) data time 0.5320 (0.5320) model time 0.0000 (0.0000) loss 5.6615 (5.6615) grad_norm 2.7415 (2.7415) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][10/625] eta 0:03:06 lr 0.001087 wd 0.0500 time 0.2567 (0.3027) data time 0.0008 (0.0494) model time 0.0000 (0.0000) loss 5.8846 (6.2812) grad_norm 1.9310 (1.9080) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][20/625] eta 0:02:49 lr 0.001087 wd 0.0500 time 0.2541 (0.2802) data time 0.0010 (0.0263) model time 0.0000 (0.0000) loss 5.2764 (6.0082) grad_norm 2.1241 (1.8278) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][30/625] eta 0:02:42 lr 0.001087 wd 0.0500 time 0.2577 (0.2723) data time 0.0010 (0.0181) model time 0.0000 (0.0000) loss 6.2291 (6.0284) grad_norm 2.7546 (1.9947) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][40/625] eta 0:02:36 lr 0.001087 wd 0.0500 time 0.2573 (0.2683) data time 0.0011 (0.0139) model time 0.0000 (0.0000) loss 6.5582 (6.0395) grad_norm 2.1836 (2.0281) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][50/625] eta 0:02:32 lr 0.001087 wd 0.0500 time 0.2574 (0.2659) data time 0.0007 (0.0114) model time 0.0000 (0.0000) loss 6.5372 (6.0197) grad_norm 1.5604 (1.9284) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][60/625] eta 0:02:29 lr 0.001087 wd 0.0500 time 0.2575 (0.2644) data time 0.0007 (0.0097) model time 0.2568 (0.2558) loss 7.4637 (6.0429) grad_norm 1.4420 (1.9557) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][70/625] eta 0:02:26 lr 0.001086 wd 0.0500 time 0.2553 (0.2631) data time 0.0007 (0.0084) model time 0.2546 (0.2551) loss 4.7181 (6.0801) grad_norm 1.9325 (1.9325) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][80/625] eta 0:02:22 lr 0.001086 wd 0.0500 time 0.2545 (0.2623) data time 0.0008 (0.0075) model time 0.2537 (0.2553) loss 6.5855 (6.1131) grad_norm 2.3150 (1.9896) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][90/625] eta 0:02:19 lr 0.001086 wd 0.0500 time 0.2548 (0.2617) data time 0.0008 (0.0068) model time 0.2539 (0.2553) loss 5.2035 (6.0615) grad_norm 2.2164 (2.0337) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][100/625] eta 0:02:17 lr 0.001086 wd 0.0500 time 0.2576 (0.2612) data time 0.0009 (0.0062) model time 0.2567 (0.2555) loss 7.2949 (6.0805) grad_norm 1.5164 (2.0077) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:45:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][110/625] eta 0:02:15 lr 0.001086 wd 0.0500 time 0.2383 (0.2626) data time 0.0009 (0.0057) model time 0.2374 (0.2589) loss 5.8975 (6.0683) grad_norm 1.9678 (2.0004) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][120/625] eta 0:02:12 lr 0.001086 wd 0.0500 time 0.2699 (0.2622) data time 0.0012 (0.0053) model time 0.2686 (0.2586) loss 7.3074 (6.0641) grad_norm 2.3214 (1.9812) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][130/625] eta 0:02:09 lr 0.001085 wd 0.0500 time 0.2556 (0.2618) data time 0.0008 (0.0050) model time 0.2548 (0.2583) loss 5.1839 (6.0571) grad_norm 1.8514 (1.9781) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][140/625] eta 0:02:06 lr 0.001085 wd 0.0500 time 0.2599 (0.2614) data time 0.0009 (0.0047) model time 0.2590 (0.2579) loss 5.2765 (6.0521) grad_norm 2.4635 (1.9692) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][150/625] eta 0:02:03 lr 0.001085 wd 0.0500 time 0.2605 (0.2610) data time 0.0008 (0.0045) model time 0.2598 (0.2576) loss 5.4031 (6.0174) grad_norm 2.5740 (1.9581) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][160/625] eta 0:02:01 lr 0.001085 wd 0.0500 time 0.2542 (0.2606) data time 0.0009 (0.0043) model time 0.2533 (0.2572) loss 4.6532 (5.9940) grad_norm 2.1526 (1.9478) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][170/625] eta 0:01:58 lr 0.001085 wd 0.0500 time 0.2607 (0.2604) data time 0.0008 (0.0041) model time 0.2599 (0.2571) loss 6.3580 (6.0184) grad_norm 1.9449 (1.9538) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][180/625] eta 0:01:55 lr 0.001084 wd 0.0500 time 0.2600 (0.2602) data time 0.0006 (0.0039) model time 0.2593 (0.2570) loss 6.5274 (6.0358) grad_norm 2.9692 (1.9609) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][190/625] eta 0:01:53 lr 0.001084 wd 0.0500 time 0.2567 (0.2599) data time 0.0008 (0.0037) model time 0.2559 (0.2568) loss 5.2640 (6.0420) grad_norm 2.0002 (1.9541) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][200/625] eta 0:01:50 lr 0.001084 wd 0.0500 time 0.2547 (0.2597) data time 0.0008 (0.0036) model time 0.2539 (0.2567) loss 5.5624 (6.0421) grad_norm 2.2042 (1.9611) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][210/625] eta 0:01:47 lr 0.001084 wd 0.0500 time 0.2485 (0.2596) data time 0.0009 (0.0035) model time 0.2476 (0.2566) loss 5.7096 (6.0534) grad_norm 1.6999 (1.9545) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][220/625] eta 0:01:45 lr 0.001084 wd 0.0500 time 0.2563 (0.2594) data time 0.0011 (0.0034) model time 0.2551 (0.2565) loss 5.3321 (6.0440) grad_norm 2.0597 (1.9444) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][230/625] eta 0:01:42 lr 0.001084 wd 0.0500 time 0.2512 (0.2592) data time 0.0009 (0.0032) model time 0.2502 (0.2564) loss 5.4508 (6.0313) grad_norm 2.2146 (1.9393) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][240/625] eta 0:01:39 lr 0.001083 wd 0.0500 time 0.2557 (0.2591) data time 0.0008 (0.0031) model time 0.2549 (0.2564) loss 5.7376 (6.0136) grad_norm 2.0582 (1.9437) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][250/625] eta 0:01:37 lr 0.001083 wd 0.0500 time 0.2580 (0.2590) data time 0.0008 (0.0031) model time 0.2572 (0.2563) loss 4.8746 (6.0261) grad_norm 2.8047 (1.9910) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][260/625] eta 0:01:34 lr 0.001083 wd 0.0500 time 0.2591 (0.2588) data time 0.0007 (0.0030) model time 0.2584 (0.2562) loss 6.4588 (6.0326) grad_norm 1.8676 (2.0154) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][270/625] eta 0:01:31 lr 0.001083 wd 0.0500 time 0.2557 (0.2588) data time 0.0008 (0.0029) model time 0.2549 (0.2562) loss 6.9825 (6.0395) grad_norm 1.9258 (2.0142) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][280/625] eta 0:01:29 lr 0.001083 wd 0.0500 time 0.2599 (0.2587) data time 0.0008 (0.0028) model time 0.2592 (0.2562) loss 6.5142 (6.0371) grad_norm 1.5693 (2.0101) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][290/625] eta 0:01:26 lr 0.001083 wd 0.0500 time 0.2556 (0.2586) data time 0.0007 (0.0028) model time 0.2549 (0.2561) loss 4.7283 (6.0377) grad_norm 1.6025 (2.0080) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][300/625] eta 0:01:24 lr 0.001082 wd 0.0500 time 0.2547 (0.2585) data time 0.0009 (0.0027) model time 0.2538 (0.2561) loss 5.0755 (6.0425) grad_norm 1.6250 (2.0001) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][310/625] eta 0:01:21 lr 0.001082 wd 0.0500 time 0.2536 (0.2584) data time 0.0009 (0.0026) model time 0.2527 (0.2560) loss 6.7919 (6.0568) grad_norm 1.4400 (1.9892) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][320/625] eta 0:01:18 lr 0.001082 wd 0.0500 time 0.2557 (0.2584) data time 0.0007 (0.0026) model time 0.2550 (0.2560) loss 5.2423 (6.0475) grad_norm 1.9446 (1.9953) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:46:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][330/625] eta 0:01:16 lr 0.001082 wd 0.0500 time 0.2562 (0.2583) data time 0.0006 (0.0025) model time 0.2556 (0.2560) loss 5.9894 (6.0342) grad_norm 1.3807 (1.9938) loss_scale 4096.0000 (2097.4985) mem 9655MB [2024-08-04 04:46:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][340/625] eta 0:01:13 lr 0.001082 wd 0.0500 time 0.2570 (0.2582) data time 0.0008 (0.0025) model time 0.2561 (0.2560) loss 6.2305 (6.0449) grad_norm 1.6855 (1.9854) loss_scale 4096.0000 (2156.1056) mem 9655MB [2024-08-04 04:47:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][350/625] eta 0:01:10 lr 0.001081 wd 0.0500 time 0.2553 (0.2582) data time 0.0008 (0.0024) model time 0.2545 (0.2559) loss 4.9774 (6.0404) grad_norm 2.1229 (1.9921) loss_scale 4096.0000 (2211.3732) mem 9655MB [2024-08-04 04:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][360/625] eta 0:01:08 lr 0.001081 wd 0.0500 time 0.2606 (0.2581) data time 0.0007 (0.0024) model time 0.2599 (0.2559) loss 6.9381 (6.0425) grad_norm 2.1864 (1.9895) loss_scale 4096.0000 (2263.5789) mem 9655MB [2024-08-04 04:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][370/625] eta 0:01:05 lr 0.001081 wd 0.0500 time 0.2501 (0.2580) data time 0.0006 (0.0024) model time 0.2495 (0.2559) loss 6.7939 (6.0467) grad_norm 1.3019 (1.9828) loss_scale 4096.0000 (2312.9704) mem 9655MB [2024-08-04 04:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][380/625] eta 0:01:03 lr 0.001081 wd 0.0500 time 0.2524 (0.2580) data time 0.0009 (0.0023) model time 0.2515 (0.2558) loss 5.8018 (6.0561) grad_norm 1.2985 (1.9798) loss_scale 4096.0000 (2359.7690) mem 9655MB [2024-08-04 04:47:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][390/625] eta 0:01:00 lr 0.001081 wd 0.0500 time 0.2596 (0.2579) data time 0.0006 (0.0023) model time 0.2590 (0.2558) loss 6.6527 (6.0571) grad_norm 1.9726 (1.9799) loss_scale 4096.0000 (2404.1739) mem 9655MB [2024-08-04 04:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][400/625] eta 0:00:58 lr 0.001081 wd 0.0500 time 0.2533 (0.2579) data time 0.0009 (0.0022) model time 0.2524 (0.2558) loss 5.1045 (6.0473) grad_norm 1.8853 (1.9760) loss_scale 4096.0000 (2446.3641) mem 9655MB [2024-08-04 04:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][410/625] eta 0:00:55 lr 0.001080 wd 0.0500 time 0.2554 (0.2578) data time 0.0011 (0.0022) model time 0.2543 (0.2557) loss 6.1912 (6.0438) grad_norm 1.6177 (1.9854) loss_scale 4096.0000 (2486.5012) mem 9655MB [2024-08-04 04:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][420/625] eta 0:00:52 lr 0.001080 wd 0.0500 time 0.2541 (0.2578) data time 0.0008 (0.0022) model time 0.2533 (0.2557) loss 6.0164 (6.0437) grad_norm 2.0595 (1.9870) loss_scale 4096.0000 (2524.7316) mem 9655MB [2024-08-04 04:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][430/625] eta 0:00:50 lr 0.001080 wd 0.0500 time 0.2595 (0.2578) data time 0.0010 (0.0022) model time 0.2585 (0.2558) loss 4.5228 (6.0366) grad_norm 1.2902 (1.9791) loss_scale 4096.0000 (2561.1879) mem 9655MB [2024-08-04 04:47:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][440/625] eta 0:00:47 lr 0.001080 wd 0.0500 time 0.2575 (0.2578) data time 0.0007 (0.0021) model time 0.2567 (0.2558) loss 6.1022 (6.0310) grad_norm 1.9528 (1.9697) loss_scale 4096.0000 (2595.9909) mem 9655MB [2024-08-04 04:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][450/625] eta 0:00:45 lr 0.001080 wd 0.0500 time 0.2596 (0.2578) data time 0.0011 (0.0021) model time 0.2585 (0.2558) loss 6.2493 (6.0340) grad_norm 1.8090 (1.9727) loss_scale 4096.0000 (2629.2506) mem 9655MB [2024-08-04 04:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][460/625] eta 0:00:42 lr 0.001080 wd 0.0500 time 0.2560 (0.2578) data time 0.0008 (0.0021) model time 0.2552 (0.2558) loss 6.1586 (6.0366) grad_norm 1.1945 (1.9831) loss_scale 4096.0000 (2661.0672) mem 9655MB [2024-08-04 04:47:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][470/625] eta 0:00:39 lr 0.001079 wd 0.0500 time 0.2561 (0.2578) data time 0.0008 (0.0021) model time 0.2553 (0.2558) loss 5.1233 (6.0272) grad_norm 3.2255 (1.9813) loss_scale 4096.0000 (2691.5329) mem 9655MB [2024-08-04 04:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][480/625] eta 0:00:37 lr 0.001079 wd 0.0500 time 0.2579 (0.2577) data time 0.0010 (0.0020) model time 0.2568 (0.2558) loss 6.9443 (6.0244) grad_norm 3.2159 (1.9893) loss_scale 4096.0000 (2720.7318) mem 9655MB [2024-08-04 04:47:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][490/625] eta 0:00:34 lr 0.001079 wd 0.0500 time 0.2555 (0.2577) data time 0.0008 (0.0020) model time 0.2547 (0.2558) loss 7.1962 (6.0274) grad_norm 4.6223 (2.0052) loss_scale 4096.0000 (2748.7413) mem 9655MB [2024-08-04 04:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][500/625] eta 0:00:32 lr 0.001079 wd 0.0500 time 0.2513 (0.2577) data time 0.0008 (0.0020) model time 0.2505 (0.2558) loss 6.6687 (6.0284) grad_norm 1.3860 (2.0103) loss_scale 4096.0000 (2775.6327) mem 9655MB [2024-08-04 04:47:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][510/625] eta 0:00:29 lr 0.001079 wd 0.0500 time 0.2538 (0.2577) data time 0.0011 (0.0020) model time 0.2527 (0.2558) loss 5.8760 (6.0309) grad_norm 2.7534 (2.0250) loss_scale 4096.0000 (2801.4716) mem 9655MB [2024-08-04 04:47:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][520/625] eta 0:00:27 lr 0.001078 wd 0.0500 time 0.2533 (0.2580) data time 0.0007 (0.0020) model time 0.2526 (0.2562) loss 4.6253 (6.0316) grad_norm 3.2569 (2.0260) loss_scale 4096.0000 (2826.3186) mem 9655MB [2024-08-04 04:47:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][530/625] eta 0:00:24 lr 0.001078 wd 0.0500 time 0.2516 (0.2580) data time 0.0009 (0.0019) model time 0.2506 (0.2562) loss 6.5093 (6.0386) grad_norm 1.5821 (2.0417) loss_scale 4096.0000 (2850.2298) mem 9655MB [2024-08-04 04:47:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][540/625] eta 0:00:21 lr 0.001078 wd 0.0500 time 0.2543 (0.2580) data time 0.0010 (0.0019) model time 0.2532 (0.2562) loss 6.0325 (6.0451) grad_norm 1.6531 (2.0441) loss_scale 4096.0000 (2873.2569) mem 9655MB [2024-08-04 04:47:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][550/625] eta 0:00:19 lr 0.001078 wd 0.0500 time 0.2519 (0.2580) data time 0.0010 (0.0019) model time 0.2509 (0.2562) loss 5.8624 (6.0393) grad_norm 2.4256 (2.0425) loss_scale 4096.0000 (2895.4483) mem 9655MB [2024-08-04 04:47:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][560/625] eta 0:00:16 lr 0.001078 wd 0.0500 time 0.2563 (0.2579) data time 0.0007 (0.0019) model time 0.2556 (0.2562) loss 5.6684 (6.0276) grad_norm 1.1242 (2.0368) loss_scale 4096.0000 (2916.8485) mem 9655MB [2024-08-04 04:47:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][570/625] eta 0:00:14 lr 0.001078 wd 0.0500 time 0.2574 (0.2579) data time 0.0007 (0.0019) model time 0.2567 (0.2562) loss 4.8912 (6.0214) grad_norm 1.2429 (2.0333) loss_scale 4096.0000 (2937.4991) mem 9655MB [2024-08-04 04:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][580/625] eta 0:00:11 lr 0.001077 wd 0.0500 time 0.2542 (0.2579) data time 0.0008 (0.0018) model time 0.2534 (0.2562) loss 5.9948 (6.0231) grad_norm 1.5123 (2.0285) loss_scale 4096.0000 (2957.4389) mem 9655MB [2024-08-04 04:48:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][590/625] eta 0:00:09 lr 0.001077 wd 0.0500 time 0.2560 (0.2579) data time 0.0009 (0.0018) model time 0.2551 (0.2562) loss 6.5619 (6.0291) grad_norm 1.9237 (2.0332) loss_scale 4096.0000 (2976.7039) mem 9655MB [2024-08-04 04:48:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][600/625] eta 0:00:06 lr 0.001077 wd 0.0500 time 0.2596 (0.2579) data time 0.0009 (0.0018) model time 0.2587 (0.2563) loss 5.6678 (6.0357) grad_norm 1.7102 (2.0355) loss_scale 4096.0000 (2995.3278) mem 9655MB [2024-08-04 04:48:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][610/625] eta 0:00:03 lr 0.001077 wd 0.0500 time 0.2514 (0.2579) data time 0.0004 (0.0018) model time 0.2511 (0.2562) loss 5.2968 (6.0359) grad_norm 1.4230 (2.0310) loss_scale 4096.0000 (3013.3421) mem 9655MB [2024-08-04 04:48:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][620/625] eta 0:00:01 lr 0.001077 wd 0.0500 time 0.2528 (0.2578) data time 0.0004 (0.0018) model time 0.2524 (0.2561) loss 5.3288 (6.0317) grad_norm 1.1672 (2.0301) loss_scale 4096.0000 (3030.7762) mem 9655MB [2024-08-04 04:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 153 training takes 0:02:41 [2024-08-04 04:48:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:48:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.551 (0.551) Loss 0.6729 (0.6729) Acc@1 88.574 (88.574) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 04:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0430 (0.8191) Acc@1 77.344 (84.371) Acc@5 95.215 (97.146) Mem 9655MB [2024-08-04 04:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.2012 (0.9582) Acc@1 73.438 (80.692) Acc@5 93.799 (95.508) Mem 9655MB [2024-08-04 04:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.444 Acc@5 95.485 [2024-08-04 04:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-04 04:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.44% [2024-08-04 04:48:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:48:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.590 (0.590) Loss 0.5908 (0.5908) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:48:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 0.9409 (0.7309) Acc@1 79.443 (85.276) Acc@5 95.459 (97.456) Mem 9655MB [2024-08-04 04:48:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0742 (0.8616) Acc@1 75.000 (81.727) Acc@5 93.994 (96.010) Mem 9655MB [2024-08-04 04:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.436 Acc@5 95.993 [2024-08-04 04:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 04:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.44% [2024-08-04 04:48:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:48:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][0/625] eta 0:07:54 lr 0.001077 wd 0.0500 time 0.7595 (0.7595) data time 0.5186 (0.5186) model time 0.0000 (0.0000) loss 6.8214 (6.8214) grad_norm 2.1483 (2.1483) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][10/625] eta 0:03:12 lr 0.001076 wd 0.0500 time 0.3921 (0.3135) data time 0.0006 (0.0479) model time 0.0000 (0.0000) loss 6.3837 (5.9485) grad_norm 1.3488 (1.6265) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][20/625] eta 0:02:53 lr 0.001076 wd 0.0500 time 0.2541 (0.2863) data time 0.0008 (0.0255) model time 0.0000 (0.0000) loss 5.1010 (5.7760) grad_norm 2.3232 (1.6237) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][30/625] eta 0:02:44 lr 0.001076 wd 0.0500 time 0.2614 (0.2769) data time 0.0006 (0.0176) model time 0.0000 (0.0000) loss 5.6845 (5.8985) grad_norm 2.0110 (1.9229) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][40/625] eta 0:02:39 lr 0.001076 wd 0.0500 time 0.2586 (0.2719) data time 0.0007 (0.0135) model time 0.0000 (0.0000) loss 5.2667 (5.8230) grad_norm 1.2347 (1.8953) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][50/625] eta 0:02:34 lr 0.001076 wd 0.0500 time 0.2510 (0.2687) data time 0.0009 (0.0110) model time 0.0000 (0.0000) loss 5.3093 (5.8418) grad_norm 1.2452 (1.8210) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][60/625] eta 0:02:30 lr 0.001076 wd 0.0500 time 0.2606 (0.2668) data time 0.0006 (0.0094) model time 0.2600 (0.2560) loss 4.7899 (5.7786) grad_norm 1.6203 (1.8146) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][70/625] eta 0:02:27 lr 0.001075 wd 0.0500 time 0.2551 (0.2652) data time 0.0007 (0.0082) model time 0.2544 (0.2554) loss 4.9769 (5.8232) grad_norm 1.7796 (1.8385) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][80/625] eta 0:02:23 lr 0.001075 wd 0.0500 time 0.2559 (0.2641) data time 0.0007 (0.0073) model time 0.2553 (0.2552) loss 4.5428 (5.7910) grad_norm 1.3348 (1.7961) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][90/625] eta 0:02:20 lr 0.001075 wd 0.0500 time 0.2556 (0.2632) data time 0.0009 (0.0066) model time 0.2547 (0.2552) loss 6.9210 (5.8293) grad_norm 2.6693 (1.9037) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][100/625] eta 0:02:17 lr 0.001075 wd 0.0500 time 0.2576 (0.2627) data time 0.0010 (0.0064) model time 0.2566 (0.2550) loss 5.5885 (5.8237) grad_norm 1.6823 (1.9797) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][110/625] eta 0:02:15 lr 0.001075 wd 0.0500 time 0.2568 (0.2638) data time 0.0007 (0.0059) model time 0.2560 (0.2580) loss 6.1422 (5.8297) grad_norm 2.4699 (1.9692) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][120/625] eta 0:02:12 lr 0.001074 wd 0.0500 time 0.2585 (0.2631) data time 0.0010 (0.0055) model time 0.2575 (0.2576) loss 6.1453 (5.8213) grad_norm 2.0457 (1.9645) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][130/625] eta 0:02:09 lr 0.001074 wd 0.0500 time 0.2553 (0.2625) data time 0.0007 (0.0051) model time 0.2546 (0.2572) loss 6.0541 (5.8062) grad_norm 2.9505 (1.9612) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][140/625] eta 0:02:07 lr 0.001074 wd 0.0500 time 0.2566 (0.2622) data time 0.0010 (0.0048) model time 0.2556 (0.2571) loss 6.9870 (5.8285) grad_norm 2.0168 (1.9717) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][150/625] eta 0:02:04 lr 0.001074 wd 0.0500 time 0.2551 (0.2617) data time 0.0007 (0.0046) model time 0.2544 (0.2568) loss 6.8881 (5.8447) grad_norm 1.5487 (1.9863) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][160/625] eta 0:02:01 lr 0.001074 wd 0.0500 time 0.2557 (0.2613) data time 0.0014 (0.0043) model time 0.2543 (0.2567) loss 7.0827 (5.8517) grad_norm 2.2523 (1.9743) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][170/625] eta 0:01:59 lr 0.001074 wd 0.0500 time 0.2623 (0.2623) data time 0.0010 (0.0041) model time 0.2612 (0.2583) loss 5.9557 (5.8694) grad_norm 3.0494 (2.0078) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][180/625] eta 0:01:56 lr 0.001073 wd 0.0500 time 0.2600 (0.2620) data time 0.0009 (0.0040) model time 0.2591 (0.2581) loss 5.9969 (5.8538) grad_norm 2.6824 (2.0039) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][190/625] eta 0:01:53 lr 0.001073 wd 0.0500 time 0.2555 (0.2617) data time 0.0007 (0.0038) model time 0.2548 (0.2580) loss 5.8614 (5.8493) grad_norm 2.3561 (2.0007) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][200/625] eta 0:01:51 lr 0.001073 wd 0.0500 time 0.2605 (0.2615) data time 0.0007 (0.0037) model time 0.2597 (0.2578) loss 5.8319 (5.8571) grad_norm 4.8939 (2.0128) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][210/625] eta 0:01:48 lr 0.001073 wd 0.0500 time 0.2584 (0.2622) data time 0.0012 (0.0035) model time 0.2572 (0.2589) loss 6.7860 (5.8834) grad_norm 3.1026 (2.0162) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][220/625] eta 0:01:46 lr 0.001073 wd 0.0500 time 0.2533 (0.2618) data time 0.0008 (0.0034) model time 0.2525 (0.2586) loss 5.1831 (5.8921) grad_norm 1.6515 (2.0225) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][230/625] eta 0:01:43 lr 0.001073 wd 0.0500 time 0.2537 (0.2616) data time 0.0011 (0.0033) model time 0.2526 (0.2585) loss 6.1099 (5.8812) grad_norm 1.9279 (2.0407) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][240/625] eta 0:01:40 lr 0.001072 wd 0.0500 time 0.2548 (0.2615) data time 0.0010 (0.0032) model time 0.2539 (0.2584) loss 5.2204 (5.8746) grad_norm 2.1338 (2.0421) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][250/625] eta 0:01:37 lr 0.001072 wd 0.0500 time 0.2536 (0.2613) data time 0.0008 (0.0031) model time 0.2528 (0.2582) loss 5.6618 (5.8724) grad_norm 2.0169 (2.0603) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][260/625] eta 0:01:35 lr 0.001072 wd 0.0500 time 0.2546 (0.2611) data time 0.0010 (0.0030) model time 0.2536 (0.2581) loss 6.5874 (5.8827) grad_norm 1.6807 (2.0574) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][270/625] eta 0:01:32 lr 0.001072 wd 0.0500 time 0.2551 (0.2609) data time 0.0007 (0.0030) model time 0.2544 (0.2579) loss 6.1117 (5.8820) grad_norm 2.2502 (2.0581) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][280/625] eta 0:01:29 lr 0.001072 wd 0.0500 time 0.2567 (0.2607) data time 0.0008 (0.0029) model time 0.2559 (0.2578) loss 6.7371 (5.8928) grad_norm 1.6735 (2.0568) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][290/625] eta 0:01:27 lr 0.001071 wd 0.0500 time 0.2591 (0.2605) data time 0.0006 (0.0028) model time 0.2585 (0.2577) loss 6.9670 (5.8966) grad_norm 2.7537 (2.0598) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][300/625] eta 0:01:24 lr 0.001071 wd 0.0500 time 0.2541 (0.2604) data time 0.0008 (0.0028) model time 0.2533 (0.2576) loss 5.9262 (5.9086) grad_norm 1.8390 (2.0474) loss_scale 4096.0000 (4096.0000) mem 9655MB [2024-08-04 04:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][310/625] eta 0:01:21 lr 0.001071 wd 0.0500 time 0.2525 (0.2602) data time 0.0006 (0.0027) model time 0.2518 (0.2575) loss 6.8420 (5.9099) grad_norm 1.8866 (inf) loss_scale 2048.0000 (4049.9035) mem 9655MB [2024-08-04 04:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][320/625] eta 0:01:19 lr 0.001071 wd 0.0500 time 0.2549 (0.2601) data time 0.0008 (0.0027) model time 0.2541 (0.2574) loss 6.4093 (5.9235) grad_norm 1.3378 (inf) loss_scale 2048.0000 (3987.5389) mem 9655MB [2024-08-04 04:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][330/625] eta 0:01:16 lr 0.001071 wd 0.0500 time 0.2581 (0.2599) data time 0.0010 (0.0026) model time 0.2571 (0.2573) loss 6.3559 (5.9248) grad_norm 1.2899 (inf) loss_scale 2048.0000 (3928.9426) mem 9655MB [2024-08-04 04:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][340/625] eta 0:01:14 lr 0.001071 wd 0.0500 time 0.2573 (0.2598) data time 0.0009 (0.0026) model time 0.2564 (0.2572) loss 5.7337 (5.9273) grad_norm 2.1230 (inf) loss_scale 2048.0000 (3873.7830) mem 9655MB [2024-08-04 04:49:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][350/625] eta 0:01:11 lr 0.001070 wd 0.0500 time 0.2507 (0.2597) data time 0.0007 (0.0025) model time 0.2500 (0.2570) loss 4.8099 (5.9220) grad_norm 1.7327 (inf) loss_scale 2048.0000 (3821.7664) mem 9655MB [2024-08-04 04:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][360/625] eta 0:01:08 lr 0.001070 wd 0.0500 time 0.2560 (0.2596) data time 0.0010 (0.0025) model time 0.2549 (0.2570) loss 5.8560 (5.9179) grad_norm 1.7291 (inf) loss_scale 2048.0000 (3772.6316) mem 9655MB [2024-08-04 04:49:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][370/625] eta 0:01:06 lr 0.001070 wd 0.0500 time 0.2562 (0.2595) data time 0.0010 (0.0024) model time 0.2552 (0.2569) loss 6.3517 (5.9107) grad_norm 1.2130 (inf) loss_scale 2048.0000 (3726.1456) mem 9655MB [2024-08-04 04:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][380/625] eta 0:01:03 lr 0.001070 wd 0.0500 time 0.2531 (0.2594) data time 0.0009 (0.0024) model time 0.2521 (0.2569) loss 6.0834 (5.9154) grad_norm 1.7882 (inf) loss_scale 2048.0000 (3682.0997) mem 9655MB [2024-08-04 04:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][390/625] eta 0:01:00 lr 0.001070 wd 0.0500 time 0.2576 (0.2593) data time 0.0005 (0.0023) model time 0.2570 (0.2569) loss 6.7963 (5.9204) grad_norm 1.2832 (inf) loss_scale 2048.0000 (3640.3069) mem 9655MB [2024-08-04 04:50:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][400/625] eta 0:00:58 lr 0.001070 wd 0.0500 time 0.2553 (0.2592) data time 0.0006 (0.0023) model time 0.2547 (0.2568) loss 4.9690 (5.9199) grad_norm 1.8514 (inf) loss_scale 2048.0000 (3600.5985) mem 9655MB [2024-08-04 04:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][410/625] eta 0:00:55 lr 0.001069 wd 0.0500 time 0.2579 (0.2591) data time 0.0007 (0.0023) model time 0.2571 (0.2568) loss 5.6825 (5.9270) grad_norm 2.7980 (inf) loss_scale 2048.0000 (3562.8224) mem 9655MB [2024-08-04 04:50:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][420/625] eta 0:00:53 lr 0.001069 wd 0.0500 time 0.2594 (0.2591) data time 0.0005 (0.0022) model time 0.2588 (0.2567) loss 6.2250 (5.9283) grad_norm 1.3917 (inf) loss_scale 2048.0000 (3526.8409) mem 9655MB [2024-08-04 04:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][430/625] eta 0:00:50 lr 0.001069 wd 0.0500 time 0.2534 (0.2590) data time 0.0006 (0.0022) model time 0.2528 (0.2567) loss 5.5195 (5.9295) grad_norm 3.2585 (inf) loss_scale 2048.0000 (3492.5290) mem 9655MB [2024-08-04 04:50:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][440/625] eta 0:00:47 lr 0.001069 wd 0.0500 time 0.2539 (0.2590) data time 0.0008 (0.0022) model time 0.2531 (0.2567) loss 4.9002 (5.9295) grad_norm 1.4359 (inf) loss_scale 2048.0000 (3459.7732) mem 9655MB [2024-08-04 04:50:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][450/625] eta 0:00:45 lr 0.001069 wd 0.0500 time 0.2611 (0.2590) data time 0.0007 (0.0022) model time 0.2604 (0.2567) loss 4.8880 (5.9349) grad_norm 1.8191 (inf) loss_scale 2048.0000 (3428.4701) mem 9655MB [2024-08-04 04:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][460/625] eta 0:00:42 lr 0.001068 wd 0.0500 time 0.2615 (0.2589) data time 0.0013 (0.0021) model time 0.2602 (0.2566) loss 6.1113 (5.9292) grad_norm 1.6677 (inf) loss_scale 2048.0000 (3398.5249) mem 9655MB [2024-08-04 04:50:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][470/625] eta 0:00:40 lr 0.001068 wd 0.0500 time 0.2526 (0.2588) data time 0.0009 (0.0021) model time 0.2517 (0.2566) loss 4.8512 (5.9259) grad_norm 2.6834 (inf) loss_scale 2048.0000 (3369.8514) mem 9655MB [2024-08-04 04:50:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][480/625] eta 0:00:37 lr 0.001068 wd 0.0500 time 0.2587 (0.2588) data time 0.0007 (0.0021) model time 0.2580 (0.2566) loss 4.5558 (5.9284) grad_norm 1.6528 (inf) loss_scale 2048.0000 (3342.3701) mem 9655MB [2024-08-04 04:50:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][490/625] eta 0:00:34 lr 0.001068 wd 0.0500 time 0.2559 (0.2588) data time 0.0009 (0.0021) model time 0.2550 (0.2566) loss 5.6986 (5.9251) grad_norm 2.2249 (inf) loss_scale 2048.0000 (3316.0081) mem 9655MB [2024-08-04 04:50:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][500/625] eta 0:00:32 lr 0.001068 wd 0.0500 time 0.2583 (0.2587) data time 0.0008 (0.0020) model time 0.2576 (0.2566) loss 6.6769 (5.9244) grad_norm 2.4579 (inf) loss_scale 2048.0000 (3290.6986) mem 9655MB [2024-08-04 04:50:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][510/625] eta 0:00:29 lr 0.001068 wd 0.0500 time 0.2512 (0.2587) data time 0.0008 (0.0020) model time 0.2505 (0.2565) loss 4.4838 (5.9278) grad_norm 2.6272 (inf) loss_scale 2048.0000 (3266.3796) mem 9655MB [2024-08-04 04:50:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][520/625] eta 0:00:27 lr 0.001067 wd 0.0500 time 0.2611 (0.2586) data time 0.0009 (0.0020) model time 0.2602 (0.2565) loss 5.7562 (5.9356) grad_norm 1.8274 (inf) loss_scale 2048.0000 (3242.9942) mem 9655MB [2024-08-04 04:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][530/625] eta 0:00:24 lr 0.001067 wd 0.0500 time 0.2533 (0.2586) data time 0.0010 (0.0020) model time 0.2523 (0.2565) loss 6.9152 (5.9431) grad_norm 2.6924 (inf) loss_scale 2048.0000 (3220.4896) mem 9655MB [2024-08-04 04:50:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][540/625] eta 0:00:21 lr 0.001067 wd 0.0500 time 0.2564 (0.2585) data time 0.0010 (0.0020) model time 0.2555 (0.2564) loss 5.4885 (5.9494) grad_norm 2.0394 (inf) loss_scale 2048.0000 (3198.8170) mem 9655MB [2024-08-04 04:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][550/625] eta 0:00:19 lr 0.001067 wd 0.0500 time 0.2589 (0.2587) data time 0.0006 (0.0019) model time 0.2582 (0.2567) loss 6.2849 (5.9474) grad_norm 1.8216 (inf) loss_scale 2048.0000 (3177.9310) mem 9655MB [2024-08-04 04:50:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][560/625] eta 0:00:16 lr 0.001067 wd 0.0500 time 0.2572 (0.2587) data time 0.0008 (0.0019) model time 0.2564 (0.2567) loss 5.5365 (5.9522) grad_norm 1.7817 (inf) loss_scale 2048.0000 (3157.7897) mem 9655MB [2024-08-04 04:50:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][570/625] eta 0:00:14 lr 0.001066 wd 0.0500 time 0.2591 (0.2587) data time 0.0006 (0.0019) model time 0.2585 (0.2566) loss 7.0377 (5.9533) grad_norm 1.2728 (inf) loss_scale 2048.0000 (3138.3538) mem 9655MB [2024-08-04 04:50:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][580/625] eta 0:00:11 lr 0.001066 wd 0.0500 time 0.2679 (0.2586) data time 0.0007 (0.0019) model time 0.2671 (0.2567) loss 6.7736 (5.9545) grad_norm 2.2346 (inf) loss_scale 2048.0000 (3119.5869) mem 9655MB [2024-08-04 04:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][590/625] eta 0:00:09 lr 0.001066 wd 0.0500 time 0.2564 (0.2586) data time 0.0014 (0.0019) model time 0.2550 (0.2566) loss 5.3403 (5.9524) grad_norm 1.2605 (inf) loss_scale 2048.0000 (3101.4552) mem 9655MB [2024-08-04 04:50:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][600/625] eta 0:00:06 lr 0.001066 wd 0.0500 time 0.2592 (0.2586) data time 0.0007 (0.0019) model time 0.2585 (0.2566) loss 7.1010 (5.9556) grad_norm 2.0231 (inf) loss_scale 2048.0000 (3083.9268) mem 9655MB [2024-08-04 04:50:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][610/625] eta 0:00:03 lr 0.001066 wd 0.0500 time 0.2524 (0.2585) data time 0.0006 (0.0018) model time 0.2518 (0.2566) loss 7.1449 (5.9605) grad_norm 2.9368 (inf) loss_scale 2048.0000 (3066.9722) mem 9655MB [2024-08-04 04:50:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][620/625] eta 0:00:01 lr 0.001066 wd 0.0500 time 0.2537 (0.2585) data time 0.0006 (0.0018) model time 0.2530 (0.2565) loss 6.9758 (5.9697) grad_norm 1.4111 (inf) loss_scale 2048.0000 (3050.5636) mem 9655MB [2024-08-04 04:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 154 training takes 0:02:41 [2024-08-04 04:50:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:50:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:50:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.6787 (0.6787) Acc@1 87.500 (87.500) Acc@5 97.998 (97.998) Mem 9655MB [2024-08-04 04:50:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 1.0234 (0.8117) Acc@1 78.760 (84.051) Acc@5 94.971 (97.075) Mem 9655MB [2024-08-04 04:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1660 (0.9531) Acc@1 74.609 (80.448) Acc@5 93.506 (95.361) Mem 9655MB [2024-08-04 04:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.152 Acc@5 95.363 [2024-08-04 04:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-04 04:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.766 (0.766) Loss 0.5913 (0.5913) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9404 (0.7310) Acc@1 79.346 (85.294) Acc@5 95.410 (97.474) Mem 9655MB [2024-08-04 04:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0742 (0.8616) Acc@1 75.000 (81.745) Acc@5 93.994 (96.031) Mem 9655MB [2024-08-04 04:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.448 Acc@5 96.009 [2024-08-04 04:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 04:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.45% [2024-08-04 04:51:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:51:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][0/625] eta 0:07:36 lr 0.001066 wd 0.0500 time 0.7297 (0.7297) data time 0.4877 (0.4877) model time 0.0000 (0.0000) loss 5.9878 (5.9878) grad_norm 1.4670 (1.4670) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][10/625] eta 0:03:03 lr 0.001065 wd 0.0500 time 0.2522 (0.2978) data time 0.0009 (0.0452) model time 0.0000 (0.0000) loss 5.0813 (5.9375) grad_norm 1.5225 (1.7309) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][20/625] eta 0:02:48 lr 0.001065 wd 0.0500 time 0.2558 (0.2779) data time 0.0008 (0.0241) model time 0.0000 (0.0000) loss 7.3198 (5.8705) grad_norm 1.5040 (2.0494) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][30/625] eta 0:02:41 lr 0.001065 wd 0.0500 time 0.2538 (0.2708) data time 0.0007 (0.0166) model time 0.0000 (0.0000) loss 5.8294 (5.8788) grad_norm 1.4281 (2.0305) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][40/625] eta 0:02:36 lr 0.001065 wd 0.0500 time 0.2575 (0.2675) data time 0.0007 (0.0128) model time 0.0000 (0.0000) loss 4.3769 (5.9313) grad_norm 2.4248 (1.9830) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][50/625] eta 0:02:34 lr 0.001065 wd 0.0500 time 0.2534 (0.2686) data time 0.0018 (0.0105) model time 0.0000 (0.0000) loss 6.7483 (5.9792) grad_norm 2.3585 (1.9642) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][60/625] eta 0:02:30 lr 0.001064 wd 0.0500 time 0.2591 (0.2665) data time 0.0006 (0.0089) model time 0.2585 (0.2550) loss 4.2431 (6.0361) grad_norm 1.6601 (2.0119) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][70/625] eta 0:02:27 lr 0.001064 wd 0.0500 time 0.2586 (0.2652) data time 0.0006 (0.0078) model time 0.2580 (0.2555) loss 5.9778 (6.0483) grad_norm 1.3528 (2.0951) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][80/625] eta 0:02:23 lr 0.001064 wd 0.0500 time 0.2542 (0.2640) data time 0.0010 (0.0070) model time 0.2533 (0.2554) loss 5.2789 (6.0308) grad_norm 1.6380 (2.1195) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][90/625] eta 0:02:20 lr 0.001064 wd 0.0500 time 0.2557 (0.2631) data time 0.0006 (0.0063) model time 0.2552 (0.2551) loss 4.7886 (6.0221) grad_norm 2.5888 (2.1180) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][100/625] eta 0:02:17 lr 0.001064 wd 0.0500 time 0.2585 (0.2624) data time 0.0007 (0.0058) model time 0.2578 (0.2552) loss 6.6528 (6.0775) grad_norm 1.7344 (2.1006) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][110/625] eta 0:02:14 lr 0.001064 wd 0.0500 time 0.2530 (0.2618) data time 0.0009 (0.0054) model time 0.2521 (0.2550) loss 6.5789 (6.0968) grad_norm 1.7527 (2.0763) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][120/625] eta 0:02:11 lr 0.001063 wd 0.0500 time 0.2593 (0.2613) data time 0.0010 (0.0050) model time 0.2583 (0.2551) loss 5.5996 (6.0529) grad_norm 1.5450 (2.0428) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][130/625] eta 0:02:09 lr 0.001063 wd 0.0500 time 0.2572 (0.2611) data time 0.0009 (0.0047) model time 0.2563 (0.2553) loss 6.8181 (6.0367) grad_norm 1.3932 (2.0263) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][140/625] eta 0:02:06 lr 0.001063 wd 0.0500 time 0.2613 (0.2607) data time 0.0008 (0.0044) model time 0.2605 (0.2553) loss 7.5807 (6.0218) grad_norm 1.6227 (2.0787) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][150/625] eta 0:02:03 lr 0.001063 wd 0.0500 time 0.2536 (0.2604) data time 0.0010 (0.0042) model time 0.2526 (0.2552) loss 6.2425 (6.0190) grad_norm 3.1996 (2.0926) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][160/625] eta 0:02:00 lr 0.001063 wd 0.0500 time 0.2591 (0.2601) data time 0.0009 (0.0040) model time 0.2582 (0.2552) loss 6.0452 (6.0187) grad_norm 2.2290 (2.0938) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][170/625] eta 0:01:58 lr 0.001062 wd 0.0500 time 0.2532 (0.2599) data time 0.0008 (0.0038) model time 0.2525 (0.2553) loss 6.9908 (6.0166) grad_norm 2.6103 (2.0932) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][180/625] eta 0:01:55 lr 0.001062 wd 0.0500 time 0.2503 (0.2597) data time 0.0010 (0.0037) model time 0.2493 (0.2552) loss 6.5261 (6.0095) grad_norm 1.8348 (2.0566) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][190/625] eta 0:01:52 lr 0.001062 wd 0.0500 time 0.2500 (0.2594) data time 0.0009 (0.0035) model time 0.2491 (0.2552) loss 5.0619 (5.9966) grad_norm 3.4682 (2.0727) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][200/625] eta 0:01:50 lr 0.001062 wd 0.0500 time 0.2550 (0.2592) data time 0.0008 (0.0034) model time 0.2542 (0.2551) loss 7.2500 (6.0063) grad_norm 1.8282 (2.0712) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][210/625] eta 0:01:47 lr 0.001062 wd 0.0500 time 0.2598 (0.2591) data time 0.0008 (0.0033) model time 0.2590 (0.2551) loss 6.2077 (6.0093) grad_norm 1.8666 (2.0626) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][220/625] eta 0:01:44 lr 0.001062 wd 0.0500 time 0.2571 (0.2590) data time 0.0006 (0.0032) model time 0.2565 (0.2551) loss 4.8757 (5.9941) grad_norm 1.6025 (2.0735) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][230/625] eta 0:01:42 lr 0.001061 wd 0.0500 time 0.2543 (0.2588) data time 0.0009 (0.0031) model time 0.2535 (0.2551) loss 6.4923 (6.0005) grad_norm 2.3337 (2.0599) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][240/625] eta 0:01:39 lr 0.001061 wd 0.0500 time 0.2550 (0.2587) data time 0.0010 (0.0030) model time 0.2540 (0.2550) loss 5.8139 (5.9998) grad_norm 1.5718 (2.0414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][250/625] eta 0:01:36 lr 0.001061 wd 0.0500 time 0.2557 (0.2586) data time 0.0009 (0.0029) model time 0.2548 (0.2551) loss 6.1502 (6.0003) grad_norm 1.4755 (2.0433) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][260/625] eta 0:01:34 lr 0.001061 wd 0.0500 time 0.2618 (0.2585) data time 0.0008 (0.0028) model time 0.2609 (0.2551) loss 6.1197 (5.9986) grad_norm 4.5499 (2.0747) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][270/625] eta 0:01:31 lr 0.001061 wd 0.0500 time 0.2542 (0.2584) data time 0.0008 (0.0027) model time 0.2534 (0.2551) loss 6.6391 (5.9909) grad_norm 2.0596 (2.0763) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][280/625] eta 0:01:29 lr 0.001061 wd 0.0500 time 0.2545 (0.2584) data time 0.0011 (0.0027) model time 0.2534 (0.2552) loss 7.3065 (6.0097) grad_norm 1.9511 (2.0656) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][290/625] eta 0:01:26 lr 0.001060 wd 0.0500 time 0.2507 (0.2583) data time 0.0009 (0.0026) model time 0.2498 (0.2551) loss 5.4595 (6.0198) grad_norm 1.3547 (2.0455) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][300/625] eta 0:01:23 lr 0.001060 wd 0.0500 time 0.2560 (0.2582) data time 0.0015 (0.0026) model time 0.2545 (0.2551) loss 5.7996 (6.0167) grad_norm 1.6791 (2.0471) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][310/625] eta 0:01:21 lr 0.001060 wd 0.0500 time 0.2581 (0.2581) data time 0.0008 (0.0025) model time 0.2573 (0.2551) loss 6.2802 (6.0256) grad_norm 2.7709 (2.0505) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][320/625] eta 0:01:18 lr 0.001060 wd 0.0500 time 0.2544 (0.2580) data time 0.0007 (0.0025) model time 0.2536 (0.2551) loss 6.4646 (6.0265) grad_norm 2.4726 (2.0451) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][330/625] eta 0:01:16 lr 0.001060 wd 0.0500 time 0.4556 (0.2586) data time 0.0010 (0.0024) model time 0.4546 (0.2558) loss 5.9379 (6.0184) grad_norm 1.3799 (2.0405) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][340/625] eta 0:01:13 lr 0.001059 wd 0.0500 time 0.2586 (0.2586) data time 0.0005 (0.0024) model time 0.2581 (0.2559) loss 7.5134 (6.0327) grad_norm 1.1617 (2.0235) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][350/625] eta 0:01:11 lr 0.001059 wd 0.0500 time 0.2561 (0.2592) data time 0.0008 (0.0023) model time 0.2553 (0.2566) loss 5.0493 (6.0327) grad_norm 2.1022 (2.0196) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][360/625] eta 0:01:08 lr 0.001059 wd 0.0500 time 0.2561 (0.2591) data time 0.0011 (0.0023) model time 0.2551 (0.2566) loss 6.9233 (6.0332) grad_norm 1.6102 (2.0202) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][370/625] eta 0:01:06 lr 0.001059 wd 0.0500 time 0.2546 (0.2590) data time 0.0012 (0.0023) model time 0.2534 (0.2565) loss 4.7831 (6.0380) grad_norm 1.9885 (2.0077) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][380/625] eta 0:01:03 lr 0.001059 wd 0.0500 time 0.2593 (0.2589) data time 0.0006 (0.0022) model time 0.2587 (0.2565) loss 6.8098 (6.0511) grad_norm 2.1645 (2.0061) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][390/625] eta 0:01:00 lr 0.001059 wd 0.0500 time 0.2576 (0.2588) data time 0.0008 (0.0022) model time 0.2568 (0.2565) loss 5.5517 (6.0516) grad_norm 2.3715 (2.0081) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][400/625] eta 0:00:58 lr 0.001058 wd 0.0500 time 0.2539 (0.2588) data time 0.0010 (0.0022) model time 0.2529 (0.2564) loss 6.2635 (6.0660) grad_norm 2.4989 (2.0061) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][410/625] eta 0:00:55 lr 0.001058 wd 0.0500 time 0.4703 (0.2592) data time 0.0007 (0.0021) model time 0.4696 (0.2570) loss 6.2067 (6.0641) grad_norm 2.1666 (1.9989) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][420/625] eta 0:00:53 lr 0.001058 wd 0.0500 time 0.2638 (0.2591) data time 0.0009 (0.0021) model time 0.2630 (0.2569) loss 6.8716 (6.0610) grad_norm 2.3743 (2.0032) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][430/625] eta 0:00:50 lr 0.001058 wd 0.0500 time 0.2570 (0.2591) data time 0.0008 (0.0021) model time 0.2562 (0.2569) loss 7.0971 (6.0672) grad_norm 1.5387 (1.9999) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][440/625] eta 0:00:47 lr 0.001058 wd 0.0500 time 0.2612 (0.2590) data time 0.0006 (0.0020) model time 0.2606 (0.2569) loss 5.7935 (6.0657) grad_norm 1.4880 (1.9939) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][450/625] eta 0:00:45 lr 0.001058 wd 0.0500 time 0.2528 (0.2590) data time 0.0016 (0.0020) model time 0.2512 (0.2568) loss 6.4734 (6.0674) grad_norm 2.4803 (2.0090) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][460/625] eta 0:00:42 lr 0.001057 wd 0.0500 time 0.2525 (0.2589) data time 0.0007 (0.0020) model time 0.2518 (0.2568) loss 6.0252 (6.0638) grad_norm 1.6578 (2.0134) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][470/625] eta 0:00:40 lr 0.001057 wd 0.0500 time 0.2512 (0.2588) data time 0.0008 (0.0020) model time 0.2504 (0.2567) loss 6.2563 (6.0648) grad_norm 2.3434 (2.0111) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][480/625] eta 0:00:37 lr 0.001057 wd 0.0500 time 0.2553 (0.2588) data time 0.0007 (0.0020) model time 0.2546 (0.2567) loss 6.3827 (6.0624) grad_norm 1.9735 (2.0189) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][490/625] eta 0:00:34 lr 0.001057 wd 0.0500 time 0.2564 (0.2588) data time 0.0010 (0.0019) model time 0.2554 (0.2567) loss 5.3231 (6.0634) grad_norm 1.3762 (2.0203) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][500/625] eta 0:00:32 lr 0.001057 wd 0.0500 time 0.2605 (0.2588) data time 0.0007 (0.0019) model time 0.2598 (0.2567) loss 6.6546 (6.0678) grad_norm 3.2542 (2.0177) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][510/625] eta 0:00:29 lr 0.001056 wd 0.0500 time 0.2604 (0.2587) data time 0.0006 (0.0019) model time 0.2598 (0.2567) loss 5.5756 (6.0699) grad_norm 3.1781 (2.0202) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][520/625] eta 0:00:27 lr 0.001056 wd 0.0500 time 0.2564 (0.2586) data time 0.0009 (0.0019) model time 0.2555 (0.2566) loss 7.0702 (6.0830) grad_norm 1.6039 (2.0155) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][530/625] eta 0:00:24 lr 0.001056 wd 0.0500 time 0.2518 (0.2586) data time 0.0009 (0.0019) model time 0.2508 (0.2566) loss 5.4614 (6.0878) grad_norm 1.8788 (2.0143) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][540/625] eta 0:00:21 lr 0.001056 wd 0.0500 time 0.2590 (0.2586) data time 0.0006 (0.0018) model time 0.2585 (0.2566) loss 6.1766 (6.0822) grad_norm 1.8339 (2.0075) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][550/625] eta 0:00:19 lr 0.001056 wd 0.0500 time 0.2554 (0.2585) data time 0.0008 (0.0018) model time 0.2546 (0.2565) loss 5.0285 (6.0758) grad_norm 1.7493 (2.0056) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][560/625] eta 0:00:16 lr 0.001056 wd 0.0500 time 0.2552 (0.2585) data time 0.0009 (0.0018) model time 0.2543 (0.2565) loss 6.0517 (6.0739) grad_norm 1.7819 (1.9996) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][570/625] eta 0:00:14 lr 0.001055 wd 0.0500 time 0.2554 (0.2584) data time 0.0009 (0.0018) model time 0.2545 (0.2565) loss 5.4091 (6.0740) grad_norm 1.5983 (1.9961) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][580/625] eta 0:00:11 lr 0.001055 wd 0.0500 time 0.2592 (0.2584) data time 0.0006 (0.0018) model time 0.2586 (0.2565) loss 6.0532 (6.0659) grad_norm 2.4780 (1.9971) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][590/625] eta 0:00:09 lr 0.001055 wd 0.0500 time 0.2568 (0.2583) data time 0.0012 (0.0018) model time 0.2557 (0.2564) loss 6.9203 (6.0643) grad_norm 2.0468 (1.9965) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][600/625] eta 0:00:06 lr 0.001055 wd 0.0500 time 0.2576 (0.2583) data time 0.0007 (0.0017) model time 0.2569 (0.2564) loss 5.3129 (6.0625) grad_norm 1.6640 (1.9963) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][610/625] eta 0:00:03 lr 0.001055 wd 0.0500 time 0.2523 (0.2583) data time 0.0005 (0.0017) model time 0.2518 (0.2564) loss 6.0682 (6.0508) grad_norm 3.3588 (1.9964) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][620/625] eta 0:00:01 lr 0.001055 wd 0.0500 time 0.2534 (0.2582) data time 0.0003 (0.0017) model time 0.2531 (0.2563) loss 5.5206 (6.0499) grad_norm 1.2018 (1.9959) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 155 training takes 0:02:41 [2024-08-04 04:53:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:53:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:53:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.6724 (0.6724) Acc@1 87.842 (87.842) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0684 (0.8164) Acc@1 78.174 (84.295) Acc@5 94.482 (97.017) Mem 9655MB [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1436 (0.9517) Acc@1 75.195 (80.636) Acc@5 93.750 (95.475) Mem 9655MB [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.450 Acc@5 95.479 [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.45% [2024-08-04 04:53:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 04:53:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 04:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5918 (0.5918) Acc@1 89.062 (89.062) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:53:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9399 (0.7310) Acc@1 79.395 (85.316) Acc@5 95.361 (97.479) Mem 9655MB [2024-08-04 04:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0732 (0.8612) Acc@1 75.049 (81.775) Acc@5 93.994 (96.033) Mem 9655MB [2024-08-04 04:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.484 Acc@5 96.009 [2024-08-04 04:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-04 04:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.48% [2024-08-04 04:53:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:53:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:53:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][0/625] eta 0:06:47 lr 0.001054 wd 0.0500 time 0.6519 (0.6519) data time 0.4083 (0.4083) model time 0.0000 (0.0000) loss 6.1929 (6.1929) grad_norm 2.6192 (2.6192) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][10/625] eta 0:02:59 lr 0.001054 wd 0.0500 time 0.2562 (0.2924) data time 0.0013 (0.0380) model time 0.0000 (0.0000) loss 6.1104 (5.9543) grad_norm 2.3231 (2.4475) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][20/625] eta 0:02:46 lr 0.001054 wd 0.0500 time 0.2563 (0.2749) data time 0.0007 (0.0205) model time 0.0000 (0.0000) loss 5.2417 (5.9560) grad_norm 2.1221 (2.2546) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:53:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][30/625] eta 0:02:40 lr 0.001054 wd 0.0500 time 0.2549 (0.2689) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 6.4250 (5.9609) grad_norm 1.7484 (2.2759) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][40/625] eta 0:02:37 lr 0.001054 wd 0.0500 time 0.2532 (0.2698) data time 0.0007 (0.0109) model time 0.0000 (0.0000) loss 6.5236 (5.9698) grad_norm 2.2359 (2.1591) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][50/625] eta 0:02:33 lr 0.001054 wd 0.0500 time 0.2643 (0.2672) data time 0.0011 (0.0090) model time 0.0000 (0.0000) loss 5.6341 (5.9352) grad_norm 2.6525 (2.0984) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][60/625] eta 0:02:29 lr 0.001053 wd 0.0500 time 0.2562 (0.2652) data time 0.0007 (0.0076) model time 0.2554 (0.2538) loss 5.4434 (5.8620) grad_norm 1.3234 (2.0547) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][70/625] eta 0:02:26 lr 0.001053 wd 0.0500 time 0.2561 (0.2640) data time 0.0010 (0.0067) model time 0.2552 (0.2549) loss 6.0973 (5.8740) grad_norm 1.3646 (2.0074) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][80/625] eta 0:02:23 lr 0.001053 wd 0.0500 time 0.2529 (0.2629) data time 0.0009 (0.0060) model time 0.2521 (0.2546) loss 6.5247 (5.9393) grad_norm 1.8113 (1.9582) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][90/625] eta 0:02:20 lr 0.001053 wd 0.0500 time 0.2558 (0.2621) data time 0.0006 (0.0054) model time 0.2552 (0.2546) loss 4.7638 (5.9373) grad_norm 1.9966 (1.9398) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][100/625] eta 0:02:17 lr 0.001053 wd 0.0500 time 0.2631 (0.2615) data time 0.0006 (0.0050) model time 0.2625 (0.2548) loss 5.4699 (5.9273) grad_norm 2.2921 (1.9265) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][110/625] eta 0:02:14 lr 0.001052 wd 0.0500 time 0.2566 (0.2610) data time 0.0006 (0.0046) model time 0.2560 (0.2548) loss 6.1089 (5.8897) grad_norm 1.1471 (1.9059) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][120/625] eta 0:02:11 lr 0.001052 wd 0.0500 time 0.2557 (0.2606) data time 0.0008 (0.0043) model time 0.2549 (0.2548) loss 5.8026 (5.9155) grad_norm 2.9437 (1.9179) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][130/625] eta 0:02:08 lr 0.001052 wd 0.0500 time 0.2572 (0.2603) data time 0.0010 (0.0041) model time 0.2563 (0.2549) loss 6.9318 (5.9237) grad_norm 1.8198 (1.9453) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][140/625] eta 0:02:06 lr 0.001052 wd 0.0500 time 0.2572 (0.2600) data time 0.0006 (0.0038) model time 0.2566 (0.2549) loss 4.9006 (5.9313) grad_norm 1.3921 (1.9392) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][150/625] eta 0:02:03 lr 0.001052 wd 0.0500 time 0.2580 (0.2597) data time 0.0007 (0.0036) model time 0.2573 (0.2550) loss 5.4306 (5.9321) grad_norm 1.7910 (1.9251) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][160/625] eta 0:02:00 lr 0.001052 wd 0.0500 time 0.2563 (0.2595) data time 0.0007 (0.0035) model time 0.2556 (0.2550) loss 6.2987 (5.9391) grad_norm 1.4578 (1.9146) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][170/625] eta 0:01:58 lr 0.001051 wd 0.0500 time 0.2579 (0.2594) data time 0.0008 (0.0033) model time 0.2571 (0.2551) loss 6.5790 (5.9388) grad_norm 3.5133 (1.9165) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][180/625] eta 0:01:55 lr 0.001051 wd 0.0500 time 0.2556 (0.2592) data time 0.0008 (0.0032) model time 0.2547 (0.2551) loss 4.9037 (5.9334) grad_norm 2.2727 (1.9178) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][190/625] eta 0:01:52 lr 0.001051 wd 0.0500 time 0.2547 (0.2590) data time 0.0007 (0.0031) model time 0.2540 (0.2551) loss 6.4832 (5.9408) grad_norm 1.3277 (1.9248) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][200/625] eta 0:01:50 lr 0.001051 wd 0.0500 time 0.2574 (0.2590) data time 0.0008 (0.0030) model time 0.2566 (0.2552) loss 6.1008 (5.9599) grad_norm 1.6878 (1.9188) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][210/625] eta 0:01:47 lr 0.001051 wd 0.0500 time 0.2588 (0.2588) data time 0.0011 (0.0029) model time 0.2578 (0.2552) loss 4.9877 (5.9495) grad_norm 2.7125 (1.9130) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][220/625] eta 0:01:44 lr 0.001051 wd 0.0500 time 0.2561 (0.2587) data time 0.0009 (0.0028) model time 0.2552 (0.2552) loss 6.7667 (5.9507) grad_norm 2.2273 (1.9035) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][230/625] eta 0:01:42 lr 0.001050 wd 0.0500 time 0.2529 (0.2585) data time 0.0010 (0.0027) model time 0.2519 (0.2551) loss 6.5508 (5.9613) grad_norm 2.7457 (1.9199) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][240/625] eta 0:01:39 lr 0.001050 wd 0.0500 time 0.2597 (0.2584) data time 0.0006 (0.0026) model time 0.2591 (0.2552) loss 6.4025 (5.9670) grad_norm 1.9441 (1.9516) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][250/625] eta 0:01:36 lr 0.001050 wd 0.0500 time 0.2611 (0.2584) data time 0.0008 (0.0026) model time 0.2603 (0.2552) loss 5.5255 (5.9629) grad_norm 2.0887 (1.9557) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][260/625] eta 0:01:34 lr 0.001050 wd 0.0500 time 0.2591 (0.2583) data time 0.0008 (0.0025) model time 0.2583 (0.2553) loss 6.8671 (5.9575) grad_norm 1.8634 (1.9834) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][270/625] eta 0:01:31 lr 0.001050 wd 0.0500 time 0.2515 (0.2583) data time 0.0008 (0.0024) model time 0.2506 (0.2553) loss 6.5566 (5.9670) grad_norm 2.7384 (1.9956) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][280/625] eta 0:01:29 lr 0.001049 wd 0.0500 time 0.2566 (0.2582) data time 0.0006 (0.0024) model time 0.2560 (0.2553) loss 5.2868 (5.9702) grad_norm 1.5269 (1.9904) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][290/625] eta 0:01:26 lr 0.001049 wd 0.0500 time 0.2541 (0.2581) data time 0.0009 (0.0023) model time 0.2531 (0.2553) loss 5.5790 (5.9724) grad_norm 1.4266 (1.9913) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][300/625] eta 0:01:23 lr 0.001049 wd 0.0500 time 0.2556 (0.2580) data time 0.0009 (0.0023) model time 0.2548 (0.2552) loss 5.2466 (5.9766) grad_norm 2.0583 (1.9876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][310/625] eta 0:01:21 lr 0.001049 wd 0.0500 time 0.2536 (0.2579) data time 0.0008 (0.0022) model time 0.2528 (0.2552) loss 6.2145 (5.9784) grad_norm 1.6674 (1.9805) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][320/625] eta 0:01:18 lr 0.001049 wd 0.0500 time 0.2573 (0.2578) data time 0.0006 (0.0022) model time 0.2567 (0.2552) loss 6.4875 (5.9763) grad_norm 2.0581 (1.9817) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][330/625] eta 0:01:16 lr 0.001049 wd 0.0500 time 0.2577 (0.2578) data time 0.0010 (0.0022) model time 0.2566 (0.2552) loss 6.5781 (5.9742) grad_norm 1.9816 (1.9789) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][340/625] eta 0:01:13 lr 0.001048 wd 0.0500 time 0.2656 (0.2578) data time 0.0007 (0.0021) model time 0.2649 (0.2552) loss 5.0993 (5.9723) grad_norm 3.3302 (1.9951) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][350/625] eta 0:01:10 lr 0.001048 wd 0.0500 time 0.2522 (0.2578) data time 0.0009 (0.0021) model time 0.2513 (0.2553) loss 5.1972 (5.9585) grad_norm 2.5030 (1.9957) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][360/625] eta 0:01:08 lr 0.001048 wd 0.0500 time 0.2566 (0.2578) data time 0.0006 (0.0020) model time 0.2560 (0.2554) loss 6.3046 (5.9699) grad_norm 1.4812 (1.9947) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][370/625] eta 0:01:05 lr 0.001048 wd 0.0500 time 0.2525 (0.2578) data time 0.0009 (0.0020) model time 0.2517 (0.2554) loss 6.8043 (5.9720) grad_norm 2.8741 (1.9960) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][380/625] eta 0:01:03 lr 0.001048 wd 0.0500 time 0.2559 (0.2577) data time 0.0006 (0.0020) model time 0.2553 (0.2554) loss 6.7320 (5.9766) grad_norm 1.8385 (2.0084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][390/625] eta 0:01:00 lr 0.001047 wd 0.0500 time 0.2593 (0.2577) data time 0.0006 (0.0020) model time 0.2587 (0.2554) loss 5.6767 (5.9688) grad_norm 2.0090 (2.0043) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][400/625] eta 0:00:58 lr 0.001047 wd 0.0500 time 0.2536 (0.2580) data time 0.0008 (0.0019) model time 0.2528 (0.2558) loss 6.3724 (5.9826) grad_norm 4.0510 (2.0103) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][410/625] eta 0:00:55 lr 0.001047 wd 0.0500 time 0.2546 (0.2580) data time 0.0011 (0.0019) model time 0.2535 (0.2558) loss 6.5860 (5.9826) grad_norm 1.9380 (2.0065) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][420/625] eta 0:00:52 lr 0.001047 wd 0.0500 time 0.2636 (0.2580) data time 0.0007 (0.0019) model time 0.2629 (0.2558) loss 6.9108 (5.9801) grad_norm 2.7416 (2.0242) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][430/625] eta 0:00:50 lr 0.001047 wd 0.0500 time 0.2610 (0.2580) data time 0.0006 (0.0019) model time 0.2604 (0.2559) loss 6.7913 (5.9880) grad_norm 1.3468 (2.0396) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][440/625] eta 0:00:47 lr 0.001047 wd 0.0500 time 0.2528 (0.2580) data time 0.0011 (0.0018) model time 0.2517 (0.2559) loss 5.9705 (5.9966) grad_norm 1.4515 (2.0359) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][450/625] eta 0:00:45 lr 0.001046 wd 0.0500 time 0.2555 (0.2580) data time 0.0010 (0.0018) model time 0.2545 (0.2559) loss 6.6370 (5.9999) grad_norm 2.9800 (2.0532) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][460/625] eta 0:00:42 lr 0.001046 wd 0.0500 time 0.2583 (0.2582) data time 0.0010 (0.0018) model time 0.2574 (0.2562) loss 6.2833 (6.0047) grad_norm 2.9887 (2.0680) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][470/625] eta 0:00:40 lr 0.001046 wd 0.0500 time 0.2579 (0.2582) data time 0.0007 (0.0018) model time 0.2573 (0.2562) loss 7.2177 (6.0135) grad_norm 3.7577 (2.0663) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][480/625] eta 0:00:37 lr 0.001046 wd 0.0500 time 0.2558 (0.2581) data time 0.0007 (0.0018) model time 0.2551 (0.2561) loss 6.2360 (6.0161) grad_norm 1.8806 (2.0615) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][490/625] eta 0:00:34 lr 0.001046 wd 0.0500 time 0.2556 (0.2581) data time 0.0008 (0.0017) model time 0.2547 (0.2562) loss 6.9880 (6.0136) grad_norm 1.3881 (2.0619) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][500/625] eta 0:00:32 lr 0.001046 wd 0.0500 time 0.2590 (0.2581) data time 0.0009 (0.0017) model time 0.2581 (0.2561) loss 5.2398 (6.0177) grad_norm 2.0531 (2.0548) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][510/625] eta 0:00:29 lr 0.001045 wd 0.0500 time 0.2551 (0.2580) data time 0.0006 (0.0017) model time 0.2546 (0.2561) loss 5.6535 (6.0197) grad_norm 1.5904 (2.0510) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][520/625] eta 0:00:27 lr 0.001045 wd 0.0500 time 0.2562 (0.2580) data time 0.0009 (0.0017) model time 0.2553 (0.2561) loss 5.3397 (6.0193) grad_norm 1.7360 (2.0490) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][530/625] eta 0:00:24 lr 0.001045 wd 0.0500 time 0.2580 (0.2580) data time 0.0009 (0.0017) model time 0.2571 (0.2561) loss 6.5729 (6.0204) grad_norm 2.9721 (2.0524) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][540/625] eta 0:00:21 lr 0.001045 wd 0.0500 time 0.2548 (0.2579) data time 0.0008 (0.0017) model time 0.2540 (0.2561) loss 6.9302 (6.0208) grad_norm 1.5432 (2.0467) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][550/625] eta 0:00:19 lr 0.001045 wd 0.0500 time 0.2580 (0.2579) data time 0.0008 (0.0017) model time 0.2572 (0.2560) loss 6.9346 (6.0222) grad_norm 1.3168 (2.0377) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][560/625] eta 0:00:16 lr 0.001044 wd 0.0500 time 0.2552 (0.2579) data time 0.0008 (0.0016) model time 0.2544 (0.2560) loss 7.0844 (6.0271) grad_norm 1.7782 (2.0368) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][570/625] eta 0:00:14 lr 0.001044 wd 0.0500 time 0.2635 (0.2579) data time 0.0008 (0.0016) model time 0.2627 (0.2560) loss 5.6919 (6.0261) grad_norm 1.4823 (2.0326) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][580/625] eta 0:00:11 lr 0.001044 wd 0.0500 time 0.2663 (0.2579) data time 0.0008 (0.0016) model time 0.2655 (0.2560) loss 6.0450 (6.0329) grad_norm 2.4762 (2.0252) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][590/625] eta 0:00:09 lr 0.001044 wd 0.0500 time 0.2597 (0.2579) data time 0.0005 (0.0016) model time 0.2591 (0.2561) loss 4.9329 (6.0231) grad_norm 2.1217 (2.0250) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][600/625] eta 0:00:06 lr 0.001044 wd 0.0500 time 0.2557 (0.2578) data time 0.0011 (0.0016) model time 0.2547 (0.2560) loss 6.4418 (6.0204) grad_norm 1.5019 (2.0272) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][610/625] eta 0:00:03 lr 0.001044 wd 0.0500 time 0.2505 (0.2578) data time 0.0003 (0.0016) model time 0.2501 (0.2560) loss 4.6663 (6.0136) grad_norm 1.5395 (2.0236) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][620/625] eta 0:00:01 lr 0.001043 wd 0.0500 time 0.2537 (0.2577) data time 0.0004 (0.0016) model time 0.2532 (0.2559) loss 5.2585 (6.0101) grad_norm 1.5803 (2.0181) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 156 training takes 0:02:41 [2024-08-04 04:56:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:56:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.6685 (0.6685) Acc@1 87.402 (87.402) Acc@5 98.096 (98.096) Mem 9655MB [2024-08-04 04:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0820 (0.8129) Acc@1 76.807 (84.326) Acc@5 94.727 (96.959) Mem 9655MB [2024-08-04 04:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1807 (0.9505) Acc@1 73.779 (80.578) Acc@5 93.506 (95.419) Mem 9655MB [2024-08-04 04:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.268 Acc@5 95.433 [2024-08-04 04:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 04:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.790 (0.790) Loss 0.5923 (0.5923) Acc@1 89.160 (89.160) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.129) Loss 0.9395 (0.7309) Acc@1 79.443 (85.338) Acc@5 95.410 (97.470) Mem 9655MB [2024-08-04 04:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0742 (0.8610) Acc@1 75.146 (81.801) Acc@5 93.945 (96.031) Mem 9655MB [2024-08-04 04:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.506 Acc@5 96.013 [2024-08-04 04:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-04 04:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.51% [2024-08-04 04:56:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:56:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][0/625] eta 0:07:31 lr 0.001043 wd 0.0500 time 0.7230 (0.7230) data time 0.4834 (0.4834) model time 0.0000 (0.0000) loss 6.3963 (6.3963) grad_norm 1.8184 (1.8184) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][10/625] eta 0:03:15 lr 0.001043 wd 0.0500 time 0.2551 (0.3178) data time 0.0006 (0.0447) model time 0.0000 (0.0000) loss 6.0163 (5.9837) grad_norm 2.9200 (1.7390) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][20/625] eta 0:02:54 lr 0.001043 wd 0.0500 time 0.2601 (0.2882) data time 0.0008 (0.0238) model time 0.0000 (0.0000) loss 6.0227 (6.0798) grad_norm 1.4349 (1.9608) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][30/625] eta 0:02:48 lr 0.001043 wd 0.0500 time 0.2588 (0.2832) data time 0.0009 (0.0164) model time 0.0000 (0.0000) loss 5.5424 (6.0868) grad_norm 2.7238 (1.9235) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][40/625] eta 0:02:41 lr 0.001043 wd 0.0500 time 0.2541 (0.2765) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 6.4951 (6.0876) grad_norm 2.2160 (1.9696) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][50/625] eta 0:02:36 lr 0.001042 wd 0.0500 time 0.2579 (0.2727) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 4.4698 (5.9684) grad_norm 1.6498 (1.9554) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][60/625] eta 0:02:32 lr 0.001042 wd 0.0500 time 0.2560 (0.2701) data time 0.0006 (0.0089) model time 0.2554 (0.2559) loss 4.7946 (5.9239) grad_norm 2.4792 (1.9446) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][70/625] eta 0:02:28 lr 0.001042 wd 0.0500 time 0.2558 (0.2681) data time 0.0006 (0.0077) model time 0.2552 (0.2552) loss 5.6713 (5.9250) grad_norm 2.2120 (1.9876) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:56:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][80/625] eta 0:02:25 lr 0.001042 wd 0.0500 time 0.2513 (0.2666) data time 0.0012 (0.0069) model time 0.2501 (0.2552) loss 6.6565 (5.9513) grad_norm 2.2936 (2.0422) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][90/625] eta 0:02:22 lr 0.001042 wd 0.0500 time 0.2669 (0.2657) data time 0.0011 (0.0063) model time 0.2658 (0.2556) loss 5.2729 (5.9003) grad_norm 1.6914 (2.0218) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][100/625] eta 0:02:19 lr 0.001042 wd 0.0500 time 0.2585 (0.2648) data time 0.0008 (0.0058) model time 0.2577 (0.2556) loss 5.7885 (5.9127) grad_norm 2.6622 (2.0673) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][110/625] eta 0:02:15 lr 0.001041 wd 0.0500 time 0.2577 (0.2640) data time 0.0008 (0.0053) model time 0.2569 (0.2555) loss 7.1474 (5.9248) grad_norm 2.2183 (2.1110) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][120/625] eta 0:02:12 lr 0.001041 wd 0.0500 time 0.2589 (0.2633) data time 0.0008 (0.0050) model time 0.2581 (0.2555) loss 4.9880 (5.9232) grad_norm 1.4411 (2.0942) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][130/625] eta 0:02:10 lr 0.001041 wd 0.0500 time 0.2597 (0.2629) data time 0.0006 (0.0047) model time 0.2591 (0.2556) loss 4.8486 (5.8978) grad_norm 2.0359 (2.0815) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][140/625] eta 0:02:07 lr 0.001041 wd 0.0500 time 0.2508 (0.2623) data time 0.0011 (0.0044) model time 0.2497 (0.2554) loss 5.9148 (5.9014) grad_norm 1.9574 (2.0724) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][150/625] eta 0:02:04 lr 0.001041 wd 0.0500 time 0.2573 (0.2619) data time 0.0006 (0.0042) model time 0.2566 (0.2554) loss 5.6877 (5.9337) grad_norm 1.4712 (2.0416) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][160/625] eta 0:02:01 lr 0.001040 wd 0.0500 time 0.2587 (0.2615) data time 0.0008 (0.0040) model time 0.2579 (0.2553) loss 4.8813 (5.9202) grad_norm 2.4316 (2.0276) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][170/625] eta 0:01:58 lr 0.001040 wd 0.0500 time 0.2526 (0.2611) data time 0.0010 (0.0038) model time 0.2516 (0.2552) loss 6.4190 (5.9244) grad_norm 1.6399 (2.0210) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][180/625] eta 0:01:56 lr 0.001040 wd 0.0500 time 0.2580 (0.2609) data time 0.0008 (0.0036) model time 0.2572 (0.2553) loss 5.7103 (5.9395) grad_norm 1.7875 (2.0305) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][190/625] eta 0:01:53 lr 0.001040 wd 0.0500 time 0.2577 (0.2606) data time 0.0006 (0.0035) model time 0.2571 (0.2553) loss 4.8524 (5.9326) grad_norm 1.4497 (2.0190) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][200/625] eta 0:01:50 lr 0.001040 wd 0.0500 time 0.2541 (0.2604) data time 0.0008 (0.0034) model time 0.2533 (0.2552) loss 5.6928 (5.9318) grad_norm 3.0783 (2.0073) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][210/625] eta 0:01:47 lr 0.001040 wd 0.0500 time 0.2519 (0.2602) data time 0.0007 (0.0032) model time 0.2511 (0.2552) loss 5.4796 (5.9376) grad_norm 1.3974 (2.0179) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][220/625] eta 0:01:45 lr 0.001039 wd 0.0500 time 0.2612 (0.2600) data time 0.0008 (0.0031) model time 0.2605 (0.2552) loss 5.6691 (5.9450) grad_norm 2.5378 (2.0360) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][230/625] eta 0:01:42 lr 0.001039 wd 0.0500 time 0.2596 (0.2598) data time 0.0006 (0.0030) model time 0.2590 (0.2552) loss 6.4717 (5.9646) grad_norm 1.4592 (2.0266) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][240/625] eta 0:01:39 lr 0.001039 wd 0.0500 time 0.2508 (0.2597) data time 0.0007 (0.0030) model time 0.2501 (0.2552) loss 4.8131 (5.9565) grad_norm 3.6425 (2.0306) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][250/625] eta 0:01:37 lr 0.001039 wd 0.0500 time 0.2490 (0.2595) data time 0.0009 (0.0029) model time 0.2481 (0.2551) loss 5.2013 (5.9571) grad_norm 1.2956 (2.0260) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][260/625] eta 0:01:34 lr 0.001039 wd 0.0500 time 0.2602 (0.2594) data time 0.0009 (0.0028) model time 0.2592 (0.2552) loss 5.8537 (5.9526) grad_norm 2.9226 (2.0321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][270/625] eta 0:01:32 lr 0.001039 wd 0.0500 time 0.2546 (0.2593) data time 0.0007 (0.0027) model time 0.2539 (0.2552) loss 5.1536 (5.9468) grad_norm 1.3525 (2.0245) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][280/625] eta 0:01:29 lr 0.001038 wd 0.0500 time 0.2574 (0.2591) data time 0.0007 (0.0027) model time 0.2567 (0.2552) loss 7.6034 (5.9576) grad_norm 2.8150 (2.0200) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][290/625] eta 0:01:26 lr 0.001038 wd 0.0500 time 0.2579 (0.2591) data time 0.0014 (0.0026) model time 0.2564 (0.2552) loss 6.0604 (5.9669) grad_norm 2.0977 (2.0297) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][300/625] eta 0:01:24 lr 0.001038 wd 0.0500 time 0.2559 (0.2590) data time 0.0006 (0.0025) model time 0.2553 (0.2552) loss 6.7903 (5.9622) grad_norm 1.9671 (2.0255) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][310/625] eta 0:01:21 lr 0.001038 wd 0.0500 time 0.2533 (0.2589) data time 0.0011 (0.0025) model time 0.2522 (0.2552) loss 5.2645 (5.9481) grad_norm 2.9643 (2.0241) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][320/625] eta 0:01:18 lr 0.001038 wd 0.0500 time 0.2545 (0.2588) data time 0.0010 (0.0024) model time 0.2534 (0.2552) loss 6.2032 (5.9557) grad_norm 4.4421 (2.0440) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][330/625] eta 0:01:16 lr 0.001037 wd 0.0500 time 0.2582 (0.2587) data time 0.0008 (0.0024) model time 0.2574 (0.2552) loss 6.2960 (5.9594) grad_norm 1.4835 (2.0685) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][340/625] eta 0:01:13 lr 0.001037 wd 0.0500 time 0.2561 (0.2586) data time 0.0010 (0.0023) model time 0.2551 (0.2552) loss 5.2818 (5.9597) grad_norm 1.2691 (2.0728) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][350/625] eta 0:01:11 lr 0.001037 wd 0.0500 time 0.2610 (0.2592) data time 0.0007 (0.0023) model time 0.2604 (0.2560) loss 5.1242 (5.9544) grad_norm 1.9782 (2.0611) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][360/625] eta 0:01:08 lr 0.001037 wd 0.0500 time 0.2610 (0.2592) data time 0.0012 (0.0023) model time 0.2598 (0.2560) loss 6.7093 (5.9575) grad_norm 1.9598 (2.0614) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][370/625] eta 0:01:06 lr 0.001037 wd 0.0500 time 0.2569 (0.2591) data time 0.0008 (0.0022) model time 0.2560 (0.2560) loss 6.2572 (5.9617) grad_norm 2.2801 (2.0638) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][380/625] eta 0:01:03 lr 0.001037 wd 0.0500 time 0.2554 (0.2590) data time 0.0007 (0.0022) model time 0.2548 (0.2560) loss 5.8480 (5.9701) grad_norm 1.7965 (2.0687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][390/625] eta 0:01:00 lr 0.001036 wd 0.0500 time 0.2571 (0.2590) data time 0.0008 (0.0022) model time 0.2563 (0.2560) loss 5.2826 (5.9610) grad_norm 2.9768 (2.0660) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][400/625] eta 0:00:58 lr 0.001036 wd 0.0500 time 0.2693 (0.2589) data time 0.0007 (0.0021) model time 0.2686 (0.2560) loss 4.9510 (5.9632) grad_norm 1.2899 (2.0650) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][410/625] eta 0:00:55 lr 0.001036 wd 0.0500 time 0.2547 (0.2588) data time 0.0008 (0.0021) model time 0.2539 (0.2559) loss 6.0718 (5.9664) grad_norm 2.0133 (2.0541) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][420/625] eta 0:00:53 lr 0.001036 wd 0.0500 time 0.2572 (0.2587) data time 0.0008 (0.0021) model time 0.2564 (0.2559) loss 6.1881 (5.9683) grad_norm 1.6398 (2.0562) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][430/625] eta 0:00:50 lr 0.001036 wd 0.0500 time 0.2602 (0.2587) data time 0.0005 (0.0020) model time 0.2597 (0.2559) loss 6.4669 (5.9691) grad_norm 1.8544 (2.0461) loss_scale 4096.0000 (2057.5035) mem 9655MB [2024-08-04 04:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][440/625] eta 0:00:47 lr 0.001036 wd 0.0500 time 0.2540 (0.2586) data time 0.0010 (0.0020) model time 0.2530 (0.2559) loss 6.1663 (5.9607) grad_norm 1.3929 (2.0435) loss_scale 4096.0000 (2103.7279) mem 9655MB [2024-08-04 04:58:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][450/625] eta 0:00:45 lr 0.001035 wd 0.0500 time 0.2726 (0.2586) data time 0.0006 (0.0020) model time 0.2720 (0.2559) loss 5.3298 (5.9585) grad_norm 2.9315 (2.0467) loss_scale 4096.0000 (2147.9024) mem 9655MB [2024-08-04 04:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][460/625] eta 0:00:42 lr 0.001035 wd 0.0500 time 0.2562 (0.2586) data time 0.0008 (0.0020) model time 0.2554 (0.2559) loss 6.5435 (5.9641) grad_norm 1.9011 (2.0478) loss_scale 4096.0000 (2190.1605) mem 9655MB [2024-08-04 04:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][470/625] eta 0:00:40 lr 0.001035 wd 0.0500 time 0.2527 (0.2585) data time 0.0008 (0.0019) model time 0.2518 (0.2559) loss 5.0707 (5.9644) grad_norm 1.4322 (2.0414) loss_scale 4096.0000 (2230.6242) mem 9655MB [2024-08-04 04:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][480/625] eta 0:00:37 lr 0.001035 wd 0.0500 time 0.2572 (0.2585) data time 0.0006 (0.0019) model time 0.2566 (0.2559) loss 5.6359 (5.9685) grad_norm 3.5986 (2.0454) loss_scale 4096.0000 (2269.4054) mem 9655MB [2024-08-04 04:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][490/625] eta 0:00:34 lr 0.001035 wd 0.0500 time 0.2604 (0.2584) data time 0.0011 (0.0019) model time 0.2593 (0.2558) loss 6.7105 (5.9765) grad_norm 1.5088 (2.0432) loss_scale 4096.0000 (2306.6069) mem 9655MB [2024-08-04 04:58:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][500/625] eta 0:00:32 lr 0.001034 wd 0.0500 time 0.2639 (0.2584) data time 0.0008 (0.0019) model time 0.2632 (0.2559) loss 6.3032 (5.9792) grad_norm 1.9289 (2.0507) loss_scale 4096.0000 (2342.3234) mem 9655MB [2024-08-04 04:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][510/625] eta 0:00:29 lr 0.001034 wd 0.0500 time 0.2599 (0.2584) data time 0.0007 (0.0019) model time 0.2591 (0.2559) loss 4.8276 (5.9783) grad_norm 2.3572 (2.0587) loss_scale 4096.0000 (2376.6419) mem 9655MB [2024-08-04 04:58:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][520/625] eta 0:00:27 lr 0.001034 wd 0.0500 time 0.2560 (0.2584) data time 0.0008 (0.0018) model time 0.2552 (0.2559) loss 5.8515 (5.9846) grad_norm 3.5355 (2.0609) loss_scale 4096.0000 (2409.6430) mem 9655MB [2024-08-04 04:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][530/625] eta 0:00:24 lr 0.001034 wd 0.0500 time 0.2560 (0.2583) data time 0.0010 (0.0018) model time 0.2550 (0.2559) loss 5.7331 (5.9857) grad_norm 2.7572 (inf) loss_scale 2048.0000 (2422.1168) mem 9655MB [2024-08-04 04:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][540/625] eta 0:00:21 lr 0.001034 wd 0.0500 time 0.2649 (0.2583) data time 0.0007 (0.0018) model time 0.2642 (0.2559) loss 6.4210 (5.9912) grad_norm 1.4361 (inf) loss_scale 2048.0000 (2415.2015) mem 9655MB [2024-08-04 04:58:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][550/625] eta 0:00:19 lr 0.001034 wd 0.0500 time 0.2567 (0.2583) data time 0.0007 (0.0018) model time 0.2560 (0.2559) loss 6.3384 (5.9919) grad_norm 2.1551 (inf) loss_scale 2048.0000 (2408.5372) mem 9655MB [2024-08-04 04:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][560/625] eta 0:00:16 lr 0.001033 wd 0.0500 time 0.2566 (0.2583) data time 0.0009 (0.0018) model time 0.2557 (0.2559) loss 5.5914 (5.9918) grad_norm 1.3110 (inf) loss_scale 2048.0000 (2402.1105) mem 9655MB [2024-08-04 04:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][570/625] eta 0:00:14 lr 0.001033 wd 0.0500 time 0.2566 (0.2586) data time 0.0009 (0.0018) model time 0.2557 (0.2563) loss 5.2106 (5.9918) grad_norm 1.9754 (inf) loss_scale 2048.0000 (2395.9089) mem 9655MB [2024-08-04 04:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][580/625] eta 0:00:11 lr 0.001033 wd 0.0500 time 0.2558 (0.2589) data time 0.0009 (0.0017) model time 0.2549 (0.2567) loss 4.4267 (5.9861) grad_norm 1.8269 (inf) loss_scale 2048.0000 (2389.9208) mem 9655MB [2024-08-04 04:59:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][590/625] eta 0:00:09 lr 0.001033 wd 0.0500 time 0.2558 (0.2589) data time 0.0007 (0.0017) model time 0.2552 (0.2567) loss 4.9610 (5.9798) grad_norm 1.5640 (inf) loss_scale 2048.0000 (2384.1354) mem 9655MB [2024-08-04 04:59:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][600/625] eta 0:00:06 lr 0.001033 wd 0.0500 time 0.2564 (0.2588) data time 0.0007 (0.0017) model time 0.2558 (0.2566) loss 6.8997 (5.9828) grad_norm 2.0065 (inf) loss_scale 2048.0000 (2378.5424) mem 9655MB [2024-08-04 04:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][610/625] eta 0:00:03 lr 0.001032 wd 0.0500 time 0.2524 (0.2591) data time 0.0006 (0.0017) model time 0.2518 (0.2570) loss 4.4409 (5.9782) grad_norm 1.4876 (inf) loss_scale 2048.0000 (2373.1326) mem 9655MB [2024-08-04 04:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][620/625] eta 0:00:01 lr 0.001032 wd 0.0500 time 0.2539 (0.2590) data time 0.0006 (0.0017) model time 0.2533 (0.2569) loss 5.0247 (5.9765) grad_norm 1.7737 (inf) loss_scale 2048.0000 (2367.8969) mem 9655MB [2024-08-04 04:59:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 157 training takes 0:02:41 [2024-08-04 04:59:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 04:59:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 04:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.434 (0.434) Loss 0.6753 (0.6753) Acc@1 88.721 (88.721) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 04:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.091) Loss 1.0566 (0.8051) Acc@1 77.344 (84.242) Acc@5 94.482 (97.088) Mem 9655MB [2024-08-04 04:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.056 (0.074) Loss 1.1445 (0.9437) Acc@1 75.732 (80.694) Acc@5 93.066 (95.505) Mem 9655MB [2024-08-04 04:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.336 Acc@5 95.485 [2024-08-04 04:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 04:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.784 (0.784) Loss 0.5913 (0.5913) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9390 (0.7303) Acc@1 79.248 (85.294) Acc@5 95.361 (97.483) Mem 9655MB [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0713 (0.8601) Acc@1 75.146 (81.813) Acc@5 93.945 (96.036) Mem 9655MB [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.518 Acc@5 96.017 [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.52% [2024-08-04 04:59:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 04:59:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 04:59:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][0/625] eta 0:09:45 lr 0.001032 wd 0.0500 time 0.9367 (0.9367) data time 0.6992 (0.6992) model time 0.0000 (0.0000) loss 6.2554 (6.2554) grad_norm 1.6414 (1.6414) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][10/625] eta 0:03:16 lr 0.001032 wd 0.0500 time 0.2530 (0.3188) data time 0.0009 (0.0643) model time 0.0000 (0.0000) loss 5.9135 (5.7949) grad_norm 2.6758 (1.7017) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][20/625] eta 0:02:54 lr 0.001032 wd 0.0500 time 0.2553 (0.2892) data time 0.0009 (0.0341) model time 0.0000 (0.0000) loss 5.8684 (5.5971) grad_norm 1.6593 (2.2603) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][30/625] eta 0:02:45 lr 0.001032 wd 0.0500 time 0.2522 (0.2783) data time 0.0006 (0.0234) model time 0.0000 (0.0000) loss 6.2598 (5.7949) grad_norm 1.5768 (2.2703) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][40/625] eta 0:02:39 lr 0.001032 wd 0.0500 time 0.2506 (0.2729) data time 0.0007 (0.0179) model time 0.0000 (0.0000) loss 4.9700 (5.8251) grad_norm 1.9735 (2.3823) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][50/625] eta 0:02:34 lr 0.001031 wd 0.0500 time 0.2513 (0.2695) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 7.3087 (5.8220) grad_norm 4.1023 (2.3843) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][60/625] eta 0:02:31 lr 0.001031 wd 0.0500 time 0.2481 (0.2674) data time 0.0006 (0.0123) model time 0.2475 (0.2556) loss 4.8710 (5.8510) grad_norm 2.3106 (2.2764) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][70/625] eta 0:02:30 lr 0.001031 wd 0.0500 time 0.2499 (0.2712) data time 0.0006 (0.0107) model time 0.2493 (0.2747) loss 6.4604 (5.9133) grad_norm 1.5085 (2.1793) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][80/625] eta 0:02:26 lr 0.001031 wd 0.0500 time 0.2542 (0.2694) data time 0.0009 (0.0095) model time 0.2533 (0.2683) loss 6.1443 (5.9289) grad_norm 1.6169 (2.1422) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][90/625] eta 0:02:23 lr 0.001031 wd 0.0500 time 0.2587 (0.2680) data time 0.0015 (0.0086) model time 0.2572 (0.2651) loss 5.9423 (5.9358) grad_norm 2.1978 (2.0959) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][100/625] eta 0:02:20 lr 0.001030 wd 0.0500 time 0.2581 (0.2668) data time 0.0007 (0.0078) model time 0.2574 (0.2631) loss 6.3064 (5.9407) grad_norm 1.5603 (2.0582) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][110/625] eta 0:02:16 lr 0.001030 wd 0.0500 time 0.2562 (0.2658) data time 0.0010 (0.0072) model time 0.2552 (0.2617) loss 6.3164 (5.9644) grad_norm 2.4054 (2.0839) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][120/625] eta 0:02:13 lr 0.001030 wd 0.0500 time 0.2540 (0.2650) data time 0.0010 (0.0067) model time 0.2530 (0.2608) loss 6.9055 (5.9898) grad_norm 1.9466 (2.0710) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 04:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][130/625] eta 0:02:10 lr 0.001030 wd 0.0500 time 0.2582 (0.2643) data time 0.0011 (0.0062) model time 0.2571 (0.2600) loss 6.4365 (5.9739) grad_norm 1.2716 (2.0490) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][140/625] eta 0:02:07 lr 0.001030 wd 0.0500 time 0.2587 (0.2637) data time 0.0006 (0.0059) model time 0.2581 (0.2594) loss 5.5297 (5.9597) grad_norm 2.5312 (2.0618) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][150/625] eta 0:02:05 lr 0.001030 wd 0.0500 time 0.2562 (0.2632) data time 0.0010 (0.0055) model time 0.2552 (0.2590) loss 4.9385 (5.9415) grad_norm 2.4344 (2.0660) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][160/625] eta 0:02:02 lr 0.001029 wd 0.0500 time 0.2584 (0.2630) data time 0.0009 (0.0052) model time 0.2575 (0.2590) loss 5.0302 (5.9478) grad_norm 1.3208 (2.0911) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][170/625] eta 0:01:59 lr 0.001029 wd 0.0500 time 0.2563 (0.2626) data time 0.0010 (0.0050) model time 0.2553 (0.2588) loss 6.5436 (5.9740) grad_norm 2.7297 (2.1121) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][180/625] eta 0:01:56 lr 0.001029 wd 0.0500 time 0.2567 (0.2623) data time 0.0006 (0.0048) model time 0.2561 (0.2585) loss 6.2505 (5.9959) grad_norm 2.7337 (2.1301) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][190/625] eta 0:01:53 lr 0.001029 wd 0.0500 time 0.2531 (0.2620) data time 0.0008 (0.0046) model time 0.2522 (0.2583) loss 6.4338 (5.9948) grad_norm 1.5578 (2.1157) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][200/625] eta 0:01:51 lr 0.001029 wd 0.0500 time 0.2600 (0.2617) data time 0.0006 (0.0044) model time 0.2594 (0.2581) loss 6.4339 (5.9984) grad_norm 1.7896 (2.0988) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][210/625] eta 0:01:48 lr 0.001028 wd 0.0500 time 0.2590 (0.2614) data time 0.0008 (0.0042) model time 0.2583 (0.2579) loss 5.6546 (5.9769) grad_norm 2.6264 (2.1149) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][220/625] eta 0:01:45 lr 0.001028 wd 0.0500 time 0.2555 (0.2611) data time 0.0007 (0.0041) model time 0.2549 (0.2577) loss 6.1980 (5.9716) grad_norm 1.6594 (2.1211) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][230/625] eta 0:01:43 lr 0.001028 wd 0.0500 time 0.2498 (0.2609) data time 0.0006 (0.0039) model time 0.2491 (0.2576) loss 6.1700 (5.9869) grad_norm 1.8673 (2.1132) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][240/625] eta 0:01:40 lr 0.001028 wd 0.0500 time 0.2578 (0.2608) data time 0.0006 (0.0038) model time 0.2572 (0.2575) loss 6.1867 (5.9908) grad_norm 1.4130 (2.1000) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][250/625] eta 0:01:37 lr 0.001028 wd 0.0500 time 0.2573 (0.2606) data time 0.0012 (0.0037) model time 0.2561 (0.2574) loss 5.7846 (5.9803) grad_norm 1.3037 (2.0756) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][260/625] eta 0:01:35 lr 0.001028 wd 0.0500 time 0.2567 (0.2604) data time 0.0008 (0.0036) model time 0.2559 (0.2573) loss 6.6317 (5.9888) grad_norm 2.3826 (2.0690) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][270/625] eta 0:01:32 lr 0.001027 wd 0.0500 time 0.2566 (0.2603) data time 0.0009 (0.0035) model time 0.2557 (0.2572) loss 4.7968 (5.9746) grad_norm 2.2386 (2.0561) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][280/625] eta 0:01:29 lr 0.001027 wd 0.0500 time 0.2572 (0.2601) data time 0.0008 (0.0034) model time 0.2564 (0.2571) loss 5.4661 (5.9720) grad_norm 2.6500 (2.0483) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][290/625] eta 0:01:27 lr 0.001027 wd 0.0500 time 0.2596 (0.2600) data time 0.0009 (0.0033) model time 0.2587 (0.2570) loss 6.3514 (5.9690) grad_norm 3.1490 (2.0719) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][300/625] eta 0:01:24 lr 0.001027 wd 0.0500 time 0.2598 (0.2599) data time 0.0007 (0.0032) model time 0.2591 (0.2570) loss 6.0060 (5.9718) grad_norm 1.3252 (2.0694) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][310/625] eta 0:01:21 lr 0.001027 wd 0.0500 time 0.2500 (0.2597) data time 0.0007 (0.0032) model time 0.2493 (0.2569) loss 4.4139 (5.9609) grad_norm 1.3916 (2.0777) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][320/625] eta 0:01:19 lr 0.001027 wd 0.0500 time 0.2572 (0.2597) data time 0.0007 (0.0031) model time 0.2565 (0.2569) loss 6.6391 (5.9715) grad_norm 2.9836 (2.0745) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][330/625] eta 0:01:16 lr 0.001026 wd 0.0500 time 0.2598 (0.2595) data time 0.0006 (0.0030) model time 0.2593 (0.2568) loss 5.6966 (5.9669) grad_norm 1.7094 (2.0816) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][340/625] eta 0:01:13 lr 0.001026 wd 0.0500 time 0.2575 (0.2594) data time 0.0010 (0.0030) model time 0.2565 (0.2567) loss 6.1727 (5.9670) grad_norm 3.1008 (2.0741) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][350/625] eta 0:01:11 lr 0.001026 wd 0.0500 time 0.2594 (0.2594) data time 0.0009 (0.0029) model time 0.2586 (0.2567) loss 6.8115 (5.9734) grad_norm 1.7868 (2.0754) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][360/625] eta 0:01:08 lr 0.001026 wd 0.0500 time 0.2563 (0.2593) data time 0.0009 (0.0029) model time 0.2553 (0.2567) loss 6.1116 (5.9718) grad_norm 2.2318 (2.0746) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][370/625] eta 0:01:06 lr 0.001026 wd 0.0500 time 0.2724 (0.2592) data time 0.0010 (0.0028) model time 0.2714 (0.2566) loss 6.2024 (5.9686) grad_norm 2.2383 (2.0731) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][380/625] eta 0:01:03 lr 0.001025 wd 0.0500 time 0.2554 (0.2591) data time 0.0007 (0.0028) model time 0.2547 (0.2566) loss 4.8328 (5.9561) grad_norm 1.4789 (2.0674) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][390/625] eta 0:01:00 lr 0.001025 wd 0.0500 time 0.2570 (0.2591) data time 0.0009 (0.0027) model time 0.2561 (0.2566) loss 6.4528 (5.9583) grad_norm 1.9165 (2.0670) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][400/625] eta 0:00:58 lr 0.001025 wd 0.0500 time 0.2544 (0.2590) data time 0.0009 (0.0027) model time 0.2536 (0.2565) loss 5.0766 (5.9587) grad_norm 2.0804 (2.0625) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][410/625] eta 0:00:55 lr 0.001025 wd 0.0500 time 0.2553 (0.2589) data time 0.0007 (0.0026) model time 0.2547 (0.2565) loss 6.7862 (5.9634) grad_norm 1.3598 (2.0705) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][420/625] eta 0:00:53 lr 0.001025 wd 0.0500 time 0.2573 (0.2588) data time 0.0006 (0.0026) model time 0.2567 (0.2564) loss 4.4606 (5.9517) grad_norm 1.3764 (2.0615) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][430/625] eta 0:00:50 lr 0.001025 wd 0.0500 time 0.2542 (0.2587) data time 0.0007 (0.0025) model time 0.2535 (0.2564) loss 7.0583 (5.9621) grad_norm 1.6362 (2.0650) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][440/625] eta 0:00:47 lr 0.001024 wd 0.0500 time 0.2633 (0.2587) data time 0.0010 (0.0025) model time 0.2622 (0.2564) loss 6.0756 (5.9612) grad_norm 3.2578 (2.0966) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][450/625] eta 0:00:45 lr 0.001024 wd 0.0500 time 0.2553 (0.2586) data time 0.0007 (0.0025) model time 0.2546 (0.2563) loss 5.6022 (5.9653) grad_norm 1.7040 (2.0923) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][460/625] eta 0:00:42 lr 0.001024 wd 0.0500 time 0.2582 (0.2586) data time 0.0008 (0.0024) model time 0.2574 (0.2563) loss 6.7542 (5.9683) grad_norm 1.5385 (2.0857) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][470/625] eta 0:00:40 lr 0.001024 wd 0.0500 time 0.2564 (0.2585) data time 0.0007 (0.0024) model time 0.2557 (0.2563) loss 5.8072 (5.9645) grad_norm 2.7160 (2.1004) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][480/625] eta 0:00:37 lr 0.001024 wd 0.0500 time 0.2554 (0.2585) data time 0.0012 (0.0024) model time 0.2542 (0.2562) loss 4.9443 (5.9752) grad_norm 1.4130 (2.0983) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][490/625] eta 0:00:34 lr 0.001024 wd 0.0500 time 0.2511 (0.2585) data time 0.0009 (0.0023) model time 0.2502 (0.2562) loss 7.1311 (5.9740) grad_norm 1.5904 (2.0983) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][500/625] eta 0:00:32 lr 0.001023 wd 0.0500 time 0.2553 (0.2584) data time 0.0010 (0.0023) model time 0.2544 (0.2562) loss 4.9896 (5.9689) grad_norm 1.3107 (2.0922) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][510/625] eta 0:00:29 lr 0.001023 wd 0.0500 time 0.2577 (0.2584) data time 0.0009 (0.0023) model time 0.2568 (0.2563) loss 5.6324 (5.9725) grad_norm 1.8561 (2.0895) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][520/625] eta 0:00:27 lr 0.001023 wd 0.0500 time 0.2596 (0.2584) data time 0.0008 (0.0023) model time 0.2588 (0.2563) loss 6.1811 (5.9749) grad_norm 1.8358 (2.0822) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][530/625] eta 0:00:24 lr 0.001023 wd 0.0500 time 0.2552 (0.2584) data time 0.0009 (0.0022) model time 0.2543 (0.2563) loss 5.3335 (5.9714) grad_norm 2.5465 (2.0862) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][540/625] eta 0:00:21 lr 0.001023 wd 0.0500 time 0.2576 (0.2583) data time 0.0006 (0.0022) model time 0.2570 (0.2563) loss 6.3863 (5.9809) grad_norm 2.3797 (2.0942) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][550/625] eta 0:00:19 lr 0.001022 wd 0.0500 time 0.2512 (0.2583) data time 0.0013 (0.0022) model time 0.2499 (0.2562) loss 6.5063 (5.9876) grad_norm 4.0317 (2.0990) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][560/625] eta 0:00:16 lr 0.001022 wd 0.0500 time 0.2521 (0.2582) data time 0.0009 (0.0022) model time 0.2512 (0.2562) loss 6.6413 (5.9854) grad_norm 1.2100 (2.1054) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][570/625] eta 0:00:14 lr 0.001022 wd 0.0500 time 0.2565 (0.2582) data time 0.0008 (0.0022) model time 0.2558 (0.2562) loss 4.9059 (5.9778) grad_norm 1.3865 (2.1164) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][580/625] eta 0:00:11 lr 0.001022 wd 0.0500 time 0.2537 (0.2582) data time 0.0008 (0.0021) model time 0.2529 (0.2561) loss 6.3765 (5.9808) grad_norm 1.9087 (2.1084) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][590/625] eta 0:00:09 lr 0.001022 wd 0.0500 time 0.2542 (0.2581) data time 0.0009 (0.0021) model time 0.2533 (0.2561) loss 4.4122 (5.9750) grad_norm 1.8935 (2.1011) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][600/625] eta 0:00:06 lr 0.001022 wd 0.0500 time 0.2543 (0.2581) data time 0.0009 (0.0021) model time 0.2534 (0.2561) loss 6.0616 (5.9739) grad_norm 2.0858 (2.1012) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][610/625] eta 0:00:03 lr 0.001021 wd 0.0500 time 0.2535 (0.2581) data time 0.0006 (0.0021) model time 0.2529 (0.2561) loss 5.7957 (5.9726) grad_norm 1.3832 (2.0983) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][620/625] eta 0:00:01 lr 0.001021 wd 0.0500 time 0.2537 (0.2580) data time 0.0004 (0.0021) model time 0.2533 (0.2560) loss 6.7805 (5.9766) grad_norm 3.0983 (2.0967) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 158 training takes 0:02:41 [2024-08-04 05:02:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:02:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:02:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.494 (0.494) Loss 0.6670 (0.6670) Acc@1 87.939 (87.939) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 05:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.065 (0.099) Loss 1.0762 (0.7967) Acc@1 76.855 (84.135) Acc@5 94.043 (97.021) Mem 9655MB [2024-08-04 05:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1553 (0.9355) Acc@1 74.609 (80.613) Acc@5 93.701 (95.519) Mem 9655MB [2024-08-04 05:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.340 Acc@5 95.469 [2024-08-04 05:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 05:02:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.927 (0.927) Loss 0.5913 (0.5913) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.136) Loss 0.9395 (0.7300) Acc@1 79.248 (85.334) Acc@5 95.361 (97.505) Mem 9655MB [2024-08-04 05:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.097) Loss 1.0713 (0.8600) Acc@1 75.244 (81.848) Acc@5 93.896 (96.061) Mem 9655MB [2024-08-04 05:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.556 Acc@5 96.041 [2024-08-04 05:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 05:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.56% [2024-08-04 05:02:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:02:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:02:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][0/625] eta 0:07:41 lr 0.001021 wd 0.0500 time 0.7391 (0.7391) data time 0.4990 (0.4990) model time 0.0000 (0.0000) loss 5.1430 (5.1430) grad_norm 1.6946 (1.6946) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][10/625] eta 0:03:03 lr 0.001021 wd 0.0500 time 0.2542 (0.2988) data time 0.0007 (0.0461) model time 0.0000 (0.0000) loss 5.5916 (5.9173) grad_norm 1.5893 (1.9691) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][20/625] eta 0:02:48 lr 0.001021 wd 0.0500 time 0.2578 (0.2781) data time 0.0008 (0.0246) model time 0.0000 (0.0000) loss 5.3966 (5.9800) grad_norm 1.3356 (1.7886) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][30/625] eta 0:02:52 lr 0.001021 wd 0.0500 time 0.4398 (0.2901) data time 0.0007 (0.0169) model time 0.0000 (0.0000) loss 6.3483 (5.9933) grad_norm 1.3736 (1.8565) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][40/625] eta 0:02:44 lr 0.001020 wd 0.0500 time 0.2549 (0.2816) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 5.9376 (6.0057) grad_norm 2.0934 (1.9018) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][50/625] eta 0:02:39 lr 0.001020 wd 0.0500 time 0.2536 (0.2771) data time 0.0010 (0.0107) model time 0.0000 (0.0000) loss 6.4716 (6.0212) grad_norm 1.5800 (1.9150) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][60/625] eta 0:02:34 lr 0.001020 wd 0.0500 time 0.2592 (0.2736) data time 0.0006 (0.0091) model time 0.2586 (0.2550) loss 6.6296 (6.0223) grad_norm 1.7542 (1.8862) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][70/625] eta 0:02:30 lr 0.001020 wd 0.0500 time 0.2574 (0.2712) data time 0.0006 (0.0079) model time 0.2568 (0.2554) loss 6.5729 (5.9775) grad_norm 1.5210 (1.8885) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][80/625] eta 0:02:26 lr 0.001020 wd 0.0500 time 0.2610 (0.2694) data time 0.0006 (0.0071) model time 0.2604 (0.2554) loss 6.1461 (5.9477) grad_norm 3.0818 (1.8999) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][90/625] eta 0:02:23 lr 0.001020 wd 0.0500 time 0.2531 (0.2678) data time 0.0008 (0.0065) model time 0.2523 (0.2548) loss 5.9499 (5.9507) grad_norm 2.7614 (1.9475) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][100/625] eta 0:02:19 lr 0.001019 wd 0.0500 time 0.2526 (0.2666) data time 0.0009 (0.0060) model time 0.2518 (0.2547) loss 5.0658 (5.9749) grad_norm 2.6036 (1.9602) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][110/625] eta 0:02:16 lr 0.001019 wd 0.0500 time 0.2583 (0.2657) data time 0.0009 (0.0055) model time 0.2573 (0.2549) loss 7.2010 (6.0175) grad_norm 2.5911 (2.0777) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][120/625] eta 0:02:13 lr 0.001019 wd 0.0500 time 0.2620 (0.2649) data time 0.0007 (0.0051) model time 0.2613 (0.2550) loss 6.4883 (6.0287) grad_norm 2.0523 (2.0760) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][130/625] eta 0:02:10 lr 0.001019 wd 0.0500 time 0.2536 (0.2642) data time 0.0010 (0.0048) model time 0.2526 (0.2549) loss 6.1730 (6.0007) grad_norm 1.9635 (2.1487) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][140/625] eta 0:02:07 lr 0.001019 wd 0.0500 time 0.2594 (0.2637) data time 0.0007 (0.0045) model time 0.2587 (0.2550) loss 5.7934 (5.9918) grad_norm 2.3232 (2.1701) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][150/625] eta 0:02:05 lr 0.001018 wd 0.0500 time 0.2569 (0.2649) data time 0.0008 (0.0043) model time 0.2561 (0.2576) loss 5.4892 (5.9752) grad_norm 1.3081 (2.1423) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][160/625] eta 0:02:02 lr 0.001018 wd 0.0500 time 0.2584 (0.2643) data time 0.0006 (0.0041) model time 0.2578 (0.2574) loss 6.1800 (5.9828) grad_norm 2.7053 (2.1156) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][170/625] eta 0:02:00 lr 0.001018 wd 0.0500 time 0.2572 (0.2640) data time 0.0006 (0.0039) model time 0.2566 (0.2574) loss 6.4896 (6.0079) grad_norm 2.2995 (2.1011) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][180/625] eta 0:01:57 lr 0.001018 wd 0.0500 time 0.2505 (0.2636) data time 0.0009 (0.0037) model time 0.2496 (0.2572) loss 5.8917 (6.0293) grad_norm 1.6602 (2.0927) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][190/625] eta 0:01:54 lr 0.001018 wd 0.0500 time 0.2515 (0.2631) data time 0.0008 (0.0036) model time 0.2507 (0.2571) loss 7.0881 (6.0428) grad_norm 2.3759 (2.0797) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][200/625] eta 0:01:51 lr 0.001018 wd 0.0500 time 0.2627 (0.2630) data time 0.0008 (0.0035) model time 0.2619 (0.2572) loss 5.0617 (6.0127) grad_norm 1.6322 (2.0595) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][210/625] eta 0:01:49 lr 0.001017 wd 0.0500 time 0.2584 (0.2627) data time 0.0008 (0.0033) model time 0.2575 (0.2571) loss 5.6595 (5.9994) grad_norm 2.6638 (2.0601) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][220/625] eta 0:01:46 lr 0.001017 wd 0.0500 time 0.2554 (0.2623) data time 0.0008 (0.0032) model time 0.2545 (0.2570) loss 6.3423 (5.9930) grad_norm 1.5210 (2.0741) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][230/625] eta 0:01:43 lr 0.001017 wd 0.0500 time 0.2543 (0.2621) data time 0.0006 (0.0031) model time 0.2536 (0.2568) loss 5.1529 (5.9811) grad_norm 2.6927 (2.0867) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][240/625] eta 0:01:40 lr 0.001017 wd 0.0500 time 0.2546 (0.2619) data time 0.0008 (0.0030) model time 0.2538 (0.2568) loss 7.2986 (5.9925) grad_norm 1.9267 (2.0823) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][250/625] eta 0:01:38 lr 0.001017 wd 0.0500 time 0.2522 (0.2616) data time 0.0009 (0.0030) model time 0.2514 (0.2567) loss 5.7190 (5.9912) grad_norm 1.4290 (2.0730) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][260/625] eta 0:01:35 lr 0.001016 wd 0.0500 time 0.2558 (0.2614) data time 0.0009 (0.0029) model time 0.2550 (0.2566) loss 5.5480 (5.9997) grad_norm 1.9148 (2.0770) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][270/625] eta 0:01:33 lr 0.001016 wd 0.0500 time 0.4603 (0.2627) data time 0.0006 (0.0028) model time 0.4597 (0.2584) loss 6.4117 (5.9930) grad_norm 1.9223 (2.0649) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][280/625] eta 0:01:30 lr 0.001016 wd 0.0500 time 0.2591 (0.2633) data time 0.0009 (0.0027) model time 0.2582 (0.2592) loss 5.7763 (5.9955) grad_norm 1.7353 (2.0708) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][290/625] eta 0:01:28 lr 0.001016 wd 0.0500 time 0.2569 (0.2630) data time 0.0011 (0.0027) model time 0.2558 (0.2590) loss 5.3230 (5.9975) grad_norm 1.4240 (2.0566) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][300/625] eta 0:01:25 lr 0.001016 wd 0.0500 time 0.2577 (0.2628) data time 0.0007 (0.0026) model time 0.2570 (0.2589) loss 5.0456 (5.9867) grad_norm 1.2751 (2.0421) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][310/625] eta 0:01:22 lr 0.001016 wd 0.0500 time 0.2594 (0.2626) data time 0.0010 (0.0026) model time 0.2584 (0.2588) loss 5.4649 (5.9891) grad_norm 1.3780 (2.0423) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][320/625] eta 0:01:20 lr 0.001015 wd 0.0500 time 0.2575 (0.2624) data time 0.0008 (0.0025) model time 0.2566 (0.2587) loss 6.6577 (5.9853) grad_norm 3.3335 (2.0423) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][330/625] eta 0:01:17 lr 0.001015 wd 0.0500 time 0.4390 (0.2628) data time 0.0008 (0.0025) model time 0.4381 (0.2592) loss 6.6276 (5.9858) grad_norm 1.8940 (2.0439) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][340/625] eta 0:01:14 lr 0.001015 wd 0.0500 time 0.2581 (0.2626) data time 0.0008 (0.0024) model time 0.2573 (0.2590) loss 5.7722 (5.9737) grad_norm 1.6832 (2.0309) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][350/625] eta 0:01:12 lr 0.001015 wd 0.0500 time 0.2545 (0.2624) data time 0.0008 (0.0024) model time 0.2536 (0.2589) loss 6.6898 (5.9678) grad_norm 2.0680 (2.0370) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][360/625] eta 0:01:09 lr 0.001015 wd 0.0500 time 0.2558 (0.2622) data time 0.0007 (0.0024) model time 0.2551 (0.2588) loss 7.1415 (5.9780) grad_norm 1.8682 (2.0408) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][370/625] eta 0:01:06 lr 0.001015 wd 0.0500 time 0.2532 (0.2621) data time 0.0007 (0.0023) model time 0.2525 (0.2587) loss 7.0551 (5.9780) grad_norm 1.6385 (2.0364) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][380/625] eta 0:01:04 lr 0.001014 wd 0.0500 time 0.2572 (0.2619) data time 0.0010 (0.0023) model time 0.2563 (0.2586) loss 7.1652 (5.9776) grad_norm 2.4457 (2.0311) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][390/625] eta 0:01:01 lr 0.001014 wd 0.0500 time 0.2591 (0.2618) data time 0.0006 (0.0022) model time 0.2585 (0.2585) loss 4.1014 (5.9710) grad_norm 1.4236 (2.0366) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][400/625] eta 0:00:58 lr 0.001014 wd 0.0500 time 0.2549 (0.2617) data time 0.0009 (0.0022) model time 0.2539 (0.2585) loss 6.9845 (5.9719) grad_norm 1.4342 (2.0309) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][410/625] eta 0:00:56 lr 0.001014 wd 0.0500 time 0.2609 (0.2616) data time 0.0007 (0.0022) model time 0.2601 (0.2584) loss 6.6378 (5.9741) grad_norm 1.9294 (2.0263) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][420/625] eta 0:00:53 lr 0.001014 wd 0.0500 time 0.2601 (0.2615) data time 0.0008 (0.0021) model time 0.2593 (0.2584) loss 6.3427 (5.9750) grad_norm 1.3156 (2.0375) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][430/625] eta 0:00:50 lr 0.001013 wd 0.0500 time 0.2555 (0.2613) data time 0.0008 (0.0021) model time 0.2547 (0.2583) loss 6.2235 (5.9718) grad_norm 1.5976 (2.0342) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][440/625] eta 0:00:48 lr 0.001013 wd 0.0500 time 0.2612 (0.2613) data time 0.0008 (0.0021) model time 0.2604 (0.2582) loss 6.3719 (5.9693) grad_norm 2.2735 (2.0386) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][450/625] eta 0:00:45 lr 0.001013 wd 0.0500 time 0.2518 (0.2611) data time 0.0008 (0.0021) model time 0.2511 (0.2582) loss 6.7488 (5.9665) grad_norm 2.1276 (2.0426) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][460/625] eta 0:00:43 lr 0.001013 wd 0.0500 time 0.2543 (0.2610) data time 0.0009 (0.0020) model time 0.2534 (0.2581) loss 5.8192 (5.9672) grad_norm 1.9352 (2.0442) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][470/625] eta 0:00:40 lr 0.001013 wd 0.0500 time 0.2522 (0.2609) data time 0.0008 (0.0020) model time 0.2514 (0.2580) loss 6.2043 (5.9729) grad_norm 1.5969 (2.0742) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][480/625] eta 0:00:37 lr 0.001013 wd 0.0500 time 0.2549 (0.2608) data time 0.0009 (0.0020) model time 0.2539 (0.2580) loss 5.9340 (5.9720) grad_norm 1.2447 (2.0722) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][490/625] eta 0:00:35 lr 0.001012 wd 0.0500 time 0.2629 (0.2608) data time 0.0008 (0.0020) model time 0.2621 (0.2580) loss 6.0805 (5.9749) grad_norm 2.7074 (2.0717) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][500/625] eta 0:00:32 lr 0.001012 wd 0.0500 time 0.2656 (0.2607) data time 0.0007 (0.0020) model time 0.2649 (0.2579) loss 4.8567 (5.9741) grad_norm 1.5655 (2.0726) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][510/625] eta 0:00:29 lr 0.001012 wd 0.0500 time 0.2556 (0.2607) data time 0.0010 (0.0019) model time 0.2546 (0.2579) loss 6.5009 (5.9811) grad_norm 2.0312 (2.0649) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][520/625] eta 0:00:27 lr 0.001012 wd 0.0500 time 0.2511 (0.2606) data time 0.0006 (0.0019) model time 0.2505 (0.2578) loss 5.5897 (5.9773) grad_norm 1.5745 (2.0610) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][530/625] eta 0:00:24 lr 0.001012 wd 0.0500 time 0.2564 (0.2605) data time 0.0007 (0.0019) model time 0.2557 (0.2578) loss 5.6656 (5.9820) grad_norm 1.8763 (2.0613) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][540/625] eta 0:00:22 lr 0.001012 wd 0.0500 time 0.2538 (0.2604) data time 0.0009 (0.0019) model time 0.2529 (0.2577) loss 6.9458 (5.9792) grad_norm 1.4933 (2.0656) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][550/625] eta 0:00:19 lr 0.001011 wd 0.0500 time 0.2567 (0.2604) data time 0.0010 (0.0019) model time 0.2557 (0.2577) loss 6.8228 (5.9849) grad_norm 1.3411 (2.0648) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][560/625] eta 0:00:16 lr 0.001011 wd 0.0500 time 0.2637 (0.2603) data time 0.0007 (0.0018) model time 0.2630 (0.2577) loss 5.3215 (5.9874) grad_norm 1.2750 (2.0654) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][570/625] eta 0:00:14 lr 0.001011 wd 0.0500 time 0.2571 (0.2602) data time 0.0006 (0.0018) model time 0.2565 (0.2576) loss 6.8624 (5.9945) grad_norm 1.5074 (2.0633) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][580/625] eta 0:00:11 lr 0.001011 wd 0.0500 time 0.2632 (0.2602) data time 0.0010 (0.0018) model time 0.2622 (0.2576) loss 6.5599 (5.9959) grad_norm 2.0407 (2.0639) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][590/625] eta 0:00:09 lr 0.001011 wd 0.0500 time 0.2538 (0.2601) data time 0.0007 (0.0018) model time 0.2531 (0.2576) loss 4.9956 (5.9918) grad_norm 1.5092 (2.0615) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][600/625] eta 0:00:06 lr 0.001010 wd 0.0500 time 0.2604 (0.2601) data time 0.0009 (0.0018) model time 0.2594 (0.2575) loss 5.5828 (5.9838) grad_norm 2.7102 (2.0677) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][610/625] eta 0:00:03 lr 0.001010 wd 0.0500 time 0.2536 (0.2600) data time 0.0006 (0.0018) model time 0.2530 (0.2575) loss 6.2084 (5.9856) grad_norm 1.7693 (2.0725) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][620/625] eta 0:00:01 lr 0.001010 wd 0.0500 time 0.2535 (0.2599) data time 0.0006 (0.0018) model time 0.2530 (0.2574) loss 6.3855 (5.9883) grad_norm 1.4009 (2.0710) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 159 training takes 0:02:42 [2024-08-04 05:04:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:04:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.495 (0.495) Loss 0.6479 (0.6479) Acc@1 88.672 (88.672) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 05:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 1.0527 (0.8088) Acc@1 77.002 (84.246) Acc@5 95.117 (96.999) Mem 9655MB [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1445 (0.9481) Acc@1 74.805 (80.715) Acc@5 93.408 (95.503) Mem 9655MB [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.458 Acc@5 95.533 [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.46% [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:04:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.497 (0.497) Loss 0.5913 (0.5913) Acc@1 89.160 (89.160) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9380 (0.7300) Acc@1 79.297 (85.378) Acc@5 95.264 (97.488) Mem 9655MB [2024-08-04 05:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0713 (0.8597) Acc@1 75.293 (81.871) Acc@5 93.994 (96.073) Mem 9655MB [2024-08-04 05:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.582 Acc@5 96.053 [2024-08-04 05:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 05:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.58% [2024-08-04 05:04:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:04:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][0/625] eta 0:07:14 lr 0.001010 wd 0.0500 time 0.6954 (0.6954) data time 0.4565 (0.4565) model time 0.0000 (0.0000) loss 5.8125 (5.8125) grad_norm 1.6924 (1.6924) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][10/625] eta 0:03:02 lr 0.001010 wd 0.0500 time 0.2560 (0.2961) data time 0.0008 (0.0423) model time 0.0000 (0.0000) loss 5.3449 (5.5715) grad_norm 1.4549 (1.9094) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][20/625] eta 0:02:47 lr 0.001010 wd 0.0500 time 0.2549 (0.2771) data time 0.0008 (0.0226) model time 0.0000 (0.0000) loss 7.2271 (5.6759) grad_norm 1.7601 (2.0069) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][30/625] eta 0:02:40 lr 0.001009 wd 0.0500 time 0.2569 (0.2704) data time 0.0011 (0.0156) model time 0.0000 (0.0000) loss 6.1557 (5.8901) grad_norm 2.1483 (2.0742) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][40/625] eta 0:02:36 lr 0.001009 wd 0.0500 time 0.2525 (0.2670) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 6.0383 (5.8712) grad_norm 2.7695 (2.0534) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][50/625] eta 0:02:32 lr 0.001009 wd 0.0500 time 0.2607 (0.2648) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 5.6794 (5.9138) grad_norm 1.3844 (2.0460) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][60/625] eta 0:02:28 lr 0.001009 wd 0.0500 time 0.2556 (0.2635) data time 0.0008 (0.0083) model time 0.2548 (0.2558) loss 5.6912 (5.9284) grad_norm 1.7416 (2.0740) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][70/625] eta 0:02:25 lr 0.001009 wd 0.0500 time 0.2565 (0.2624) data time 0.0008 (0.0073) model time 0.2556 (0.2555) loss 4.9248 (5.9404) grad_norm 2.2138 (2.0855) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][80/625] eta 0:02:22 lr 0.001009 wd 0.0500 time 0.2562 (0.2618) data time 0.0011 (0.0065) model time 0.2550 (0.2557) loss 6.5420 (5.9694) grad_norm 1.3500 (2.0478) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][90/625] eta 0:02:19 lr 0.001008 wd 0.0500 time 0.2526 (0.2611) data time 0.0008 (0.0059) model time 0.2517 (0.2554) loss 6.8696 (5.9791) grad_norm 1.3313 (1.9851) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][100/625] eta 0:02:18 lr 0.001008 wd 0.0500 time 0.2553 (0.2646) data time 0.0007 (0.0054) model time 0.2546 (0.2635) loss 4.7577 (5.9804) grad_norm 1.3469 (1.9504) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][110/625] eta 0:02:16 lr 0.001008 wd 0.0500 time 0.2522 (0.2656) data time 0.0009 (0.0050) model time 0.2513 (0.2654) loss 6.5932 (6.0004) grad_norm 2.5159 (1.9386) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][120/625] eta 0:02:13 lr 0.001008 wd 0.0500 time 0.2577 (0.2649) data time 0.0007 (0.0047) model time 0.2570 (0.2639) loss 6.3980 (6.0004) grad_norm 2.8718 (1.9737) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][130/625] eta 0:02:10 lr 0.001008 wd 0.0500 time 0.2518 (0.2642) data time 0.0007 (0.0044) model time 0.2511 (0.2628) loss 5.2608 (5.9637) grad_norm 1.8502 (1.9942) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][140/625] eta 0:02:07 lr 0.001008 wd 0.0500 time 0.2578 (0.2638) data time 0.0009 (0.0042) model time 0.2570 (0.2622) loss 6.9161 (5.9711) grad_norm 1.3386 (2.0206) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][150/625] eta 0:02:05 lr 0.001007 wd 0.0500 time 0.2618 (0.2633) data time 0.0007 (0.0039) model time 0.2611 (0.2615) loss 5.9186 (5.9646) grad_norm 5.0265 (2.0656) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][160/625] eta 0:02:02 lr 0.001007 wd 0.0500 time 0.2641 (0.2629) data time 0.0009 (0.0038) model time 0.2631 (0.2611) loss 6.2862 (5.9655) grad_norm 2.0957 (2.0563) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][170/625] eta 0:01:59 lr 0.001007 wd 0.0500 time 0.2540 (0.2625) data time 0.0012 (0.0036) model time 0.2528 (0.2605) loss 6.5320 (5.9481) grad_norm 1.4812 (2.0694) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][180/625] eta 0:01:56 lr 0.001007 wd 0.0500 time 0.2560 (0.2621) data time 0.0011 (0.0034) model time 0.2549 (0.2601) loss 6.1451 (5.9487) grad_norm 1.0964 (2.0596) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][190/625] eta 0:01:53 lr 0.001007 wd 0.0500 time 0.2584 (0.2617) data time 0.0010 (0.0033) model time 0.2574 (0.2597) loss 6.1072 (5.9687) grad_norm 3.2689 (2.0688) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][200/625] eta 0:01:51 lr 0.001006 wd 0.0500 time 0.2638 (0.2615) data time 0.0008 (0.0032) model time 0.2631 (0.2595) loss 6.1525 (5.9660) grad_norm 2.3576 (2.0726) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][210/625] eta 0:01:48 lr 0.001006 wd 0.0500 time 0.2547 (0.2613) data time 0.0011 (0.0031) model time 0.2537 (0.2592) loss 6.7211 (5.9791) grad_norm 2.1244 (2.0861) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][220/625] eta 0:01:45 lr 0.001006 wd 0.0500 time 0.2535 (0.2610) data time 0.0007 (0.0030) model time 0.2528 (0.2589) loss 5.5308 (5.9692) grad_norm 2.2872 (2.0713) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][230/625] eta 0:01:43 lr 0.001006 wd 0.0500 time 0.2550 (0.2608) data time 0.0009 (0.0029) model time 0.2541 (0.2587) loss 6.8372 (5.9732) grad_norm 1.5815 (2.0671) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][240/625] eta 0:01:40 lr 0.001006 wd 0.0500 time 0.2588 (0.2606) data time 0.0005 (0.0028) model time 0.2583 (0.2585) loss 4.3329 (5.9539) grad_norm 1.9983 (2.0630) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][250/625] eta 0:01:37 lr 0.001006 wd 0.0500 time 0.2569 (0.2605) data time 0.0009 (0.0028) model time 0.2560 (0.2584) loss 6.7001 (5.9585) grad_norm 2.8089 (2.0705) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][260/625] eta 0:01:34 lr 0.001005 wd 0.0500 time 0.2542 (0.2603) data time 0.0009 (0.0027) model time 0.2532 (0.2582) loss 5.3672 (5.9539) grad_norm 1.7968 (2.0555) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][270/625] eta 0:01:32 lr 0.001005 wd 0.0500 time 0.2548 (0.2601) data time 0.0006 (0.0026) model time 0.2542 (0.2580) loss 5.0394 (5.9591) grad_norm 2.3413 (2.0565) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][280/625] eta 0:01:29 lr 0.001005 wd 0.0500 time 0.2543 (0.2599) data time 0.0007 (0.0026) model time 0.2536 (0.2579) loss 5.7421 (5.9621) grad_norm 2.2377 (2.0510) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][290/625] eta 0:01:27 lr 0.001005 wd 0.0500 time 0.2567 (0.2598) data time 0.0009 (0.0025) model time 0.2559 (0.2577) loss 7.0525 (5.9538) grad_norm 3.2384 (2.0648) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][300/625] eta 0:01:24 lr 0.001005 wd 0.0500 time 0.2570 (0.2597) data time 0.0011 (0.0025) model time 0.2559 (0.2576) loss 6.2632 (5.9667) grad_norm 2.0019 (2.0637) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][310/625] eta 0:01:21 lr 0.001004 wd 0.0500 time 0.2572 (0.2595) data time 0.0010 (0.0024) model time 0.2562 (0.2575) loss 6.5330 (5.9742) grad_norm 1.4601 (2.0600) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][320/625] eta 0:01:19 lr 0.001004 wd 0.0500 time 0.2596 (0.2594) data time 0.0009 (0.0024) model time 0.2588 (0.2574) loss 6.2335 (5.9705) grad_norm 2.0133 (2.0553) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][330/625] eta 0:01:16 lr 0.001004 wd 0.0500 time 0.2569 (0.2593) data time 0.0009 (0.0023) model time 0.2560 (0.2573) loss 5.4393 (5.9709) grad_norm 2.3840 (2.0674) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][340/625] eta 0:01:13 lr 0.001004 wd 0.0500 time 0.2542 (0.2592) data time 0.0007 (0.0023) model time 0.2534 (0.2573) loss 5.1423 (5.9594) grad_norm 2.1576 (2.0571) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][350/625] eta 0:01:11 lr 0.001004 wd 0.0500 time 0.2601 (0.2591) data time 0.0009 (0.0022) model time 0.2592 (0.2572) loss 6.2931 (5.9618) grad_norm 3.1898 (2.0599) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][360/625] eta 0:01:08 lr 0.001004 wd 0.0500 time 0.2581 (0.2590) data time 0.0008 (0.0022) model time 0.2573 (0.2571) loss 5.7736 (5.9671) grad_norm 1.5726 (2.0562) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][370/625] eta 0:01:06 lr 0.001003 wd 0.0500 time 0.2561 (0.2590) data time 0.0010 (0.0022) model time 0.2550 (0.2571) loss 5.9178 (5.9642) grad_norm 1.9056 (2.0877) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][380/625] eta 0:01:03 lr 0.001003 wd 0.0500 time 0.2552 (0.2589) data time 0.0009 (0.0021) model time 0.2543 (0.2570) loss 5.2792 (5.9643) grad_norm 2.0310 (2.0890) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][390/625] eta 0:01:00 lr 0.001003 wd 0.0500 time 0.2601 (0.2589) data time 0.0008 (0.0021) model time 0.2593 (0.2570) loss 5.2679 (5.9656) grad_norm 1.6005 (2.1003) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][400/625] eta 0:00:58 lr 0.001003 wd 0.0500 time 0.2590 (0.2588) data time 0.0007 (0.0021) model time 0.2583 (0.2570) loss 5.8437 (5.9659) grad_norm 1.6707 (2.0915) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][410/625] eta 0:00:55 lr 0.001003 wd 0.0500 time 0.4684 (0.2593) data time 0.0012 (0.0021) model time 0.4672 (0.2575) loss 5.0422 (5.9655) grad_norm 1.5480 (2.0906) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][420/625] eta 0:00:53 lr 0.001003 wd 0.0500 time 0.2538 (0.2601) data time 0.0008 (0.0020) model time 0.2531 (0.2585) loss 7.3336 (5.9640) grad_norm 2.4188 (2.0864) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][430/625] eta 0:00:50 lr 0.001002 wd 0.0500 time 0.2561 (0.2600) data time 0.0007 (0.0020) model time 0.2554 (0.2584) loss 4.8197 (5.9629) grad_norm 1.7829 (2.0825) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][440/625] eta 0:00:48 lr 0.001002 wd 0.0500 time 0.2590 (0.2599) data time 0.0009 (0.0020) model time 0.2581 (0.2584) loss 6.1780 (5.9662) grad_norm 3.3201 (2.0868) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][450/625] eta 0:00:45 lr 0.001002 wd 0.0500 time 0.2545 (0.2599) data time 0.0008 (0.0020) model time 0.2537 (0.2583) loss 6.0117 (5.9637) grad_norm 1.3246 (2.0835) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][460/625] eta 0:00:42 lr 0.001002 wd 0.0500 time 0.2626 (0.2598) data time 0.0009 (0.0019) model time 0.2617 (0.2582) loss 4.6413 (5.9687) grad_norm 1.5812 (2.0754) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][470/625] eta 0:00:40 lr 0.001002 wd 0.0500 time 0.2519 (0.2597) data time 0.0007 (0.0019) model time 0.2512 (0.2582) loss 5.1041 (5.9633) grad_norm 2.3888 (2.0941) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][480/625] eta 0:00:37 lr 0.001001 wd 0.0500 time 0.2582 (0.2597) data time 0.0007 (0.0019) model time 0.2575 (0.2581) loss 5.8486 (5.9568) grad_norm 1.5635 (2.0926) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][490/625] eta 0:00:35 lr 0.001001 wd 0.0500 time 0.2643 (0.2596) data time 0.0008 (0.0019) model time 0.2636 (0.2580) loss 5.0317 (5.9635) grad_norm 3.5944 (2.1000) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][500/625] eta 0:00:32 lr 0.001001 wd 0.0500 time 0.2545 (0.2595) data time 0.0008 (0.0019) model time 0.2538 (0.2580) loss 5.6910 (5.9629) grad_norm 4.0096 (inf) loss_scale 1024.0000 (2039.8244) mem 9655MB [2024-08-04 05:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][510/625] eta 0:00:29 lr 0.001001 wd 0.0500 time 0.2554 (0.2595) data time 0.0008 (0.0019) model time 0.2546 (0.2579) loss 4.7643 (5.9576) grad_norm 1.9343 (inf) loss_scale 1024.0000 (2019.9452) mem 9655MB [2024-08-04 05:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][520/625] eta 0:00:27 lr 0.001001 wd 0.0500 time 0.2596 (0.2594) data time 0.0007 (0.0018) model time 0.2589 (0.2579) loss 6.4568 (5.9616) grad_norm 1.1119 (inf) loss_scale 1024.0000 (2000.8292) mem 9655MB [2024-08-04 05:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][530/625] eta 0:00:24 lr 0.001001 wd 0.0500 time 0.2589 (0.2598) data time 0.0009 (0.0018) model time 0.2579 (0.2583) loss 6.2946 (5.9606) grad_norm 1.2222 (inf) loss_scale 1024.0000 (1982.4331) mem 9655MB [2024-08-04 05:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][540/625] eta 0:00:22 lr 0.001000 wd 0.0500 time 0.2550 (0.2604) data time 0.0010 (0.0018) model time 0.2540 (0.2590) loss 6.5424 (5.9701) grad_norm 3.1307 (inf) loss_scale 1024.0000 (1964.7172) mem 9655MB [2024-08-04 05:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][550/625] eta 0:00:19 lr 0.001000 wd 0.0500 time 0.2535 (0.2603) data time 0.0008 (0.0018) model time 0.2526 (0.2589) loss 5.4506 (5.9660) grad_norm 3.4089 (inf) loss_scale 1024.0000 (1947.6443) mem 9655MB [2024-08-04 05:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][560/625] eta 0:00:16 lr 0.001000 wd 0.0500 time 0.2561 (0.2603) data time 0.0007 (0.0018) model time 0.2554 (0.2589) loss 5.7859 (5.9663) grad_norm 1.6976 (inf) loss_scale 1024.0000 (1931.1800) mem 9655MB [2024-08-04 05:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][570/625] eta 0:00:14 lr 0.001000 wd 0.0500 time 0.2587 (0.2602) data time 0.0008 (0.0018) model time 0.2579 (0.2588) loss 6.7472 (5.9653) grad_norm 2.0242 (inf) loss_scale 1024.0000 (1915.2925) mem 9655MB [2024-08-04 05:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][580/625] eta 0:00:11 lr 0.001000 wd 0.0500 time 0.2546 (0.2602) data time 0.0007 (0.0017) model time 0.2539 (0.2587) loss 5.3325 (5.9669) grad_norm 1.6689 (inf) loss_scale 1024.0000 (1899.9518) mem 9655MB [2024-08-04 05:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][590/625] eta 0:00:09 lr 0.001000 wd 0.0500 time 0.2575 (0.2601) data time 0.0007 (0.0017) model time 0.2567 (0.2587) loss 5.9035 (5.9655) grad_norm 3.1507 (inf) loss_scale 1024.0000 (1885.1303) mem 9655MB [2024-08-04 05:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][600/625] eta 0:00:06 lr 0.000999 wd 0.0500 time 0.2564 (0.2600) data time 0.0008 (0.0017) model time 0.2556 (0.2586) loss 4.8495 (5.9641) grad_norm 1.4274 (inf) loss_scale 1024.0000 (1870.8020) mem 9655MB [2024-08-04 05:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][610/625] eta 0:00:03 lr 0.000999 wd 0.0500 time 0.2516 (0.2600) data time 0.0004 (0.0017) model time 0.2512 (0.2585) loss 5.5477 (5.9615) grad_norm 2.0242 (inf) loss_scale 1024.0000 (1856.9427) mem 9655MB [2024-08-04 05:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][620/625] eta 0:00:01 lr 0.000999 wd 0.0500 time 0.2545 (0.2599) data time 0.0004 (0.0017) model time 0.2541 (0.2585) loss 6.3309 (5.9588) grad_norm 1.3731 (inf) loss_scale 1024.0000 (1843.5298) mem 9655MB [2024-08-04 05:07:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 160 training takes 0:02:42 [2024-08-04 05:07:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:07:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.430 (0.430) Loss 0.6802 (0.6802) Acc@1 88.281 (88.281) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 05:07:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.090) Loss 1.0176 (0.8043) Acc@1 78.223 (84.411) Acc@5 95.264 (97.128) Mem 9655MB [2024-08-04 05:07:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.1602 (0.9413) Acc@1 75.098 (80.957) Acc@5 93.359 (95.624) Mem 9655MB [2024-08-04 05:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.660 Acc@5 95.593 [2024-08-04 05:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-04 05:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.66% [2024-08-04 05:07:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:07:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:07:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.527 (0.527) Loss 0.5918 (0.5918) Acc@1 89.258 (89.258) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:07:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9380 (0.7297) Acc@1 79.297 (85.383) Acc@5 95.410 (97.528) Mem 9655MB [2024-08-04 05:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0713 (0.8593) Acc@1 75.342 (81.882) Acc@5 94.092 (96.103) Mem 9655MB [2024-08-04 05:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.608 Acc@5 96.085 [2024-08-04 05:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 05:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.61% [2024-08-04 05:07:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:07:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][0/625] eta 0:07:33 lr 0.000999 wd 0.0500 time 0.7249 (0.7249) data time 0.4676 (0.4676) model time 0.0000 (0.0000) loss 5.1431 (5.1431) grad_norm 1.7911 (1.7911) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][10/625] eta 0:03:03 lr 0.000999 wd 0.0500 time 0.2588 (0.2976) data time 0.0007 (0.0434) model time 0.0000 (0.0000) loss 6.8643 (5.7858) grad_norm 1.7519 (2.2107) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][20/625] eta 0:02:47 lr 0.000999 wd 0.0500 time 0.2538 (0.2775) data time 0.0009 (0.0232) model time 0.0000 (0.0000) loss 4.7905 (5.8188) grad_norm 1.8947 (2.2641) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][30/625] eta 0:02:40 lr 0.000998 wd 0.0500 time 0.2552 (0.2704) data time 0.0010 (0.0162) model time 0.0000 (0.0000) loss 5.9845 (5.9938) grad_norm 1.3290 (2.0904) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][40/625] eta 0:02:36 lr 0.000998 wd 0.0500 time 0.2554 (0.2671) data time 0.0009 (0.0124) model time 0.0000 (0.0000) loss 5.4159 (5.9924) grad_norm 1.5399 (2.0316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][50/625] eta 0:02:32 lr 0.000998 wd 0.0500 time 0.2562 (0.2653) data time 0.0007 (0.0102) model time 0.0000 (0.0000) loss 4.7652 (5.9269) grad_norm 2.9939 (2.0705) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][60/625] eta 0:02:28 lr 0.000998 wd 0.0500 time 0.2550 (0.2637) data time 0.0011 (0.0087) model time 0.2540 (0.2548) loss 6.5195 (5.9575) grad_norm 2.3075 (2.2269) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][70/625] eta 0:02:25 lr 0.000998 wd 0.0500 time 0.2591 (0.2627) data time 0.0010 (0.0076) model time 0.2581 (0.2552) loss 6.3752 (5.9941) grad_norm 1.3455 (2.1820) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][80/625] eta 0:02:22 lr 0.000997 wd 0.0500 time 0.2553 (0.2618) data time 0.0015 (0.0068) model time 0.2538 (0.2550) loss 5.1085 (5.9909) grad_norm 2.7338 (2.1805) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][90/625] eta 0:02:19 lr 0.000997 wd 0.0500 time 0.2545 (0.2612) data time 0.0007 (0.0062) model time 0.2538 (0.2549) loss 5.4808 (6.0183) grad_norm 2.8879 (2.2285) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][100/625] eta 0:02:16 lr 0.000997 wd 0.0500 time 0.2529 (0.2606) data time 0.0010 (0.0057) model time 0.2520 (0.2549) loss 5.8901 (6.0140) grad_norm 1.1577 (2.2705) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][110/625] eta 0:02:13 lr 0.000997 wd 0.0500 time 0.2567 (0.2602) data time 0.0013 (0.0052) model time 0.2554 (0.2548) loss 6.5445 (6.0036) grad_norm 2.2341 (2.2715) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][120/625] eta 0:02:11 lr 0.000997 wd 0.0500 time 0.2570 (0.2598) data time 0.0008 (0.0049) model time 0.2562 (0.2548) loss 6.0806 (5.9770) grad_norm 2.2443 (2.2467) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][130/625] eta 0:02:08 lr 0.000997 wd 0.0500 time 0.2582 (0.2594) data time 0.0009 (0.0046) model time 0.2572 (0.2547) loss 6.5328 (5.9711) grad_norm 2.1985 (2.2223) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][140/625] eta 0:02:05 lr 0.000996 wd 0.0500 time 0.2541 (0.2591) data time 0.0011 (0.0043) model time 0.2530 (0.2546) loss 4.2179 (5.9569) grad_norm 1.3600 (2.1900) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][150/625] eta 0:02:02 lr 0.000996 wd 0.0500 time 0.2541 (0.2588) data time 0.0010 (0.0041) model time 0.2531 (0.2545) loss 6.0755 (5.9410) grad_norm 2.3715 (2.1774) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][160/625] eta 0:02:00 lr 0.000996 wd 0.0500 time 0.2601 (0.2586) data time 0.0008 (0.0039) model time 0.2593 (0.2545) loss 6.6726 (5.9396) grad_norm 1.8925 (2.1938) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][170/625] eta 0:01:57 lr 0.000996 wd 0.0500 time 0.2572 (0.2593) data time 0.0007 (0.0037) model time 0.2565 (0.2557) loss 6.4758 (5.9421) grad_norm 3.3085 (2.1878) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][180/625] eta 0:01:56 lr 0.000996 wd 0.0500 time 0.2576 (0.2607) data time 0.0006 (0.0036) model time 0.2570 (0.2580) loss 6.8353 (5.9559) grad_norm 3.0063 (2.1900) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][190/625] eta 0:01:53 lr 0.000996 wd 0.0500 time 0.2550 (0.2605) data time 0.0009 (0.0035) model time 0.2541 (0.2577) loss 5.2500 (5.9537) grad_norm 2.7168 (2.2012) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][200/625] eta 0:01:50 lr 0.000995 wd 0.0500 time 0.2572 (0.2603) data time 0.0006 (0.0033) model time 0.2566 (0.2576) loss 6.7016 (5.9673) grad_norm 1.5612 (2.1811) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][210/625] eta 0:01:47 lr 0.000995 wd 0.0500 time 0.2563 (0.2601) data time 0.0011 (0.0032) model time 0.2551 (0.2575) loss 5.8256 (5.9550) grad_norm 2.0044 (2.1569) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][220/625] eta 0:01:45 lr 0.000995 wd 0.0500 time 0.2571 (0.2599) data time 0.0009 (0.0031) model time 0.2561 (0.2573) loss 5.8705 (5.9557) grad_norm 1.3552 (2.1512) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][230/625] eta 0:01:42 lr 0.000995 wd 0.0500 time 0.2557 (0.2597) data time 0.0011 (0.0030) model time 0.2547 (0.2572) loss 5.5183 (5.9488) grad_norm 1.5942 (2.1368) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][240/625] eta 0:01:39 lr 0.000995 wd 0.0500 time 0.2581 (0.2596) data time 0.0006 (0.0029) model time 0.2575 (0.2571) loss 5.4612 (5.9707) grad_norm 1.6947 (2.1253) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][250/625] eta 0:01:37 lr 0.000994 wd 0.0500 time 0.2551 (0.2595) data time 0.0005 (0.0028) model time 0.2545 (0.2571) loss 6.6767 (5.9704) grad_norm 3.0144 (2.1324) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][260/625] eta 0:01:34 lr 0.000994 wd 0.0500 time 0.2571 (0.2594) data time 0.0007 (0.0028) model time 0.2564 (0.2570) loss 7.3007 (5.9955) grad_norm 1.3139 (2.1529) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][270/625] eta 0:01:32 lr 0.000994 wd 0.0500 time 0.2522 (0.2592) data time 0.0011 (0.0027) model time 0.2511 (0.2568) loss 5.9632 (5.9928) grad_norm 2.0708 (2.1435) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][280/625] eta 0:01:29 lr 0.000994 wd 0.0500 time 0.2558 (0.2591) data time 0.0008 (0.0026) model time 0.2549 (0.2568) loss 6.7499 (5.9985) grad_norm 1.4059 (2.1244) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][290/625] eta 0:01:26 lr 0.000994 wd 0.0500 time 0.2592 (0.2590) data time 0.0005 (0.0026) model time 0.2587 (0.2567) loss 5.0814 (6.0000) grad_norm 2.5351 (2.1293) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][300/625] eta 0:01:24 lr 0.000994 wd 0.0500 time 0.2562 (0.2589) data time 0.0008 (0.0025) model time 0.2554 (0.2566) loss 5.5494 (5.9944) grad_norm 3.4955 (2.1448) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][310/625] eta 0:01:21 lr 0.000993 wd 0.0500 time 0.2583 (0.2589) data time 0.0008 (0.0025) model time 0.2574 (0.2567) loss 5.9180 (6.0022) grad_norm 1.5614 (2.1424) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][320/625] eta 0:01:18 lr 0.000993 wd 0.0500 time 0.2586 (0.2588) data time 0.0008 (0.0024) model time 0.2578 (0.2566) loss 4.8993 (5.9936) grad_norm 2.5808 (2.1377) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][330/625] eta 0:01:16 lr 0.000993 wd 0.0500 time 0.2574 (0.2587) data time 0.0010 (0.0024) model time 0.2564 (0.2566) loss 6.1491 (5.9924) grad_norm 1.3225 (2.1353) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][340/625] eta 0:01:13 lr 0.000993 wd 0.0500 time 0.2697 (0.2587) data time 0.0008 (0.0023) model time 0.2689 (0.2565) loss 7.0771 (5.9949) grad_norm 2.4403 (2.1385) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][350/625] eta 0:01:11 lr 0.000993 wd 0.0500 time 0.2552 (0.2586) data time 0.0008 (0.0023) model time 0.2544 (0.2565) loss 5.7011 (5.9944) grad_norm 3.3444 (2.1279) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][360/625] eta 0:01:08 lr 0.000992 wd 0.0500 time 0.2620 (0.2585) data time 0.0009 (0.0023) model time 0.2611 (0.2564) loss 5.5078 (5.9980) grad_norm 1.3381 (2.1177) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][370/625] eta 0:01:05 lr 0.000992 wd 0.0500 time 0.2546 (0.2584) data time 0.0009 (0.0022) model time 0.2537 (0.2564) loss 5.6460 (6.0010) grad_norm 1.9466 (2.1227) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][380/625] eta 0:01:03 lr 0.000992 wd 0.0500 time 0.2557 (0.2584) data time 0.0010 (0.0022) model time 0.2547 (0.2563) loss 5.9794 (6.0047) grad_norm 2.1653 (2.1278) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][390/625] eta 0:01:00 lr 0.000992 wd 0.0500 time 0.2576 (0.2588) data time 0.0010 (0.0022) model time 0.2566 (0.2569) loss 6.5557 (6.0111) grad_norm 2.3704 (2.1364) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][400/625] eta 0:00:58 lr 0.000992 wd 0.0500 time 0.2598 (0.2588) data time 0.0006 (0.0021) model time 0.2592 (0.2569) loss 6.0690 (6.0271) grad_norm 2.2998 (2.1399) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][410/625] eta 0:00:55 lr 0.000992 wd 0.0500 time 0.2519 (0.2588) data time 0.0010 (0.0021) model time 0.2509 (0.2569) loss 5.3245 (6.0265) grad_norm 1.2112 (2.1306) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][420/625] eta 0:00:53 lr 0.000991 wd 0.0500 time 0.2487 (0.2587) data time 0.0007 (0.0021) model time 0.2480 (0.2569) loss 6.3808 (6.0352) grad_norm 1.8873 (2.1236) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][430/625] eta 0:00:50 lr 0.000991 wd 0.0500 time 0.2563 (0.2587) data time 0.0008 (0.0021) model time 0.2555 (0.2568) loss 5.9268 (6.0361) grad_norm 2.2060 (2.1316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][440/625] eta 0:00:47 lr 0.000991 wd 0.0500 time 0.2584 (0.2586) data time 0.0011 (0.0020) model time 0.2572 (0.2568) loss 5.1975 (6.0336) grad_norm 1.6531 (2.1309) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][450/625] eta 0:00:45 lr 0.000991 wd 0.0500 time 0.2509 (0.2586) data time 0.0008 (0.0020) model time 0.2502 (0.2568) loss 4.5145 (6.0322) grad_norm 3.8101 (2.1347) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][460/625] eta 0:00:42 lr 0.000991 wd 0.0500 time 0.2514 (0.2586) data time 0.0008 (0.0020) model time 0.2506 (0.2568) loss 5.3422 (6.0369) grad_norm 3.0372 (2.1472) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][470/625] eta 0:00:40 lr 0.000991 wd 0.0500 time 0.2526 (0.2585) data time 0.0007 (0.0020) model time 0.2519 (0.2567) loss 6.9899 (6.0342) grad_norm 1.9808 (2.1483) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][480/625] eta 0:00:37 lr 0.000990 wd 0.0500 time 0.2579 (0.2585) data time 0.0009 (0.0019) model time 0.2570 (0.2568) loss 5.8643 (6.0302) grad_norm 1.2902 (2.1405) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][490/625] eta 0:00:34 lr 0.000990 wd 0.0500 time 0.2531 (0.2585) data time 0.0007 (0.0019) model time 0.2524 (0.2567) loss 5.4725 (6.0320) grad_norm 1.4309 (2.1338) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][500/625] eta 0:00:32 lr 0.000990 wd 0.0500 time 0.2554 (0.2585) data time 0.0011 (0.0019) model time 0.2543 (0.2567) loss 4.6575 (6.0289) grad_norm 2.5110 (2.1288) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][510/625] eta 0:00:29 lr 0.000990 wd 0.0500 time 0.2575 (0.2584) data time 0.0007 (0.0019) model time 0.2568 (0.2567) loss 6.5520 (6.0187) grad_norm 2.9310 (2.1275) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][520/625] eta 0:00:27 lr 0.000990 wd 0.0500 time 0.2535 (0.2584) data time 0.0010 (0.0019) model time 0.2525 (0.2566) loss 6.5818 (6.0163) grad_norm 2.0338 (2.1320) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][530/625] eta 0:00:24 lr 0.000989 wd 0.0500 time 0.2498 (0.2583) data time 0.0008 (0.0019) model time 0.2490 (0.2566) loss 5.2737 (6.0135) grad_norm 1.9665 (2.1282) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][540/625] eta 0:00:21 lr 0.000989 wd 0.0500 time 0.2522 (0.2583) data time 0.0007 (0.0019) model time 0.2515 (0.2565) loss 6.5920 (6.0177) grad_norm 1.6353 (2.1213) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][550/625] eta 0:00:19 lr 0.000989 wd 0.0500 time 0.2521 (0.2582) data time 0.0008 (0.0018) model time 0.2513 (0.2565) loss 6.7777 (6.0242) grad_norm 1.6640 (2.1161) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][560/625] eta 0:00:16 lr 0.000989 wd 0.0500 time 0.2540 (0.2582) data time 0.0008 (0.0018) model time 0.2531 (0.2565) loss 5.1298 (6.0204) grad_norm 1.5102 (2.1130) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][570/625] eta 0:00:14 lr 0.000989 wd 0.0500 time 0.2577 (0.2584) data time 0.0008 (0.0018) model time 0.2569 (0.2568) loss 6.8086 (6.0234) grad_norm 1.7494 (2.1056) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][580/625] eta 0:00:11 lr 0.000989 wd 0.0500 time 0.2575 (0.2586) data time 0.0006 (0.0018) model time 0.2568 (0.2570) loss 6.5826 (6.0236) grad_norm 1.7316 (2.1027) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][590/625] eta 0:00:09 lr 0.000988 wd 0.0500 time 0.2569 (0.2586) data time 0.0010 (0.0018) model time 0.2560 (0.2570) loss 6.4189 (6.0195) grad_norm 1.7161 (2.1054) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][600/625] eta 0:00:06 lr 0.000988 wd 0.0500 time 0.2574 (0.2585) data time 0.0012 (0.0018) model time 0.2562 (0.2569) loss 5.1311 (6.0149) grad_norm 1.5509 (2.1091) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][610/625] eta 0:00:03 lr 0.000988 wd 0.0500 time 0.2529 (0.2585) data time 0.0005 (0.0018) model time 0.2524 (0.2569) loss 4.7115 (6.0123) grad_norm 1.2499 (2.1043) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][620/625] eta 0:00:01 lr 0.000988 wd 0.0500 time 0.2542 (0.2584) data time 0.0003 (0.0017) model time 0.2538 (0.2568) loss 6.9039 (6.0128) grad_norm 1.8292 (2.0962) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 161 training takes 0:02:41 [2024-08-04 05:10:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:10:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.419 (0.419) Loss 0.6631 (0.6631) Acc@1 88.379 (88.379) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 05:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 1.0176 (0.8076) Acc@1 77.686 (84.313) Acc@5 94.922 (97.150) Mem 9655MB [2024-08-04 05:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.1396 (0.9352) Acc@1 74.072 (80.820) Acc@5 94.141 (95.587) Mem 9655MB [2024-08-04 05:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.536 Acc@5 95.563 [2024-08-04 05:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-04 05:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5913 (0.5913) Acc@1 89.209 (89.209) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:10:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9380 (0.7295) Acc@1 79.346 (85.374) Acc@5 95.312 (97.501) Mem 9655MB [2024-08-04 05:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0693 (0.8590) Acc@1 75.195 (81.889) Acc@5 94.043 (96.091) Mem 9655MB [2024-08-04 05:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.608 Acc@5 96.077 [2024-08-04 05:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 05:10:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][0/625] eta 0:12:00 lr 0.000988 wd 0.0500 time 1.1524 (1.1524) data time 0.5467 (0.5467) model time 0.0000 (0.0000) loss 6.2059 (6.2059) grad_norm 1.6162 (1.6162) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][10/625] eta 0:03:27 lr 0.000988 wd 0.0500 time 0.2556 (0.3371) data time 0.0008 (0.0505) model time 0.0000 (0.0000) loss 5.5287 (6.0005) grad_norm 2.5246 (2.1445) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][20/625] eta 0:03:00 lr 0.000987 wd 0.0500 time 0.2538 (0.2988) data time 0.0010 (0.0270) model time 0.0000 (0.0000) loss 6.5635 (5.9722) grad_norm 1.8016 (2.1239) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][30/625] eta 0:02:49 lr 0.000987 wd 0.0500 time 0.2589 (0.2849) data time 0.0006 (0.0186) model time 0.0000 (0.0000) loss 6.0502 (6.0103) grad_norm 1.8747 (2.1348) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][40/625] eta 0:02:42 lr 0.000987 wd 0.0500 time 0.2500 (0.2778) data time 0.0007 (0.0143) model time 0.0000 (0.0000) loss 6.1504 (6.0569) grad_norm 1.5273 (2.0204) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][50/625] eta 0:02:37 lr 0.000987 wd 0.0500 time 0.2559 (0.2733) data time 0.0015 (0.0117) model time 0.0000 (0.0000) loss 5.9692 (6.0319) grad_norm 2.5254 (2.0010) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][60/625] eta 0:02:32 lr 0.000987 wd 0.0500 time 0.2573 (0.2704) data time 0.0007 (0.0099) model time 0.2565 (0.2546) loss 7.4674 (6.0674) grad_norm 1.4876 (2.0096) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][70/625] eta 0:02:28 lr 0.000987 wd 0.0500 time 0.2567 (0.2683) data time 0.0009 (0.0087) model time 0.2558 (0.2543) loss 5.5940 (6.0476) grad_norm 2.7846 (2.0138) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][80/625] eta 0:02:25 lr 0.000986 wd 0.0500 time 0.2581 (0.2667) data time 0.0006 (0.0077) model time 0.2575 (0.2544) loss 6.9757 (6.0041) grad_norm 6.2380 (2.0718) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][90/625] eta 0:02:22 lr 0.000986 wd 0.0500 time 0.2573 (0.2655) data time 0.0008 (0.0070) model time 0.2565 (0.2545) loss 5.2453 (6.0091) grad_norm 1.9587 (2.1141) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][100/625] eta 0:02:19 lr 0.000986 wd 0.0500 time 0.2556 (0.2663) data time 0.0008 (0.0064) model time 0.2548 (0.2583) loss 4.8310 (5.9771) grad_norm 1.6740 (2.1095) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][110/625] eta 0:02:17 lr 0.000986 wd 0.0500 time 0.2555 (0.2671) data time 0.0007 (0.0059) model time 0.2548 (0.2609) loss 7.0166 (6.0017) grad_norm 3.6631 (2.1367) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][120/625] eta 0:02:14 lr 0.000986 wd 0.0500 time 0.2604 (0.2663) data time 0.0008 (0.0055) model time 0.2596 (0.2603) loss 5.1899 (6.0042) grad_norm 3.7039 (2.1696) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][130/625] eta 0:02:11 lr 0.000985 wd 0.0500 time 0.2609 (0.2656) data time 0.0008 (0.0051) model time 0.2602 (0.2597) loss 6.6556 (6.0141) grad_norm 2.8246 (2.1500) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][140/625] eta 0:02:08 lr 0.000985 wd 0.0500 time 0.2536 (0.2649) data time 0.0011 (0.0048) model time 0.2525 (0.2591) loss 5.0824 (6.0100) grad_norm 1.2800 (2.1578) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][150/625] eta 0:02:05 lr 0.000985 wd 0.0500 time 0.2539 (0.2643) data time 0.0008 (0.0046) model time 0.2531 (0.2588) loss 6.2797 (6.0214) grad_norm 1.3543 (2.1143) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][160/625] eta 0:02:02 lr 0.000985 wd 0.0500 time 0.2599 (0.2638) data time 0.0007 (0.0044) model time 0.2592 (0.2584) loss 5.0414 (5.9937) grad_norm 2.2706 (2.0808) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][170/625] eta 0:01:59 lr 0.000985 wd 0.0500 time 0.2531 (0.2633) data time 0.0011 (0.0042) model time 0.2520 (0.2580) loss 6.1822 (5.9953) grad_norm 3.5477 (2.1218) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][180/625] eta 0:01:56 lr 0.000985 wd 0.0500 time 0.2524 (0.2629) data time 0.0008 (0.0040) model time 0.2517 (0.2578) loss 5.2406 (5.9953) grad_norm 3.1316 (2.1809) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][190/625] eta 0:01:54 lr 0.000984 wd 0.0500 time 0.2519 (0.2625) data time 0.0008 (0.0038) model time 0.2510 (0.2576) loss 6.6635 (6.0019) grad_norm 1.5412 (2.1762) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][200/625] eta 0:01:51 lr 0.000984 wd 0.0500 time 0.2560 (0.2622) data time 0.0008 (0.0037) model time 0.2551 (0.2574) loss 6.0604 (5.9946) grad_norm 2.0236 (2.1439) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][210/625] eta 0:01:48 lr 0.000984 wd 0.0500 time 0.2530 (0.2619) data time 0.0008 (0.0036) model time 0.2522 (0.2573) loss 6.9735 (5.9929) grad_norm 2.0133 (2.1285) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][220/625] eta 0:01:45 lr 0.000984 wd 0.0500 time 0.2564 (0.2616) data time 0.0008 (0.0035) model time 0.2557 (0.2571) loss 5.7675 (5.9796) grad_norm 1.8849 (2.1344) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][230/625] eta 0:01:43 lr 0.000984 wd 0.0500 time 0.2549 (0.2613) data time 0.0009 (0.0033) model time 0.2541 (0.2570) loss 6.1159 (5.9961) grad_norm 2.1524 (2.1508) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][240/625] eta 0:01:40 lr 0.000984 wd 0.0500 time 0.2563 (0.2611) data time 0.0008 (0.0032) model time 0.2554 (0.2569) loss 5.5160 (5.9873) grad_norm 2.4960 (2.1306) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][250/625] eta 0:01:37 lr 0.000983 wd 0.0500 time 0.2563 (0.2610) data time 0.0007 (0.0032) model time 0.2556 (0.2568) loss 4.5989 (5.9663) grad_norm 2.0513 (2.1270) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][260/625] eta 0:01:35 lr 0.000983 wd 0.0500 time 0.2563 (0.2608) data time 0.0007 (0.0031) model time 0.2556 (0.2567) loss 6.3334 (5.9657) grad_norm 1.3358 (2.1248) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][270/625] eta 0:01:32 lr 0.000983 wd 0.0500 time 0.2541 (0.2606) data time 0.0008 (0.0030) model time 0.2533 (0.2567) loss 6.7312 (5.9548) grad_norm 1.6927 (2.1251) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][280/625] eta 0:01:29 lr 0.000983 wd 0.0500 time 0.2570 (0.2605) data time 0.0009 (0.0029) model time 0.2561 (0.2566) loss 6.1196 (5.9462) grad_norm 2.3046 (2.1298) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][290/625] eta 0:01:27 lr 0.000983 wd 0.0500 time 0.2574 (0.2603) data time 0.0009 (0.0029) model time 0.2565 (0.2566) loss 5.4308 (5.9441) grad_norm 2.4168 (2.1332) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][300/625] eta 0:01:24 lr 0.000982 wd 0.0500 time 0.2543 (0.2602) data time 0.0010 (0.0028) model time 0.2534 (0.2565) loss 5.9147 (5.9331) grad_norm 1.4649 (2.1282) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][310/625] eta 0:01:21 lr 0.000982 wd 0.0500 time 0.2576 (0.2600) data time 0.0008 (0.0027) model time 0.2568 (0.2564) loss 5.7505 (5.9350) grad_norm 2.3158 (2.1228) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][320/625] eta 0:01:19 lr 0.000982 wd 0.0500 time 0.2590 (0.2599) data time 0.0010 (0.0027) model time 0.2580 (0.2564) loss 6.2119 (5.9379) grad_norm 2.4933 (2.1206) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][330/625] eta 0:01:16 lr 0.000982 wd 0.0500 time 0.2585 (0.2598) data time 0.0010 (0.0026) model time 0.2575 (0.2564) loss 6.6982 (5.9410) grad_norm 1.2806 (2.1106) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][340/625] eta 0:01:14 lr 0.000982 wd 0.0500 time 0.2572 (0.2598) data time 0.0010 (0.0026) model time 0.2562 (0.2564) loss 5.8839 (5.9452) grad_norm 1.7002 (2.1109) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][350/625] eta 0:01:11 lr 0.000982 wd 0.0500 time 0.2588 (0.2597) data time 0.0009 (0.0025) model time 0.2578 (0.2563) loss 6.6559 (5.9552) grad_norm 2.3301 (2.1071) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][360/625] eta 0:01:08 lr 0.000981 wd 0.0500 time 0.2603 (0.2596) data time 0.0008 (0.0025) model time 0.2595 (0.2564) loss 6.4858 (5.9527) grad_norm 2.3282 (2.1139) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][370/625] eta 0:01:06 lr 0.000981 wd 0.0500 time 0.2517 (0.2595) data time 0.0007 (0.0025) model time 0.2510 (0.2564) loss 5.2507 (5.9490) grad_norm 2.1129 (2.1260) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][380/625] eta 0:01:03 lr 0.000981 wd 0.0500 time 0.2505 (0.2595) data time 0.0006 (0.0024) model time 0.2499 (0.2563) loss 6.1386 (5.9405) grad_norm 1.9518 (2.1275) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][390/625] eta 0:01:00 lr 0.000981 wd 0.0500 time 0.2554 (0.2594) data time 0.0008 (0.0024) model time 0.2546 (0.2563) loss 6.8661 (5.9377) grad_norm 2.7518 (2.1149) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][400/625] eta 0:00:58 lr 0.000981 wd 0.0500 time 0.2600 (0.2593) data time 0.0010 (0.0023) model time 0.2590 (0.2563) loss 6.0331 (5.9469) grad_norm 1.4121 (2.1067) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][410/625] eta 0:00:55 lr 0.000981 wd 0.0500 time 0.2531 (0.2593) data time 0.0010 (0.0023) model time 0.2521 (0.2563) loss 6.0686 (5.9463) grad_norm 1.8251 (2.1013) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][420/625] eta 0:00:53 lr 0.000980 wd 0.0500 time 0.2576 (0.2592) data time 0.0008 (0.0023) model time 0.2568 (0.2563) loss 6.0532 (5.9540) grad_norm 1.7727 (2.0933) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][430/625] eta 0:00:50 lr 0.000980 wd 0.0500 time 0.2554 (0.2596) data time 0.0008 (0.0022) model time 0.2546 (0.2568) loss 6.3389 (5.9642) grad_norm 2.4639 (2.0937) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][440/625] eta 0:00:48 lr 0.000980 wd 0.0500 time 0.2583 (0.2601) data time 0.0010 (0.0022) model time 0.2572 (0.2574) loss 6.2858 (5.9652) grad_norm 2.0834 (2.0907) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][450/625] eta 0:00:45 lr 0.000980 wd 0.0500 time 0.2531 (0.2600) data time 0.0008 (0.0022) model time 0.2522 (0.2573) loss 5.8956 (5.9610) grad_norm 1.7193 (2.0855) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][460/625] eta 0:00:42 lr 0.000980 wd 0.0500 time 0.2582 (0.2599) data time 0.0008 (0.0022) model time 0.2574 (0.2573) loss 6.3740 (5.9608) grad_norm 1.2808 (2.0792) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][470/625] eta 0:00:40 lr 0.000979 wd 0.0500 time 0.2561 (0.2598) data time 0.0008 (0.0021) model time 0.2553 (0.2572) loss 6.6114 (5.9577) grad_norm 1.5461 (2.0956) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][480/625] eta 0:00:37 lr 0.000979 wd 0.0500 time 0.2572 (0.2598) data time 0.0016 (0.0021) model time 0.2557 (0.2572) loss 6.0836 (5.9517) grad_norm 1.5057 (2.0913) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][490/625] eta 0:00:35 lr 0.000979 wd 0.0500 time 0.2609 (0.2597) data time 0.0006 (0.0021) model time 0.2603 (0.2572) loss 5.4182 (5.9472) grad_norm 1.4628 (2.0911) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][500/625] eta 0:00:32 lr 0.000979 wd 0.0500 time 0.2597 (0.2597) data time 0.0006 (0.0021) model time 0.2591 (0.2572) loss 6.6556 (5.9510) grad_norm 2.4510 (2.0961) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][510/625] eta 0:00:29 lr 0.000979 wd 0.0500 time 0.2560 (0.2596) data time 0.0006 (0.0020) model time 0.2554 (0.2571) loss 6.6461 (5.9604) grad_norm 1.5900 (2.0919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][520/625] eta 0:00:27 lr 0.000979 wd 0.0500 time 0.2560 (0.2596) data time 0.0008 (0.0020) model time 0.2553 (0.2571) loss 5.2063 (5.9573) grad_norm 1.6593 (2.0944) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][530/625] eta 0:00:24 lr 0.000978 wd 0.0500 time 0.2511 (0.2595) data time 0.0010 (0.0020) model time 0.2501 (0.2571) loss 6.4978 (5.9533) grad_norm 1.7510 (2.0986) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][540/625] eta 0:00:22 lr 0.000978 wd 0.0500 time 0.2588 (0.2594) data time 0.0007 (0.0020) model time 0.2582 (0.2570) loss 4.8202 (5.9554) grad_norm 1.7949 (2.1191) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][550/625] eta 0:00:19 lr 0.000978 wd 0.0500 time 0.2510 (0.2594) data time 0.0008 (0.0020) model time 0.2502 (0.2570) loss 5.8879 (5.9566) grad_norm 2.4298 (2.1207) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][560/625] eta 0:00:16 lr 0.000978 wd 0.0500 time 0.2589 (0.2593) data time 0.0008 (0.0019) model time 0.2581 (0.2570) loss 4.7537 (5.9549) grad_norm 2.0883 (2.1307) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][570/625] eta 0:00:14 lr 0.000978 wd 0.0500 time 0.2537 (0.2593) data time 0.0007 (0.0019) model time 0.2530 (0.2569) loss 5.0649 (5.9529) grad_norm 1.3648 (2.1264) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][580/625] eta 0:00:11 lr 0.000977 wd 0.0500 time 0.2551 (0.2592) data time 0.0009 (0.0019) model time 0.2543 (0.2569) loss 6.7231 (5.9521) grad_norm 4.0446 (2.1239) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][590/625] eta 0:00:09 lr 0.000977 wd 0.0500 time 0.2569 (0.2592) data time 0.0007 (0.0019) model time 0.2563 (0.2569) loss 6.2532 (5.9556) grad_norm 1.5219 (2.1188) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][600/625] eta 0:00:06 lr 0.000977 wd 0.0500 time 0.2562 (0.2594) data time 0.0009 (0.0019) model time 0.2553 (0.2572) loss 7.2673 (5.9557) grad_norm 1.6808 (2.1113) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][610/625] eta 0:00:03 lr 0.000977 wd 0.0500 time 0.2518 (0.2593) data time 0.0004 (0.0019) model time 0.2514 (0.2571) loss 6.9164 (5.9581) grad_norm 1.2020 (2.1062) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][620/625] eta 0:00:01 lr 0.000977 wd 0.0500 time 0.2588 (0.2592) data time 0.0006 (0.0018) model time 0.2583 (0.2570) loss 6.9668 (5.9644) grad_norm 2.1869 (2.1021) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 162 training takes 0:02:42 [2024-08-04 05:13:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:13:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:13:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.6406 (0.6406) Acc@1 88.184 (88.184) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 05:13:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.0527 (0.7864) Acc@1 77.734 (84.548) Acc@5 94.775 (97.155) Mem 9655MB [2024-08-04 05:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1572 (0.9258) Acc@1 73.389 (80.857) Acc@5 93.604 (95.626) Mem 9655MB [2024-08-04 05:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.546 Acc@5 95.593 [2024-08-04 05:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-04 05:13:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.717 (0.717) Loss 0.5918 (0.5918) Acc@1 89.307 (89.307) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.128) Loss 0.9370 (0.7293) Acc@1 79.395 (85.414) Acc@5 95.264 (97.505) Mem 9655MB [2024-08-04 05:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0684 (0.8585) Acc@1 75.342 (81.936) Acc@5 94.141 (96.084) Mem 9655MB [2024-08-04 05:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.642 Acc@5 96.071 [2024-08-04 05:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 05:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.64% [2024-08-04 05:13:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:13:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][0/625] eta 0:06:49 lr 0.000977 wd 0.0500 time 0.6551 (0.6551) data time 0.4171 (0.4171) model time 0.0000 (0.0000) loss 5.7275 (5.7275) grad_norm 1.6429 (1.6429) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][10/625] eta 0:02:59 lr 0.000977 wd 0.0500 time 0.2520 (0.2925) data time 0.0009 (0.0387) model time 0.0000 (0.0000) loss 6.2840 (6.3323) grad_norm 2.2483 (2.2470) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][20/625] eta 0:02:46 lr 0.000976 wd 0.0500 time 0.2555 (0.2747) data time 0.0010 (0.0207) model time 0.0000 (0.0000) loss 6.8928 (6.4281) grad_norm 1.7963 (2.0668) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][30/625] eta 0:02:39 lr 0.000976 wd 0.0500 time 0.2540 (0.2687) data time 0.0010 (0.0143) model time 0.0000 (0.0000) loss 6.6367 (6.2249) grad_norm 3.1679 (2.1092) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][40/625] eta 0:02:35 lr 0.000976 wd 0.0500 time 0.2598 (0.2656) data time 0.0011 (0.0110) model time 0.0000 (0.0000) loss 5.4805 (6.1203) grad_norm 1.6125 (2.1529) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][50/625] eta 0:02:33 lr 0.000976 wd 0.0500 time 0.2553 (0.2666) data time 0.0009 (0.0091) model time 0.0000 (0.0000) loss 4.7223 (5.9876) grad_norm 1.9520 (2.1609) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][60/625] eta 0:02:29 lr 0.000976 wd 0.0500 time 0.2591 (0.2652) data time 0.0008 (0.0077) model time 0.2583 (0.2574) loss 6.1510 (6.0217) grad_norm 2.2184 (2.1395) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][70/625] eta 0:02:28 lr 0.000975 wd 0.0500 time 0.2573 (0.2669) data time 0.0007 (0.0068) model time 0.2566 (0.2667) loss 5.5222 (5.9961) grad_norm 1.9028 (2.1486) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][80/625] eta 0:02:24 lr 0.000975 wd 0.0500 time 0.2658 (0.2658) data time 0.0007 (0.0061) model time 0.2652 (0.2634) loss 6.8393 (6.0616) grad_norm 2.2908 (2.1198) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][90/625] eta 0:02:21 lr 0.000975 wd 0.0500 time 0.2532 (0.2646) data time 0.0009 (0.0055) model time 0.2523 (0.2612) loss 6.7109 (6.0558) grad_norm 3.2631 (2.1749) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][100/625] eta 0:02:18 lr 0.000975 wd 0.0500 time 0.2563 (0.2639) data time 0.0010 (0.0051) model time 0.2553 (0.2601) loss 5.4830 (6.0439) grad_norm 2.4246 (2.2183) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][110/625] eta 0:02:15 lr 0.000975 wd 0.0500 time 0.2589 (0.2631) data time 0.0006 (0.0047) model time 0.2584 (0.2592) loss 6.1569 (6.0376) grad_norm 1.8919 (2.2464) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][120/625] eta 0:02:12 lr 0.000975 wd 0.0500 time 0.2545 (0.2626) data time 0.0007 (0.0044) model time 0.2538 (0.2587) loss 6.9004 (6.0541) grad_norm 2.5084 (2.2412) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][130/625] eta 0:02:09 lr 0.000974 wd 0.0500 time 0.2547 (0.2622) data time 0.0008 (0.0041) model time 0.2539 (0.2585) loss 5.9994 (6.0534) grad_norm 1.4565 (2.2186) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][140/625] eta 0:02:06 lr 0.000974 wd 0.0500 time 0.2572 (0.2618) data time 0.0012 (0.0039) model time 0.2560 (0.2581) loss 7.2270 (6.0788) grad_norm 2.1062 (2.2053) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:13:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][150/625] eta 0:02:04 lr 0.000974 wd 0.0500 time 0.2547 (0.2614) data time 0.0009 (0.0037) model time 0.2538 (0.2578) loss 5.5936 (6.0677) grad_norm 1.9282 (2.1749) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][160/625] eta 0:02:01 lr 0.000974 wd 0.0500 time 0.2542 (0.2610) data time 0.0007 (0.0035) model time 0.2535 (0.2575) loss 4.2980 (6.0585) grad_norm 2.1679 (2.1905) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][170/625] eta 0:01:58 lr 0.000974 wd 0.0500 time 0.2566 (0.2607) data time 0.0007 (0.0034) model time 0.2559 (0.2573) loss 6.6092 (6.0384) grad_norm 2.0577 (2.2296) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][180/625] eta 0:01:55 lr 0.000973 wd 0.0500 time 0.2601 (0.2605) data time 0.0008 (0.0032) model time 0.2594 (0.2571) loss 6.7555 (6.0179) grad_norm 2.3239 (2.2783) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][190/625] eta 0:01:53 lr 0.000973 wd 0.0500 time 0.2595 (0.2602) data time 0.0006 (0.0031) model time 0.2589 (0.2570) loss 5.7457 (6.0040) grad_norm 2.0991 (2.2895) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][200/625] eta 0:01:50 lr 0.000973 wd 0.0500 time 0.2572 (0.2600) data time 0.0006 (0.0030) model time 0.2566 (0.2568) loss 5.8497 (6.0083) grad_norm 2.3264 (2.2910) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][210/625] eta 0:01:47 lr 0.000973 wd 0.0500 time 0.2605 (0.2599) data time 0.0005 (0.0029) model time 0.2600 (0.2568) loss 5.9609 (5.9990) grad_norm 2.4328 (2.2801) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][220/625] eta 0:01:45 lr 0.000973 wd 0.0500 time 0.2546 (0.2597) data time 0.0009 (0.0028) model time 0.2536 (0.2567) loss 5.4698 (5.9947) grad_norm 1.4442 (2.2589) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][230/625] eta 0:01:42 lr 0.000973 wd 0.0500 time 0.2668 (0.2604) data time 0.0010 (0.0027) model time 0.2658 (0.2577) loss 6.3176 (5.9942) grad_norm 1.4225 (2.2508) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][240/625] eta 0:01:40 lr 0.000972 wd 0.0500 time 0.2569 (0.2611) data time 0.0009 (0.0027) model time 0.2561 (0.2587) loss 4.6575 (5.9859) grad_norm 1.4157 (2.2301) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][250/625] eta 0:01:37 lr 0.000972 wd 0.0500 time 0.2624 (0.2609) data time 0.0008 (0.0026) model time 0.2616 (0.2585) loss 5.9305 (5.9886) grad_norm 1.5482 (2.2030) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][260/625] eta 0:01:35 lr 0.000972 wd 0.0500 time 0.2529 (0.2608) data time 0.0009 (0.0025) model time 0.2521 (0.2585) loss 6.4671 (5.9888) grad_norm 1.3961 (2.2004) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][270/625] eta 0:01:32 lr 0.000972 wd 0.0500 time 0.2529 (0.2606) data time 0.0009 (0.0025) model time 0.2519 (0.2583) loss 5.9914 (5.9936) grad_norm 1.3934 (2.1824) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][280/625] eta 0:01:29 lr 0.000972 wd 0.0500 time 0.2548 (0.2605) data time 0.0010 (0.0024) model time 0.2538 (0.2582) loss 5.7415 (5.9924) grad_norm 2.5055 (2.1834) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][290/625] eta 0:01:27 lr 0.000972 wd 0.0500 time 0.2573 (0.2603) data time 0.0006 (0.0024) model time 0.2567 (0.2580) loss 5.8631 (5.9747) grad_norm 2.8198 (2.2055) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][300/625] eta 0:01:24 lr 0.000971 wd 0.0500 time 0.2523 (0.2602) data time 0.0007 (0.0023) model time 0.2516 (0.2579) loss 5.3405 (5.9672) grad_norm 2.1573 (2.2023) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][310/625] eta 0:01:21 lr 0.000971 wd 0.0500 time 0.2558 (0.2600) data time 0.0008 (0.0023) model time 0.2550 (0.2578) loss 6.2359 (5.9630) grad_norm 2.2051 (2.2010) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][320/625] eta 0:01:19 lr 0.000971 wd 0.0500 time 0.2557 (0.2599) data time 0.0007 (0.0022) model time 0.2550 (0.2577) loss 6.7105 (5.9740) grad_norm 2.0033 (2.2004) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][330/625] eta 0:01:16 lr 0.000971 wd 0.0500 time 0.2506 (0.2598) data time 0.0009 (0.0022) model time 0.2497 (0.2576) loss 5.2575 (5.9803) grad_norm 1.9614 (2.1873) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][340/625] eta 0:01:14 lr 0.000971 wd 0.0500 time 0.2591 (0.2597) data time 0.0006 (0.0022) model time 0.2585 (0.2575) loss 5.7034 (5.9919) grad_norm 1.2904 (2.1828) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][350/625] eta 0:01:11 lr 0.000970 wd 0.0500 time 0.2537 (0.2595) data time 0.0010 (0.0021) model time 0.2527 (0.2574) loss 6.9521 (5.9984) grad_norm 1.9097 (2.1711) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][360/625] eta 0:01:08 lr 0.000970 wd 0.0500 time 0.2587 (0.2594) data time 0.0010 (0.0021) model time 0.2577 (0.2573) loss 6.3210 (5.9960) grad_norm 3.1055 (2.1687) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][370/625] eta 0:01:06 lr 0.000970 wd 0.0500 time 0.2532 (0.2593) data time 0.0009 (0.0021) model time 0.2523 (0.2572) loss 6.6247 (6.0042) grad_norm 2.0559 (2.1670) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][380/625] eta 0:01:03 lr 0.000970 wd 0.0500 time 0.2582 (0.2592) data time 0.0009 (0.0020) model time 0.2572 (0.2572) loss 4.7319 (6.0018) grad_norm 1.4062 (2.1562) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][390/625] eta 0:01:00 lr 0.000970 wd 0.0500 time 0.2593 (0.2592) data time 0.0006 (0.0020) model time 0.2587 (0.2571) loss 5.9225 (6.0069) grad_norm 2.0000 (2.1423) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][400/625] eta 0:00:58 lr 0.000970 wd 0.0500 time 0.2565 (0.2591) data time 0.0009 (0.0020) model time 0.2556 (0.2570) loss 5.2889 (6.0112) grad_norm 1.3319 (2.1355) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][410/625] eta 0:00:55 lr 0.000969 wd 0.0500 time 0.2507 (0.2590) data time 0.0010 (0.0020) model time 0.2497 (0.2570) loss 5.0917 (6.0077) grad_norm 2.4066 (2.1369) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][420/625] eta 0:00:53 lr 0.000969 wd 0.0500 time 0.2557 (0.2590) data time 0.0008 (0.0019) model time 0.2550 (0.2570) loss 6.2046 (5.9994) grad_norm 3.2308 (2.1456) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][430/625] eta 0:00:50 lr 0.000969 wd 0.0500 time 0.2588 (0.2589) data time 0.0007 (0.0019) model time 0.2580 (0.2570) loss 6.4345 (6.0017) grad_norm 2.8573 (2.1386) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][440/625] eta 0:00:47 lr 0.000969 wd 0.0500 time 0.2552 (0.2589) data time 0.0009 (0.0019) model time 0.2543 (0.2569) loss 5.5718 (6.0029) grad_norm 2.6221 (2.1386) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][450/625] eta 0:00:45 lr 0.000969 wd 0.0500 time 0.2566 (0.2588) data time 0.0006 (0.0019) model time 0.2560 (0.2569) loss 4.4221 (6.0034) grad_norm 2.1301 (2.1346) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][460/625] eta 0:00:42 lr 0.000969 wd 0.0500 time 0.2545 (0.2587) data time 0.0009 (0.0018) model time 0.2536 (0.2568) loss 5.8267 (6.0035) grad_norm 2.7344 (2.1405) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][470/625] eta 0:00:40 lr 0.000968 wd 0.0500 time 0.2549 (0.2587) data time 0.0007 (0.0018) model time 0.2542 (0.2568) loss 5.6658 (6.0038) grad_norm 1.3665 (2.1406) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][480/625] eta 0:00:37 lr 0.000968 wd 0.0500 time 0.2553 (0.2587) data time 0.0006 (0.0018) model time 0.2547 (0.2568) loss 6.2546 (6.0093) grad_norm 2.1510 (2.1361) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][490/625] eta 0:00:34 lr 0.000968 wd 0.0500 time 0.2492 (0.2586) data time 0.0009 (0.0018) model time 0.2483 (0.2568) loss 6.2748 (6.0112) grad_norm 2.1165 (2.1290) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][500/625] eta 0:00:32 lr 0.000968 wd 0.0500 time 0.2565 (0.2586) data time 0.0009 (0.0018) model time 0.2557 (0.2567) loss 5.7660 (6.0068) grad_norm 2.1054 (2.1271) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][510/625] eta 0:00:29 lr 0.000968 wd 0.0500 time 0.2546 (0.2585) data time 0.0007 (0.0017) model time 0.2539 (0.2567) loss 5.4467 (6.0033) grad_norm 1.6550 (2.1224) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][520/625] eta 0:00:27 lr 0.000967 wd 0.0500 time 0.2546 (0.2585) data time 0.0009 (0.0017) model time 0.2537 (0.2567) loss 5.4822 (6.0013) grad_norm 2.6294 (2.1149) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][530/625] eta 0:00:24 lr 0.000967 wd 0.0500 time 0.2581 (0.2585) data time 0.0009 (0.0017) model time 0.2571 (0.2567) loss 4.8388 (6.0001) grad_norm 2.0076 (2.1165) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][540/625] eta 0:00:21 lr 0.000967 wd 0.0500 time 0.2545 (0.2584) data time 0.0006 (0.0017) model time 0.2539 (0.2566) loss 5.8468 (6.0036) grad_norm 2.1204 (2.1131) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][550/625] eta 0:00:19 lr 0.000967 wd 0.0500 time 0.2567 (0.2584) data time 0.0008 (0.0017) model time 0.2560 (0.2566) loss 4.8785 (6.0041) grad_norm 2.4025 (2.1145) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][560/625] eta 0:00:16 lr 0.000967 wd 0.0500 time 0.2542 (0.2583) data time 0.0010 (0.0017) model time 0.2532 (0.2566) loss 6.3756 (5.9968) grad_norm 2.7270 (2.1269) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][570/625] eta 0:00:14 lr 0.000967 wd 0.0500 time 0.2551 (0.2583) data time 0.0007 (0.0017) model time 0.2544 (0.2566) loss 6.7224 (5.9951) grad_norm 2.1253 (2.1479) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][580/625] eta 0:00:11 lr 0.000966 wd 0.0500 time 0.2599 (0.2583) data time 0.0007 (0.0016) model time 0.2592 (0.2565) loss 6.7814 (5.9924) grad_norm 1.7996 (2.1474) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][590/625] eta 0:00:09 lr 0.000966 wd 0.0500 time 0.2621 (0.2583) data time 0.0008 (0.0016) model time 0.2614 (0.2565) loss 4.9330 (5.9924) grad_norm 1.5957 (2.1409) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][600/625] eta 0:00:06 lr 0.000966 wd 0.0500 time 0.2546 (0.2582) data time 0.0008 (0.0016) model time 0.2537 (0.2565) loss 6.3724 (5.9926) grad_norm 1.7457 (2.1360) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][610/625] eta 0:00:03 lr 0.000966 wd 0.0500 time 0.2527 (0.2582) data time 0.0004 (0.0016) model time 0.2523 (0.2565) loss 5.2656 (5.9960) grad_norm 4.8267 (2.1444) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][620/625] eta 0:00:01 lr 0.000966 wd 0.0500 time 0.2529 (0.2581) data time 0.0005 (0.0016) model time 0.2523 (0.2564) loss 6.5801 (5.9959) grad_norm 2.7660 (2.1558) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:15:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 163 training takes 0:02:41 [2024-08-04 05:15:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:16:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.417 (0.417) Loss 0.6514 (0.6514) Acc@1 88.232 (88.232) Acc@5 98.145 (98.145) Mem 9655MB [2024-08-04 05:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 1.0029 (0.7871) Acc@1 79.053 (84.615) Acc@5 95.361 (97.208) Mem 9655MB [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.1250 (0.9249) Acc@1 75.439 (81.115) Acc@5 93.652 (95.719) Mem 9655MB [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.726 Acc@5 95.693 [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.73% [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:16:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.578 (0.578) Loss 0.5923 (0.5923) Acc@1 89.355 (89.355) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.9365 (0.7292) Acc@1 79.541 (85.431) Acc@5 95.312 (97.510) Mem 9655MB [2024-08-04 05:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0674 (0.8581) Acc@1 75.488 (81.961) Acc@5 94.238 (96.094) Mem 9655MB [2024-08-04 05:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.682 Acc@5 96.081 [2024-08-04 05:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 05:16:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.68% [2024-08-04 05:16:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:16:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][0/625] eta 0:06:50 lr 0.000966 wd 0.0500 time 0.6568 (0.6568) data time 0.4076 (0.4076) model time 0.0000 (0.0000) loss 5.0115 (5.0115) grad_norm 2.0312 (2.0312) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][10/625] eta 0:02:59 lr 0.000965 wd 0.0500 time 0.2555 (0.2926) data time 0.0008 (0.0379) model time 0.0000 (0.0000) loss 6.3257 (6.1523) grad_norm 3.8220 (2.2994) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][20/625] eta 0:02:46 lr 0.000965 wd 0.0500 time 0.2572 (0.2755) data time 0.0006 (0.0203) model time 0.0000 (0.0000) loss 6.9113 (6.1257) grad_norm 1.3375 (2.1072) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][30/625] eta 0:02:40 lr 0.000965 wd 0.0500 time 0.2592 (0.2691) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 6.6418 (6.1129) grad_norm 2.1050 (2.0897) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][40/625] eta 0:02:35 lr 0.000965 wd 0.0500 time 0.2608 (0.2658) data time 0.0009 (0.0108) model time 0.0000 (0.0000) loss 7.0298 (6.0743) grad_norm 1.7252 (2.0595) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][50/625] eta 0:02:31 lr 0.000965 wd 0.0500 time 0.2580 (0.2639) data time 0.0006 (0.0089) model time 0.0000 (0.0000) loss 6.7490 (6.0830) grad_norm 3.4330 (2.2350) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][60/625] eta 0:02:29 lr 0.000965 wd 0.0500 time 0.2587 (0.2651) data time 0.0007 (0.0076) model time 0.2580 (0.2704) loss 6.8379 (6.0269) grad_norm 2.2417 (2.2404) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][70/625] eta 0:02:26 lr 0.000964 wd 0.0500 time 0.2529 (0.2636) data time 0.0008 (0.0067) model time 0.2521 (0.2620) loss 5.3992 (5.9986) grad_norm 2.3075 (inf) loss_scale 1024.0000 (1918.1972) mem 9655MB [2024-08-04 05:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][80/625] eta 0:02:23 lr 0.000964 wd 0.0500 time 0.2518 (0.2626) data time 0.0009 (0.0060) model time 0.2509 (0.2594) loss 5.7174 (5.9986) grad_norm 3.6966 (inf) loss_scale 1024.0000 (1807.8025) mem 9655MB [2024-08-04 05:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][90/625] eta 0:02:20 lr 0.000964 wd 0.0500 time 0.2563 (0.2620) data time 0.0009 (0.0054) model time 0.2554 (0.2588) loss 4.9063 (5.9873) grad_norm 1.5947 (inf) loss_scale 1024.0000 (1721.6703) mem 9655MB [2024-08-04 05:16:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][100/625] eta 0:02:17 lr 0.000964 wd 0.0500 time 0.2630 (0.2615) data time 0.0005 (0.0050) model time 0.2624 (0.2581) loss 5.1783 (5.9662) grad_norm 1.5293 (inf) loss_scale 1024.0000 (1652.5941) mem 9655MB [2024-08-04 05:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][110/625] eta 0:02:14 lr 0.000964 wd 0.0500 time 0.2536 (0.2609) data time 0.0011 (0.0046) model time 0.2525 (0.2575) loss 5.0306 (5.9630) grad_norm 1.1612 (inf) loss_scale 1024.0000 (1595.9640) mem 9655MB [2024-08-04 05:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][120/625] eta 0:02:13 lr 0.000963 wd 0.0500 time 0.2566 (0.2640) data time 0.0009 (0.0043) model time 0.2557 (0.2632) loss 4.8207 (5.9683) grad_norm 2.3649 (inf) loss_scale 1024.0000 (1548.6942) mem 9655MB [2024-08-04 05:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][130/625] eta 0:02:10 lr 0.000963 wd 0.0500 time 0.2515 (0.2634) data time 0.0011 (0.0040) model time 0.2504 (0.2621) loss 5.2095 (5.9697) grad_norm 2.4553 (inf) loss_scale 1024.0000 (1508.6412) mem 9655MB [2024-08-04 05:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][140/625] eta 0:02:07 lr 0.000963 wd 0.0500 time 0.2584 (0.2629) data time 0.0009 (0.0038) model time 0.2575 (0.2614) loss 5.9760 (5.9508) grad_norm 2.8665 (inf) loss_scale 1024.0000 (1474.2695) mem 9655MB [2024-08-04 05:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][150/625] eta 0:02:04 lr 0.000963 wd 0.0500 time 0.2548 (0.2624) data time 0.0007 (0.0036) model time 0.2541 (0.2607) loss 6.0686 (5.9628) grad_norm 1.8044 (inf) loss_scale 1024.0000 (1444.4503) mem 9655MB [2024-08-04 05:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][160/625] eta 0:02:01 lr 0.000963 wd 0.0500 time 0.2564 (0.2620) data time 0.0006 (0.0035) model time 0.2557 (0.2601) loss 4.7046 (5.9615) grad_norm 3.9757 (inf) loss_scale 1024.0000 (1418.3354) mem 9655MB [2024-08-04 05:16:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][170/625] eta 0:01:59 lr 0.000963 wd 0.0500 time 0.2583 (0.2617) data time 0.0010 (0.0033) model time 0.2573 (0.2598) loss 6.3735 (5.9910) grad_norm 2.3354 (inf) loss_scale 1024.0000 (1395.2749) mem 9655MB [2024-08-04 05:16:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][180/625] eta 0:01:56 lr 0.000962 wd 0.0500 time 0.2593 (0.2626) data time 0.0008 (0.0032) model time 0.2585 (0.2611) loss 6.0387 (5.9930) grad_norm 1.7670 (inf) loss_scale 1024.0000 (1374.7624) mem 9655MB [2024-08-04 05:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][190/625] eta 0:01:54 lr 0.000962 wd 0.0500 time 0.2540 (0.2621) data time 0.0008 (0.0031) model time 0.2533 (0.2606) loss 4.9421 (5.9924) grad_norm 1.7232 (inf) loss_scale 1024.0000 (1356.3979) mem 9655MB [2024-08-04 05:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][200/625] eta 0:01:51 lr 0.000962 wd 0.0500 time 0.2580 (0.2619) data time 0.0007 (0.0030) model time 0.2572 (0.2603) loss 6.9553 (6.0039) grad_norm 1.7259 (inf) loss_scale 1024.0000 (1339.8607) mem 9655MB [2024-08-04 05:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][210/625] eta 0:01:48 lr 0.000962 wd 0.0500 time 0.2606 (0.2616) data time 0.0008 (0.0029) model time 0.2598 (0.2600) loss 7.0561 (6.0069) grad_norm 3.0973 (inf) loss_scale 1024.0000 (1324.8910) mem 9655MB [2024-08-04 05:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][220/625] eta 0:01:45 lr 0.000962 wd 0.0500 time 0.2622 (0.2614) data time 0.0006 (0.0028) model time 0.2616 (0.2598) loss 6.9046 (6.0067) grad_norm 1.3846 (inf) loss_scale 1024.0000 (1311.2760) mem 9655MB [2024-08-04 05:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][230/625] eta 0:01:43 lr 0.000962 wd 0.0500 time 0.2547 (0.2612) data time 0.0009 (0.0027) model time 0.2538 (0.2595) loss 6.7106 (5.9953) grad_norm 1.7741 (inf) loss_scale 1024.0000 (1298.8398) mem 9655MB [2024-08-04 05:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][240/625] eta 0:01:40 lr 0.000961 wd 0.0500 time 0.2551 (0.2610) data time 0.0008 (0.0026) model time 0.2543 (0.2593) loss 7.3896 (6.0030) grad_norm 1.3803 (inf) loss_scale 1024.0000 (1287.4357) mem 9655MB [2024-08-04 05:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][250/625] eta 0:01:37 lr 0.000961 wd 0.0500 time 0.2530 (0.2607) data time 0.0009 (0.0026) model time 0.2521 (0.2590) loss 7.1109 (5.9995) grad_norm 2.0240 (inf) loss_scale 1024.0000 (1276.9402) mem 9655MB [2024-08-04 05:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][260/625] eta 0:01:35 lr 0.000961 wd 0.0500 time 0.2585 (0.2606) data time 0.0005 (0.0025) model time 0.2579 (0.2588) loss 6.9831 (5.9943) grad_norm 2.3426 (inf) loss_scale 1024.0000 (1267.2490) mem 9655MB [2024-08-04 05:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][270/625] eta 0:01:32 lr 0.000961 wd 0.0500 time 0.2529 (0.2604) data time 0.0007 (0.0024) model time 0.2523 (0.2587) loss 5.6574 (5.9887) grad_norm 1.6036 (inf) loss_scale 1024.0000 (1258.2731) mem 9655MB [2024-08-04 05:17:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][280/625] eta 0:01:29 lr 0.000961 wd 0.0500 time 0.2625 (0.2603) data time 0.0006 (0.0024) model time 0.2619 (0.2586) loss 5.8738 (5.9818) grad_norm 2.0773 (inf) loss_scale 1024.0000 (1249.9359) mem 9655MB [2024-08-04 05:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][290/625] eta 0:01:27 lr 0.000960 wd 0.0500 time 0.2570 (0.2607) data time 0.0009 (0.0023) model time 0.2561 (0.2591) loss 5.4283 (5.9873) grad_norm 1.9962 (inf) loss_scale 1024.0000 (1242.1718) mem 9655MB [2024-08-04 05:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][300/625] eta 0:01:24 lr 0.000960 wd 0.0500 time 0.2562 (0.2605) data time 0.0006 (0.0023) model time 0.2557 (0.2589) loss 6.8806 (5.9866) grad_norm 1.9088 (inf) loss_scale 1024.0000 (1234.9236) mem 9655MB [2024-08-04 05:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][310/625] eta 0:01:22 lr 0.000960 wd 0.0500 time 0.2558 (0.2604) data time 0.0009 (0.0022) model time 0.2549 (0.2588) loss 5.8023 (5.9902) grad_norm 1.6515 (inf) loss_scale 1024.0000 (1228.1415) mem 9655MB [2024-08-04 05:17:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][320/625] eta 0:01:19 lr 0.000960 wd 0.0500 time 0.2533 (0.2603) data time 0.0008 (0.0022) model time 0.2525 (0.2587) loss 5.4720 (6.0038) grad_norm 1.5823 (inf) loss_scale 1024.0000 (1221.7819) mem 9655MB [2024-08-04 05:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][330/625] eta 0:01:16 lr 0.000960 wd 0.0500 time 0.2453 (0.2601) data time 0.0008 (0.0022) model time 0.2446 (0.2585) loss 6.7367 (6.0108) grad_norm 1.8807 (inf) loss_scale 1024.0000 (1215.8066) mem 9655MB [2024-08-04 05:17:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][340/625] eta 0:01:14 lr 0.000960 wd 0.0500 time 0.2582 (0.2600) data time 0.0009 (0.0021) model time 0.2573 (0.2584) loss 5.7331 (6.0114) grad_norm 3.3504 (inf) loss_scale 1024.0000 (1210.1818) mem 9655MB [2024-08-04 05:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][350/625] eta 0:01:11 lr 0.000959 wd 0.0500 time 0.2650 (0.2599) data time 0.0009 (0.0021) model time 0.2642 (0.2583) loss 5.8227 (6.0120) grad_norm 1.4115 (inf) loss_scale 1024.0000 (1204.8775) mem 9655MB [2024-08-04 05:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][360/625] eta 0:01:09 lr 0.000959 wd 0.0500 time 0.2563 (0.2604) data time 0.0010 (0.0021) model time 0.2552 (0.2589) loss 5.8302 (6.0141) grad_norm 3.6015 (inf) loss_scale 1024.0000 (1199.8670) mem 9655MB [2024-08-04 05:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][370/625] eta 0:01:06 lr 0.000959 wd 0.0500 time 0.2542 (0.2603) data time 0.0008 (0.0020) model time 0.2534 (0.2588) loss 5.1534 (6.0195) grad_norm 2.0555 (inf) loss_scale 1024.0000 (1195.1267) mem 9655MB [2024-08-04 05:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][380/625] eta 0:01:03 lr 0.000959 wd 0.0500 time 0.2647 (0.2602) data time 0.0010 (0.0020) model time 0.2637 (0.2587) loss 6.5926 (6.0236) grad_norm 1.7078 (inf) loss_scale 1024.0000 (1190.6352) mem 9655MB [2024-08-04 05:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][390/625] eta 0:01:01 lr 0.000959 wd 0.0500 time 0.2530 (0.2601) data time 0.0009 (0.0020) model time 0.2521 (0.2586) loss 6.8821 (6.0245) grad_norm 1.6741 (inf) loss_scale 1024.0000 (1186.3734) mem 9655MB [2024-08-04 05:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][400/625] eta 0:00:58 lr 0.000958 wd 0.0500 time 0.2570 (0.2600) data time 0.0008 (0.0019) model time 0.2562 (0.2585) loss 5.3821 (6.0195) grad_norm 1.2674 (inf) loss_scale 1024.0000 (1182.3242) mem 9655MB [2024-08-04 05:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][410/625] eta 0:00:55 lr 0.000958 wd 0.0500 time 0.2525 (0.2599) data time 0.0008 (0.0019) model time 0.2517 (0.2584) loss 7.2902 (6.0171) grad_norm 2.4944 (inf) loss_scale 1024.0000 (1178.4720) mem 9655MB [2024-08-04 05:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][420/625] eta 0:00:53 lr 0.000958 wd 0.0500 time 0.2609 (0.2598) data time 0.0006 (0.0019) model time 0.2603 (0.2583) loss 5.5917 (6.0105) grad_norm 1.8060 (inf) loss_scale 1024.0000 (1174.8029) mem 9655MB [2024-08-04 05:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][430/625] eta 0:00:50 lr 0.000958 wd 0.0500 time 0.2565 (0.2597) data time 0.0008 (0.0019) model time 0.2557 (0.2582) loss 5.9612 (6.0081) grad_norm 1.5298 (inf) loss_scale 1024.0000 (1171.3039) mem 9655MB [2024-08-04 05:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][440/625] eta 0:00:48 lr 0.000958 wd 0.0500 time 0.2550 (0.2597) data time 0.0007 (0.0019) model time 0.2542 (0.2582) loss 6.3352 (6.0127) grad_norm 6.8588 (inf) loss_scale 1024.0000 (1167.9637) mem 9655MB [2024-08-04 05:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][450/625] eta 0:00:45 lr 0.000958 wd 0.0500 time 0.2573 (0.2596) data time 0.0007 (0.0019) model time 0.2566 (0.2581) loss 6.3671 (6.0100) grad_norm 4.0490 (inf) loss_scale 1024.0000 (1164.7716) mem 9655MB [2024-08-04 05:18:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][460/625] eta 0:00:42 lr 0.000957 wd 0.0500 time 0.2607 (0.2596) data time 0.0007 (0.0018) model time 0.2600 (0.2581) loss 6.1750 (6.0106) grad_norm 2.6664 (inf) loss_scale 1024.0000 (1161.7180) mem 9655MB [2024-08-04 05:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][470/625] eta 0:00:40 lr 0.000957 wd 0.0500 time 0.2575 (0.2599) data time 0.0008 (0.0018) model time 0.2567 (0.2585) loss 6.0149 (5.9982) grad_norm 2.7333 (inf) loss_scale 1024.0000 (1158.7941) mem 9655MB [2024-08-04 05:18:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][480/625] eta 0:00:37 lr 0.000957 wd 0.0500 time 0.2594 (0.2598) data time 0.0008 (0.0018) model time 0.2587 (0.2584) loss 6.5321 (5.9985) grad_norm 1.2658 (inf) loss_scale 1024.0000 (1155.9917) mem 9655MB [2024-08-04 05:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][490/625] eta 0:00:35 lr 0.000957 wd 0.0500 time 0.2528 (0.2598) data time 0.0008 (0.0018) model time 0.2520 (0.2583) loss 5.4725 (5.9968) grad_norm 2.1560 (inf) loss_scale 1024.0000 (1153.3035) mem 9655MB [2024-08-04 05:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][500/625] eta 0:00:32 lr 0.000957 wd 0.0500 time 0.2599 (0.2597) data time 0.0009 (0.0018) model time 0.2590 (0.2583) loss 4.8858 (5.9960) grad_norm 1.7119 (inf) loss_scale 1024.0000 (1150.7226) mem 9655MB [2024-08-04 05:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][510/625] eta 0:00:29 lr 0.000957 wd 0.0500 time 0.2482 (0.2596) data time 0.0008 (0.0017) model time 0.2474 (0.2582) loss 6.5748 (5.9997) grad_norm 2.3044 (inf) loss_scale 1024.0000 (1148.2427) mem 9655MB [2024-08-04 05:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][520/625] eta 0:00:27 lr 0.000956 wd 0.0500 time 0.2572 (0.2596) data time 0.0009 (0.0017) model time 0.2563 (0.2582) loss 6.9515 (5.9980) grad_norm 2.1039 (inf) loss_scale 1024.0000 (1145.8580) mem 9655MB [2024-08-04 05:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][530/625] eta 0:00:24 lr 0.000956 wd 0.0500 time 0.2539 (0.2595) data time 0.0017 (0.0017) model time 0.2522 (0.2581) loss 6.0698 (6.0036) grad_norm 1.3750 (inf) loss_scale 1024.0000 (1143.5631) mem 9655MB [2024-08-04 05:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][540/625] eta 0:00:22 lr 0.000956 wd 0.0500 time 0.2547 (0.2594) data time 0.0010 (0.0017) model time 0.2538 (0.2580) loss 4.0678 (5.9946) grad_norm 1.5967 (inf) loss_scale 1024.0000 (1141.3530) mem 9655MB [2024-08-04 05:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][550/625] eta 0:00:19 lr 0.000956 wd 0.0500 time 0.2601 (0.2594) data time 0.0007 (0.0017) model time 0.2594 (0.2580) loss 4.8378 (5.9962) grad_norm 1.2229 (inf) loss_scale 1024.0000 (1139.2232) mem 9655MB [2024-08-04 05:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][560/625] eta 0:00:16 lr 0.000956 wd 0.0500 time 0.2580 (0.2594) data time 0.0007 (0.0017) model time 0.2573 (0.2580) loss 5.3030 (5.9932) grad_norm 2.1654 (inf) loss_scale 1024.0000 (1137.1693) mem 9655MB [2024-08-04 05:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][570/625] eta 0:00:14 lr 0.000955 wd 0.0500 time 0.2563 (0.2593) data time 0.0009 (0.0017) model time 0.2554 (0.2579) loss 6.2418 (5.9984) grad_norm 1.6318 (inf) loss_scale 1024.0000 (1135.1874) mem 9655MB [2024-08-04 05:18:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][580/625] eta 0:00:11 lr 0.000955 wd 0.0500 time 0.2561 (0.2592) data time 0.0009 (0.0016) model time 0.2552 (0.2579) loss 6.5271 (6.0024) grad_norm 1.8734 (inf) loss_scale 1024.0000 (1133.2737) mem 9655MB [2024-08-04 05:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][590/625] eta 0:00:09 lr 0.000955 wd 0.0500 time 0.2554 (0.2592) data time 0.0009 (0.0016) model time 0.2545 (0.2578) loss 4.8236 (5.9957) grad_norm 1.7774 (inf) loss_scale 1024.0000 (1131.4247) mem 9655MB [2024-08-04 05:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][600/625] eta 0:00:06 lr 0.000955 wd 0.0500 time 0.2541 (0.2592) data time 0.0007 (0.0016) model time 0.2534 (0.2578) loss 4.9270 (5.9939) grad_norm 3.6385 (inf) loss_scale 1024.0000 (1129.6373) mem 9655MB [2024-08-04 05:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][610/625] eta 0:00:03 lr 0.000955 wd 0.0500 time 0.2526 (0.2591) data time 0.0004 (0.0016) model time 0.2522 (0.2577) loss 4.7319 (5.9940) grad_norm 1.7916 (inf) loss_scale 1024.0000 (1127.9083) mem 9655MB [2024-08-04 05:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][620/625] eta 0:00:01 lr 0.000955 wd 0.0500 time 0.2529 (0.2590) data time 0.0004 (0.0016) model time 0.2525 (0.2576) loss 6.3016 (5.9942) grad_norm 3.5335 (inf) loss_scale 1024.0000 (1126.2351) mem 9655MB [2024-08-04 05:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 164 training takes 0:02:41 [2024-08-04 05:18:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:18:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.6460 (0.6460) Acc@1 88.037 (88.037) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 05:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0391 (0.7897) Acc@1 77.588 (84.428) Acc@5 94.971 (97.150) Mem 9655MB [2024-08-04 05:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1602 (0.9222) Acc@1 73.047 (80.820) Acc@5 93.213 (95.575) Mem 9655MB [2024-08-04 05:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.434 Acc@5 95.563 [2024-08-04 05:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-04 05:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.727 (0.727) Loss 0.5923 (0.5923) Acc@1 89.502 (89.502) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.124) Loss 0.9360 (0.7288) Acc@1 79.590 (85.454) Acc@5 95.264 (97.505) Mem 9655MB [2024-08-04 05:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0664 (0.8576) Acc@1 75.391 (81.980) Acc@5 94.287 (96.098) Mem 9655MB [2024-08-04 05:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.696 Acc@5 96.085 [2024-08-04 05:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 05:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.70% [2024-08-04 05:18:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:18:52 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][0/625] eta 0:07:27 lr 0.000954 wd 0.0500 time 0.7164 (0.7164) data time 0.4762 (0.4762) model time 0.0000 (0.0000) loss 6.6950 (6.6950) grad_norm 2.1777 (2.1777) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][10/625] eta 0:03:02 lr 0.000954 wd 0.0500 time 0.2490 (0.2963) data time 0.0009 (0.0442) model time 0.0000 (0.0000) loss 5.0940 (5.6973) grad_norm 2.0377 (2.0821) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][20/625] eta 0:02:47 lr 0.000954 wd 0.0500 time 0.2618 (0.2769) data time 0.0005 (0.0235) model time 0.0000 (0.0000) loss 4.9459 (5.6381) grad_norm 2.0293 (2.1623) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][30/625] eta 0:02:40 lr 0.000954 wd 0.0500 time 0.2500 (0.2698) data time 0.0011 (0.0163) model time 0.0000 (0.0000) loss 6.9100 (5.8218) grad_norm 3.4553 (2.2982) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][40/625] eta 0:02:35 lr 0.000954 wd 0.0500 time 0.2682 (0.2667) data time 0.0010 (0.0125) model time 0.0000 (0.0000) loss 6.1519 (5.8430) grad_norm 3.8879 (2.3180) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][50/625] eta 0:02:32 lr 0.000954 wd 0.0500 time 0.2573 (0.2647) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 6.3303 (5.9046) grad_norm 1.5169 (2.2921) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][60/625] eta 0:02:28 lr 0.000953 wd 0.0500 time 0.2581 (0.2632) data time 0.0010 (0.0087) model time 0.2571 (0.2542) loss 5.5208 (5.9282) grad_norm 3.3619 (2.2550) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][70/625] eta 0:02:25 lr 0.000953 wd 0.0500 time 0.2498 (0.2620) data time 0.0007 (0.0076) model time 0.2491 (0.2542) loss 6.1697 (5.9013) grad_norm 3.4276 (2.2096) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][80/625] eta 0:02:22 lr 0.000953 wd 0.0500 time 0.2551 (0.2613) data time 0.0008 (0.0068) model time 0.2543 (0.2546) loss 5.9372 (5.9273) grad_norm 1.8840 (2.1974) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][90/625] eta 0:02:19 lr 0.000953 wd 0.0500 time 0.2558 (0.2608) data time 0.0008 (0.0062) model time 0.2550 (0.2549) loss 5.1005 (5.9806) grad_norm 3.2980 (2.2443) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][100/625] eta 0:02:16 lr 0.000953 wd 0.0500 time 0.2524 (0.2603) data time 0.0011 (0.0056) model time 0.2513 (0.2549) loss 5.8289 (5.9744) grad_norm 1.8323 (2.3121) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][110/625] eta 0:02:13 lr 0.000953 wd 0.0500 time 0.2543 (0.2600) data time 0.0010 (0.0052) model time 0.2533 (0.2550) loss 5.9142 (5.9591) grad_norm 1.9358 (2.3207) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][120/625] eta 0:02:11 lr 0.000952 wd 0.0500 time 0.2542 (0.2596) data time 0.0011 (0.0048) model time 0.2532 (0.2549) loss 6.9411 (5.9722) grad_norm 2.5073 (2.3454) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][130/625] eta 0:02:08 lr 0.000952 wd 0.0500 time 0.2546 (0.2592) data time 0.0010 (0.0045) model time 0.2537 (0.2548) loss 6.1684 (5.9867) grad_norm 2.7675 (2.3871) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][140/625] eta 0:02:05 lr 0.000952 wd 0.0500 time 0.2545 (0.2589) data time 0.0010 (0.0043) model time 0.2535 (0.2547) loss 6.9795 (6.0054) grad_norm 1.8381 (2.3676) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][150/625] eta 0:02:02 lr 0.000952 wd 0.0500 time 0.2543 (0.2587) data time 0.0008 (0.0041) model time 0.2535 (0.2547) loss 6.8960 (5.9992) grad_norm 1.4491 (2.3344) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][160/625] eta 0:02:00 lr 0.000952 wd 0.0500 time 0.2619 (0.2585) data time 0.0014 (0.0039) model time 0.2605 (0.2548) loss 5.0852 (5.9865) grad_norm 2.1148 (2.3006) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][170/625] eta 0:01:57 lr 0.000951 wd 0.0500 time 0.2544 (0.2583) data time 0.0007 (0.0037) model time 0.2537 (0.2547) loss 5.3706 (5.9892) grad_norm 2.4719 (2.2940) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][180/625] eta 0:01:54 lr 0.000951 wd 0.0500 time 0.2521 (0.2582) data time 0.0008 (0.0035) model time 0.2513 (0.2547) loss 6.4551 (5.9868) grad_norm 2.3481 (2.2964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][190/625] eta 0:01:52 lr 0.000951 wd 0.0500 time 0.2598 (0.2581) data time 0.0010 (0.0034) model time 0.2588 (0.2548) loss 6.2921 (5.9729) grad_norm 4.5949 (2.2968) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][200/625] eta 0:01:49 lr 0.000951 wd 0.0500 time 0.2556 (0.2581) data time 0.0011 (0.0033) model time 0.2545 (0.2549) loss 6.8147 (5.9888) grad_norm 2.7224 (2.2830) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][210/625] eta 0:01:47 lr 0.000951 wd 0.0500 time 0.2562 (0.2580) data time 0.0006 (0.0032) model time 0.2556 (0.2549) loss 6.8016 (5.9847) grad_norm 2.0119 (2.2791) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][220/625] eta 0:01:44 lr 0.000951 wd 0.0500 time 0.2538 (0.2579) data time 0.0009 (0.0031) model time 0.2529 (0.2549) loss 6.1650 (5.9767) grad_norm 3.2817 (2.2816) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][230/625] eta 0:01:41 lr 0.000950 wd 0.0500 time 0.2563 (0.2578) data time 0.0008 (0.0030) model time 0.2555 (0.2550) loss 7.1240 (6.0019) grad_norm 1.7359 (2.2707) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][240/625] eta 0:01:39 lr 0.000950 wd 0.0500 time 0.2589 (0.2577) data time 0.0008 (0.0029) model time 0.2581 (0.2550) loss 6.6864 (6.0043) grad_norm 1.8830 (2.2966) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][250/625] eta 0:01:36 lr 0.000950 wd 0.0500 time 0.2594 (0.2584) data time 0.0008 (0.0028) model time 0.2586 (0.2558) loss 5.8626 (6.0190) grad_norm 1.3092 (inf) loss_scale 512.0000 (1017.8805) mem 9655MB [2024-08-04 05:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][260/625] eta 0:01:34 lr 0.000950 wd 0.0500 time 0.2591 (0.2591) data time 0.0007 (0.0027) model time 0.2584 (0.2568) loss 6.2393 (6.0271) grad_norm 2.0185 (inf) loss_scale 512.0000 (998.4981) mem 9655MB [2024-08-04 05:20:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][270/625] eta 0:01:32 lr 0.000950 wd 0.0500 time 0.2535 (0.2598) data time 0.0008 (0.0027) model time 0.2527 (0.2577) loss 5.6653 (6.0268) grad_norm 1.3305 (inf) loss_scale 512.0000 (980.5461) mem 9655MB [2024-08-04 05:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][280/625] eta 0:01:29 lr 0.000950 wd 0.0500 time 0.2577 (0.2596) data time 0.0008 (0.0026) model time 0.2569 (0.2576) loss 4.8660 (6.0320) grad_norm 2.0799 (inf) loss_scale 512.0000 (963.8719) mem 9655MB [2024-08-04 05:20:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][290/625] eta 0:01:26 lr 0.000949 wd 0.0500 time 0.2564 (0.2595) data time 0.0008 (0.0025) model time 0.2556 (0.2575) loss 5.9495 (6.0243) grad_norm 1.3188 (inf) loss_scale 512.0000 (948.3436) mem 9655MB [2024-08-04 05:20:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][300/625] eta 0:01:24 lr 0.000949 wd 0.0500 time 0.2507 (0.2594) data time 0.0009 (0.0025) model time 0.2498 (0.2575) loss 5.6218 (6.0193) grad_norm 1.9897 (inf) loss_scale 512.0000 (933.8472) mem 9655MB [2024-08-04 05:20:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][310/625] eta 0:01:21 lr 0.000949 wd 0.0500 time 0.2574 (0.2594) data time 0.0010 (0.0024) model time 0.2564 (0.2574) loss 6.5199 (6.0302) grad_norm 1.3265 (inf) loss_scale 512.0000 (920.2830) mem 9655MB [2024-08-04 05:20:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][320/625] eta 0:01:19 lr 0.000949 wd 0.0500 time 0.2542 (0.2593) data time 0.0009 (0.0024) model time 0.2533 (0.2573) loss 5.4843 (6.0281) grad_norm 2.1371 (inf) loss_scale 512.0000 (907.5639) mem 9655MB [2024-08-04 05:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][330/625] eta 0:01:16 lr 0.000949 wd 0.0500 time 0.2562 (0.2592) data time 0.0009 (0.0023) model time 0.2553 (0.2573) loss 6.3156 (6.0349) grad_norm 1.5195 (inf) loss_scale 512.0000 (895.6133) mem 9655MB [2024-08-04 05:20:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][340/625] eta 0:01:13 lr 0.000948 wd 0.0500 time 0.2567 (0.2591) data time 0.0007 (0.0023) model time 0.2560 (0.2572) loss 5.7515 (6.0364) grad_norm 1.4290 (inf) loss_scale 512.0000 (884.3636) mem 9655MB [2024-08-04 05:20:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][350/625] eta 0:01:11 lr 0.000948 wd 0.0500 time 0.2539 (0.2590) data time 0.0009 (0.0023) model time 0.2530 (0.2571) loss 6.0778 (6.0322) grad_norm 1.9574 (inf) loss_scale 512.0000 (873.7550) mem 9655MB [2024-08-04 05:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][360/625] eta 0:01:08 lr 0.000948 wd 0.0500 time 0.2608 (0.2601) data time 0.0013 (0.0022) model time 0.2595 (0.2584) loss 5.9371 (6.0427) grad_norm 2.5651 (inf) loss_scale 512.0000 (863.7341) mem 9655MB [2024-08-04 05:20:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][370/625] eta 0:01:06 lr 0.000948 wd 0.0500 time 0.2567 (0.2605) data time 0.0010 (0.0022) model time 0.2557 (0.2590) loss 6.3267 (6.0399) grad_norm 2.0988 (inf) loss_scale 512.0000 (854.2534) mem 9655MB [2024-08-04 05:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][380/625] eta 0:01:03 lr 0.000948 wd 0.0500 time 0.2590 (0.2605) data time 0.0007 (0.0022) model time 0.2583 (0.2589) loss 6.6606 (6.0360) grad_norm 1.5166 (inf) loss_scale 512.0000 (845.2703) mem 9655MB [2024-08-04 05:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][390/625] eta 0:01:01 lr 0.000948 wd 0.0500 time 0.2495 (0.2603) data time 0.0011 (0.0021) model time 0.2484 (0.2588) loss 6.0968 (6.0340) grad_norm 2.2466 (inf) loss_scale 512.0000 (836.7468) mem 9655MB [2024-08-04 05:20:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][400/625] eta 0:00:58 lr 0.000947 wd 0.0500 time 0.2529 (0.2602) data time 0.0010 (0.0021) model time 0.2518 (0.2587) loss 6.2672 (6.0230) grad_norm 2.4671 (inf) loss_scale 512.0000 (828.6484) mem 9655MB [2024-08-04 05:20:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][410/625] eta 0:00:55 lr 0.000947 wd 0.0500 time 0.2532 (0.2601) data time 0.0011 (0.0021) model time 0.2521 (0.2586) loss 6.2459 (6.0260) grad_norm 2.3012 (inf) loss_scale 512.0000 (820.9440) mem 9655MB [2024-08-04 05:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][420/625] eta 0:00:53 lr 0.000947 wd 0.0500 time 0.2629 (0.2601) data time 0.0006 (0.0020) model time 0.2623 (0.2585) loss 6.9910 (6.0265) grad_norm 2.8048 (inf) loss_scale 512.0000 (813.6057) mem 9655MB [2024-08-04 05:20:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][430/625] eta 0:00:50 lr 0.000947 wd 0.0500 time 0.2555 (0.2600) data time 0.0009 (0.0020) model time 0.2546 (0.2585) loss 5.5560 (6.0243) grad_norm 2.7693 (inf) loss_scale 512.0000 (806.6079) mem 9655MB [2024-08-04 05:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][440/625] eta 0:00:48 lr 0.000947 wd 0.0500 time 0.2562 (0.2600) data time 0.0008 (0.0020) model time 0.2553 (0.2584) loss 6.1624 (6.0269) grad_norm 4.1126 (inf) loss_scale 512.0000 (799.9274) mem 9655MB [2024-08-04 05:20:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][450/625] eta 0:00:45 lr 0.000947 wd 0.0500 time 0.2579 (0.2599) data time 0.0007 (0.0020) model time 0.2572 (0.2583) loss 4.7864 (6.0281) grad_norm 1.9292 (inf) loss_scale 512.0000 (793.5432) mem 9655MB [2024-08-04 05:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][460/625] eta 0:00:42 lr 0.000946 wd 0.0500 time 0.2529 (0.2598) data time 0.0007 (0.0019) model time 0.2522 (0.2582) loss 6.7747 (6.0242) grad_norm 1.5512 (inf) loss_scale 512.0000 (787.4360) mem 9655MB [2024-08-04 05:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][470/625] eta 0:00:40 lr 0.000946 wd 0.0500 time 0.2620 (0.2597) data time 0.0011 (0.0019) model time 0.2609 (0.2582) loss 6.4982 (6.0242) grad_norm 1.6059 (inf) loss_scale 512.0000 (781.5881) mem 9655MB [2024-08-04 05:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][480/625] eta 0:00:37 lr 0.000946 wd 0.0500 time 0.2570 (0.2596) data time 0.0007 (0.0019) model time 0.2563 (0.2581) loss 6.6564 (6.0271) grad_norm 1.8958 (inf) loss_scale 512.0000 (775.9834) mem 9655MB [2024-08-04 05:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][490/625] eta 0:00:35 lr 0.000946 wd 0.0500 time 0.2590 (0.2596) data time 0.0009 (0.0019) model time 0.2581 (0.2580) loss 6.7034 (6.0267) grad_norm 2.1683 (inf) loss_scale 512.0000 (770.6069) mem 9655MB [2024-08-04 05:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][500/625] eta 0:00:32 lr 0.000946 wd 0.0500 time 0.2537 (0.2600) data time 0.0009 (0.0019) model time 0.2528 (0.2585) loss 6.4433 (6.0366) grad_norm 2.3077 (inf) loss_scale 512.0000 (765.4451) mem 9655MB [2024-08-04 05:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][510/625] eta 0:00:29 lr 0.000945 wd 0.0500 time 0.2585 (0.2603) data time 0.0008 (0.0018) model time 0.2577 (0.2589) loss 5.3661 (6.0330) grad_norm 2.3677 (inf) loss_scale 512.0000 (760.4853) mem 9655MB [2024-08-04 05:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][520/625] eta 0:00:27 lr 0.000945 wd 0.0500 time 0.2555 (0.2602) data time 0.0008 (0.0018) model time 0.2547 (0.2588) loss 5.5545 (6.0292) grad_norm 1.0794 (inf) loss_scale 512.0000 (755.7159) mem 9655MB [2024-08-04 05:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][530/625] eta 0:00:24 lr 0.000945 wd 0.0500 time 0.2627 (0.2601) data time 0.0008 (0.0018) model time 0.2620 (0.2587) loss 7.1636 (6.0269) grad_norm 1.6558 (inf) loss_scale 512.0000 (751.1262) mem 9655MB [2024-08-04 05:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][540/625] eta 0:00:22 lr 0.000945 wd 0.0500 time 0.2552 (0.2601) data time 0.0009 (0.0018) model time 0.2543 (0.2587) loss 5.4809 (6.0295) grad_norm 1.8070 (inf) loss_scale 512.0000 (746.7061) mem 9655MB [2024-08-04 05:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][550/625] eta 0:00:19 lr 0.000945 wd 0.0500 time 0.2506 (0.2600) data time 0.0010 (0.0018) model time 0.2496 (0.2586) loss 5.2130 (6.0243) grad_norm 1.2604 (inf) loss_scale 512.0000 (742.4465) mem 9655MB [2024-08-04 05:21:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][560/625] eta 0:00:16 lr 0.000945 wd 0.0500 time 0.2545 (0.2600) data time 0.0010 (0.0018) model time 0.2535 (0.2586) loss 6.2685 (6.0233) grad_norm 1.8415 (inf) loss_scale 512.0000 (738.3387) mem 9655MB [2024-08-04 05:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][570/625] eta 0:00:14 lr 0.000944 wd 0.0500 time 0.2601 (0.2599) data time 0.0008 (0.0018) model time 0.2593 (0.2585) loss 5.1491 (6.0204) grad_norm 2.2418 (inf) loss_scale 512.0000 (734.3748) mem 9655MB [2024-08-04 05:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][580/625] eta 0:00:11 lr 0.000944 wd 0.0500 time 0.2660 (0.2599) data time 0.0010 (0.0017) model time 0.2650 (0.2585) loss 6.4742 (6.0226) grad_norm 1.8307 (inf) loss_scale 512.0000 (730.5473) mem 9655MB [2024-08-04 05:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][590/625] eta 0:00:09 lr 0.000944 wd 0.0500 time 0.2579 (0.2598) data time 0.0007 (0.0017) model time 0.2573 (0.2584) loss 7.3175 (6.0227) grad_norm 1.5292 (inf) loss_scale 512.0000 (726.8494) mem 9655MB [2024-08-04 05:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][600/625] eta 0:00:06 lr 0.000944 wd 0.0500 time 0.2562 (0.2598) data time 0.0008 (0.0017) model time 0.2554 (0.2584) loss 5.7865 (6.0185) grad_norm 2.0481 (inf) loss_scale 512.0000 (723.2745) mem 9655MB [2024-08-04 05:21:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][610/625] eta 0:00:03 lr 0.000944 wd 0.0500 time 0.2521 (0.2597) data time 0.0004 (0.0017) model time 0.2518 (0.2583) loss 6.8622 (6.0204) grad_norm 1.7153 (inf) loss_scale 512.0000 (719.8167) mem 9655MB [2024-08-04 05:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][620/625] eta 0:00:01 lr 0.000943 wd 0.0500 time 0.2560 (0.2596) data time 0.0003 (0.0017) model time 0.2557 (0.2582) loss 4.7202 (6.0210) grad_norm 2.0166 (inf) loss_scale 512.0000 (716.4702) mem 9655MB [2024-08-04 05:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 165 training takes 0:02:42 [2024-08-04 05:21:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:21:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.560 (0.560) Loss 0.6523 (0.6523) Acc@1 88.184 (88.184) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 05:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.0156 (0.7841) Acc@1 77.295 (84.468) Acc@5 95.166 (97.093) Mem 9655MB [2024-08-04 05:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.1338 (0.9222) Acc@1 75.293 (80.869) Acc@5 93.457 (95.654) Mem 9655MB [2024-08-04 05:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.570 Acc@5 95.623 [2024-08-04 05:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-04 05:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.756 (0.756) Loss 0.5918 (0.5918) Acc@1 89.453 (89.453) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9360 (0.7285) Acc@1 79.639 (85.507) Acc@5 95.361 (97.510) Mem 9655MB [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0664 (0.8571) Acc@1 75.537 (82.031) Acc@5 94.434 (96.108) Mem 9655MB [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.748 Acc@5 96.097 [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.75% [2024-08-04 05:21:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:21:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][0/625] eta 0:08:25 lr 0.000943 wd 0.0500 time 0.8085 (0.8085) data time 0.5640 (0.5640) model time 0.0000 (0.0000) loss 5.3601 (5.3601) grad_norm 2.3199 (2.3199) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][10/625] eta 0:03:08 lr 0.000943 wd 0.0500 time 0.2540 (0.3064) data time 0.0010 (0.0521) model time 0.0000 (0.0000) loss 5.3761 (6.0202) grad_norm 1.9519 (1.9580) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][20/625] eta 0:02:51 lr 0.000943 wd 0.0500 time 0.2653 (0.2827) data time 0.0011 (0.0277) model time 0.0000 (0.0000) loss 4.6904 (5.9414) grad_norm 1.7917 (2.0132) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][30/625] eta 0:02:42 lr 0.000943 wd 0.0500 time 0.2558 (0.2739) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 5.6129 (6.0098) grad_norm 1.6426 (2.3786) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][40/625] eta 0:02:43 lr 0.000943 wd 0.0500 time 0.2536 (0.2787) data time 0.0007 (0.0147) model time 0.0000 (0.0000) loss 6.1694 (6.0406) grad_norm 1.8277 (2.2775) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][50/625] eta 0:02:37 lr 0.000943 wd 0.0500 time 0.2555 (0.2743) data time 0.0006 (0.0120) model time 0.0000 (0.0000) loss 4.7472 (5.9958) grad_norm 1.7461 (2.2553) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][60/625] eta 0:02:33 lr 0.000942 wd 0.0500 time 0.2604 (0.2714) data time 0.0008 (0.0102) model time 0.2596 (0.2555) loss 4.7656 (5.9280) grad_norm 1.4063 (2.2745) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][70/625] eta 0:02:29 lr 0.000942 wd 0.0500 time 0.2554 (0.2693) data time 0.0007 (0.0089) model time 0.2547 (0.2558) loss 6.5856 (5.9371) grad_norm 2.0015 (2.2905) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][80/625] eta 0:02:25 lr 0.000942 wd 0.0500 time 0.2553 (0.2678) data time 0.0008 (0.0079) model time 0.2545 (0.2558) loss 6.3711 (5.9494) grad_norm 1.6762 (2.3018) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][90/625] eta 0:02:22 lr 0.000942 wd 0.0500 time 0.2554 (0.2665) data time 0.0006 (0.0071) model time 0.2548 (0.2557) loss 4.7758 (5.9271) grad_norm 3.1842 (2.3466) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][100/625] eta 0:02:19 lr 0.000942 wd 0.0500 time 0.2550 (0.2655) data time 0.0008 (0.0065) model time 0.2542 (0.2556) loss 5.5393 (5.9405) grad_norm 2.5622 (2.3431) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][110/625] eta 0:02:16 lr 0.000941 wd 0.0500 time 0.2580 (0.2647) data time 0.0016 (0.0060) model time 0.2564 (0.2555) loss 5.4839 (5.9224) grad_norm 1.3318 (2.2878) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][120/625] eta 0:02:13 lr 0.000941 wd 0.0500 time 0.2568 (0.2639) data time 0.0008 (0.0056) model time 0.2560 (0.2554) loss 5.8062 (5.9178) grad_norm 1.6392 (2.2945) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][130/625] eta 0:02:10 lr 0.000941 wd 0.0500 time 0.2567 (0.2633) data time 0.0007 (0.0052) model time 0.2560 (0.2553) loss 5.1731 (5.9313) grad_norm 5.4254 (2.3347) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][140/625] eta 0:02:07 lr 0.000941 wd 0.0500 time 0.2571 (0.2628) data time 0.0006 (0.0049) model time 0.2564 (0.2553) loss 4.8141 (5.9267) grad_norm 2.3431 (2.3636) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][150/625] eta 0:02:04 lr 0.000941 wd 0.0500 time 0.2595 (0.2623) data time 0.0007 (0.0047) model time 0.2588 (0.2552) loss 5.4169 (5.9206) grad_norm 1.8067 (2.3943) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][160/625] eta 0:02:01 lr 0.000941 wd 0.0500 time 0.2561 (0.2619) data time 0.0010 (0.0045) model time 0.2551 (0.2551) loss 7.3201 (5.9136) grad_norm 1.5263 (2.3890) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][170/625] eta 0:01:59 lr 0.000940 wd 0.0500 time 0.2575 (0.2615) data time 0.0008 (0.0043) model time 0.2567 (0.2551) loss 5.6262 (5.9013) grad_norm 2.1936 (2.3708) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][180/625] eta 0:01:56 lr 0.000940 wd 0.0500 time 0.2562 (0.2612) data time 0.0007 (0.0041) model time 0.2555 (0.2551) loss 5.9225 (5.9157) grad_norm 2.1804 (2.3476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][190/625] eta 0:01:53 lr 0.000940 wd 0.0500 time 0.2550 (0.2609) data time 0.0007 (0.0039) model time 0.2543 (0.2550) loss 4.5617 (5.9280) grad_norm 1.3782 (2.3539) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][200/625] eta 0:01:50 lr 0.000940 wd 0.0500 time 0.2583 (0.2607) data time 0.0009 (0.0038) model time 0.2573 (0.2551) loss 5.5292 (5.9467) grad_norm 1.8281 (2.3471) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][210/625] eta 0:01:48 lr 0.000940 wd 0.0500 time 0.2527 (0.2606) data time 0.0010 (0.0036) model time 0.2517 (0.2552) loss 6.1949 (5.9566) grad_norm 1.6666 (2.3479) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][220/625] eta 0:01:45 lr 0.000940 wd 0.0500 time 0.2577 (0.2604) data time 0.0009 (0.0035) model time 0.2568 (0.2552) loss 6.8217 (5.9661) grad_norm 1.2928 (2.3291) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][230/625] eta 0:01:42 lr 0.000939 wd 0.0500 time 0.2569 (0.2602) data time 0.0006 (0.0034) model time 0.2563 (0.2552) loss 6.6040 (5.9819) grad_norm 1.5191 (2.3036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][240/625] eta 0:01:40 lr 0.000939 wd 0.0500 time 0.2624 (0.2600) data time 0.0010 (0.0033) model time 0.2614 (0.2552) loss 5.6878 (5.9756) grad_norm 2.5722 (2.3074) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][250/625] eta 0:01:37 lr 0.000939 wd 0.0500 time 0.2547 (0.2599) data time 0.0008 (0.0032) model time 0.2540 (0.2552) loss 6.4679 (5.9904) grad_norm 1.8009 (2.2906) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][260/625] eta 0:01:34 lr 0.000939 wd 0.0500 time 0.2573 (0.2597) data time 0.0009 (0.0032) model time 0.2565 (0.2552) loss 6.7172 (5.9993) grad_norm 1.9605 (2.2826) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][270/625] eta 0:01:32 lr 0.000939 wd 0.0500 time 0.2520 (0.2596) data time 0.0009 (0.0031) model time 0.2511 (0.2552) loss 5.9051 (5.9970) grad_norm 1.8530 (2.2715) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][280/625] eta 0:01:29 lr 0.000938 wd 0.0500 time 0.2566 (0.2595) data time 0.0009 (0.0030) model time 0.2558 (0.2552) loss 6.2674 (6.0036) grad_norm 1.9061 (2.2573) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][290/625] eta 0:01:26 lr 0.000938 wd 0.0500 time 0.2565 (0.2594) data time 0.0007 (0.0029) model time 0.2559 (0.2552) loss 5.2431 (6.0007) grad_norm 1.9208 (2.2542) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][300/625] eta 0:01:24 lr 0.000938 wd 0.0500 time 0.2611 (0.2593) data time 0.0009 (0.0029) model time 0.2602 (0.2553) loss 5.8005 (5.9974) grad_norm 1.2405 (2.2358) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][310/625] eta 0:01:21 lr 0.000938 wd 0.0500 time 0.2531 (0.2592) data time 0.0010 (0.0028) model time 0.2521 (0.2552) loss 6.3796 (5.9980) grad_norm 2.2777 (2.2178) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][320/625] eta 0:01:19 lr 0.000938 wd 0.0500 time 0.2518 (0.2596) data time 0.0008 (0.0028) model time 0.2510 (0.2558) loss 4.9068 (5.9981) grad_norm 1.5221 (2.2137) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][330/625] eta 0:01:16 lr 0.000938 wd 0.0500 time 0.2517 (0.2599) data time 0.0007 (0.0027) model time 0.2510 (0.2563) loss 6.2399 (5.9868) grad_norm 1.5213 (2.1992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][340/625] eta 0:01:14 lr 0.000937 wd 0.0500 time 0.2568 (0.2598) data time 0.0009 (0.0027) model time 0.2560 (0.2563) loss 4.4704 (5.9773) grad_norm 1.9379 (2.1919) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][350/625] eta 0:01:11 lr 0.000937 wd 0.0500 time 0.2570 (0.2597) data time 0.0007 (0.0026) model time 0.2563 (0.2562) loss 4.9432 (5.9496) grad_norm 1.5796 (2.1785) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][360/625] eta 0:01:08 lr 0.000937 wd 0.0500 time 0.2564 (0.2596) data time 0.0010 (0.0026) model time 0.2554 (0.2562) loss 6.5178 (5.9541) grad_norm 1.7643 (2.1631) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][370/625] eta 0:01:06 lr 0.000937 wd 0.0500 time 0.2566 (0.2596) data time 0.0009 (0.0025) model time 0.2557 (0.2562) loss 6.3537 (5.9563) grad_norm 1.6958 (2.1508) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][380/625] eta 0:01:03 lr 0.000937 wd 0.0500 time 0.2593 (0.2595) data time 0.0009 (0.0025) model time 0.2584 (0.2562) loss 5.9974 (5.9634) grad_norm 1.5853 (2.1456) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][390/625] eta 0:01:00 lr 0.000936 wd 0.0500 time 0.2517 (0.2594) data time 0.0009 (0.0024) model time 0.2508 (0.2562) loss 6.4502 (5.9659) grad_norm 1.8738 (2.1484) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][400/625] eta 0:00:58 lr 0.000936 wd 0.0500 time 0.2659 (0.2593) data time 0.0007 (0.0024) model time 0.2652 (0.2562) loss 5.8894 (5.9699) grad_norm 2.3426 (2.1591) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][410/625] eta 0:00:55 lr 0.000936 wd 0.0500 time 0.3918 (0.2599) data time 0.0007 (0.0024) model time 0.3912 (0.2568) loss 6.9878 (5.9692) grad_norm 2.3264 (2.1634) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][420/625] eta 0:00:53 lr 0.000936 wd 0.0500 time 0.2552 (0.2602) data time 0.0007 (0.0023) model time 0.2544 (0.2573) loss 6.8684 (5.9637) grad_norm 2.0692 (2.1604) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][430/625] eta 0:00:50 lr 0.000936 wd 0.0500 time 0.2575 (0.2602) data time 0.0007 (0.0023) model time 0.2568 (0.2573) loss 6.4201 (5.9686) grad_norm 2.9722 (2.1524) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][440/625] eta 0:00:48 lr 0.000936 wd 0.0500 time 0.2570 (0.2601) data time 0.0012 (0.0023) model time 0.2558 (0.2572) loss 4.9627 (5.9651) grad_norm 1.8459 (2.1575) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][450/625] eta 0:00:45 lr 0.000935 wd 0.0500 time 0.2590 (0.2600) data time 0.0008 (0.0022) model time 0.2581 (0.2572) loss 7.0337 (5.9663) grad_norm 1.2586 (2.1494) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][460/625] eta 0:00:42 lr 0.000935 wd 0.0500 time 0.2588 (0.2599) data time 0.0008 (0.0022) model time 0.2580 (0.2572) loss 4.9152 (5.9669) grad_norm 2.0472 (2.1451) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][470/625] eta 0:00:40 lr 0.000935 wd 0.0500 time 0.2630 (0.2599) data time 0.0007 (0.0022) model time 0.2622 (0.2571) loss 7.3463 (5.9650) grad_norm 3.2501 (2.1476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][480/625] eta 0:00:37 lr 0.000935 wd 0.0500 time 0.2528 (0.2598) data time 0.0009 (0.0022) model time 0.2519 (0.2571) loss 6.7957 (5.9633) grad_norm 1.7251 (2.1555) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][490/625] eta 0:00:35 lr 0.000935 wd 0.0500 time 0.2587 (0.2597) data time 0.0006 (0.0021) model time 0.2580 (0.2570) loss 4.8435 (5.9615) grad_norm 2.5121 (2.1535) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][500/625] eta 0:00:32 lr 0.000935 wd 0.0500 time 0.2556 (0.2596) data time 0.0006 (0.0021) model time 0.2550 (0.2570) loss 5.9637 (5.9528) grad_norm 1.4299 (2.1497) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][510/625] eta 0:00:29 lr 0.000934 wd 0.0500 time 0.2595 (0.2596) data time 0.0007 (0.0021) model time 0.2587 (0.2569) loss 6.4295 (5.9555) grad_norm 1.6387 (2.1496) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][520/625] eta 0:00:27 lr 0.000934 wd 0.0500 time 0.2621 (0.2595) data time 0.0007 (0.0021) model time 0.2614 (0.2569) loss 5.6389 (5.9557) grad_norm 1.6994 (2.1448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][530/625] eta 0:00:24 lr 0.000934 wd 0.0500 time 0.2570 (0.2594) data time 0.0006 (0.0020) model time 0.2564 (0.2569) loss 7.0942 (5.9536) grad_norm 1.1782 (2.1468) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][540/625] eta 0:00:22 lr 0.000934 wd 0.0500 time 0.2507 (0.2594) data time 0.0007 (0.0020) model time 0.2500 (0.2568) loss 6.2351 (5.9460) grad_norm 1.4298 (2.1324) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][550/625] eta 0:00:19 lr 0.000934 wd 0.0500 time 0.2547 (0.2593) data time 0.0009 (0.0020) model time 0.2539 (0.2568) loss 6.6281 (5.9458) grad_norm 4.1536 (2.1482) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][560/625] eta 0:00:16 lr 0.000933 wd 0.0500 time 0.2547 (0.2592) data time 0.0008 (0.0020) model time 0.2538 (0.2567) loss 6.2447 (5.9478) grad_norm 2.0368 (2.1487) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][570/625] eta 0:00:14 lr 0.000933 wd 0.0500 time 0.2556 (0.2591) data time 0.0008 (0.0020) model time 0.2548 (0.2567) loss 4.0821 (5.9455) grad_norm 2.5617 (2.1526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][580/625] eta 0:00:11 lr 0.000933 wd 0.0500 time 0.2575 (0.2591) data time 0.0007 (0.0019) model time 0.2568 (0.2567) loss 5.6968 (5.9403) grad_norm 1.7247 (2.1466) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][590/625] eta 0:00:09 lr 0.000933 wd 0.0500 time 0.2533 (0.2591) data time 0.0009 (0.0019) model time 0.2524 (0.2566) loss 6.0823 (5.9401) grad_norm 2.0110 (2.1468) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][600/625] eta 0:00:06 lr 0.000933 wd 0.0500 time 0.2540 (0.2590) data time 0.0010 (0.0019) model time 0.2530 (0.2566) loss 5.9699 (5.9415) grad_norm 2.7413 (2.1481) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][610/625] eta 0:00:03 lr 0.000933 wd 0.0500 time 0.2530 (0.2590) data time 0.0006 (0.0019) model time 0.2524 (0.2566) loss 6.6905 (5.9429) grad_norm 2.4683 (2.1485) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][620/625] eta 0:00:01 lr 0.000932 wd 0.0500 time 0.2568 (0.2589) data time 0.0005 (0.0019) model time 0.2562 (0.2565) loss 7.3490 (5.9459) grad_norm 1.8324 (2.1590) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 166 training takes 0:02:41 [2024-08-04 05:24:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:24:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.6455 (0.6455) Acc@1 88.232 (88.232) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0635 (0.7947) Acc@1 77.148 (84.464) Acc@5 95.068 (97.226) Mem 9655MB [2024-08-04 05:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1357 (0.9360) Acc@1 75.244 (80.964) Acc@5 93.848 (95.678) Mem 9655MB [2024-08-04 05:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.638 Acc@5 95.673 [2024-08-04 05:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-04 05:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.787 (0.787) Loss 0.5908 (0.5908) Acc@1 89.600 (89.600) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.9346 (0.7278) Acc@1 79.736 (85.574) Acc@5 95.508 (97.505) Mem 9655MB [2024-08-04 05:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.095) Loss 1.0664 (0.8567) Acc@1 75.537 (82.087) Acc@5 94.482 (96.119) Mem 9655MB [2024-08-04 05:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.794 Acc@5 96.105 [2024-08-04 05:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 05:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.79% [2024-08-04 05:24:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:24:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][0/625] eta 0:08:05 lr 0.000932 wd 0.0500 time 0.7767 (0.7767) data time 0.5381 (0.5381) model time 0.0000 (0.0000) loss 6.5193 (6.5193) grad_norm 2.6675 (2.6675) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][10/625] eta 0:03:14 lr 0.000932 wd 0.0500 time 0.2534 (0.3155) data time 0.0009 (0.0497) model time 0.0000 (0.0000) loss 5.9799 (5.5068) grad_norm 2.1601 (2.3571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][20/625] eta 0:02:53 lr 0.000932 wd 0.0500 time 0.2565 (0.2870) data time 0.0009 (0.0265) model time 0.0000 (0.0000) loss 5.0645 (5.5860) grad_norm 2.3370 (2.2640) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][30/625] eta 0:02:44 lr 0.000932 wd 0.0500 time 0.2532 (0.2769) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 6.9092 (5.7167) grad_norm 3.0345 (2.4087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][40/625] eta 0:02:41 lr 0.000932 wd 0.0500 time 0.2570 (0.2769) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 6.2290 (5.7644) grad_norm 1.6115 (2.3464) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][50/625] eta 0:02:39 lr 0.000931 wd 0.0500 time 0.2570 (0.2768) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 5.1433 (5.8215) grad_norm 2.0221 (2.2770) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][60/625] eta 0:02:34 lr 0.000931 wd 0.0500 time 0.2568 (0.2734) data time 0.0007 (0.0097) model time 0.2560 (0.2553) loss 6.6993 (5.8608) grad_norm 5.0118 (2.3357) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][70/625] eta 0:02:30 lr 0.000931 wd 0.0500 time 0.2537 (0.2710) data time 0.0007 (0.0085) model time 0.2530 (0.2553) loss 4.8658 (5.8665) grad_norm 3.7499 (2.3257) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][80/625] eta 0:02:26 lr 0.000931 wd 0.0500 time 0.2559 (0.2693) data time 0.0008 (0.0075) model time 0.2552 (0.2555) loss 5.2501 (5.8965) grad_norm 2.3131 (2.4006) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][90/625] eta 0:02:23 lr 0.000931 wd 0.0500 time 0.2556 (0.2679) data time 0.0008 (0.0068) model time 0.2548 (0.2556) loss 7.1335 (5.9343) grad_norm 2.9970 (2.4321) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][100/625] eta 0:02:20 lr 0.000931 wd 0.0500 time 0.2709 (0.2669) data time 0.0011 (0.0062) model time 0.2698 (0.2558) loss 6.3415 (5.9530) grad_norm 1.5944 (2.3885) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][110/625] eta 0:02:16 lr 0.000930 wd 0.0500 time 0.2535 (0.2657) data time 0.0009 (0.0057) model time 0.2526 (0.2553) loss 6.2645 (5.9808) grad_norm 1.7463 (2.3234) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][120/625] eta 0:02:14 lr 0.000930 wd 0.0500 time 0.2585 (0.2664) data time 0.0008 (0.0053) model time 0.2578 (0.2579) loss 5.5671 (5.9917) grad_norm 1.6436 (2.2522) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][130/625] eta 0:02:11 lr 0.000930 wd 0.0500 time 0.2586 (0.2656) data time 0.0008 (0.0050) model time 0.2578 (0.2575) loss 5.9736 (6.0158) grad_norm 2.3190 (2.2254) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][140/625] eta 0:02:08 lr 0.000930 wd 0.0500 time 0.2532 (0.2650) data time 0.0012 (0.0047) model time 0.2520 (0.2575) loss 6.0028 (6.0018) grad_norm 1.6593 (2.2272) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][150/625] eta 0:02:05 lr 0.000930 wd 0.0500 time 0.2537 (0.2645) data time 0.0011 (0.0045) model time 0.2527 (0.2573) loss 4.7935 (6.0044) grad_norm 1.5271 (2.2136) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][160/625] eta 0:02:02 lr 0.000929 wd 0.0500 time 0.2551 (0.2640) data time 0.0010 (0.0043) model time 0.2542 (0.2572) loss 7.4146 (5.9966) grad_norm 2.9171 (2.2072) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][170/625] eta 0:01:59 lr 0.000929 wd 0.0500 time 0.2610 (0.2635) data time 0.0008 (0.0041) model time 0.2602 (0.2570) loss 6.3107 (6.0156) grad_norm 2.8034 (2.2163) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][180/625] eta 0:01:57 lr 0.000929 wd 0.0500 time 0.2530 (0.2631) data time 0.0008 (0.0039) model time 0.2522 (0.2569) loss 5.4733 (6.0257) grad_norm 2.1490 (2.2418) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][190/625] eta 0:01:54 lr 0.000929 wd 0.0500 time 0.2574 (0.2628) data time 0.0018 (0.0037) model time 0.2556 (0.2567) loss 5.8460 (6.0280) grad_norm 1.1857 (2.2238) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][200/625] eta 0:01:52 lr 0.000929 wd 0.0500 time 0.2580 (0.2644) data time 0.0007 (0.0036) model time 0.2573 (0.2592) loss 4.9007 (6.0246) grad_norm 2.0785 (2.2075) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][210/625] eta 0:01:49 lr 0.000929 wd 0.0500 time 0.2581 (0.2641) data time 0.0006 (0.0035) model time 0.2575 (0.2591) loss 5.9350 (6.0188) grad_norm 2.1696 (2.2001) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][220/625] eta 0:01:46 lr 0.000928 wd 0.0500 time 0.2534 (0.2637) data time 0.0007 (0.0033) model time 0.2527 (0.2588) loss 5.8550 (6.0223) grad_norm 2.0719 (2.2177) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][230/625] eta 0:01:44 lr 0.000928 wd 0.0500 time 0.2567 (0.2634) data time 0.0009 (0.0032) model time 0.2558 (0.2586) loss 6.2991 (6.0410) grad_norm 3.0433 (2.2117) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][240/625] eta 0:01:41 lr 0.000928 wd 0.0500 time 0.2562 (0.2631) data time 0.0008 (0.0031) model time 0.2554 (0.2585) loss 5.9421 (6.0451) grad_norm 1.4024 (2.2157) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][250/625] eta 0:01:38 lr 0.000928 wd 0.0500 time 0.2534 (0.2627) data time 0.0007 (0.0031) model time 0.2528 (0.2582) loss 7.0282 (6.0343) grad_norm 1.5681 (2.2020) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][260/625] eta 0:01:35 lr 0.000928 wd 0.0500 time 0.2553 (0.2625) data time 0.0010 (0.0030) model time 0.2542 (0.2581) loss 6.6907 (6.0236) grad_norm 3.4362 (2.2070) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][270/625] eta 0:01:33 lr 0.000928 wd 0.0500 time 0.2568 (0.2623) data time 0.0007 (0.0029) model time 0.2561 (0.2580) loss 5.3180 (6.0162) grad_norm 1.3441 (2.2238) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][280/625] eta 0:01:30 lr 0.000927 wd 0.0500 time 0.2555 (0.2620) data time 0.0008 (0.0028) model time 0.2547 (0.2578) loss 7.3458 (6.0287) grad_norm 1.5921 (2.2279) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][290/625] eta 0:01:27 lr 0.000927 wd 0.0500 time 0.2568 (0.2618) data time 0.0007 (0.0028) model time 0.2561 (0.2577) loss 5.4368 (6.0291) grad_norm 1.8519 (2.2269) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][300/625] eta 0:01:25 lr 0.000927 wd 0.0500 time 0.2533 (0.2617) data time 0.0007 (0.0027) model time 0.2526 (0.2576) loss 6.6595 (6.0265) grad_norm 1.3693 (2.2113) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][310/625] eta 0:01:22 lr 0.000927 wd 0.0500 time 0.2528 (0.2615) data time 0.0007 (0.0026) model time 0.2521 (0.2575) loss 6.2945 (6.0325) grad_norm 1.6917 (2.1933) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][320/625] eta 0:01:19 lr 0.000927 wd 0.0500 time 0.2586 (0.2613) data time 0.0007 (0.0026) model time 0.2579 (0.2575) loss 5.2640 (6.0166) grad_norm 1.2989 (2.1713) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][330/625] eta 0:01:17 lr 0.000926 wd 0.0500 time 0.2588 (0.2612) data time 0.0010 (0.0025) model time 0.2579 (0.2574) loss 5.5744 (6.0278) grad_norm 2.2037 (2.1780) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][340/625] eta 0:01:14 lr 0.000926 wd 0.0500 time 0.2593 (0.2611) data time 0.0010 (0.0025) model time 0.2583 (0.2574) loss 5.7102 (6.0286) grad_norm 2.8392 (2.1689) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][350/625] eta 0:01:12 lr 0.000926 wd 0.0500 time 0.2535 (0.2620) data time 0.0008 (0.0024) model time 0.2527 (0.2585) loss 7.2215 (6.0252) grad_norm 1.5039 (2.1541) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][360/625] eta 0:01:09 lr 0.000926 wd 0.0500 time 0.2576 (0.2618) data time 0.0010 (0.0024) model time 0.2566 (0.2585) loss 5.8364 (6.0234) grad_norm 1.6106 (2.1589) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][370/625] eta 0:01:06 lr 0.000926 wd 0.0500 time 0.2505 (0.2617) data time 0.0010 (0.0024) model time 0.2495 (0.2583) loss 5.6482 (6.0152) grad_norm 3.6794 (2.1644) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][380/625] eta 0:01:04 lr 0.000926 wd 0.0500 time 0.2428 (0.2625) data time 0.0010 (0.0023) model time 0.2418 (0.2593) loss 6.0728 (6.0053) grad_norm 2.3144 (2.1793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][390/625] eta 0:01:01 lr 0.000925 wd 0.0500 time 0.2538 (0.2623) data time 0.0008 (0.0023) model time 0.2530 (0.2592) loss 5.2163 (6.0074) grad_norm 2.7303 (2.1845) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][400/625] eta 0:00:58 lr 0.000925 wd 0.0500 time 0.2571 (0.2621) data time 0.0009 (0.0023) model time 0.2562 (0.2591) loss 4.7736 (6.0072) grad_norm 1.3671 (2.1727) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][410/625] eta 0:00:56 lr 0.000925 wd 0.0500 time 0.2540 (0.2620) data time 0.0010 (0.0022) model time 0.2530 (0.2589) loss 5.5971 (6.0046) grad_norm 2.6036 (2.1726) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][420/625] eta 0:00:53 lr 0.000925 wd 0.0500 time 0.2580 (0.2618) data time 0.0008 (0.0022) model time 0.2572 (0.2588) loss 4.8523 (6.0120) grad_norm 1.5786 (2.1678) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][430/625] eta 0:00:51 lr 0.000925 wd 0.0500 time 0.2603 (0.2617) data time 0.0008 (0.0022) model time 0.2595 (0.2587) loss 5.1597 (6.0094) grad_norm 1.9366 (2.1590) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][440/625] eta 0:00:48 lr 0.000925 wd 0.0500 time 0.2549 (0.2616) data time 0.0007 (0.0021) model time 0.2542 (0.2586) loss 5.1743 (6.0028) grad_norm 1.2741 (2.1496) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][450/625] eta 0:00:45 lr 0.000924 wd 0.0500 time 0.2550 (0.2614) data time 0.0007 (0.0021) model time 0.2543 (0.2585) loss 7.0812 (5.9948) grad_norm 2.6156 (2.1526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][460/625] eta 0:00:43 lr 0.000924 wd 0.0500 time 0.2586 (0.2614) data time 0.0007 (0.0021) model time 0.2579 (0.2585) loss 4.9291 (5.9930) grad_norm 2.6213 (2.1666) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][470/625] eta 0:00:40 lr 0.000924 wd 0.0500 time 0.2534 (0.2612) data time 0.0008 (0.0021) model time 0.2526 (0.2584) loss 5.2250 (5.9954) grad_norm 1.8620 (2.1694) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][480/625] eta 0:00:37 lr 0.000924 wd 0.0500 time 0.2549 (0.2611) data time 0.0009 (0.0020) model time 0.2539 (0.2583) loss 6.5365 (6.0043) grad_norm 2.0704 (2.1623) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][490/625] eta 0:00:35 lr 0.000924 wd 0.0500 time 0.2569 (0.2610) data time 0.0008 (0.0020) model time 0.2561 (0.2583) loss 6.7208 (5.9985) grad_norm 1.1135 (2.1496) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][500/625] eta 0:00:32 lr 0.000923 wd 0.0500 time 0.2610 (0.2610) data time 0.0007 (0.0020) model time 0.2603 (0.2582) loss 6.3725 (5.9936) grad_norm 2.2358 (2.1443) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][510/625] eta 0:00:30 lr 0.000923 wd 0.0500 time 0.2582 (0.2609) data time 0.0009 (0.0020) model time 0.2573 (0.2582) loss 6.7153 (5.9917) grad_norm 1.4694 (2.1364) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][520/625] eta 0:00:27 lr 0.000923 wd 0.0500 time 0.2564 (0.2612) data time 0.0008 (0.0020) model time 0.2556 (0.2586) loss 5.6258 (5.9967) grad_norm 1.8941 (2.1344) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][530/625] eta 0:00:24 lr 0.000923 wd 0.0500 time 0.2580 (0.2611) data time 0.0007 (0.0019) model time 0.2573 (0.2585) loss 4.6978 (5.9916) grad_norm 3.5388 (2.1359) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][540/625] eta 0:00:22 lr 0.000923 wd 0.0500 time 0.2572 (0.2610) data time 0.0008 (0.0019) model time 0.2564 (0.2584) loss 5.2299 (5.9869) grad_norm 2.0646 (2.1461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][550/625] eta 0:00:19 lr 0.000923 wd 0.0500 time 0.2562 (0.2609) data time 0.0009 (0.0019) model time 0.2553 (0.2583) loss 5.4489 (5.9792) grad_norm 4.1986 (2.1577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][560/625] eta 0:00:16 lr 0.000922 wd 0.0500 time 0.2622 (0.2609) data time 0.0007 (0.0019) model time 0.2616 (0.2583) loss 6.2543 (5.9811) grad_norm 1.8675 (2.1526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][570/625] eta 0:00:14 lr 0.000922 wd 0.0500 time 0.2564 (0.2608) data time 0.0009 (0.0019) model time 0.2556 (0.2582) loss 4.7128 (5.9764) grad_norm 1.6864 (2.1563) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][580/625] eta 0:00:11 lr 0.000922 wd 0.0500 time 0.2540 (0.2614) data time 0.0008 (0.0019) model time 0.2532 (0.2589) loss 6.9038 (5.9787) grad_norm 2.3941 (2.1576) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][590/625] eta 0:00:09 lr 0.000922 wd 0.0500 time 0.2584 (0.2613) data time 0.0007 (0.0019) model time 0.2577 (0.2589) loss 7.0398 (5.9857) grad_norm 2.1178 (2.1560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][600/625] eta 0:00:06 lr 0.000922 wd 0.0500 time 0.2578 (0.2612) data time 0.0009 (0.0018) model time 0.2569 (0.2588) loss 4.9239 (5.9876) grad_norm 1.8605 (2.1534) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][610/625] eta 0:00:03 lr 0.000922 wd 0.0500 time 0.2528 (0.2611) data time 0.0006 (0.0018) model time 0.2522 (0.2588) loss 5.2909 (5.9846) grad_norm 2.2954 (2.1504) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][620/625] eta 0:00:01 lr 0.000921 wd 0.0500 time 0.2544 (0.2610) data time 0.0003 (0.0018) model time 0.2540 (0.2587) loss 5.0661 (5.9847) grad_norm 1.2844 (2.1442) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 167 training takes 0:02:43 [2024-08-04 05:27:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:27:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.6240 (0.6240) Acc@1 89.014 (89.014) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 05:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0254 (0.7864) Acc@1 78.271 (84.646) Acc@5 94.727 (97.181) Mem 9655MB [2024-08-04 05:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1387 (0.9191) Acc@1 74.219 (80.985) Acc@5 93.604 (95.647) Mem 9655MB [2024-08-04 05:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.668 Acc@5 95.655 [2024-08-04 05:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-04 05:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.758 (0.758) Loss 0.5898 (0.5898) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.127) Loss 0.9331 (0.7273) Acc@1 79.736 (85.574) Acc@5 95.557 (97.501) Mem 9655MB [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0654 (0.8561) Acc@1 75.391 (82.096) Acc@5 94.531 (96.140) Mem 9655MB [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.810 Acc@5 96.125 [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.81% [2024-08-04 05:27:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:27:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][0/625] eta 0:08:19 lr 0.000921 wd 0.0500 time 0.8000 (0.8000) data time 0.5622 (0.5622) model time 0.0000 (0.0000) loss 6.8621 (6.8621) grad_norm 1.8987 (1.8987) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][10/625] eta 0:03:07 lr 0.000921 wd 0.0500 time 0.2551 (0.3048) data time 0.0007 (0.0518) model time 0.0000 (0.0000) loss 6.9123 (6.2590) grad_norm 1.6384 (2.2260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][20/625] eta 0:02:50 lr 0.000921 wd 0.0500 time 0.2560 (0.2812) data time 0.0009 (0.0276) model time 0.0000 (0.0000) loss 6.2738 (5.9551) grad_norm 2.2257 (2.1185) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][30/625] eta 0:02:42 lr 0.000921 wd 0.0500 time 0.2582 (0.2730) data time 0.0005 (0.0190) model time 0.0000 (0.0000) loss 6.2817 (5.9501) grad_norm 2.9663 (2.1434) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][40/625] eta 0:02:37 lr 0.000921 wd 0.0500 time 0.2600 (0.2687) data time 0.0010 (0.0146) model time 0.0000 (0.0000) loss 7.3907 (6.0805) grad_norm 1.7279 (2.0700) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][50/625] eta 0:02:33 lr 0.000920 wd 0.0500 time 0.2572 (0.2662) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 5.7471 (6.1369) grad_norm 2.2531 (2.0519) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][60/625] eta 0:02:29 lr 0.000920 wd 0.0500 time 0.2580 (0.2648) data time 0.0005 (0.0101) model time 0.2575 (0.2564) loss 6.5120 (6.1674) grad_norm 1.4696 (2.0513) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][70/625] eta 0:02:26 lr 0.000920 wd 0.0500 time 0.2524 (0.2637) data time 0.0007 (0.0088) model time 0.2517 (0.2562) loss 4.7795 (6.1272) grad_norm 1.4549 (2.0434) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][80/625] eta 0:02:23 lr 0.000920 wd 0.0500 time 0.2551 (0.2628) data time 0.0011 (0.0079) model time 0.2540 (0.2560) loss 5.3728 (6.0585) grad_norm 2.6030 (2.0585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][90/625] eta 0:02:20 lr 0.000920 wd 0.0500 time 0.2574 (0.2621) data time 0.0009 (0.0071) model time 0.2566 (0.2558) loss 6.5530 (6.0608) grad_norm 2.0367 (2.0235) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][100/625] eta 0:02:17 lr 0.000919 wd 0.0500 time 0.2558 (0.2615) data time 0.0006 (0.0065) model time 0.2552 (0.2557) loss 4.7683 (6.0456) grad_norm 1.9840 (1.9887) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][110/625] eta 0:02:16 lr 0.000919 wd 0.0500 time 0.2584 (0.2647) data time 0.0007 (0.0060) model time 0.2577 (0.2625) loss 6.8890 (6.0382) grad_norm 1.2562 (1.9378) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][120/625] eta 0:02:13 lr 0.000919 wd 0.0500 time 0.2577 (0.2640) data time 0.0009 (0.0055) model time 0.2568 (0.2615) loss 6.5986 (6.0678) grad_norm 2.3980 (1.9221) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][130/625] eta 0:02:11 lr 0.000919 wd 0.0500 time 0.2574 (0.2650) data time 0.0007 (0.0052) model time 0.2567 (0.2634) loss 6.4826 (6.0785) grad_norm 1.5691 (1.9279) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][140/625] eta 0:02:08 lr 0.000919 wd 0.0500 time 0.2566 (0.2658) data time 0.0009 (0.0049) model time 0.2557 (0.2646) loss 5.3212 (6.0558) grad_norm 2.7348 (1.9440) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][150/625] eta 0:02:05 lr 0.000919 wd 0.0500 time 0.2578 (0.2652) data time 0.0005 (0.0046) model time 0.2572 (0.2637) loss 4.9776 (6.0394) grad_norm 3.9924 (1.9770) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][160/625] eta 0:02:03 lr 0.000918 wd 0.0500 time 0.2564 (0.2647) data time 0.0010 (0.0044) model time 0.2554 (0.2631) loss 6.4540 (6.0371) grad_norm 1.5799 (1.9805) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][170/625] eta 0:02:00 lr 0.000918 wd 0.0500 time 0.2529 (0.2642) data time 0.0010 (0.0042) model time 0.2520 (0.2624) loss 6.1037 (6.0296) grad_norm 2.5523 (2.0036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][180/625] eta 0:01:57 lr 0.000918 wd 0.0500 time 0.2585 (0.2637) data time 0.0007 (0.0040) model time 0.2578 (0.2618) loss 6.4852 (6.0419) grad_norm 2.8626 (2.0333) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][190/625] eta 0:01:54 lr 0.000918 wd 0.0500 time 0.4013 (0.2641) data time 0.0007 (0.0038) model time 0.4006 (0.2624) loss 6.8739 (6.0411) grad_norm 1.3879 (2.0374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][200/625] eta 0:01:52 lr 0.000918 wd 0.0500 time 0.2539 (0.2637) data time 0.0007 (0.0037) model time 0.2532 (0.2619) loss 7.0467 (6.0303) grad_norm 2.9543 (2.0706) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][210/625] eta 0:01:49 lr 0.000918 wd 0.0500 time 0.2540 (0.2633) data time 0.0009 (0.0036) model time 0.2532 (0.2615) loss 5.0146 (6.0146) grad_norm 1.7232 (2.0740) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][220/625] eta 0:01:46 lr 0.000917 wd 0.0500 time 0.2520 (0.2630) data time 0.0010 (0.0035) model time 0.2510 (0.2610) loss 5.5376 (6.0137) grad_norm 2.1305 (2.0872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][230/625] eta 0:01:43 lr 0.000917 wd 0.0500 time 0.2580 (0.2627) data time 0.0008 (0.0034) model time 0.2572 (0.2607) loss 6.0749 (6.0243) grad_norm 4.0443 (2.1036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][240/625] eta 0:01:41 lr 0.000917 wd 0.0500 time 0.2587 (0.2625) data time 0.0009 (0.0033) model time 0.2578 (0.2605) loss 5.5812 (6.0360) grad_norm 1.4812 (2.0880) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][250/625] eta 0:01:38 lr 0.000917 wd 0.0500 time 0.2499 (0.2623) data time 0.0008 (0.0032) model time 0.2491 (0.2603) loss 5.7327 (6.0356) grad_norm 1.5861 (2.0667) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][260/625] eta 0:01:35 lr 0.000917 wd 0.0500 time 0.2569 (0.2620) data time 0.0007 (0.0031) model time 0.2563 (0.2601) loss 4.9035 (6.0314) grad_norm 2.5862 (2.0613) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][270/625] eta 0:01:32 lr 0.000916 wd 0.0500 time 0.2568 (0.2618) data time 0.0006 (0.0030) model time 0.2562 (0.2599) loss 5.3218 (6.0334) grad_norm 1.5708 (2.0549) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][280/625] eta 0:01:30 lr 0.000916 wd 0.0500 time 0.2594 (0.2616) data time 0.0006 (0.0029) model time 0.2587 (0.2597) loss 5.7278 (6.0310) grad_norm 1.7558 (2.0499) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][290/625] eta 0:01:27 lr 0.000916 wd 0.0500 time 0.2580 (0.2615) data time 0.0009 (0.0029) model time 0.2571 (0.2596) loss 5.5780 (6.0273) grad_norm 1.3425 (2.0433) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][300/625] eta 0:01:24 lr 0.000916 wd 0.0500 time 0.2560 (0.2613) data time 0.0007 (0.0028) model time 0.2553 (0.2594) loss 7.2301 (6.0446) grad_norm 1.6160 (2.0390) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][310/625] eta 0:01:22 lr 0.000916 wd 0.0500 time 0.2549 (0.2612) data time 0.0008 (0.0027) model time 0.2541 (0.2593) loss 6.1253 (6.0463) grad_norm 3.6231 (2.0456) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][320/625] eta 0:01:19 lr 0.000916 wd 0.0500 time 0.2564 (0.2611) data time 0.0008 (0.0027) model time 0.2556 (0.2592) loss 6.7902 (6.0460) grad_norm 1.7105 (2.0701) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][330/625] eta 0:01:17 lr 0.000915 wd 0.0500 time 0.2579 (0.2614) data time 0.0009 (0.0026) model time 0.2569 (0.2596) loss 6.4317 (6.0378) grad_norm 1.4207 (2.0745) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][340/625] eta 0:01:14 lr 0.000915 wd 0.0500 time 0.2601 (0.2613) data time 0.0007 (0.0026) model time 0.2594 (0.2595) loss 6.1770 (6.0380) grad_norm 3.5391 (2.0859) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][350/625] eta 0:01:11 lr 0.000915 wd 0.0500 time 0.2528 (0.2611) data time 0.0008 (0.0025) model time 0.2520 (0.2593) loss 4.5483 (6.0279) grad_norm 1.9661 (2.0961) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][360/625] eta 0:01:09 lr 0.000915 wd 0.0500 time 0.2558 (0.2609) data time 0.0006 (0.0025) model time 0.2553 (0.2591) loss 5.1167 (6.0372) grad_norm 2.1865 (2.1164) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][370/625] eta 0:01:06 lr 0.000915 wd 0.0500 time 0.2559 (0.2614) data time 0.0006 (0.0024) model time 0.2553 (0.2597) loss 4.7766 (6.0319) grad_norm 1.7072 (2.1267) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][380/625] eta 0:01:04 lr 0.000915 wd 0.0500 time 0.2599 (0.2613) data time 0.0007 (0.0024) model time 0.2593 (0.2596) loss 5.8343 (6.0213) grad_norm 1.9295 (2.1243) loss_scale 1024.0000 (522.7507) mem 9655MB [2024-08-04 05:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][390/625] eta 0:01:01 lr 0.000914 wd 0.0500 time 0.2580 (0.2611) data time 0.0008 (0.0024) model time 0.2572 (0.2594) loss 5.9442 (6.0208) grad_norm 2.0155 (2.1151) loss_scale 1024.0000 (535.5703) mem 9655MB [2024-08-04 05:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][400/625] eta 0:00:58 lr 0.000914 wd 0.0500 time 0.2529 (0.2610) data time 0.0007 (0.0023) model time 0.2523 (0.2593) loss 5.1826 (6.0178) grad_norm 4.8370 (2.1235) loss_scale 1024.0000 (547.7506) mem 9655MB [2024-08-04 05:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][410/625] eta 0:00:56 lr 0.000914 wd 0.0500 time 0.2554 (0.2613) data time 0.0007 (0.0023) model time 0.2547 (0.2597) loss 7.1194 (6.0302) grad_norm 1.9023 (2.1323) loss_scale 1024.0000 (559.3382) mem 9655MB [2024-08-04 05:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][420/625] eta 0:00:53 lr 0.000914 wd 0.0500 time 0.2551 (0.2613) data time 0.0007 (0.0023) model time 0.2544 (0.2596) loss 6.1819 (6.0325) grad_norm 1.9686 (2.1316) loss_scale 1024.0000 (570.3753) mem 9655MB [2024-08-04 05:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][430/625] eta 0:00:50 lr 0.000914 wd 0.0500 time 0.2587 (0.2611) data time 0.0007 (0.0022) model time 0.2580 (0.2595) loss 6.2502 (6.0339) grad_norm 1.6093 (2.1193) loss_scale 1024.0000 (580.9002) mem 9655MB [2024-08-04 05:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][440/625] eta 0:00:48 lr 0.000913 wd 0.0500 time 0.2563 (0.2610) data time 0.0011 (0.0022) model time 0.2551 (0.2594) loss 6.5101 (6.0379) grad_norm 2.3033 (2.1158) loss_scale 1024.0000 (590.9478) mem 9655MB [2024-08-04 05:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][450/625] eta 0:00:45 lr 0.000913 wd 0.0500 time 0.2506 (0.2613) data time 0.0008 (0.0022) model time 0.2497 (0.2598) loss 6.5236 (6.0342) grad_norm 1.7607 (2.1101) loss_scale 1024.0000 (600.5499) mem 9655MB [2024-08-04 05:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][460/625] eta 0:00:43 lr 0.000913 wd 0.0500 time 0.2554 (0.2612) data time 0.0010 (0.0021) model time 0.2544 (0.2596) loss 6.5129 (6.0269) grad_norm 1.5742 (2.1038) loss_scale 1024.0000 (609.7354) mem 9655MB [2024-08-04 05:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][470/625] eta 0:00:40 lr 0.000913 wd 0.0500 time 0.2570 (0.2611) data time 0.0008 (0.0021) model time 0.2563 (0.2595) loss 5.7038 (6.0171) grad_norm 1.3258 (2.0929) loss_scale 1024.0000 (618.5308) mem 9655MB [2024-08-04 05:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][480/625] eta 0:00:37 lr 0.000913 wd 0.0500 time 0.2575 (0.2610) data time 0.0007 (0.0021) model time 0.2568 (0.2594) loss 6.0714 (6.0061) grad_norm 1.8523 (2.0876) loss_scale 1024.0000 (626.9605) mem 9655MB [2024-08-04 05:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][490/625] eta 0:00:35 lr 0.000913 wd 0.0500 time 0.2536 (0.2609) data time 0.0008 (0.0021) model time 0.2528 (0.2593) loss 5.2338 (6.0007) grad_norm 3.1964 (2.0820) loss_scale 1024.0000 (635.0468) mem 9655MB [2024-08-04 05:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][500/625] eta 0:00:32 lr 0.000912 wd 0.0500 time 0.2563 (0.2608) data time 0.0010 (0.0020) model time 0.2553 (0.2593) loss 4.8880 (5.9969) grad_norm 1.6537 (2.0889) loss_scale 1024.0000 (642.8104) mem 9655MB [2024-08-04 05:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][510/625] eta 0:00:29 lr 0.000912 wd 0.0500 time 0.2518 (0.2607) data time 0.0010 (0.0020) model time 0.2508 (0.2592) loss 6.3302 (5.9999) grad_norm 1.8086 (2.0906) loss_scale 1024.0000 (650.2701) mem 9655MB [2024-08-04 05:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][520/625] eta 0:00:27 lr 0.000912 wd 0.0500 time 0.2541 (0.2606) data time 0.0011 (0.0020) model time 0.2531 (0.2591) loss 5.9170 (5.9974) grad_norm 1.4619 (2.0834) loss_scale 1024.0000 (657.4434) mem 9655MB [2024-08-04 05:29:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][530/625] eta 0:00:24 lr 0.000912 wd 0.0500 time 0.2659 (0.2605) data time 0.0009 (0.0020) model time 0.2651 (0.2590) loss 7.1833 (6.0036) grad_norm 1.7287 (2.0783) loss_scale 1024.0000 (664.3465) mem 9655MB [2024-08-04 05:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][540/625] eta 0:00:22 lr 0.000912 wd 0.0500 time 0.2565 (0.2605) data time 0.0008 (0.0020) model time 0.2557 (0.2589) loss 4.7115 (6.0057) grad_norm 3.0285 (2.0777) loss_scale 1024.0000 (670.9945) mem 9655MB [2024-08-04 05:29:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][550/625] eta 0:00:19 lr 0.000912 wd 0.0500 time 0.2571 (0.2604) data time 0.0009 (0.0019) model time 0.2561 (0.2589) loss 5.9133 (6.0089) grad_norm 3.2408 (2.1015) loss_scale 1024.0000 (677.4011) mem 9655MB [2024-08-04 05:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][560/625] eta 0:00:16 lr 0.000911 wd 0.0500 time 0.2577 (0.2607) data time 0.0007 (0.0019) model time 0.2570 (0.2592) loss 5.5750 (6.0028) grad_norm 2.9221 (2.1209) loss_scale 1024.0000 (683.5793) mem 9655MB [2024-08-04 05:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][570/625] eta 0:00:14 lr 0.000911 wd 0.0500 time 0.2553 (0.2607) data time 0.0008 (0.0019) model time 0.2545 (0.2592) loss 5.9988 (6.0053) grad_norm 3.0289 (2.1307) loss_scale 1024.0000 (689.5412) mem 9655MB [2024-08-04 05:29:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][580/625] eta 0:00:11 lr 0.000911 wd 0.0500 time 0.2550 (0.2606) data time 0.0008 (0.0019) model time 0.2541 (0.2591) loss 6.8462 (6.0018) grad_norm 2.1249 (2.1406) loss_scale 1024.0000 (695.2978) mem 9655MB [2024-08-04 05:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][590/625] eta 0:00:09 lr 0.000911 wd 0.0500 time 0.2548 (0.2605) data time 0.0011 (0.0019) model time 0.2536 (0.2591) loss 5.1284 (6.0091) grad_norm 1.8395 (2.1431) loss_scale 1024.0000 (700.8596) mem 9655MB [2024-08-04 05:29:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][600/625] eta 0:00:06 lr 0.000911 wd 0.0500 time 0.2557 (0.2604) data time 0.0010 (0.0019) model time 0.2547 (0.2590) loss 5.9594 (6.0074) grad_norm 1.7418 (2.1411) loss_scale 1024.0000 (706.2363) mem 9655MB [2024-08-04 05:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][610/625] eta 0:00:03 lr 0.000910 wd 0.0500 time 0.2528 (0.2604) data time 0.0006 (0.0018) model time 0.2522 (0.2589) loss 5.6223 (6.0021) grad_norm 1.5779 (2.1302) loss_scale 1024.0000 (711.4370) mem 9655MB [2024-08-04 05:29:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][620/625] eta 0:00:01 lr 0.000910 wd 0.0500 time 0.2559 (0.2603) data time 0.0005 (0.0018) model time 0.2554 (0.2588) loss 5.4593 (6.0015) grad_norm 1.7204 (2.1254) loss_scale 1024.0000 (716.4702) mem 9655MB [2024-08-04 05:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 168 training takes 0:02:42 [2024-08-04 05:29:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:29:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.621 (0.621) Loss 0.6465 (0.6465) Acc@1 88.330 (88.330) Acc@5 98.340 (98.340) Mem 9655MB [2024-08-04 05:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 0.9854 (0.7824) Acc@1 79.883 (84.708) Acc@5 95.020 (97.159) Mem 9655MB [2024-08-04 05:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.1221 (0.9162) Acc@1 74.219 (81.006) Acc@5 93.164 (95.594) Mem 9655MB [2024-08-04 05:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.700 Acc@5 95.581 [2024-08-04 05:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-04 05:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.772 (0.772) Loss 0.5884 (0.5884) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9341 (0.7269) Acc@1 79.785 (85.605) Acc@5 95.557 (97.532) Mem 9655MB [2024-08-04 05:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0645 (0.8555) Acc@1 75.391 (82.138) Acc@5 94.531 (96.157) Mem 9655MB [2024-08-04 05:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.858 Acc@5 96.139 [2024-08-04 05:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.86% [2024-08-04 05:30:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:30:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][0/625] eta 0:07:38 lr 0.000910 wd 0.0500 time 0.7343 (0.7343) data time 0.4862 (0.4862) model time 0.0000 (0.0000) loss 4.6299 (4.6299) grad_norm 1.1955 (1.1955) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][10/625] eta 0:03:03 lr 0.000910 wd 0.0500 time 0.2538 (0.2981) data time 0.0007 (0.0451) model time 0.0000 (0.0000) loss 5.0624 (6.0355) grad_norm 1.3943 (2.1533) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][20/625] eta 0:02:48 lr 0.000910 wd 0.0500 time 0.2574 (0.2780) data time 0.0009 (0.0240) model time 0.0000 (0.0000) loss 5.2007 (6.0423) grad_norm 2.0088 (2.2793) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][30/625] eta 0:02:45 lr 0.000910 wd 0.0500 time 0.2605 (0.2778) data time 0.0005 (0.0165) model time 0.0000 (0.0000) loss 5.0519 (5.9387) grad_norm 2.8820 (2.2538) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][40/625] eta 0:02:42 lr 0.000909 wd 0.0500 time 0.2544 (0.2772) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 6.7202 (5.8291) grad_norm 1.7363 (2.2139) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][50/625] eta 0:02:37 lr 0.000909 wd 0.0500 time 0.2562 (0.2731) data time 0.0011 (0.0104) model time 0.0000 (0.0000) loss 5.8526 (5.8363) grad_norm 1.4023 (2.1404) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][60/625] eta 0:02:32 lr 0.000909 wd 0.0500 time 0.2574 (0.2705) data time 0.0011 (0.0089) model time 0.2563 (0.2560) loss 5.1036 (5.8810) grad_norm 2.5296 (2.0671) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][70/625] eta 0:02:29 lr 0.000909 wd 0.0500 time 0.2642 (0.2686) data time 0.0006 (0.0078) model time 0.2637 (0.2561) loss 6.5730 (5.8884) grad_norm 1.1290 (2.0194) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][80/625] eta 0:02:25 lr 0.000909 wd 0.0500 time 0.2586 (0.2671) data time 0.0008 (0.0069) model time 0.2578 (0.2560) loss 7.1249 (5.9181) grad_norm 2.1235 (2.0937) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][90/625] eta 0:02:22 lr 0.000909 wd 0.0500 time 0.2560 (0.2661) data time 0.0010 (0.0063) model time 0.2550 (0.2562) loss 6.2492 (5.9064) grad_norm 1.4840 (2.0540) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][100/625] eta 0:02:19 lr 0.000908 wd 0.0500 time 0.2573 (0.2651) data time 0.0008 (0.0057) model time 0.2565 (0.2561) loss 4.3006 (5.9182) grad_norm 1.4301 (2.0287) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][110/625] eta 0:02:16 lr 0.000908 wd 0.0500 time 0.2682 (0.2644) data time 0.0006 (0.0053) model time 0.2676 (0.2561) loss 6.4568 (5.9456) grad_norm 2.2036 (2.0385) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][120/625] eta 0:02:13 lr 0.000908 wd 0.0500 time 0.2571 (0.2638) data time 0.0007 (0.0050) model time 0.2564 (0.2561) loss 6.6520 (5.9612) grad_norm 1.6417 (2.0520) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][130/625] eta 0:02:10 lr 0.000908 wd 0.0500 time 0.2561 (0.2632) data time 0.0008 (0.0047) model time 0.2553 (0.2559) loss 5.4206 (5.9489) grad_norm 1.9452 (2.0654) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][140/625] eta 0:02:07 lr 0.000908 wd 0.0500 time 0.2561 (0.2627) data time 0.0006 (0.0044) model time 0.2555 (0.2559) loss 5.6331 (5.9595) grad_norm 1.5418 (2.0721) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][150/625] eta 0:02:04 lr 0.000908 wd 0.0500 time 0.2598 (0.2623) data time 0.0008 (0.0042) model time 0.2591 (0.2559) loss 6.4887 (5.9667) grad_norm 2.1199 (2.0962) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][160/625] eta 0:02:01 lr 0.000907 wd 0.0500 time 0.2548 (0.2619) data time 0.0009 (0.0040) model time 0.2539 (0.2557) loss 4.6962 (5.9551) grad_norm 1.7311 (2.1174) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][170/625] eta 0:01:59 lr 0.000907 wd 0.0500 time 0.2547 (0.2615) data time 0.0006 (0.0038) model time 0.2542 (0.2557) loss 6.6281 (5.9755) grad_norm 3.2729 (2.1247) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][180/625] eta 0:01:56 lr 0.000907 wd 0.0500 time 0.2518 (0.2620) data time 0.0007 (0.0036) model time 0.2511 (0.2567) loss 4.8815 (5.9846) grad_norm 2.6599 (2.1186) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][190/625] eta 0:01:54 lr 0.000907 wd 0.0500 time 0.2567 (0.2625) data time 0.0009 (0.0035) model time 0.2558 (0.2576) loss 4.9655 (5.9732) grad_norm 2.2577 (2.1419) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][200/625] eta 0:01:51 lr 0.000907 wd 0.0500 time 0.2527 (0.2629) data time 0.0007 (0.0034) model time 0.2519 (0.2585) loss 5.0458 (5.9726) grad_norm 2.4943 (2.1860) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][210/625] eta 0:01:48 lr 0.000906 wd 0.0500 time 0.2535 (0.2626) data time 0.0009 (0.0033) model time 0.2526 (0.2583) loss 5.5170 (5.9567) grad_norm 1.3929 (2.1757) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][220/625] eta 0:01:46 lr 0.000906 wd 0.0500 time 0.2562 (0.2632) data time 0.0008 (0.0032) model time 0.2553 (0.2593) loss 6.0786 (5.9387) grad_norm 1.4232 (2.1617) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][230/625] eta 0:01:43 lr 0.000906 wd 0.0500 time 0.2573 (0.2629) data time 0.0008 (0.0031) model time 0.2566 (0.2590) loss 6.8529 (5.9404) grad_norm 1.8055 (2.1414) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][240/625] eta 0:01:41 lr 0.000906 wd 0.0500 time 0.2557 (0.2626) data time 0.0007 (0.0030) model time 0.2550 (0.2588) loss 6.0099 (5.9310) grad_norm 1.6642 (2.1352) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][250/625] eta 0:01:38 lr 0.000906 wd 0.0500 time 0.2537 (0.2623) data time 0.0009 (0.0029) model time 0.2528 (0.2586) loss 4.7710 (5.9339) grad_norm 2.4474 (2.1419) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][260/625] eta 0:01:35 lr 0.000906 wd 0.0500 time 0.2534 (0.2621) data time 0.0009 (0.0028) model time 0.2525 (0.2585) loss 5.9973 (5.9480) grad_norm 1.5562 (2.1305) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][270/625] eta 0:01:32 lr 0.000905 wd 0.0500 time 0.2540 (0.2618) data time 0.0011 (0.0027) model time 0.2530 (0.2583) loss 5.2196 (5.9460) grad_norm 1.5178 (2.1182) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][280/625] eta 0:01:30 lr 0.000905 wd 0.0500 time 0.2533 (0.2623) data time 0.0009 (0.0027) model time 0.2524 (0.2590) loss 5.9988 (5.9460) grad_norm 1.1607 (2.1046) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][290/625] eta 0:01:28 lr 0.000905 wd 0.0500 time 0.2571 (0.2635) data time 0.0011 (0.0026) model time 0.2559 (0.2605) loss 5.1540 (5.9398) grad_norm 1.5872 (2.0859) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][300/625] eta 0:01:25 lr 0.000905 wd 0.0500 time 0.2529 (0.2632) data time 0.0008 (0.0026) model time 0.2520 (0.2602) loss 6.9797 (5.9431) grad_norm 2.4494 (2.1017) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][310/625] eta 0:01:22 lr 0.000905 wd 0.0500 time 0.2617 (0.2630) data time 0.0009 (0.0025) model time 0.2608 (0.2600) loss 5.3193 (5.9429) grad_norm 2.1397 (2.1056) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][320/625] eta 0:01:20 lr 0.000905 wd 0.0500 time 0.2556 (0.2628) data time 0.0011 (0.0025) model time 0.2546 (0.2599) loss 5.6665 (5.9445) grad_norm 3.5427 (2.1235) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][330/625] eta 0:01:17 lr 0.000904 wd 0.0500 time 0.2532 (0.2626) data time 0.0008 (0.0024) model time 0.2524 (0.2597) loss 6.2466 (5.9382) grad_norm 1.8499 (2.1406) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][340/625] eta 0:01:14 lr 0.000904 wd 0.0500 time 0.2552 (0.2624) data time 0.0008 (0.0024) model time 0.2545 (0.2596) loss 5.7686 (5.9446) grad_norm 1.9706 (2.1386) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][350/625] eta 0:01:12 lr 0.000904 wd 0.0500 time 0.2577 (0.2622) data time 0.0008 (0.0023) model time 0.2568 (0.2594) loss 6.6681 (5.9450) grad_norm 2.9157 (2.1959) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][360/625] eta 0:01:09 lr 0.000904 wd 0.0500 time 0.2573 (0.2621) data time 0.0009 (0.0023) model time 0.2565 (0.2593) loss 5.8379 (5.9463) grad_norm 2.0015 (2.2096) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][370/625] eta 0:01:06 lr 0.000904 wd 0.0500 time 0.2551 (0.2619) data time 0.0007 (0.0023) model time 0.2544 (0.2592) loss 5.2558 (5.9531) grad_norm 3.2612 (2.2056) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][380/625] eta 0:01:04 lr 0.000903 wd 0.0500 time 0.2568 (0.2617) data time 0.0007 (0.0022) model time 0.2561 (0.2590) loss 6.8322 (5.9653) grad_norm 2.2524 (2.2058) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][390/625] eta 0:01:01 lr 0.000903 wd 0.0500 time 0.2566 (0.2616) data time 0.0010 (0.0022) model time 0.2556 (0.2589) loss 5.6008 (5.9624) grad_norm 1.6494 (2.2043) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][400/625] eta 0:00:58 lr 0.000903 wd 0.0500 time 0.2674 (0.2615) data time 0.0008 (0.0022) model time 0.2666 (0.2589) loss 6.4417 (5.9734) grad_norm 2.0101 (2.2224) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][410/625] eta 0:00:56 lr 0.000903 wd 0.0500 time 0.2548 (0.2614) data time 0.0010 (0.0021) model time 0.2538 (0.2588) loss 6.0817 (5.9678) grad_norm 1.9281 (2.2263) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][420/625] eta 0:00:53 lr 0.000903 wd 0.0500 time 0.2678 (0.2613) data time 0.0006 (0.0021) model time 0.2671 (0.2587) loss 6.0065 (5.9652) grad_norm 1.7807 (2.2129) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][430/625] eta 0:00:50 lr 0.000903 wd 0.0500 time 0.2603 (0.2615) data time 0.0006 (0.0021) model time 0.2596 (0.2590) loss 5.7164 (5.9677) grad_norm 2.9504 (2.2164) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][440/625] eta 0:00:48 lr 0.000902 wd 0.0500 time 0.2620 (0.2614) data time 0.0008 (0.0020) model time 0.2611 (0.2589) loss 5.9739 (5.9680) grad_norm 2.6906 (2.2119) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][450/625] eta 0:00:45 lr 0.000902 wd 0.0500 time 0.3752 (0.2619) data time 0.0007 (0.0020) model time 0.3745 (0.2596) loss 6.4568 (5.9571) grad_norm 1.7870 (2.2009) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][460/625] eta 0:00:43 lr 0.000902 wd 0.0500 time 0.2573 (0.2619) data time 0.0006 (0.0020) model time 0.2567 (0.2595) loss 5.3785 (5.9577) grad_norm 1.5516 (2.1886) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][470/625] eta 0:00:40 lr 0.000902 wd 0.0500 time 0.2515 (0.2617) data time 0.0010 (0.0020) model time 0.2505 (0.2594) loss 6.1195 (5.9627) grad_norm 1.6794 (2.1839) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][480/625] eta 0:00:37 lr 0.000902 wd 0.0500 time 0.2551 (0.2616) data time 0.0009 (0.0020) model time 0.2542 (0.2593) loss 5.3343 (5.9639) grad_norm 1.4939 (2.1822) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][490/625] eta 0:00:35 lr 0.000902 wd 0.0500 time 0.2577 (0.2615) data time 0.0008 (0.0019) model time 0.2570 (0.2592) loss 6.4149 (5.9683) grad_norm 1.9985 (2.1801) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][500/625] eta 0:00:32 lr 0.000901 wd 0.0500 time 0.2621 (0.2614) data time 0.0008 (0.0019) model time 0.2613 (0.2591) loss 6.5381 (5.9691) grad_norm 3.0598 (2.1848) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][510/625] eta 0:00:30 lr 0.000901 wd 0.0500 time 0.2584 (0.2613) data time 0.0006 (0.0019) model time 0.2578 (0.2590) loss 5.0699 (5.9674) grad_norm 1.7241 (2.1818) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][520/625] eta 0:00:27 lr 0.000901 wd 0.0500 time 0.2597 (0.2612) data time 0.0008 (0.0019) model time 0.2590 (0.2590) loss 6.3998 (5.9679) grad_norm 2.0753 (2.1961) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][530/625] eta 0:00:24 lr 0.000901 wd 0.0500 time 0.2500 (0.2611) data time 0.0009 (0.0019) model time 0.2491 (0.2589) loss 6.7793 (5.9721) grad_norm 3.2190 (2.2093) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][540/625] eta 0:00:22 lr 0.000901 wd 0.0500 time 0.2548 (0.2610) data time 0.0007 (0.0018) model time 0.2542 (0.2588) loss 5.4456 (5.9702) grad_norm 2.5342 (2.2073) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][550/625] eta 0:00:19 lr 0.000900 wd 0.0500 time 0.2550 (0.2609) data time 0.0008 (0.0018) model time 0.2542 (0.2587) loss 6.7738 (5.9690) grad_norm 2.0969 (2.2080) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][560/625] eta 0:00:16 lr 0.000900 wd 0.0500 time 0.2572 (0.2608) data time 0.0008 (0.0018) model time 0.2564 (0.2587) loss 6.6325 (5.9647) grad_norm 2.1287 (2.2030) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][570/625] eta 0:00:14 lr 0.000900 wd 0.0500 time 0.2565 (0.2607) data time 0.0007 (0.0018) model time 0.2558 (0.2586) loss 4.9848 (5.9627) grad_norm 1.7475 (2.1938) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][580/625] eta 0:00:11 lr 0.000900 wd 0.0500 time 0.2553 (0.2607) data time 0.0009 (0.0018) model time 0.2544 (0.2585) loss 6.9038 (5.9579) grad_norm 2.6788 (2.1880) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][590/625] eta 0:00:09 lr 0.000900 wd 0.0500 time 0.2565 (0.2606) data time 0.0009 (0.0018) model time 0.2556 (0.2585) loss 6.9953 (5.9579) grad_norm 1.4416 (2.1919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][600/625] eta 0:00:06 lr 0.000900 wd 0.0500 time 0.2559 (0.2605) data time 0.0010 (0.0017) model time 0.2550 (0.2584) loss 5.1123 (5.9580) grad_norm 2.2067 (2.1890) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][610/625] eta 0:00:03 lr 0.000899 wd 0.0500 time 0.2544 (0.2605) data time 0.0003 (0.0017) model time 0.2540 (0.2583) loss 6.2347 (5.9609) grad_norm 1.3572 (2.1835) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][620/625] eta 0:00:01 lr 0.000899 wd 0.0500 time 0.2532 (0.2603) data time 0.0005 (0.0017) model time 0.2527 (0.2583) loss 6.5243 (5.9680) grad_norm 3.1908 (2.1838) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 169 training takes 0:02:42 [2024-08-04 05:32:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:32:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.6587 (0.6587) Acc@1 88.574 (88.574) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 05:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 1.0391 (0.8081) Acc@1 78.516 (84.490) Acc@5 95.020 (97.186) Mem 9655MB [2024-08-04 05:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1387 (0.9412) Acc@1 76.074 (81.024) Acc@5 93.848 (95.631) Mem 9655MB [2024-08-04 05:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.744 Acc@5 95.613 [2024-08-04 05:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-04 05:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.74% [2024-08-04 05:32:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:32:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.669 (0.669) Loss 0.5889 (0.5889) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.111) Loss 0.9331 (0.7264) Acc@1 79.736 (85.627) Acc@5 95.557 (97.523) Mem 9655MB [2024-08-04 05:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.0635 (0.8553) Acc@1 75.488 (82.182) Acc@5 94.482 (96.164) Mem 9655MB [2024-08-04 05:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.890 Acc@5 96.149 [2024-08-04 05:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.89% [2024-08-04 05:32:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:32:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][0/625] eta 0:07:22 lr 0.000899 wd 0.0500 time 0.7072 (0.7072) data time 0.4598 (0.4598) model time 0.0000 (0.0000) loss 5.5294 (5.5294) grad_norm 1.4146 (1.4146) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][10/625] eta 0:03:12 lr 0.000899 wd 0.0500 time 0.2557 (0.3126) data time 0.0008 (0.0426) model time 0.0000 (0.0000) loss 6.2130 (6.2422) grad_norm 1.4324 (2.4983) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][20/625] eta 0:02:52 lr 0.000899 wd 0.0500 time 0.2600 (0.2859) data time 0.0008 (0.0227) model time 0.0000 (0.0000) loss 4.9827 (6.0565) grad_norm 1.5227 (2.3576) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][30/625] eta 0:02:44 lr 0.000899 wd 0.0500 time 0.2560 (0.2763) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 6.6237 (5.9266) grad_norm 2.8932 (2.2631) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][40/625] eta 0:02:38 lr 0.000898 wd 0.0500 time 0.2592 (0.2714) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 6.7411 (5.9449) grad_norm 3.8356 (2.2560) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][50/625] eta 0:02:34 lr 0.000898 wd 0.0500 time 0.2539 (0.2689) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 5.4882 (5.9642) grad_norm 1.6029 (2.1428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][60/625] eta 0:02:30 lr 0.000898 wd 0.0500 time 0.2573 (0.2667) data time 0.0008 (0.0084) model time 0.2565 (0.2546) loss 4.9665 (5.9379) grad_norm 1.5788 (2.0587) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][70/625] eta 0:02:27 lr 0.000898 wd 0.0500 time 0.2517 (0.2651) data time 0.0010 (0.0074) model time 0.2507 (0.2545) loss 5.2418 (5.9685) grad_norm 2.5225 (2.0123) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][80/625] eta 0:02:23 lr 0.000898 wd 0.0500 time 0.2642 (0.2641) data time 0.0008 (0.0066) model time 0.2634 (0.2552) loss 5.2582 (5.9658) grad_norm 1.6863 (1.9801) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][90/625] eta 0:02:20 lr 0.000898 wd 0.0500 time 0.2548 (0.2632) data time 0.0008 (0.0059) model time 0.2540 (0.2550) loss 7.0821 (5.9786) grad_norm 2.5288 (2.0187) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][100/625] eta 0:02:18 lr 0.000897 wd 0.0500 time 0.4577 (0.2644) data time 0.0010 (0.0054) model time 0.4567 (0.2589) loss 6.8178 (5.9763) grad_norm 1.7172 (1.9943) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][110/625] eta 0:02:16 lr 0.000897 wd 0.0500 time 0.2564 (0.2654) data time 0.0006 (0.0050) model time 0.2558 (0.2616) loss 6.1963 (5.9809) grad_norm 1.1230 (1.9910) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][120/625] eta 0:02:13 lr 0.000897 wd 0.0500 time 0.2551 (0.2646) data time 0.0006 (0.0047) model time 0.2544 (0.2606) loss 5.0323 (5.9771) grad_norm 1.4655 (2.0094) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][130/625] eta 0:02:10 lr 0.000897 wd 0.0500 time 0.2544 (0.2639) data time 0.0010 (0.0044) model time 0.2534 (0.2598) loss 5.8353 (5.9817) grad_norm 1.4882 (1.9943) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][140/625] eta 0:02:08 lr 0.000897 wd 0.0500 time 0.2568 (0.2647) data time 0.0009 (0.0042) model time 0.2559 (0.2614) loss 5.4966 (5.9854) grad_norm 2.9476 (2.0934) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][150/625] eta 0:02:05 lr 0.000897 wd 0.0500 time 0.2584 (0.2641) data time 0.0006 (0.0040) model time 0.2578 (0.2608) loss 5.5638 (6.0026) grad_norm 2.7667 (2.1198) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][160/625] eta 0:02:02 lr 0.000896 wd 0.0500 time 0.2565 (0.2636) data time 0.0008 (0.0038) model time 0.2557 (0.2603) loss 5.7983 (5.9961) grad_norm 2.1176 (2.1312) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][170/625] eta 0:01:59 lr 0.000896 wd 0.0500 time 0.2563 (0.2631) data time 0.0009 (0.0036) model time 0.2554 (0.2597) loss 5.2434 (5.9852) grad_norm 1.7098 (2.1434) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][180/625] eta 0:01:56 lr 0.000896 wd 0.0500 time 0.2579 (0.2627) data time 0.0010 (0.0035) model time 0.2569 (0.2593) loss 5.7632 (5.9868) grad_norm 1.1303 (2.1351) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][190/625] eta 0:01:54 lr 0.000896 wd 0.0500 time 0.2537 (0.2623) data time 0.0008 (0.0033) model time 0.2528 (0.2590) loss 4.7112 (5.9698) grad_norm 3.7372 (2.1430) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][200/625] eta 0:01:51 lr 0.000896 wd 0.0500 time 0.2607 (0.2620) data time 0.0007 (0.0032) model time 0.2600 (0.2588) loss 6.4459 (5.9701) grad_norm 1.2730 (2.1491) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][210/625] eta 0:01:48 lr 0.000895 wd 0.0500 time 0.2579 (0.2618) data time 0.0006 (0.0031) model time 0.2573 (0.2585) loss 6.3098 (5.9667) grad_norm 2.4363 (2.1371) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][220/625] eta 0:01:45 lr 0.000895 wd 0.0500 time 0.2582 (0.2615) data time 0.0005 (0.0030) model time 0.2577 (0.2583) loss 7.3801 (5.9586) grad_norm 1.4650 (2.1350) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][230/625] eta 0:01:43 lr 0.000895 wd 0.0500 time 0.2564 (0.2613) data time 0.0011 (0.0029) model time 0.2553 (0.2582) loss 5.8092 (5.9569) grad_norm 2.2701 (2.1447) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][240/625] eta 0:01:40 lr 0.000895 wd 0.0500 time 0.2564 (0.2611) data time 0.0009 (0.0028) model time 0.2554 (0.2580) loss 6.3644 (5.9560) grad_norm 2.1273 (2.1370) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][250/625] eta 0:01:37 lr 0.000895 wd 0.0500 time 0.2552 (0.2609) data time 0.0009 (0.0028) model time 0.2543 (0.2579) loss 6.7176 (5.9711) grad_norm 3.1625 (2.1369) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][260/625] eta 0:01:35 lr 0.000895 wd 0.0500 time 0.2583 (0.2615) data time 0.0006 (0.0027) model time 0.2577 (0.2588) loss 6.8745 (5.9799) grad_norm 2.6095 (2.1603) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][270/625] eta 0:01:32 lr 0.000894 wd 0.0500 time 0.2583 (0.2613) data time 0.0008 (0.0026) model time 0.2574 (0.2586) loss 6.5253 (5.9657) grad_norm 1.5224 (2.1862) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][280/625] eta 0:01:30 lr 0.000894 wd 0.0500 time 0.2559 (0.2612) data time 0.0010 (0.0026) model time 0.2549 (0.2585) loss 6.1886 (5.9634) grad_norm 1.8283 (2.1994) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][290/625] eta 0:01:27 lr 0.000894 wd 0.0500 time 0.2610 (0.2610) data time 0.0008 (0.0025) model time 0.2602 (0.2583) loss 4.8982 (5.9558) grad_norm 2.3995 (2.2079) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][300/625] eta 0:01:24 lr 0.000894 wd 0.0500 time 0.2646 (0.2608) data time 0.0008 (0.0025) model time 0.2638 (0.2582) loss 4.9882 (5.9610) grad_norm 1.7854 (2.2019) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][310/625] eta 0:01:22 lr 0.000894 wd 0.0500 time 0.2557 (0.2606) data time 0.0006 (0.0024) model time 0.2551 (0.2580) loss 7.0742 (5.9603) grad_norm 1.4761 (2.1957) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][320/625] eta 0:01:19 lr 0.000894 wd 0.0500 time 0.2527 (0.2605) data time 0.0019 (0.0024) model time 0.2509 (0.2579) loss 6.4958 (5.9532) grad_norm 2.1445 (2.1973) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][330/625] eta 0:01:16 lr 0.000893 wd 0.0500 time 0.2556 (0.2603) data time 0.0008 (0.0023) model time 0.2548 (0.2578) loss 6.7122 (5.9496) grad_norm 2.9167 (2.1862) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][340/625] eta 0:01:14 lr 0.000893 wd 0.0500 time 0.2573 (0.2602) data time 0.0008 (0.0023) model time 0.2565 (0.2577) loss 5.8881 (5.9464) grad_norm 2.1873 (2.1797) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][350/625] eta 0:01:11 lr 0.000893 wd 0.0500 time 0.2560 (0.2605) data time 0.0008 (0.0022) model time 0.2552 (0.2581) loss 5.1718 (5.9457) grad_norm 2.2703 (2.1689) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][360/625] eta 0:01:09 lr 0.000893 wd 0.0500 time 0.2579 (0.2604) data time 0.0010 (0.0022) model time 0.2569 (0.2581) loss 4.8620 (5.9354) grad_norm 2.7409 (2.1824) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][370/625] eta 0:01:06 lr 0.000893 wd 0.0500 time 0.2566 (0.2603) data time 0.0008 (0.0022) model time 0.2558 (0.2580) loss 6.7690 (5.9455) grad_norm 2.7571 (2.1907) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][380/625] eta 0:01:03 lr 0.000892 wd 0.0500 time 0.2641 (0.2602) data time 0.0006 (0.0021) model time 0.2635 (0.2579) loss 5.0641 (5.9394) grad_norm 1.9156 (2.1768) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][390/625] eta 0:01:01 lr 0.000892 wd 0.0500 time 0.2520 (0.2603) data time 0.0008 (0.0021) model time 0.2512 (0.2581) loss 7.0427 (5.9412) grad_norm 2.4050 (2.1684) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][400/625] eta 0:00:58 lr 0.000892 wd 0.0500 time 0.2580 (0.2602) data time 0.0011 (0.0021) model time 0.2569 (0.2580) loss 6.7086 (5.9445) grad_norm 1.8716 (2.1637) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][410/625] eta 0:00:55 lr 0.000892 wd 0.0500 time 0.2540 (0.2601) data time 0.0008 (0.0021) model time 0.2532 (0.2579) loss 4.9433 (5.9482) grad_norm 2.9731 (2.1582) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][420/625] eta 0:00:53 lr 0.000892 wd 0.0500 time 0.2517 (0.2600) data time 0.0011 (0.0020) model time 0.2506 (0.2578) loss 7.1636 (5.9491) grad_norm 3.1611 (2.1551) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][430/625] eta 0:00:50 lr 0.000892 wd 0.0500 time 0.4318 (0.2603) data time 0.0007 (0.0020) model time 0.4311 (0.2582) loss 4.2785 (5.9488) grad_norm 2.8992 (2.1489) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][440/625] eta 0:00:48 lr 0.000891 wd 0.0500 time 0.2575 (0.2611) data time 0.0006 (0.0020) model time 0.2569 (0.2592) loss 6.5604 (5.9507) grad_norm 1.3071 (2.1456) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][450/625] eta 0:00:45 lr 0.000891 wd 0.0500 time 0.2561 (0.2615) data time 0.0008 (0.0019) model time 0.2553 (0.2597) loss 5.9794 (5.9490) grad_norm 1.3830 (2.1347) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][460/625] eta 0:00:43 lr 0.000891 wd 0.0500 time 0.2489 (0.2614) data time 0.0010 (0.0019) model time 0.2479 (0.2595) loss 5.2796 (5.9521) grad_norm 1.7381 (2.1338) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][470/625] eta 0:00:40 lr 0.000891 wd 0.0500 time 0.2565 (0.2613) data time 0.0011 (0.0019) model time 0.2555 (0.2595) loss 6.3213 (5.9633) grad_norm 2.7971 (2.1285) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][480/625] eta 0:00:37 lr 0.000891 wd 0.0500 time 0.2541 (0.2612) data time 0.0008 (0.0019) model time 0.2533 (0.2594) loss 7.0781 (5.9621) grad_norm 1.5498 (2.1157) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][490/625] eta 0:00:35 lr 0.000891 wd 0.0500 time 0.2535 (0.2611) data time 0.0013 (0.0019) model time 0.2523 (0.2593) loss 5.8779 (5.9603) grad_norm 1.6063 (2.1058) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][500/625] eta 0:00:32 lr 0.000890 wd 0.0500 time 0.2574 (0.2614) data time 0.0007 (0.0018) model time 0.2567 (0.2596) loss 6.7981 (5.9652) grad_norm 4.1426 (2.1021) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][510/625] eta 0:00:30 lr 0.000890 wd 0.0500 time 0.2537 (0.2613) data time 0.0009 (0.0018) model time 0.2528 (0.2595) loss 6.5594 (5.9633) grad_norm 1.4082 (2.1063) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][520/625] eta 0:00:27 lr 0.000890 wd 0.0500 time 0.2527 (0.2611) data time 0.0008 (0.0018) model time 0.2519 (0.2594) loss 6.6846 (5.9634) grad_norm 2.8989 (2.1082) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][530/625] eta 0:00:24 lr 0.000890 wd 0.0500 time 0.2554 (0.2610) data time 0.0008 (0.0018) model time 0.2545 (0.2593) loss 5.1544 (5.9647) grad_norm 2.8595 (2.1175) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][540/625] eta 0:00:22 lr 0.000890 wd 0.0500 time 0.2564 (0.2609) data time 0.0010 (0.0018) model time 0.2554 (0.2592) loss 5.9388 (5.9689) grad_norm 1.8360 (2.1162) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][550/625] eta 0:00:19 lr 0.000889 wd 0.0500 time 0.2531 (0.2609) data time 0.0006 (0.0018) model time 0.2525 (0.2591) loss 4.5666 (5.9706) grad_norm 1.3113 (2.1108) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][560/625] eta 0:00:16 lr 0.000889 wd 0.0500 time 0.2582 (0.2612) data time 0.0008 (0.0018) model time 0.2574 (0.2595) loss 6.1690 (5.9747) grad_norm 2.3478 (2.1084) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][570/625] eta 0:00:14 lr 0.000889 wd 0.0500 time 0.2580 (0.2611) data time 0.0009 (0.0017) model time 0.2571 (0.2594) loss 5.5733 (5.9665) grad_norm 2.4516 (2.1117) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][580/625] eta 0:00:11 lr 0.000889 wd 0.0500 time 0.2549 (0.2610) data time 0.0009 (0.0017) model time 0.2540 (0.2593) loss 5.9111 (5.9697) grad_norm 2.0761 (2.1065) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][590/625] eta 0:00:09 lr 0.000889 wd 0.0500 time 0.2550 (0.2610) data time 0.0006 (0.0017) model time 0.2544 (0.2593) loss 4.6535 (5.9710) grad_norm 4.3115 (2.1220) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][600/625] eta 0:00:06 lr 0.000889 wd 0.0500 time 0.2605 (0.2609) data time 0.0008 (0.0017) model time 0.2597 (0.2592) loss 5.3262 (5.9722) grad_norm 1.7415 (2.1280) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][610/625] eta 0:00:03 lr 0.000888 wd 0.0500 time 0.2524 (0.2611) data time 0.0004 (0.0017) model time 0.2519 (0.2594) loss 6.4872 (5.9737) grad_norm 2.3437 (2.1226) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][620/625] eta 0:00:01 lr 0.000888 wd 0.0500 time 0.2520 (0.2610) data time 0.0005 (0.0017) model time 0.2515 (0.2593) loss 5.8436 (5.9688) grad_norm 2.0393 (2.1210) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 170 training takes 0:02:43 [2024-08-04 05:35:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:35:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.620 (0.620) Loss 0.6172 (0.6172) Acc@1 88.721 (88.721) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 05:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.109) Loss 0.9951 (0.7728) Acc@1 78.271 (84.490) Acc@5 95.215 (97.186) Mem 9655MB [2024-08-04 05:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.1143 (0.9115) Acc@1 75.439 (81.048) Acc@5 93.506 (95.729) Mem 9655MB [2024-08-04 05:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.788 Acc@5 95.721 [2024-08-04 05:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-04 05:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.79% [2024-08-04 05:35:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:35:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:35:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5889 (0.5889) Acc@1 89.600 (89.600) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9326 (0.7264) Acc@1 79.688 (85.622) Acc@5 95.654 (97.528) Mem 9655MB [2024-08-04 05:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0635 (0.8549) Acc@1 75.391 (82.175) Acc@5 94.580 (96.187) Mem 9655MB [2024-08-04 05:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.878 Acc@5 96.167 [2024-08-04 05:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][0/625] eta 0:11:37 lr 0.000888 wd 0.0500 time 1.1163 (1.1163) data time 0.5316 (0.5316) model time 0.0000 (0.0000) loss 6.1372 (6.1372) grad_norm 2.9833 (2.9833) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][10/625] eta 0:03:25 lr 0.000888 wd 0.0500 time 0.2516 (0.3335) data time 0.0007 (0.0491) model time 0.0000 (0.0000) loss 6.2647 (6.2964) grad_norm 1.3832 (1.8806) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][20/625] eta 0:02:59 lr 0.000888 wd 0.0500 time 0.2571 (0.2973) data time 0.0008 (0.0261) model time 0.0000 (0.0000) loss 5.1144 (6.1179) grad_norm 1.4721 (1.8919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][30/625] eta 0:02:48 lr 0.000888 wd 0.0500 time 0.2519 (0.2839) data time 0.0010 (0.0180) model time 0.0000 (0.0000) loss 6.8680 (6.0458) grad_norm 2.7665 (1.8779) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][40/625] eta 0:02:42 lr 0.000887 wd 0.0500 time 0.2549 (0.2772) data time 0.0011 (0.0139) model time 0.0000 (0.0000) loss 6.1957 (5.9730) grad_norm 1.6117 (1.8670) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][50/625] eta 0:02:37 lr 0.000887 wd 0.0500 time 0.2629 (0.2733) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 6.1910 (6.0395) grad_norm 1.3669 (1.8800) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][60/625] eta 0:02:32 lr 0.000887 wd 0.0500 time 0.2568 (0.2706) data time 0.0010 (0.0096) model time 0.2559 (0.2558) loss 5.4417 (6.0321) grad_norm 1.2906 (1.8835) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][70/625] eta 0:02:30 lr 0.000887 wd 0.0500 time 0.2521 (0.2710) data time 0.0008 (0.0084) model time 0.2512 (0.2642) loss 6.2004 (6.0912) grad_norm 2.7085 (1.9575) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][80/625] eta 0:02:26 lr 0.000887 wd 0.0500 time 0.2583 (0.2691) data time 0.0011 (0.0075) model time 0.2572 (0.2611) loss 4.7821 (6.0661) grad_norm 1.9729 (2.0299) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][90/625] eta 0:02:23 lr 0.000887 wd 0.0500 time 0.2602 (0.2679) data time 0.0011 (0.0068) model time 0.2591 (0.2600) loss 6.9013 (6.0593) grad_norm 1.5797 (2.1262) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][100/625] eta 0:02:19 lr 0.000886 wd 0.0500 time 0.2575 (0.2666) data time 0.0006 (0.0062) model time 0.2569 (0.2589) loss 5.7126 (6.0675) grad_norm 2.2243 (2.1399) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][110/625] eta 0:02:16 lr 0.000886 wd 0.0500 time 0.2594 (0.2657) data time 0.0009 (0.0058) model time 0.2585 (0.2583) loss 6.1301 (6.0684) grad_norm 1.5862 (2.1115) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][120/625] eta 0:02:13 lr 0.000886 wd 0.0500 time 0.2530 (0.2648) data time 0.0007 (0.0054) model time 0.2523 (0.2577) loss 5.7137 (6.0605) grad_norm 1.5878 (2.0657) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][130/625] eta 0:02:10 lr 0.000886 wd 0.0500 time 0.2582 (0.2642) data time 0.0010 (0.0050) model time 0.2572 (0.2575) loss 5.1748 (6.0414) grad_norm 2.0350 (2.0535) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][140/625] eta 0:02:08 lr 0.000886 wd 0.0500 time 0.2551 (0.2650) data time 0.0010 (0.0047) model time 0.2541 (0.2593) loss 7.0323 (6.0368) grad_norm 3.2443 (2.0569) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][150/625] eta 0:02:05 lr 0.000885 wd 0.0500 time 0.2580 (0.2644) data time 0.0006 (0.0045) model time 0.2575 (0.2589) loss 6.7453 (6.0243) grad_norm 2.9716 (2.0546) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][160/625] eta 0:02:02 lr 0.000885 wd 0.0500 time 0.2564 (0.2639) data time 0.0010 (0.0043) model time 0.2553 (0.2586) loss 5.3552 (6.0194) grad_norm 1.6303 (2.0680) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][170/625] eta 0:01:59 lr 0.000885 wd 0.0500 time 0.2528 (0.2634) data time 0.0009 (0.0041) model time 0.2519 (0.2582) loss 4.8402 (6.0089) grad_norm 2.8816 (2.0876) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][180/625] eta 0:01:57 lr 0.000885 wd 0.0500 time 0.2547 (0.2645) data time 0.0009 (0.0039) model time 0.2538 (0.2601) loss 5.8274 (6.0191) grad_norm 2.0666 (2.1177) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][190/625] eta 0:01:54 lr 0.000885 wd 0.0500 time 0.2552 (0.2640) data time 0.0011 (0.0037) model time 0.2541 (0.2597) loss 4.9128 (6.0245) grad_norm 1.3002 (2.0974) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][200/625] eta 0:01:52 lr 0.000885 wd 0.0500 time 0.2567 (0.2636) data time 0.0006 (0.0036) model time 0.2561 (0.2594) loss 5.9380 (6.0167) grad_norm 1.8740 (2.0778) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][210/625] eta 0:01:49 lr 0.000884 wd 0.0500 time 0.2548 (0.2633) data time 0.0007 (0.0035) model time 0.2541 (0.2591) loss 6.0753 (6.0106) grad_norm 2.6228 (2.0674) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][220/625] eta 0:01:46 lr 0.000884 wd 0.0500 time 0.2548 (0.2630) data time 0.0010 (0.0034) model time 0.2538 (0.2589) loss 6.4958 (6.0263) grad_norm 1.2070 (2.0548) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][230/625] eta 0:01:43 lr 0.000884 wd 0.0500 time 0.2543 (0.2627) data time 0.0010 (0.0033) model time 0.2533 (0.2587) loss 6.3706 (6.0226) grad_norm 2.6624 (2.0575) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][240/625] eta 0:01:41 lr 0.000884 wd 0.0500 time 0.2577 (0.2625) data time 0.0005 (0.0032) model time 0.2572 (0.2586) loss 6.8175 (6.0187) grad_norm 1.6527 (2.0675) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][250/625] eta 0:01:38 lr 0.000884 wd 0.0500 time 0.2531 (0.2622) data time 0.0006 (0.0031) model time 0.2526 (0.2584) loss 6.6001 (6.0203) grad_norm 2.5027 (2.0785) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][260/625] eta 0:01:35 lr 0.000884 wd 0.0500 time 0.2541 (0.2620) data time 0.0011 (0.0030) model time 0.2530 (0.2583) loss 6.0423 (6.0126) grad_norm 1.9099 (2.0792) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][270/625] eta 0:01:32 lr 0.000883 wd 0.0500 time 0.2532 (0.2618) data time 0.0007 (0.0029) model time 0.2525 (0.2582) loss 7.1549 (6.0008) grad_norm 1.4090 (2.0682) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][280/625] eta 0:01:30 lr 0.000883 wd 0.0500 time 0.2528 (0.2616) data time 0.0007 (0.0028) model time 0.2521 (0.2581) loss 5.4580 (5.9927) grad_norm 3.5688 (2.0761) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][290/625] eta 0:01:27 lr 0.000883 wd 0.0500 time 0.2555 (0.2614) data time 0.0006 (0.0028) model time 0.2549 (0.2579) loss 7.1450 (5.9892) grad_norm 1.4453 (2.0741) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][300/625] eta 0:01:25 lr 0.000883 wd 0.0500 time 0.2507 (0.2618) data time 0.0006 (0.0027) model time 0.2501 (0.2586) loss 5.7224 (5.9845) grad_norm 2.3822 (2.0764) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:36:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][310/625] eta 0:01:22 lr 0.000883 wd 0.0500 time 0.2547 (0.2616) data time 0.0009 (0.0026) model time 0.2538 (0.2584) loss 5.4625 (5.9804) grad_norm 1.5160 (2.0760) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][320/625] eta 0:01:19 lr 0.000882 wd 0.0500 time 0.2661 (0.2615) data time 0.0010 (0.0026) model time 0.2652 (0.2583) loss 5.8900 (5.9860) grad_norm 3.1823 (2.0809) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][330/625] eta 0:01:17 lr 0.000882 wd 0.0500 time 0.2553 (0.2613) data time 0.0008 (0.0025) model time 0.2545 (0.2582) loss 4.8437 (5.9768) grad_norm 1.7036 (2.0782) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][340/625] eta 0:01:14 lr 0.000882 wd 0.0500 time 0.2583 (0.2611) data time 0.0009 (0.0025) model time 0.2575 (0.2581) loss 6.7295 (5.9677) grad_norm 2.2764 (2.0815) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][350/625] eta 0:01:11 lr 0.000882 wd 0.0500 time 0.2514 (0.2610) data time 0.0010 (0.0024) model time 0.2504 (0.2580) loss 4.8262 (5.9582) grad_norm 2.3191 (2.0785) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][360/625] eta 0:01:09 lr 0.000882 wd 0.0500 time 0.2598 (0.2609) data time 0.0006 (0.0024) model time 0.2591 (0.2579) loss 4.3841 (5.9378) grad_norm 1.5534 (2.0711) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][370/625] eta 0:01:06 lr 0.000882 wd 0.0500 time 0.2527 (0.2607) data time 0.0008 (0.0024) model time 0.2519 (0.2578) loss 6.9136 (5.9487) grad_norm 1.6807 (2.0667) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][380/625] eta 0:01:03 lr 0.000881 wd 0.0500 time 0.2532 (0.2606) data time 0.0007 (0.0023) model time 0.2526 (0.2577) loss 6.8988 (5.9449) grad_norm 1.3619 (2.0828) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][390/625] eta 0:01:01 lr 0.000881 wd 0.0500 time 0.2539 (0.2605) data time 0.0007 (0.0023) model time 0.2532 (0.2576) loss 6.6859 (5.9418) grad_norm 1.5711 (2.0773) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][400/625] eta 0:00:58 lr 0.000881 wd 0.0500 time 0.2545 (0.2604) data time 0.0009 (0.0023) model time 0.2537 (0.2576) loss 5.2037 (5.9419) grad_norm 1.8434 (2.0697) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][410/625] eta 0:00:55 lr 0.000881 wd 0.0500 time 0.2587 (0.2603) data time 0.0008 (0.0022) model time 0.2579 (0.2575) loss 6.0420 (5.9421) grad_norm 2.3467 (2.0746) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][420/625] eta 0:00:53 lr 0.000881 wd 0.0500 time 0.2591 (0.2602) data time 0.0008 (0.0022) model time 0.2583 (0.2575) loss 5.0742 (5.9397) grad_norm 1.3206 (2.0708) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][430/625] eta 0:00:50 lr 0.000881 wd 0.0500 time 0.2614 (0.2601) data time 0.0009 (0.0022) model time 0.2604 (0.2574) loss 5.9499 (5.9395) grad_norm 1.7755 (2.0633) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][440/625] eta 0:00:48 lr 0.000880 wd 0.0500 time 0.2562 (0.2600) data time 0.0010 (0.0021) model time 0.2553 (0.2574) loss 6.2227 (5.9326) grad_norm 1.7382 (2.0580) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][450/625] eta 0:00:45 lr 0.000880 wd 0.0500 time 0.2634 (0.2600) data time 0.0008 (0.0021) model time 0.2626 (0.2573) loss 7.1118 (5.9437) grad_norm 2.0562 (2.0554) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][460/625] eta 0:00:42 lr 0.000880 wd 0.0500 time 0.2555 (0.2599) data time 0.0009 (0.0021) model time 0.2546 (0.2573) loss 6.4749 (5.9485) grad_norm 2.6648 (2.0542) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][470/625] eta 0:00:40 lr 0.000880 wd 0.0500 time 0.2577 (0.2598) data time 0.0009 (0.0021) model time 0.2567 (0.2572) loss 6.8058 (5.9454) grad_norm 1.9372 (2.0574) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][480/625] eta 0:00:37 lr 0.000880 wd 0.0500 time 0.2532 (0.2602) data time 0.0009 (0.0020) model time 0.2523 (0.2577) loss 6.8892 (5.9436) grad_norm 2.6774 (2.0704) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][490/625] eta 0:00:35 lr 0.000879 wd 0.0500 time 0.2554 (0.2602) data time 0.0010 (0.0020) model time 0.2544 (0.2577) loss 4.7864 (5.9407) grad_norm 1.6240 (2.0709) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][500/625] eta 0:00:32 lr 0.000879 wd 0.0500 time 0.2564 (0.2601) data time 0.0008 (0.0020) model time 0.2556 (0.2577) loss 5.5068 (5.9431) grad_norm 1.6659 (2.0637) loss_scale 2048.0000 (1030.1317) mem 9655MB [2024-08-04 05:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][510/625] eta 0:00:29 lr 0.000879 wd 0.0500 time 0.2566 (0.2608) data time 0.0012 (0.0020) model time 0.2554 (0.2585) loss 5.3331 (5.9443) grad_norm 2.9588 (2.0804) loss_scale 2048.0000 (1050.0509) mem 9655MB [2024-08-04 05:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][520/625] eta 0:00:27 lr 0.000879 wd 0.0500 time 0.2587 (0.2608) data time 0.0006 (0.0020) model time 0.2581 (0.2585) loss 6.2406 (5.9441) grad_norm 2.6264 (2.0814) loss_scale 2048.0000 (1069.2054) mem 9655MB [2024-08-04 05:37:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][530/625] eta 0:00:24 lr 0.000879 wd 0.0500 time 0.2555 (0.2607) data time 0.0008 (0.0019) model time 0.2547 (0.2584) loss 5.5747 (5.9458) grad_norm 2.3475 (2.0861) loss_scale 2048.0000 (1087.6384) mem 9655MB [2024-08-04 05:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][540/625] eta 0:00:22 lr 0.000879 wd 0.0500 time 0.2581 (0.2610) data time 0.0008 (0.0019) model time 0.2572 (0.2587) loss 6.3077 (5.9493) grad_norm 2.2831 (2.0845) loss_scale 2048.0000 (1105.3900) mem 9655MB [2024-08-04 05:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][550/625] eta 0:00:19 lr 0.000878 wd 0.0500 time 0.2553 (0.2609) data time 0.0011 (0.0019) model time 0.2542 (0.2587) loss 5.6133 (5.9435) grad_norm 1.9377 (2.0815) loss_scale 2048.0000 (1122.4973) mem 9655MB [2024-08-04 05:38:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][560/625] eta 0:00:16 lr 0.000878 wd 0.0500 time 0.2556 (0.2608) data time 0.0007 (0.0019) model time 0.2548 (0.2586) loss 6.8559 (5.9522) grad_norm 1.5502 (2.0757) loss_scale 2048.0000 (1138.9947) mem 9655MB [2024-08-04 05:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][570/625] eta 0:00:14 lr 0.000878 wd 0.0500 time 0.2581 (0.2607) data time 0.0006 (0.0019) model time 0.2576 (0.2585) loss 6.2247 (5.9527) grad_norm 1.4001 (2.0719) loss_scale 2048.0000 (1154.9142) mem 9655MB [2024-08-04 05:38:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][580/625] eta 0:00:11 lr 0.000878 wd 0.0500 time 0.2585 (0.2606) data time 0.0009 (0.0019) model time 0.2576 (0.2585) loss 5.3126 (5.9557) grad_norm 1.5263 (2.0649) loss_scale 2048.0000 (1170.2857) mem 9655MB [2024-08-04 05:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][590/625] eta 0:00:09 lr 0.000878 wd 0.0500 time 0.2643 (0.2605) data time 0.0008 (0.0018) model time 0.2635 (0.2584) loss 6.9446 (5.9554) grad_norm 3.3119 (2.0709) loss_scale 2048.0000 (1185.1371) mem 9655MB [2024-08-04 05:38:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][600/625] eta 0:00:06 lr 0.000878 wd 0.0500 time 0.2570 (0.2608) data time 0.0011 (0.0018) model time 0.2559 (0.2587) loss 6.5648 (5.9545) grad_norm 2.4368 (2.0772) loss_scale 2048.0000 (1199.4942) mem 9655MB [2024-08-04 05:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][610/625] eta 0:00:03 lr 0.000877 wd 0.0500 time 0.2524 (0.2607) data time 0.0004 (0.0018) model time 0.2520 (0.2586) loss 7.2723 (5.9581) grad_norm 2.9563 (2.0793) loss_scale 2048.0000 (1213.3813) mem 9655MB [2024-08-04 05:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][620/625] eta 0:00:01 lr 0.000877 wd 0.0500 time 0.2572 (0.2606) data time 0.0005 (0.0018) model time 0.2567 (0.2585) loss 6.3358 (5.9552) grad_norm 3.9496 (2.1038) loss_scale 2048.0000 (1226.8213) mem 9655MB [2024-08-04 05:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 171 training takes 0:02:42 [2024-08-04 05:38:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:38:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6357 (0.6357) Acc@1 88.525 (88.525) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 05:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 1.0225 (0.7809) Acc@1 78.857 (84.939) Acc@5 94.873 (97.252) Mem 9655MB [2024-08-04 05:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1270 (0.9133) Acc@1 74.756 (81.457) Acc@5 93.994 (95.850) Mem 9655MB [2024-08-04 05:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.100 Acc@5 95.821 [2024-08-04 05:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 05:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.10% [2024-08-04 05:38:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:38:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.409 (0.409) Loss 0.5884 (0.5884) Acc@1 89.600 (89.600) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.088) Loss 0.9316 (0.7259) Acc@1 79.834 (85.662) Acc@5 95.654 (97.541) Mem 9655MB [2024-08-04 05:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.072) Loss 1.0635 (0.8543) Acc@1 75.439 (82.185) Acc@5 94.629 (96.208) Mem 9655MB [2024-08-04 05:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.888 Acc@5 96.189 [2024-08-04 05:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][0/625] eta 0:11:41 lr 0.000877 wd 0.0500 time 1.1222 (1.1222) data time 0.6812 (0.6812) model time 0.0000 (0.0000) loss 6.4082 (6.4082) grad_norm 3.2493 (3.2493) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][10/625] eta 0:03:32 lr 0.000877 wd 0.0500 time 0.2513 (0.3460) data time 0.0007 (0.0628) model time 0.0000 (0.0000) loss 5.8263 (6.0215) grad_norm 3.7628 (2.4661) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][20/625] eta 0:03:03 lr 0.000877 wd 0.0500 time 0.2537 (0.3032) data time 0.0010 (0.0333) model time 0.0000 (0.0000) loss 7.5849 (6.1825) grad_norm 2.6445 (2.3547) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][30/625] eta 0:02:51 lr 0.000877 wd 0.0500 time 0.2518 (0.2877) data time 0.0011 (0.0228) model time 0.0000 (0.0000) loss 6.8694 (6.2508) grad_norm 2.3602 (2.3145) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][40/625] eta 0:02:43 lr 0.000876 wd 0.0500 time 0.2593 (0.2798) data time 0.0008 (0.0175) model time 0.0000 (0.0000) loss 5.4515 (6.2207) grad_norm 1.5732 (2.1951) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][50/625] eta 0:02:38 lr 0.000876 wd 0.0500 time 0.2547 (0.2750) data time 0.0006 (0.0143) model time 0.0000 (0.0000) loss 6.0155 (6.1836) grad_norm 1.2911 (2.1567) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][60/625] eta 0:02:33 lr 0.000876 wd 0.0500 time 0.2542 (0.2718) data time 0.0011 (0.0121) model time 0.2531 (0.2545) loss 6.3100 (6.1642) grad_norm 1.7143 (2.1752) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][70/625] eta 0:02:30 lr 0.000876 wd 0.0500 time 0.2515 (0.2712) data time 0.0008 (0.0105) model time 0.2507 (0.2606) loss 5.3453 (6.0792) grad_norm 1.7941 (2.1732) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][80/625] eta 0:02:26 lr 0.000876 wd 0.0500 time 0.2580 (0.2693) data time 0.0006 (0.0093) model time 0.2574 (0.2585) loss 5.4467 (6.0386) grad_norm 3.5453 (2.2541) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][90/625] eta 0:02:23 lr 0.000876 wd 0.0500 time 0.2545 (0.2678) data time 0.0010 (0.0084) model time 0.2536 (0.2575) loss 5.2467 (6.0082) grad_norm 1.8849 (2.2687) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 05:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][100/625] eta 0:02:19 lr 0.000875 wd 0.0500 time 0.2553 (0.2664) data time 0.0008 (0.0077) model time 0.2545 (0.2567) loss 5.5919 (6.0065) grad_norm 1.7131 (inf) loss_scale 1024.0000 (1997.3069) mem 9655MB [2024-08-04 05:38:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][110/625] eta 0:02:16 lr 0.000875 wd 0.0500 time 0.2507 (0.2655) data time 0.0010 (0.0071) model time 0.2497 (0.2565) loss 4.1985 (6.0079) grad_norm 2.3695 (inf) loss_scale 1024.0000 (1909.6216) mem 9655MB [2024-08-04 05:38:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][120/625] eta 0:02:13 lr 0.000875 wd 0.0500 time 0.2513 (0.2646) data time 0.0010 (0.0066) model time 0.2503 (0.2561) loss 5.7369 (5.9725) grad_norm 1.7875 (inf) loss_scale 1024.0000 (1836.4298) mem 9655MB [2024-08-04 05:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][130/625] eta 0:02:10 lr 0.000875 wd 0.0500 time 0.2553 (0.2640) data time 0.0010 (0.0061) model time 0.2543 (0.2560) loss 6.8989 (6.0003) grad_norm 2.1937 (inf) loss_scale 1024.0000 (1774.4122) mem 9655MB [2024-08-04 05:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][140/625] eta 0:02:08 lr 0.000875 wd 0.0500 time 0.2645 (0.2647) data time 0.0008 (0.0058) model time 0.2636 (0.2579) loss 6.7455 (5.9942) grad_norm 1.7004 (inf) loss_scale 1024.0000 (1721.1915) mem 9655MB [2024-08-04 05:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][150/625] eta 0:02:05 lr 0.000874 wd 0.0500 time 0.2562 (0.2650) data time 0.0008 (0.0055) model time 0.2554 (0.2590) loss 4.9998 (5.9989) grad_norm 2.5708 (inf) loss_scale 1024.0000 (1675.0199) mem 9655MB [2024-08-04 05:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][160/625] eta 0:02:02 lr 0.000874 wd 0.0500 time 0.2541 (0.2645) data time 0.0008 (0.0052) model time 0.2533 (0.2586) loss 6.5531 (5.9999) grad_norm 1.6235 (inf) loss_scale 1024.0000 (1634.5839) mem 9655MB [2024-08-04 05:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][170/625] eta 0:02:00 lr 0.000874 wd 0.0500 time 0.2563 (0.2641) data time 0.0009 (0.0049) model time 0.2554 (0.2585) loss 7.4058 (5.9979) grad_norm 2.2225 (inf) loss_scale 1024.0000 (1598.8772) mem 9655MB [2024-08-04 05:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][180/625] eta 0:01:57 lr 0.000874 wd 0.0500 time 0.2581 (0.2637) data time 0.0006 (0.0047) model time 0.2575 (0.2583) loss 4.5149 (5.9892) grad_norm 1.3651 (inf) loss_scale 1024.0000 (1567.1160) mem 9655MB [2024-08-04 05:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][190/625] eta 0:01:54 lr 0.000874 wd 0.0500 time 0.2493 (0.2633) data time 0.0008 (0.0045) model time 0.2485 (0.2581) loss 5.6766 (5.9745) grad_norm 2.4491 (inf) loss_scale 1024.0000 (1538.6806) mem 9655MB [2024-08-04 05:39:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][200/625] eta 0:01:51 lr 0.000874 wd 0.0500 time 0.2578 (0.2630) data time 0.0009 (0.0043) model time 0.2569 (0.2580) loss 5.6794 (5.9720) grad_norm 2.1985 (inf) loss_scale 1024.0000 (1513.0746) mem 9655MB [2024-08-04 05:39:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][210/625] eta 0:01:49 lr 0.000873 wd 0.0500 time 0.2561 (0.2627) data time 0.0010 (0.0042) model time 0.2551 (0.2578) loss 7.0385 (5.9829) grad_norm 2.0907 (inf) loss_scale 1024.0000 (1489.8957) mem 9655MB [2024-08-04 05:39:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][220/625] eta 0:01:46 lr 0.000873 wd 0.0500 time 0.2573 (0.2624) data time 0.0007 (0.0040) model time 0.2566 (0.2577) loss 6.0761 (5.9715) grad_norm 1.4595 (inf) loss_scale 1024.0000 (1468.8145) mem 9655MB [2024-08-04 05:39:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][230/625] eta 0:01:43 lr 0.000873 wd 0.0500 time 0.2553 (0.2621) data time 0.0008 (0.0039) model time 0.2545 (0.2575) loss 4.9172 (5.9866) grad_norm 5.1543 (inf) loss_scale 1024.0000 (1449.5584) mem 9655MB [2024-08-04 05:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][240/625] eta 0:01:40 lr 0.000873 wd 0.0500 time 0.2537 (0.2618) data time 0.0008 (0.0038) model time 0.2529 (0.2573) loss 6.7153 (5.9991) grad_norm 1.2330 (inf) loss_scale 1024.0000 (1431.9004) mem 9655MB [2024-08-04 05:39:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][250/625] eta 0:01:38 lr 0.000873 wd 0.0500 time 0.2509 (0.2621) data time 0.0016 (0.0037) model time 0.2492 (0.2578) loss 4.9301 (5.9989) grad_norm 1.9626 (inf) loss_scale 1024.0000 (1415.6494) mem 9655MB [2024-08-04 05:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][260/625] eta 0:01:35 lr 0.000873 wd 0.0500 time 0.2599 (0.2626) data time 0.0007 (0.0036) model time 0.2592 (0.2587) loss 6.3927 (6.0117) grad_norm 3.2356 (inf) loss_scale 1024.0000 (1400.6437) mem 9655MB [2024-08-04 05:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][270/625] eta 0:01:33 lr 0.000872 wd 0.0500 time 0.2545 (0.2624) data time 0.0009 (0.0035) model time 0.2536 (0.2585) loss 6.4352 (6.0095) grad_norm 2.2218 (inf) loss_scale 1024.0000 (1386.7454) mem 9655MB [2024-08-04 05:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][280/625] eta 0:01:30 lr 0.000872 wd 0.0500 time 0.2555 (0.2621) data time 0.0009 (0.0034) model time 0.2546 (0.2583) loss 6.2035 (6.0035) grad_norm 3.2219 (inf) loss_scale 1024.0000 (1373.8363) mem 9655MB [2024-08-04 05:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][290/625] eta 0:01:27 lr 0.000872 wd 0.0500 time 0.2531 (0.2620) data time 0.0010 (0.0033) model time 0.2522 (0.2582) loss 6.2245 (6.0146) grad_norm 2.7485 (inf) loss_scale 1024.0000 (1361.8144) mem 9655MB [2024-08-04 05:39:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][300/625] eta 0:01:25 lr 0.000872 wd 0.0500 time 0.2563 (0.2618) data time 0.0007 (0.0032) model time 0.2556 (0.2581) loss 6.7289 (6.0138) grad_norm 3.6164 (inf) loss_scale 1024.0000 (1350.5914) mem 9655MB [2024-08-04 05:39:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][310/625] eta 0:01:22 lr 0.000872 wd 0.0500 time 0.2384 (0.2616) data time 0.0009 (0.0031) model time 0.2375 (0.2580) loss 6.5690 (6.0096) grad_norm 2.3232 (inf) loss_scale 1024.0000 (1340.0900) mem 9655MB [2024-08-04 05:39:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][320/625] eta 0:01:19 lr 0.000871 wd 0.0500 time 0.2584 (0.2614) data time 0.0006 (0.0031) model time 0.2578 (0.2579) loss 5.5696 (6.0051) grad_norm 2.3429 (inf) loss_scale 1024.0000 (1330.2430) mem 9655MB [2024-08-04 05:39:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][330/625] eta 0:01:17 lr 0.000871 wd 0.0500 time 0.2516 (0.2613) data time 0.0008 (0.0030) model time 0.2508 (0.2578) loss 4.7778 (5.9964) grad_norm 2.0666 (inf) loss_scale 1024.0000 (1320.9909) mem 9655MB [2024-08-04 05:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][340/625] eta 0:01:14 lr 0.000871 wd 0.0500 time 0.2530 (0.2611) data time 0.0007 (0.0029) model time 0.2523 (0.2577) loss 4.3414 (5.9962) grad_norm 1.2620 (inf) loss_scale 1024.0000 (1312.2815) mem 9655MB [2024-08-04 05:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][350/625] eta 0:01:11 lr 0.000871 wd 0.0500 time 0.2561 (0.2613) data time 0.0008 (0.0029) model time 0.2554 (0.2580) loss 5.7842 (5.9927) grad_norm 1.9137 (inf) loss_scale 1024.0000 (1304.0684) mem 9655MB [2024-08-04 05:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][360/625] eta 0:01:09 lr 0.000871 wd 0.0500 time 0.2589 (0.2611) data time 0.0010 (0.0028) model time 0.2580 (0.2579) loss 7.0700 (5.9861) grad_norm 1.3205 (inf) loss_scale 1024.0000 (1296.3102) mem 9655MB [2024-08-04 05:40:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][370/625] eta 0:01:06 lr 0.000871 wd 0.0500 time 0.2561 (0.2610) data time 0.0007 (0.0028) model time 0.2554 (0.2578) loss 6.7343 (5.9850) grad_norm 3.4115 (inf) loss_scale 1024.0000 (1288.9704) mem 9655MB [2024-08-04 05:40:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][380/625] eta 0:01:03 lr 0.000870 wd 0.0500 time 0.2557 (0.2609) data time 0.0008 (0.0027) model time 0.2548 (0.2577) loss 6.5573 (5.9880) grad_norm 1.3557 (inf) loss_scale 1024.0000 (1282.0157) mem 9655MB [2024-08-04 05:40:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][390/625] eta 0:01:01 lr 0.000870 wd 0.0500 time 0.2550 (0.2607) data time 0.0010 (0.0027) model time 0.2540 (0.2576) loss 6.8456 (5.9884) grad_norm 3.0385 (inf) loss_scale 512.0000 (1266.2506) mem 9655MB [2024-08-04 05:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][400/625] eta 0:00:58 lr 0.000870 wd 0.0500 time 0.2599 (0.2606) data time 0.0012 (0.0026) model time 0.2587 (0.2575) loss 6.5459 (5.9790) grad_norm 2.9453 (inf) loss_scale 512.0000 (1247.4414) mem 9655MB [2024-08-04 05:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][410/625] eta 0:00:56 lr 0.000870 wd 0.0500 time 0.2543 (0.2611) data time 0.0009 (0.0026) model time 0.2535 (0.2581) loss 5.2153 (5.9636) grad_norm 2.2893 (inf) loss_scale 512.0000 (1229.5474) mem 9655MB [2024-08-04 05:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][420/625] eta 0:00:53 lr 0.000870 wd 0.0500 time 0.2537 (0.2609) data time 0.0010 (0.0026) model time 0.2527 (0.2580) loss 6.5524 (5.9671) grad_norm 1.4601 (inf) loss_scale 512.0000 (1212.5036) mem 9655MB [2024-08-04 05:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][430/625] eta 0:00:50 lr 0.000870 wd 0.0500 time 0.2585 (0.2608) data time 0.0008 (0.0025) model time 0.2577 (0.2580) loss 5.6351 (5.9696) grad_norm 2.8506 (inf) loss_scale 512.0000 (1196.2506) mem 9655MB [2024-08-04 05:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][440/625] eta 0:00:48 lr 0.000869 wd 0.0500 time 0.2571 (0.2612) data time 0.0010 (0.0025) model time 0.2561 (0.2584) loss 6.6846 (5.9676) grad_norm 2.0208 (inf) loss_scale 512.0000 (1180.7347) mem 9655MB [2024-08-04 05:40:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][450/625] eta 0:00:45 lr 0.000869 wd 0.0500 time 0.2570 (0.2611) data time 0.0008 (0.0025) model time 0.2562 (0.2584) loss 7.0512 (5.9738) grad_norm 1.5828 (inf) loss_scale 512.0000 (1165.9069) mem 9655MB [2024-08-04 05:40:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][460/625] eta 0:00:43 lr 0.000869 wd 0.0500 time 0.2565 (0.2610) data time 0.0010 (0.0024) model time 0.2555 (0.2583) loss 6.6672 (5.9762) grad_norm 1.3105 (inf) loss_scale 512.0000 (1151.7223) mem 9655MB [2024-08-04 05:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][470/625] eta 0:00:40 lr 0.000869 wd 0.0500 time 0.2553 (0.2609) data time 0.0008 (0.0024) model time 0.2544 (0.2582) loss 5.2720 (5.9754) grad_norm 1.5431 (inf) loss_scale 512.0000 (1138.1401) mem 9655MB [2024-08-04 05:40:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][480/625] eta 0:00:37 lr 0.000869 wd 0.0500 time 0.2532 (0.2608) data time 0.0008 (0.0024) model time 0.2525 (0.2581) loss 6.5794 (5.9740) grad_norm 2.7571 (inf) loss_scale 512.0000 (1125.1227) mem 9655MB [2024-08-04 05:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][490/625] eta 0:00:35 lr 0.000868 wd 0.0500 time 0.2548 (0.2607) data time 0.0008 (0.0023) model time 0.2541 (0.2580) loss 6.7210 (5.9768) grad_norm 1.5350 (inf) loss_scale 512.0000 (1112.6354) mem 9655MB [2024-08-04 05:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][500/625] eta 0:00:32 lr 0.000868 wd 0.0500 time 0.2559 (0.2606) data time 0.0011 (0.0023) model time 0.2548 (0.2580) loss 6.6324 (5.9827) grad_norm 1.7788 (inf) loss_scale 512.0000 (1100.6467) mem 9655MB [2024-08-04 05:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][510/625] eta 0:00:29 lr 0.000868 wd 0.0500 time 0.2537 (0.2605) data time 0.0006 (0.0023) model time 0.2531 (0.2579) loss 6.4043 (5.9887) grad_norm 2.1576 (inf) loss_scale 512.0000 (1089.1272) mem 9655MB [2024-08-04 05:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][520/625] eta 0:00:27 lr 0.000868 wd 0.0500 time 0.2588 (0.2604) data time 0.0006 (0.0023) model time 0.2582 (0.2579) loss 6.8042 (5.9929) grad_norm 4.6840 (inf) loss_scale 512.0000 (1078.0499) mem 9655MB [2024-08-04 05:40:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][530/625] eta 0:00:24 lr 0.000868 wd 0.0500 time 0.2510 (0.2603) data time 0.0010 (0.0022) model time 0.2500 (0.2578) loss 5.4624 (5.9904) grad_norm 1.4105 (inf) loss_scale 512.0000 (1067.3898) mem 9655MB [2024-08-04 05:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][540/625] eta 0:00:22 lr 0.000868 wd 0.0500 time 0.2572 (0.2605) data time 0.0007 (0.0022) model time 0.2565 (0.2581) loss 5.5307 (5.9918) grad_norm 2.0401 (inf) loss_scale 512.0000 (1057.1238) mem 9655MB [2024-08-04 05:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][550/625] eta 0:00:19 lr 0.000867 wd 0.0500 time 0.2585 (0.2607) data time 0.0008 (0.0022) model time 0.2576 (0.2583) loss 4.4396 (5.9844) grad_norm 1.2973 (inf) loss_scale 512.0000 (1047.2305) mem 9655MB [2024-08-04 05:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][560/625] eta 0:00:16 lr 0.000867 wd 0.0500 time 0.2544 (0.2606) data time 0.0007 (0.0022) model time 0.2537 (0.2582) loss 5.2252 (5.9854) grad_norm 2.0370 (inf) loss_scale 512.0000 (1037.6898) mem 9655MB [2024-08-04 05:40:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][570/625] eta 0:00:14 lr 0.000867 wd 0.0500 time 0.2580 (0.2605) data time 0.0008 (0.0021) model time 0.2572 (0.2581) loss 6.3909 (5.9847) grad_norm 1.5653 (inf) loss_scale 512.0000 (1028.4834) mem 9655MB [2024-08-04 05:40:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][580/625] eta 0:00:11 lr 0.000867 wd 0.0500 time 0.2519 (0.2604) data time 0.0009 (0.0021) model time 0.2510 (0.2581) loss 5.5409 (5.9904) grad_norm 1.5530 (inf) loss_scale 512.0000 (1019.5938) mem 9655MB [2024-08-04 05:40:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][590/625] eta 0:00:09 lr 0.000867 wd 0.0500 time 0.2539 (0.2604) data time 0.0010 (0.0021) model time 0.2529 (0.2580) loss 6.0583 (5.9861) grad_norm 1.2070 (inf) loss_scale 512.0000 (1011.0051) mem 9655MB [2024-08-04 05:41:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][600/625] eta 0:00:06 lr 0.000867 wd 0.0500 time 0.2550 (0.2603) data time 0.0011 (0.0021) model time 0.2539 (0.2580) loss 5.5629 (5.9875) grad_norm 3.5630 (inf) loss_scale 512.0000 (1002.7022) mem 9655MB [2024-08-04 05:41:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][610/625] eta 0:00:03 lr 0.000866 wd 0.0500 time 0.2514 (0.2602) data time 0.0005 (0.0021) model time 0.2509 (0.2579) loss 6.5149 (5.9865) grad_norm 1.6823 (inf) loss_scale 512.0000 (994.6710) mem 9655MB [2024-08-04 05:41:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][620/625] eta 0:00:01 lr 0.000866 wd 0.0500 time 0.2538 (0.2606) data time 0.0006 (0.0021) model time 0.2532 (0.2584) loss 6.4281 (5.9856) grad_norm 2.3775 (inf) loss_scale 512.0000 (986.8986) mem 9655MB [2024-08-04 05:41:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 172 training takes 0:02:42 [2024-08-04 05:41:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:41:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:41:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6123 (0.6123) Acc@1 89.307 (89.307) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 0.9834 (0.7702) Acc@1 78.711 (84.917) Acc@5 96.143 (97.359) Mem 9655MB [2024-08-04 05:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1221 (0.9068) Acc@1 73.779 (81.201) Acc@5 93.457 (95.796) Mem 9655MB [2024-08-04 05:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.886 Acc@5 95.777 [2024-08-04 05:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-04 05:41:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.779 (0.779) Loss 0.5874 (0.5874) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9307 (0.7256) Acc@1 79.834 (85.653) Acc@5 95.605 (97.528) Mem 9655MB [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0625 (0.8540) Acc@1 75.635 (82.208) Acc@5 94.629 (96.194) Mem 9655MB [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.904 Acc@5 96.179 [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.90% [2024-08-04 05:41:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:41:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:41:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][0/625] eta 0:07:26 lr 0.000866 wd 0.0500 time 0.7142 (0.7142) data time 0.4676 (0.4676) model time 0.0000 (0.0000) loss 6.2124 (6.2124) grad_norm 1.1750 (1.1750) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][10/625] eta 0:03:12 lr 0.000866 wd 0.0500 time 0.2571 (0.3135) data time 0.0010 (0.0434) model time 0.0000 (0.0000) loss 6.1719 (5.8352) grad_norm 1.7191 (2.2064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][20/625] eta 0:02:52 lr 0.000866 wd 0.0500 time 0.2559 (0.2858) data time 0.0009 (0.0232) model time 0.0000 (0.0000) loss 5.4306 (5.9580) grad_norm 2.0385 (1.9950) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][30/625] eta 0:02:44 lr 0.000866 wd 0.0500 time 0.2685 (0.2766) data time 0.0007 (0.0160) model time 0.0000 (0.0000) loss 6.2435 (5.9169) grad_norm 7.0326 (2.3127) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][40/625] eta 0:02:38 lr 0.000865 wd 0.0500 time 0.2533 (0.2714) data time 0.0010 (0.0124) model time 0.0000 (0.0000) loss 6.3627 (5.8898) grad_norm 2.4714 (2.3811) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][50/625] eta 0:02:34 lr 0.000865 wd 0.0500 time 0.2539 (0.2683) data time 0.0008 (0.0101) model time 0.0000 (0.0000) loss 5.7403 (5.9201) grad_norm 1.5882 (2.3651) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][60/625] eta 0:02:31 lr 0.000865 wd 0.0500 time 0.4033 (0.2687) data time 0.0007 (0.0086) model time 0.4025 (0.2700) loss 5.9751 (5.9290) grad_norm 2.1583 (2.3140) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][70/625] eta 0:02:28 lr 0.000865 wd 0.0500 time 0.2593 (0.2670) data time 0.0005 (0.0075) model time 0.2587 (0.2630) loss 4.9314 (5.9220) grad_norm 1.9854 (2.2557) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][80/625] eta 0:02:24 lr 0.000865 wd 0.0500 time 0.2536 (0.2657) data time 0.0007 (0.0067) model time 0.2529 (0.2604) loss 6.2929 (5.8959) grad_norm 1.4578 (2.2104) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][90/625] eta 0:02:21 lr 0.000865 wd 0.0500 time 0.2540 (0.2646) data time 0.0006 (0.0061) model time 0.2534 (0.2590) loss 7.1902 (5.9100) grad_norm 2.5044 (2.1859) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][100/625] eta 0:02:18 lr 0.000864 wd 0.0500 time 0.2532 (0.2638) data time 0.0009 (0.0056) model time 0.2523 (0.2584) loss 7.0451 (5.9633) grad_norm 2.8356 (2.1957) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][110/625] eta 0:02:15 lr 0.000864 wd 0.0500 time 0.2551 (0.2632) data time 0.0006 (0.0051) model time 0.2546 (0.2580) loss 6.2175 (5.9992) grad_norm 3.2617 (2.2000) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][120/625] eta 0:02:12 lr 0.000864 wd 0.0500 time 0.2573 (0.2626) data time 0.0010 (0.0048) model time 0.2563 (0.2575) loss 5.6240 (5.9473) grad_norm 2.4965 (2.1756) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][130/625] eta 0:02:09 lr 0.000864 wd 0.0500 time 0.2564 (0.2621) data time 0.0007 (0.0045) model time 0.2557 (0.2573) loss 7.1096 (5.9245) grad_norm 1.4811 (2.1870) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][140/625] eta 0:02:06 lr 0.000864 wd 0.0500 time 0.2534 (0.2616) data time 0.0007 (0.0042) model time 0.2527 (0.2569) loss 6.7771 (5.9372) grad_norm 4.2599 (2.2405) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][150/625] eta 0:02:04 lr 0.000863 wd 0.0500 time 0.2485 (0.2612) data time 0.0008 (0.0040) model time 0.2477 (0.2567) loss 5.6525 (5.9487) grad_norm 1.8633 (2.2158) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][160/625] eta 0:02:01 lr 0.000863 wd 0.0500 time 0.2572 (0.2609) data time 0.0011 (0.0038) model time 0.2561 (0.2566) loss 5.6991 (5.9332) grad_norm 1.6688 (2.1923) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][170/625] eta 0:01:58 lr 0.000863 wd 0.0500 time 0.2605 (0.2608) data time 0.0010 (0.0037) model time 0.2596 (0.2566) loss 6.1469 (5.9233) grad_norm 1.5107 (2.1662) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][180/625] eta 0:01:55 lr 0.000863 wd 0.0500 time 0.2587 (0.2605) data time 0.0007 (0.0035) model time 0.2579 (0.2565) loss 5.6690 (5.9084) grad_norm 1.6431 (2.1476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][190/625] eta 0:01:53 lr 0.000863 wd 0.0500 time 0.2578 (0.2602) data time 0.0008 (0.0034) model time 0.2570 (0.2564) loss 5.8174 (5.8875) grad_norm 1.6929 (2.1722) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][200/625] eta 0:01:50 lr 0.000863 wd 0.0500 time 0.2537 (0.2600) data time 0.0018 (0.0032) model time 0.2519 (0.2563) loss 6.7507 (5.8808) grad_norm 1.9273 (2.1809) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][210/625] eta 0:01:47 lr 0.000862 wd 0.0500 time 0.2544 (0.2599) data time 0.0009 (0.0031) model time 0.2535 (0.2563) loss 6.4643 (5.8808) grad_norm 1.8745 (2.1633) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][220/625] eta 0:01:45 lr 0.000862 wd 0.0500 time 0.2565 (0.2597) data time 0.0005 (0.0030) model time 0.2560 (0.2562) loss 6.5862 (5.8689) grad_norm 2.8184 (2.1669) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][230/625] eta 0:01:42 lr 0.000862 wd 0.0500 time 0.2563 (0.2604) data time 0.0008 (0.0030) model time 0.2555 (0.2573) loss 5.6495 (5.8757) grad_norm 1.6607 (2.1476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][240/625] eta 0:01:40 lr 0.000862 wd 0.0500 time 0.2574 (0.2602) data time 0.0006 (0.0029) model time 0.2569 (0.2572) loss 7.0508 (5.8791) grad_norm 1.5578 (2.1409) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][250/625] eta 0:01:37 lr 0.000862 wd 0.0500 time 0.2578 (0.2601) data time 0.0011 (0.0028) model time 0.2568 (0.2570) loss 5.9642 (5.8896) grad_norm 2.6753 (2.1487) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][260/625] eta 0:01:34 lr 0.000862 wd 0.0500 time 0.2538 (0.2599) data time 0.0015 (0.0027) model time 0.2523 (0.2569) loss 7.0829 (5.8897) grad_norm 2.3516 (2.1469) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][270/625] eta 0:01:32 lr 0.000861 wd 0.0500 time 0.2512 (0.2598) data time 0.0013 (0.0027) model time 0.2498 (0.2569) loss 6.3050 (5.8937) grad_norm 1.7864 (2.1514) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][280/625] eta 0:01:29 lr 0.000861 wd 0.0500 time 0.2570 (0.2597) data time 0.0010 (0.0026) model time 0.2560 (0.2568) loss 5.9124 (5.9013) grad_norm 2.1168 (2.1364) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][290/625] eta 0:01:26 lr 0.000861 wd 0.0500 time 0.2540 (0.2595) data time 0.0007 (0.0025) model time 0.2532 (0.2567) loss 6.4473 (5.9096) grad_norm 3.2617 (2.1203) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][300/625] eta 0:01:24 lr 0.000861 wd 0.0500 time 0.2542 (0.2601) data time 0.0006 (0.0025) model time 0.2536 (0.2575) loss 6.1475 (5.9091) grad_norm 2.2847 (2.1192) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][310/625] eta 0:01:21 lr 0.000861 wd 0.0500 time 0.2578 (0.2600) data time 0.0006 (0.0024) model time 0.2572 (0.2574) loss 6.5654 (5.9084) grad_norm 1.3217 (2.1208) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][320/625] eta 0:01:19 lr 0.000860 wd 0.0500 time 0.2562 (0.2598) data time 0.0007 (0.0024) model time 0.2554 (0.2573) loss 6.3772 (5.9059) grad_norm 2.0042 (2.1174) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][330/625] eta 0:01:16 lr 0.000860 wd 0.0500 time 0.2512 (0.2602) data time 0.0009 (0.0023) model time 0.2503 (0.2578) loss 7.0135 (5.9052) grad_norm 3.0338 (2.1389) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][340/625] eta 0:01:14 lr 0.000860 wd 0.0500 time 0.2549 (0.2601) data time 0.0012 (0.0023) model time 0.2537 (0.2577) loss 6.3966 (5.9046) grad_norm 2.2313 (2.1523) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][350/625] eta 0:01:11 lr 0.000860 wd 0.0500 time 0.2537 (0.2599) data time 0.0008 (0.0023) model time 0.2528 (0.2576) loss 6.5241 (5.9009) grad_norm 2.6714 (2.1495) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][360/625] eta 0:01:08 lr 0.000860 wd 0.0500 time 0.2566 (0.2598) data time 0.0008 (0.0022) model time 0.2557 (0.2575) loss 6.0978 (5.8916) grad_norm 2.9925 (2.1499) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][370/625] eta 0:01:06 lr 0.000860 wd 0.0500 time 0.2572 (0.2602) data time 0.0008 (0.0022) model time 0.2564 (0.2580) loss 5.8700 (5.8970) grad_norm 1.3123 (2.1430) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][380/625] eta 0:01:03 lr 0.000859 wd 0.0500 time 0.2523 (0.2601) data time 0.0007 (0.0022) model time 0.2516 (0.2579) loss 7.3861 (5.9105) grad_norm 6.1882 (2.1577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][390/625] eta 0:01:01 lr 0.000859 wd 0.0500 time 0.2491 (0.2600) data time 0.0012 (0.0021) model time 0.2479 (0.2578) loss 6.8287 (5.9093) grad_norm 3.1025 (2.1630) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][400/625] eta 0:00:58 lr 0.000859 wd 0.0500 time 0.2591 (0.2599) data time 0.0006 (0.0021) model time 0.2585 (0.2578) loss 6.5680 (5.9133) grad_norm 2.0022 (2.1746) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][410/625] eta 0:00:55 lr 0.000859 wd 0.0500 time 0.2578 (0.2603) data time 0.0007 (0.0021) model time 0.2571 (0.2582) loss 6.7998 (5.9109) grad_norm 1.8483 (2.1859) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][420/625] eta 0:00:53 lr 0.000859 wd 0.0500 time 0.2569 (0.2602) data time 0.0008 (0.0020) model time 0.2561 (0.2581) loss 6.7409 (5.9189) grad_norm 1.5109 (2.1779) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][430/625] eta 0:00:50 lr 0.000859 wd 0.0500 time 0.2566 (0.2601) data time 0.0008 (0.0020) model time 0.2559 (0.2580) loss 6.4934 (5.9098) grad_norm 1.8856 (2.1645) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][440/625] eta 0:00:48 lr 0.000858 wd 0.0500 time 0.2576 (0.2600) data time 0.0006 (0.0020) model time 0.2569 (0.2580) loss 5.4995 (5.9073) grad_norm 2.7779 (2.1582) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][450/625] eta 0:00:45 lr 0.000858 wd 0.0500 time 0.2541 (0.2599) data time 0.0008 (0.0020) model time 0.2534 (0.2579) loss 6.6204 (5.9095) grad_norm 2.4413 (2.1598) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][460/625] eta 0:00:42 lr 0.000858 wd 0.0500 time 0.2586 (0.2598) data time 0.0007 (0.0019) model time 0.2579 (0.2578) loss 5.7620 (5.9116) grad_norm 1.4233 (2.1581) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][470/625] eta 0:00:40 lr 0.000858 wd 0.0500 time 0.2528 (0.2601) data time 0.0009 (0.0019) model time 0.2519 (0.2582) loss 6.3416 (5.9137) grad_norm 1.4465 (2.1619) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][480/625] eta 0:00:37 lr 0.000858 wd 0.0500 time 0.2559 (0.2600) data time 0.0006 (0.0019) model time 0.2552 (0.2581) loss 6.1434 (5.9178) grad_norm 4.0178 (2.1796) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][490/625] eta 0:00:35 lr 0.000858 wd 0.0500 time 0.2535 (0.2599) data time 0.0008 (0.0019) model time 0.2527 (0.2580) loss 5.5888 (5.9144) grad_norm 1.9773 (2.1968) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][500/625] eta 0:00:32 lr 0.000857 wd 0.0500 time 0.4622 (0.2603) data time 0.0006 (0.0019) model time 0.4616 (0.2584) loss 6.4914 (5.9172) grad_norm 2.1803 (2.1924) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][510/625] eta 0:00:29 lr 0.000857 wd 0.0500 time 0.2565 (0.2602) data time 0.0010 (0.0018) model time 0.2555 (0.2584) loss 6.1489 (5.9230) grad_norm 1.1539 (2.1839) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][520/625] eta 0:00:27 lr 0.000857 wd 0.0500 time 0.2535 (0.2601) data time 0.0010 (0.0018) model time 0.2525 (0.2583) loss 6.3703 (5.9252) grad_norm 1.1855 (2.1726) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][530/625] eta 0:00:24 lr 0.000857 wd 0.0500 time 0.2548 (0.2600) data time 0.0007 (0.0018) model time 0.2541 (0.2582) loss 5.9074 (5.9284) grad_norm 1.4866 (2.1592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][540/625] eta 0:00:22 lr 0.000857 wd 0.0500 time 0.2509 (0.2603) data time 0.0009 (0.0018) model time 0.2500 (0.2586) loss 4.8221 (5.9232) grad_norm 2.1417 (2.1520) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][550/625] eta 0:00:19 lr 0.000856 wd 0.0500 time 0.2537 (0.2602) data time 0.0005 (0.0018) model time 0.2532 (0.2585) loss 5.5809 (5.9156) grad_norm 1.8610 (2.1456) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][560/625] eta 0:00:16 lr 0.000856 wd 0.0500 time 0.2662 (0.2602) data time 0.0009 (0.0018) model time 0.2653 (0.2584) loss 4.7800 (5.9143) grad_norm 1.6099 (2.1373) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][570/625] eta 0:00:14 lr 0.000856 wd 0.0500 time 0.2547 (0.2601) data time 0.0009 (0.0017) model time 0.2539 (0.2584) loss 5.9811 (5.9102) grad_norm 2.6487 (2.1356) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][580/625] eta 0:00:11 lr 0.000856 wd 0.0500 time 0.2566 (0.2604) data time 0.0009 (0.0017) model time 0.2558 (0.2588) loss 6.2895 (5.9131) grad_norm 1.6932 (2.1307) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][590/625] eta 0:00:09 lr 0.000856 wd 0.0500 time 0.2588 (0.2604) data time 0.0006 (0.0017) model time 0.2583 (0.2587) loss 6.5728 (5.9170) grad_norm 1.4989 (2.1368) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][600/625] eta 0:00:06 lr 0.000856 wd 0.0500 time 0.2510 (0.2607) data time 0.0009 (0.0017) model time 0.2501 (0.2590) loss 7.0250 (5.9217) grad_norm 1.2594 (2.1333) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][610/625] eta 0:00:03 lr 0.000855 wd 0.0500 time 0.2528 (0.2606) data time 0.0006 (0.0017) model time 0.2522 (0.2590) loss 4.5290 (5.9147) grad_norm 2.1134 (2.1295) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][620/625] eta 0:00:01 lr 0.000855 wd 0.0500 time 0.2530 (0.2605) data time 0.0005 (0.0017) model time 0.2525 (0.2589) loss 6.4520 (5.9170) grad_norm 2.2664 (2.1226) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:43:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 173 training takes 0:02:42 [2024-08-04 05:43:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:43:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:43:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6147 (0.6147) Acc@1 88.623 (88.623) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.092) Loss 1.0127 (0.7719) Acc@1 78.320 (84.952) Acc@5 95.166 (97.119) Mem 9655MB [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.1064 (0.9054) Acc@1 75.391 (81.573) Acc@5 94.141 (95.729) Mem 9655MB [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.214 Acc@5 95.763 [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.21% [2024-08-04 05:43:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:43:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:43:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.492 (0.492) Loss 0.5864 (0.5864) Acc@1 89.697 (89.697) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 05:44:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9312 (0.7255) Acc@1 79.932 (85.684) Acc@5 95.605 (97.523) Mem 9655MB [2024-08-04 05:44:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0605 (0.8537) Acc@1 75.635 (82.247) Acc@5 94.678 (96.194) Mem 9655MB [2024-08-04 05:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.934 Acc@5 96.177 [2024-08-04 05:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 05:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.93% [2024-08-04 05:44:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:44:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:44:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][0/625] eta 0:07:38 lr 0.000855 wd 0.0500 time 0.7329 (0.7329) data time 0.4882 (0.4882) model time 0.0000 (0.0000) loss 4.7388 (4.7388) grad_norm 2.1753 (2.1753) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][10/625] eta 0:03:04 lr 0.000855 wd 0.0500 time 0.2528 (0.2994) data time 0.0008 (0.0452) model time 0.0000 (0.0000) loss 6.8358 (5.8315) grad_norm 2.8642 (2.3899) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][20/625] eta 0:02:48 lr 0.000855 wd 0.0500 time 0.2555 (0.2786) data time 0.0009 (0.0241) model time 0.0000 (0.0000) loss 5.6674 (5.7748) grad_norm 1.8019 (2.3786) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][30/625] eta 0:02:47 lr 0.000855 wd 0.0500 time 0.2528 (0.2820) data time 0.0007 (0.0166) model time 0.0000 (0.0000) loss 5.5039 (5.7625) grad_norm 1.3301 (2.1924) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][40/625] eta 0:02:43 lr 0.000854 wd 0.0500 time 0.2532 (0.2802) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 6.4237 (5.8077) grad_norm 1.7170 (2.0629) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][50/625] eta 0:02:38 lr 0.000854 wd 0.0500 time 0.2566 (0.2754) data time 0.0010 (0.0105) model time 0.0000 (0.0000) loss 5.6788 (5.8208) grad_norm 2.3558 (2.0171) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][60/625] eta 0:02:33 lr 0.000854 wd 0.0500 time 0.2597 (0.2724) data time 0.0006 (0.0089) model time 0.2591 (0.2563) loss 6.1531 (5.8603) grad_norm 1.4135 (2.0338) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][70/625] eta 0:02:29 lr 0.000854 wd 0.0500 time 0.2550 (0.2702) data time 0.0006 (0.0078) model time 0.2544 (0.2559) loss 6.3647 (5.8784) grad_norm 2.8187 (2.0709) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][80/625] eta 0:02:26 lr 0.000854 wd 0.0500 time 0.2583 (0.2684) data time 0.0008 (0.0069) model time 0.2575 (0.2556) loss 6.8099 (5.8873) grad_norm 1.7284 (2.0860) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][90/625] eta 0:02:22 lr 0.000854 wd 0.0500 time 0.2481 (0.2670) data time 0.0008 (0.0063) model time 0.2472 (0.2555) loss 6.0172 (5.8630) grad_norm 2.0697 (2.0394) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][100/625] eta 0:02:20 lr 0.000853 wd 0.0500 time 0.2553 (0.2681) data time 0.0008 (0.0057) model time 0.2545 (0.2597) loss 4.1661 (5.8190) grad_norm 2.8503 (2.0319) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][110/625] eta 0:02:17 lr 0.000853 wd 0.0500 time 0.2561 (0.2670) data time 0.0010 (0.0053) model time 0.2551 (0.2589) loss 4.6123 (5.8360) grad_norm 3.4018 (2.0427) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][120/625] eta 0:02:14 lr 0.000853 wd 0.0500 time 0.2589 (0.2662) data time 0.0007 (0.0049) model time 0.2582 (0.2586) loss 6.4338 (5.8389) grad_norm 2.1879 (2.0487) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][130/625] eta 0:02:11 lr 0.000853 wd 0.0500 time 0.2656 (0.2656) data time 0.0009 (0.0046) model time 0.2647 (0.2585) loss 6.2779 (5.8479) grad_norm 1.6643 (2.0261) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][140/625] eta 0:02:08 lr 0.000853 wd 0.0500 time 0.2565 (0.2649) data time 0.0009 (0.0043) model time 0.2556 (0.2581) loss 5.5550 (5.8504) grad_norm 1.9017 (2.0095) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][150/625] eta 0:02:05 lr 0.000852 wd 0.0500 time 0.2561 (0.2643) data time 0.0009 (0.0041) model time 0.2552 (0.2578) loss 5.3438 (5.8415) grad_norm 1.4880 (2.0135) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][160/625] eta 0:02:02 lr 0.000852 wd 0.0500 time 0.2569 (0.2638) data time 0.0005 (0.0039) model time 0.2563 (0.2575) loss 6.2813 (5.8506) grad_norm 1.6243 (2.0138) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][170/625] eta 0:01:59 lr 0.000852 wd 0.0500 time 0.2554 (0.2633) data time 0.0008 (0.0037) model time 0.2547 (0.2573) loss 6.3394 (5.8420) grad_norm 1.9135 (2.0109) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][180/625] eta 0:01:57 lr 0.000852 wd 0.0500 time 0.2575 (0.2630) data time 0.0008 (0.0036) model time 0.2567 (0.2573) loss 5.2220 (5.8334) grad_norm 1.7965 (2.0563) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][190/625] eta 0:01:54 lr 0.000852 wd 0.0500 time 0.2579 (0.2627) data time 0.0006 (0.0035) model time 0.2573 (0.2572) loss 4.7354 (5.8408) grad_norm 2.6261 (2.0734) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][200/625] eta 0:01:51 lr 0.000852 wd 0.0500 time 0.2588 (0.2624) data time 0.0006 (0.0033) model time 0.2582 (0.2571) loss 6.6257 (5.8468) grad_norm 4.9726 (2.1308) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][210/625] eta 0:01:48 lr 0.000851 wd 0.0500 time 0.2607 (0.2621) data time 0.0008 (0.0032) model time 0.2599 (0.2570) loss 6.3339 (5.8566) grad_norm 4.2767 (2.1873) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][220/625] eta 0:01:46 lr 0.000851 wd 0.0500 time 0.2640 (0.2619) data time 0.0008 (0.0031) model time 0.2632 (0.2570) loss 6.8541 (5.8690) grad_norm 1.9322 (2.1921) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][230/625] eta 0:01:43 lr 0.000851 wd 0.0500 time 0.2544 (0.2618) data time 0.0009 (0.0030) model time 0.2535 (0.2570) loss 6.8755 (5.8728) grad_norm 4.2645 (2.2007) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][240/625] eta 0:01:40 lr 0.000851 wd 0.0500 time 0.2605 (0.2616) data time 0.0008 (0.0029) model time 0.2597 (0.2569) loss 5.3864 (5.8711) grad_norm 2.2341 (2.2037) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][250/625] eta 0:01:38 lr 0.000851 wd 0.0500 time 0.2521 (0.2614) data time 0.0008 (0.0029) model time 0.2513 (0.2569) loss 5.9612 (5.8840) grad_norm 3.6503 (2.2157) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][260/625] eta 0:01:35 lr 0.000851 wd 0.0500 time 0.2579 (0.2612) data time 0.0010 (0.0028) model time 0.2569 (0.2568) loss 6.0235 (5.8827) grad_norm 2.1006 (2.2297) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][270/625] eta 0:01:32 lr 0.000850 wd 0.0500 time 0.2586 (0.2610) data time 0.0007 (0.0027) model time 0.2578 (0.2568) loss 6.9687 (5.8764) grad_norm 2.1073 (2.2288) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][280/625] eta 0:01:30 lr 0.000850 wd 0.0500 time 0.2584 (0.2609) data time 0.0010 (0.0026) model time 0.2574 (0.2568) loss 6.7488 (5.8822) grad_norm 1.9316 (2.2454) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][290/625] eta 0:01:27 lr 0.000850 wd 0.0500 time 0.2566 (0.2607) data time 0.0009 (0.0026) model time 0.2557 (0.2567) loss 6.3922 (5.8892) grad_norm 1.4392 (2.2394) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][300/625] eta 0:01:24 lr 0.000850 wd 0.0500 time 0.2528 (0.2606) data time 0.0010 (0.0025) model time 0.2518 (0.2566) loss 6.7398 (5.9054) grad_norm 3.4164 (2.2254) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][310/625] eta 0:01:22 lr 0.000850 wd 0.0500 time 0.2548 (0.2604) data time 0.0006 (0.0025) model time 0.2542 (0.2566) loss 7.3506 (5.9150) grad_norm 2.2607 (2.2214) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][320/625] eta 0:01:19 lr 0.000850 wd 0.0500 time 0.2568 (0.2603) data time 0.0007 (0.0024) model time 0.2561 (0.2566) loss 6.5226 (5.9243) grad_norm 1.8393 (2.2322) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][330/625] eta 0:01:16 lr 0.000849 wd 0.0500 time 0.2558 (0.2602) data time 0.0009 (0.0024) model time 0.2550 (0.2565) loss 6.5871 (5.9248) grad_norm 1.9981 (2.2141) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][340/625] eta 0:01:14 lr 0.000849 wd 0.0500 time 0.2570 (0.2601) data time 0.0006 (0.0023) model time 0.2564 (0.2565) loss 5.6393 (5.9253) grad_norm 2.5540 (2.2096) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][350/625] eta 0:01:11 lr 0.000849 wd 0.0500 time 0.2548 (0.2600) data time 0.0006 (0.0023) model time 0.2542 (0.2565) loss 6.5047 (5.9240) grad_norm 1.6319 (2.2029) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][360/625] eta 0:01:09 lr 0.000849 wd 0.0500 time 0.2583 (0.2604) data time 0.0007 (0.0023) model time 0.2576 (0.2570) loss 4.4529 (5.9229) grad_norm 2.0226 (2.1935) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][370/625] eta 0:01:06 lr 0.000849 wd 0.0500 time 0.2598 (0.2608) data time 0.0006 (0.0022) model time 0.2593 (0.2576) loss 6.4231 (5.9237) grad_norm 1.6138 (2.2007) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][380/625] eta 0:01:03 lr 0.000848 wd 0.0500 time 0.2532 (0.2606) data time 0.0009 (0.0022) model time 0.2523 (0.2575) loss 6.4597 (5.9292) grad_norm 1.4141 (2.1908) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][390/625] eta 0:01:01 lr 0.000848 wd 0.0500 time 0.2592 (0.2605) data time 0.0006 (0.0022) model time 0.2585 (0.2574) loss 5.8273 (5.9201) grad_norm 1.7057 (2.1773) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][400/625] eta 0:00:58 lr 0.000848 wd 0.0500 time 0.2575 (0.2604) data time 0.0007 (0.0021) model time 0.2568 (0.2573) loss 4.5556 (5.9255) grad_norm 1.5638 (2.1717) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][410/625] eta 0:00:55 lr 0.000848 wd 0.0500 time 0.2600 (0.2604) data time 0.0006 (0.0021) model time 0.2594 (0.2573) loss 4.8253 (5.9332) grad_norm 1.1410 (2.1677) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][420/625] eta 0:00:53 lr 0.000848 wd 0.0500 time 0.2608 (0.2603) data time 0.0008 (0.0021) model time 0.2600 (0.2573) loss 6.0765 (5.9353) grad_norm 2.2786 (2.1633) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][430/625] eta 0:00:50 lr 0.000848 wd 0.0500 time 0.2604 (0.2602) data time 0.0008 (0.0020) model time 0.2595 (0.2572) loss 6.6040 (5.9344) grad_norm 2.1373 (2.1664) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][440/625] eta 0:00:48 lr 0.000847 wd 0.0500 time 0.2592 (0.2601) data time 0.0010 (0.0020) model time 0.2581 (0.2572) loss 5.1488 (5.9276) grad_norm 2.0413 (2.1666) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:45:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][450/625] eta 0:00:45 lr 0.000847 wd 0.0500 time 0.2544 (0.2600) data time 0.0007 (0.0020) model time 0.2537 (0.2571) loss 5.7280 (5.9296) grad_norm 1.5858 (2.1666) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][460/625] eta 0:00:42 lr 0.000847 wd 0.0500 time 0.2552 (0.2599) data time 0.0007 (0.0020) model time 0.2545 (0.2571) loss 4.5867 (5.9364) grad_norm 2.6251 (2.1686) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][470/625] eta 0:00:40 lr 0.000847 wd 0.0500 time 0.2609 (0.2602) data time 0.0008 (0.0019) model time 0.2601 (0.2575) loss 6.1404 (5.9317) grad_norm 1.3698 (2.1721) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][480/625] eta 0:00:37 lr 0.000847 wd 0.0500 time 0.2581 (0.2601) data time 0.0008 (0.0019) model time 0.2574 (0.2574) loss 5.9043 (5.9353) grad_norm 1.8139 (2.1739) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][490/625] eta 0:00:35 lr 0.000847 wd 0.0500 time 0.2534 (0.2600) data time 0.0009 (0.0019) model time 0.2526 (0.2574) loss 6.1586 (5.9338) grad_norm 2.1639 (2.1709) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][500/625] eta 0:00:32 lr 0.000846 wd 0.0500 time 0.2564 (0.2600) data time 0.0008 (0.0019) model time 0.2556 (0.2573) loss 6.9674 (5.9379) grad_norm 1.2646 (2.1640) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][510/625] eta 0:00:29 lr 0.000846 wd 0.0500 time 0.2538 (0.2599) data time 0.0006 (0.0019) model time 0.2532 (0.2573) loss 4.7322 (5.9337) grad_norm 2.3050 (2.1632) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][520/625] eta 0:00:27 lr 0.000846 wd 0.0500 time 0.2604 (0.2598) data time 0.0006 (0.0018) model time 0.2598 (0.2572) loss 5.6857 (5.9279) grad_norm 2.1453 (2.1624) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][530/625] eta 0:00:24 lr 0.000846 wd 0.0500 time 0.2571 (0.2598) data time 0.0006 (0.0018) model time 0.2565 (0.2572) loss 5.4588 (5.9241) grad_norm 1.7877 (2.1541) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][540/625] eta 0:00:22 lr 0.000846 wd 0.0500 time 0.2578 (0.2597) data time 0.0009 (0.0018) model time 0.2569 (0.2572) loss 5.8820 (5.9241) grad_norm 1.8199 (2.1584) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][550/625] eta 0:00:19 lr 0.000845 wd 0.0500 time 0.2571 (0.2599) data time 0.0008 (0.0018) model time 0.2563 (0.2574) loss 6.8265 (5.9268) grad_norm 1.5296 (2.1577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][560/625] eta 0:00:16 lr 0.000845 wd 0.0500 time 0.2579 (0.2598) data time 0.0008 (0.0018) model time 0.2571 (0.2574) loss 4.8736 (5.9253) grad_norm 1.7472 (2.1606) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][570/625] eta 0:00:14 lr 0.000845 wd 0.0500 time 0.2592 (0.2598) data time 0.0007 (0.0018) model time 0.2584 (0.2573) loss 5.4793 (5.9241) grad_norm 2.0851 (2.1536) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][580/625] eta 0:00:11 lr 0.000845 wd 0.0500 time 0.2570 (0.2597) data time 0.0009 (0.0017) model time 0.2561 (0.2573) loss 6.9393 (5.9329) grad_norm 1.9723 (2.1509) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][590/625] eta 0:00:09 lr 0.000845 wd 0.0500 time 0.2526 (0.2596) data time 0.0006 (0.0017) model time 0.2520 (0.2572) loss 4.5179 (5.9288) grad_norm 1.2551 (2.1472) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][600/625] eta 0:00:06 lr 0.000845 wd 0.0500 time 0.2537 (0.2596) data time 0.0010 (0.0017) model time 0.2527 (0.2572) loss 5.1426 (5.9293) grad_norm 2.0774 (2.1407) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][610/625] eta 0:00:03 lr 0.000844 wd 0.0500 time 0.2531 (0.2595) data time 0.0004 (0.0017) model time 0.2527 (0.2571) loss 5.0348 (5.9326) grad_norm 1.9437 (2.1395) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][620/625] eta 0:00:01 lr 0.000844 wd 0.0500 time 0.2533 (0.2594) data time 0.0003 (0.0017) model time 0.2530 (0.2571) loss 4.6059 (5.9332) grad_norm 1.8349 (2.1377) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 174 training takes 0:02:42 [2024-08-04 05:46:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:46:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:46:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.490 (0.490) Loss 0.6113 (0.6113) Acc@1 88.867 (88.867) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9741 (0.7482) Acc@1 77.979 (84.579) Acc@5 95.605 (97.270) Mem 9655MB [2024-08-04 05:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1289 (0.8822) Acc@1 75.049 (81.273) Acc@5 93.506 (95.824) Mem 9655MB [2024-08-04 05:46:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.004 Acc@5 95.801 [2024-08-04 05:46:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-04 05:46:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.770 (0.770) Loss 0.5859 (0.5859) Acc@1 89.746 (89.746) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:46:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9316 (0.7253) Acc@1 80.029 (85.733) Acc@5 95.654 (97.523) Mem 9655MB [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0605 (0.8535) Acc@1 75.537 (82.264) Acc@5 94.629 (96.194) Mem 9655MB [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.960 Acc@5 96.179 [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.96% [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:46:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:46:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][0/625] eta 0:08:05 lr 0.000844 wd 0.0500 time 0.7773 (0.7773) data time 0.5358 (0.5358) model time 0.0000 (0.0000) loss 6.7225 (6.7225) grad_norm 2.1143 (2.1143) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][10/625] eta 0:03:17 lr 0.000844 wd 0.0500 time 0.2530 (0.3207) data time 0.0008 (0.0495) model time 0.0000 (0.0000) loss 6.2364 (6.1069) grad_norm 4.2617 (2.9108) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][20/625] eta 0:02:55 lr 0.000844 wd 0.0500 time 0.2532 (0.2902) data time 0.0007 (0.0264) model time 0.0000 (0.0000) loss 4.9262 (6.0167) grad_norm 1.5089 (2.8087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][30/625] eta 0:02:45 lr 0.000844 wd 0.0500 time 0.2553 (0.2789) data time 0.0008 (0.0182) model time 0.0000 (0.0000) loss 5.7570 (6.0144) grad_norm 3.8143 (2.6534) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:46:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][40/625] eta 0:02:39 lr 0.000843 wd 0.0500 time 0.2513 (0.2731) data time 0.0012 (0.0140) model time 0.0000 (0.0000) loss 6.5579 (5.9741) grad_norm 1.5633 (2.4984) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][50/625] eta 0:02:38 lr 0.000843 wd 0.0500 time 0.4037 (0.2763) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 4.6682 (5.8786) grad_norm 1.6451 (2.3560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][60/625] eta 0:02:35 lr 0.000843 wd 0.0500 time 0.4127 (0.2755) data time 0.0006 (0.0097) model time 0.4121 (0.2704) loss 5.5737 (5.9010) grad_norm 1.9097 (2.2872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][70/625] eta 0:02:31 lr 0.000843 wd 0.0500 time 0.2533 (0.2727) data time 0.0010 (0.0084) model time 0.2523 (0.2629) loss 5.7326 (5.9593) grad_norm 8.4754 (2.3021) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][80/625] eta 0:02:27 lr 0.000843 wd 0.0500 time 0.2630 (0.2710) data time 0.0009 (0.0075) model time 0.2621 (0.2612) loss 5.4716 (5.9911) grad_norm 2.4654 (2.3450) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][90/625] eta 0:02:24 lr 0.000843 wd 0.0500 time 0.2510 (0.2692) data time 0.0008 (0.0068) model time 0.2503 (0.2594) loss 6.4058 (6.0101) grad_norm 1.3191 (2.2989) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][100/625] eta 0:02:21 lr 0.000842 wd 0.0500 time 0.2603 (0.2698) data time 0.0008 (0.0063) model time 0.2595 (0.2622) loss 5.4394 (5.9860) grad_norm 2.2527 (2.2477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][110/625] eta 0:02:18 lr 0.000842 wd 0.0500 time 0.2523 (0.2695) data time 0.0008 (0.0058) model time 0.2516 (0.2628) loss 6.8584 (6.0069) grad_norm 2.2295 (2.3063) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][120/625] eta 0:02:16 lr 0.000842 wd 0.0500 time 0.2517 (0.2694) data time 0.0011 (0.0054) model time 0.2506 (0.2634) loss 6.2866 (6.0292) grad_norm 2.8229 (2.3229) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][130/625] eta 0:02:12 lr 0.000842 wd 0.0500 time 0.2519 (0.2684) data time 0.0009 (0.0051) model time 0.2510 (0.2623) loss 6.1988 (6.0255) grad_norm 2.5111 (2.3810) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][140/625] eta 0:02:09 lr 0.000842 wd 0.0500 time 0.2545 (0.2675) data time 0.0009 (0.0048) model time 0.2535 (0.2616) loss 6.3483 (6.0208) grad_norm 1.5190 (2.3699) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][150/625] eta 0:02:06 lr 0.000842 wd 0.0500 time 0.2541 (0.2667) data time 0.0007 (0.0045) model time 0.2534 (0.2609) loss 6.0307 (6.0143) grad_norm 1.5542 (2.3271) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][160/625] eta 0:02:03 lr 0.000841 wd 0.0500 time 0.2535 (0.2661) data time 0.0009 (0.0043) model time 0.2527 (0.2603) loss 6.7671 (6.0113) grad_norm 3.2764 (2.3072) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][170/625] eta 0:02:01 lr 0.000841 wd 0.0500 time 0.2596 (0.2667) data time 0.0011 (0.0041) model time 0.2586 (0.2616) loss 5.5304 (5.9957) grad_norm 1.3807 (2.3100) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][180/625] eta 0:01:58 lr 0.000841 wd 0.0500 time 0.2709 (0.2662) data time 0.0010 (0.0039) model time 0.2699 (0.2613) loss 6.6609 (5.9937) grad_norm 2.3754 (2.2958) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][190/625] eta 0:01:55 lr 0.000841 wd 0.0500 time 0.2559 (0.2657) data time 0.0008 (0.0038) model time 0.2551 (0.2609) loss 6.6921 (6.0022) grad_norm 2.7117 (2.2826) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][200/625] eta 0:01:52 lr 0.000841 wd 0.0500 time 0.2554 (0.2653) data time 0.0008 (0.0036) model time 0.2547 (0.2606) loss 6.2723 (6.0053) grad_norm 1.7823 (2.2657) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][210/625] eta 0:01:49 lr 0.000840 wd 0.0500 time 0.2517 (0.2648) data time 0.0011 (0.0035) model time 0.2506 (0.2602) loss 6.5973 (5.9988) grad_norm 1.5671 (2.2664) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][220/625] eta 0:01:47 lr 0.000840 wd 0.0500 time 0.4686 (0.2654) data time 0.0010 (0.0034) model time 0.4676 (0.2611) loss 5.7162 (6.0020) grad_norm 2.0735 (2.2497) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][230/625] eta 0:01:44 lr 0.000840 wd 0.0500 time 0.2574 (0.2650) data time 0.0009 (0.0033) model time 0.2565 (0.2608) loss 5.9582 (5.9937) grad_norm 1.6405 (2.2774) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][240/625] eta 0:01:42 lr 0.000840 wd 0.0500 time 0.2551 (0.2655) data time 0.0010 (0.0032) model time 0.2541 (0.2616) loss 6.6842 (6.0016) grad_norm 2.4662 (2.2765) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][250/625] eta 0:01:39 lr 0.000840 wd 0.0500 time 0.2499 (0.2651) data time 0.0007 (0.0031) model time 0.2492 (0.2613) loss 5.8920 (5.9843) grad_norm 1.2756 (2.2695) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][260/625] eta 0:01:36 lr 0.000840 wd 0.0500 time 0.2556 (0.2655) data time 0.0007 (0.0030) model time 0.2548 (0.2619) loss 5.5307 (5.9792) grad_norm 2.9266 (2.2612) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][270/625] eta 0:01:34 lr 0.000839 wd 0.0500 time 0.2599 (0.2651) data time 0.0007 (0.0029) model time 0.2591 (0.2616) loss 6.0504 (5.9891) grad_norm 1.4750 (2.2529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][280/625] eta 0:01:31 lr 0.000839 wd 0.0500 time 0.2517 (0.2648) data time 0.0009 (0.0029) model time 0.2508 (0.2613) loss 6.2875 (5.9928) grad_norm 2.1456 (2.2370) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][290/625] eta 0:01:28 lr 0.000839 wd 0.0500 time 0.2514 (0.2646) data time 0.0008 (0.0028) model time 0.2506 (0.2611) loss 7.1075 (6.0073) grad_norm 1.6930 (2.2198) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][300/625] eta 0:01:25 lr 0.000839 wd 0.0500 time 0.2548 (0.2643) data time 0.0008 (0.0027) model time 0.2539 (0.2609) loss 6.7359 (5.9999) grad_norm 2.2089 (2.2173) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][310/625] eta 0:01:23 lr 0.000839 wd 0.0500 time 0.2556 (0.2640) data time 0.0008 (0.0027) model time 0.2548 (0.2607) loss 6.6800 (6.0031) grad_norm 1.8039 (2.2160) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][320/625] eta 0:01:20 lr 0.000839 wd 0.0500 time 0.2516 (0.2638) data time 0.0009 (0.0026) model time 0.2507 (0.2605) loss 5.3578 (6.0018) grad_norm 1.6749 (2.2035) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][330/625] eta 0:01:17 lr 0.000838 wd 0.0500 time 0.2533 (0.2636) data time 0.0017 (0.0026) model time 0.2517 (0.2603) loss 7.1006 (6.0053) grad_norm 1.1956 (2.1938) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][340/625] eta 0:01:15 lr 0.000838 wd 0.0500 time 0.2562 (0.2633) data time 0.0007 (0.0025) model time 0.2555 (0.2601) loss 6.1394 (6.0011) grad_norm 1.9094 (2.1858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][350/625] eta 0:01:12 lr 0.000838 wd 0.0500 time 0.2446 (0.2637) data time 0.0005 (0.0025) model time 0.2441 (0.2607) loss 6.4143 (6.0037) grad_norm 2.1231 (2.1846) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][360/625] eta 0:01:09 lr 0.000838 wd 0.0500 time 0.2560 (0.2635) data time 0.0010 (0.0024) model time 0.2550 (0.2605) loss 6.8662 (6.0032) grad_norm 2.2896 (2.1846) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][370/625] eta 0:01:07 lr 0.000838 wd 0.0500 time 0.2505 (0.2633) data time 0.0007 (0.0024) model time 0.2498 (0.2603) loss 4.4415 (6.0040) grad_norm 1.4145 (2.1783) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][380/625] eta 0:01:04 lr 0.000838 wd 0.0500 time 0.2592 (0.2631) data time 0.0010 (0.0023) model time 0.2582 (0.2602) loss 5.6319 (5.9962) grad_norm 3.5912 (2.1875) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][390/625] eta 0:01:01 lr 0.000837 wd 0.0500 time 0.2538 (0.2629) data time 0.0009 (0.0023) model time 0.2529 (0.2600) loss 6.3378 (5.9921) grad_norm 3.1068 (2.1872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][400/625] eta 0:00:59 lr 0.000837 wd 0.0500 time 0.2570 (0.2628) data time 0.0009 (0.0023) model time 0.2561 (0.2599) loss 4.7409 (5.9845) grad_norm 1.9963 (2.1837) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][410/625] eta 0:00:56 lr 0.000837 wd 0.0500 time 0.2512 (0.2626) data time 0.0009 (0.0023) model time 0.2504 (0.2598) loss 4.6890 (5.9899) grad_norm 2.1037 (2.1874) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][420/625] eta 0:00:53 lr 0.000837 wd 0.0500 time 0.2541 (0.2625) data time 0.0010 (0.0022) model time 0.2530 (0.2597) loss 6.5054 (5.9824) grad_norm 1.4033 (2.1936) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][430/625] eta 0:00:51 lr 0.000837 wd 0.0500 time 0.2588 (0.2624) data time 0.0007 (0.0022) model time 0.2581 (0.2596) loss 5.2531 (5.9729) grad_norm 8.0472 (2.2139) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][440/625] eta 0:00:48 lr 0.000836 wd 0.0500 time 0.2531 (0.2623) data time 0.0010 (0.0022) model time 0.2521 (0.2595) loss 6.7729 (5.9796) grad_norm 4.1588 (2.2327) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][450/625] eta 0:00:45 lr 0.000836 wd 0.0500 time 0.2530 (0.2621) data time 0.0007 (0.0021) model time 0.2523 (0.2594) loss 6.2597 (5.9760) grad_norm 1.7862 (2.2283) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][460/625] eta 0:00:43 lr 0.000836 wd 0.0500 time 0.2703 (0.2620) data time 0.0008 (0.0021) model time 0.2695 (0.2593) loss 6.6739 (5.9828) grad_norm 1.8903 (2.2282) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][470/625] eta 0:00:40 lr 0.000836 wd 0.0500 time 0.2587 (0.2619) data time 0.0007 (0.0021) model time 0.2580 (0.2592) loss 6.4915 (5.9812) grad_norm 1.9341 (2.2228) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][480/625] eta 0:00:37 lr 0.000836 wd 0.0500 time 0.2546 (0.2618) data time 0.0011 (0.0021) model time 0.2536 (0.2591) loss 6.3003 (5.9811) grad_norm 1.5322 (2.2183) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][490/625] eta 0:00:35 lr 0.000836 wd 0.0500 time 0.2521 (0.2617) data time 0.0009 (0.0020) model time 0.2512 (0.2590) loss 6.4248 (5.9782) grad_norm 3.1986 (2.2178) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:48:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][500/625] eta 0:00:32 lr 0.000835 wd 0.0500 time 0.2538 (0.2615) data time 0.0009 (0.0020) model time 0.2529 (0.2589) loss 6.2438 (5.9851) grad_norm 2.4985 (2.2159) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 05:49:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][510/625] eta 0:00:30 lr 0.000835 wd 0.0500 time 0.2540 (0.2617) data time 0.0006 (0.0020) model time 0.2534 (0.2591) loss 5.3077 (5.9918) grad_norm 1.6350 (2.2072) loss_scale 1024.0000 (514.0039) mem 9655MB [2024-08-04 05:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][520/625] eta 0:00:27 lr 0.000835 wd 0.0500 time 0.2543 (0.2616) data time 0.0009 (0.0020) model time 0.2534 (0.2591) loss 6.2218 (5.9916) grad_norm 2.4330 (2.2053) loss_scale 1024.0000 (523.7927) mem 9655MB [2024-08-04 05:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][530/625] eta 0:00:24 lr 0.000835 wd 0.0500 time 0.2560 (0.2615) data time 0.0007 (0.0020) model time 0.2553 (0.2590) loss 6.8409 (5.9956) grad_norm 1.5446 (2.2068) loss_scale 1024.0000 (533.2128) mem 9655MB [2024-08-04 05:49:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][540/625] eta 0:00:22 lr 0.000835 wd 0.0500 time 0.2551 (0.2614) data time 0.0006 (0.0019) model time 0.2544 (0.2589) loss 6.6917 (5.9945) grad_norm 2.0744 (2.2212) loss_scale 1024.0000 (542.2847) mem 9655MB [2024-08-04 05:49:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][550/625] eta 0:00:19 lr 0.000835 wd 0.0500 time 0.2573 (0.2617) data time 0.0006 (0.0019) model time 0.2567 (0.2592) loss 6.1794 (5.9931) grad_norm 2.1423 (2.2214) loss_scale 1024.0000 (551.0272) mem 9655MB [2024-08-04 05:49:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][560/625] eta 0:00:17 lr 0.000834 wd 0.0500 time 0.2540 (0.2616) data time 0.0009 (0.0019) model time 0.2531 (0.2591) loss 6.6437 (5.9936) grad_norm 2.6642 (2.2202) loss_scale 1024.0000 (559.4581) mem 9655MB [2024-08-04 05:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][570/625] eta 0:00:14 lr 0.000834 wd 0.0500 time 0.2539 (0.2615) data time 0.0009 (0.0019) model time 0.2530 (0.2591) loss 5.6999 (5.9967) grad_norm 2.1627 (2.2173) loss_scale 1024.0000 (567.5937) mem 9655MB [2024-08-04 05:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][580/625] eta 0:00:11 lr 0.000834 wd 0.0500 time 0.2558 (0.2614) data time 0.0010 (0.0019) model time 0.2548 (0.2590) loss 6.4955 (5.9941) grad_norm 1.8076 (2.2163) loss_scale 1024.0000 (575.4492) mem 9655MB [2024-08-04 05:49:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][590/625] eta 0:00:09 lr 0.000834 wd 0.0500 time 0.2587 (0.2613) data time 0.0006 (0.0018) model time 0.2581 (0.2589) loss 5.7097 (5.9910) grad_norm 1.3549 (2.2200) loss_scale 1024.0000 (583.0389) mem 9655MB [2024-08-04 05:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][600/625] eta 0:00:06 lr 0.000834 wd 0.0500 time 0.2509 (0.2612) data time 0.0010 (0.0018) model time 0.2499 (0.2588) loss 6.0620 (5.9891) grad_norm 2.8413 (2.2264) loss_scale 1024.0000 (590.3760) mem 9655MB [2024-08-04 05:49:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][610/625] eta 0:00:03 lr 0.000833 wd 0.0500 time 0.2527 (0.2611) data time 0.0004 (0.0018) model time 0.2523 (0.2588) loss 6.5388 (5.9901) grad_norm 2.0519 (2.2273) loss_scale 1024.0000 (597.4730) mem 9655MB [2024-08-04 05:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][620/625] eta 0:00:01 lr 0.000833 wd 0.0500 time 0.2557 (0.2610) data time 0.0003 (0.0018) model time 0.2554 (0.2587) loss 5.2678 (5.9848) grad_norm 2.1981 (2.2430) loss_scale 1024.0000 (604.3414) mem 9655MB [2024-08-04 05:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 175 training takes 0:02:43 [2024-08-04 05:49:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:49:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.554 (0.554) Loss 0.5991 (0.5991) Acc@1 89.111 (89.111) Acc@5 98.242 (98.242) Mem 9655MB [2024-08-04 05:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.9883 (0.7632) Acc@1 77.783 (84.806) Acc@5 95.361 (97.314) Mem 9655MB [2024-08-04 05:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0938 (0.9004) Acc@1 75.342 (81.359) Acc@5 93.506 (95.759) Mem 9655MB [2024-08-04 05:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.080 Acc@5 95.743 [2024-08-04 05:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 05:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.881 (0.881) Loss 0.5859 (0.5859) Acc@1 89.893 (89.893) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.133) Loss 0.9307 (0.7250) Acc@1 80.029 (85.747) Acc@5 95.557 (97.514) Mem 9655MB [2024-08-04 05:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.0596 (0.8531) Acc@1 75.684 (82.294) Acc@5 94.629 (96.198) Mem 9655MB [2024-08-04 05:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.986 Acc@5 96.183 [2024-08-04 05:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 05:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.99% [2024-08-04 05:49:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:49:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][0/625] eta 0:07:24 lr 0.000833 wd 0.0500 time 0.7114 (0.7114) data time 0.4672 (0.4672) model time 0.0000 (0.0000) loss 6.8028 (6.8028) grad_norm 2.2785 (2.2785) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][10/625] eta 0:03:03 lr 0.000833 wd 0.0500 time 0.2524 (0.2978) data time 0.0008 (0.0433) model time 0.0000 (0.0000) loss 4.9544 (5.6474) grad_norm 1.4529 (2.2779) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][20/625] eta 0:02:48 lr 0.000833 wd 0.0500 time 0.2558 (0.2779) data time 0.0009 (0.0231) model time 0.0000 (0.0000) loss 6.1693 (5.7304) grad_norm 1.4863 (2.0976) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][30/625] eta 0:02:43 lr 0.000833 wd 0.0500 time 0.2550 (0.2752) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 4.9854 (5.6851) grad_norm 1.3429 (1.9661) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][40/625] eta 0:02:38 lr 0.000833 wd 0.0500 time 0.2591 (0.2705) data time 0.0009 (0.0122) model time 0.0000 (0.0000) loss 6.7671 (5.8443) grad_norm 1.5468 (1.8696) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][50/625] eta 0:02:36 lr 0.000832 wd 0.0500 time 0.2571 (0.2718) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 6.2921 (5.9129) grad_norm 2.9237 (1.8595) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][60/625] eta 0:02:32 lr 0.000832 wd 0.0500 time 0.2560 (0.2693) data time 0.0010 (0.0086) model time 0.2549 (0.2556) loss 6.7492 (5.9566) grad_norm 1.6434 (1.8822) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][70/625] eta 0:02:29 lr 0.000832 wd 0.0500 time 0.2566 (0.2698) data time 0.0008 (0.0075) model time 0.2559 (0.2639) loss 6.0080 (5.9507) grad_norm 2.4983 (1.9639) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][80/625] eta 0:02:26 lr 0.000832 wd 0.0500 time 0.2574 (0.2683) data time 0.0008 (0.0067) model time 0.2565 (0.2614) loss 5.8731 (5.9205) grad_norm 1.9366 (2.0803) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][90/625] eta 0:02:22 lr 0.000832 wd 0.0500 time 0.2589 (0.2671) data time 0.0008 (0.0060) model time 0.2581 (0.2602) loss 6.3154 (5.9108) grad_norm 2.1692 (2.0844) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][100/625] eta 0:02:19 lr 0.000831 wd 0.0500 time 0.2604 (0.2661) data time 0.0008 (0.0055) model time 0.2596 (0.2594) loss 6.3358 (5.9222) grad_norm 1.5889 (2.0883) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][110/625] eta 0:02:16 lr 0.000831 wd 0.0500 time 0.2587 (0.2653) data time 0.0006 (0.0051) model time 0.2581 (0.2588) loss 5.9354 (5.9306) grad_norm 1.6698 (2.0569) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][120/625] eta 0:02:13 lr 0.000831 wd 0.0500 time 0.2574 (0.2645) data time 0.0010 (0.0048) model time 0.2564 (0.2583) loss 5.5374 (5.9188) grad_norm 1.8947 (2.0273) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][130/625] eta 0:02:11 lr 0.000831 wd 0.0500 time 0.2578 (0.2653) data time 0.0008 (0.0045) model time 0.2570 (0.2603) loss 6.8558 (5.9062) grad_norm 1.9018 (2.0786) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][140/625] eta 0:02:08 lr 0.000831 wd 0.0500 time 0.2596 (0.2648) data time 0.0009 (0.0042) model time 0.2587 (0.2599) loss 6.9629 (5.9001) grad_norm 1.2024 (2.0939) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][150/625] eta 0:02:05 lr 0.000831 wd 0.0500 time 0.2627 (0.2642) data time 0.0006 (0.0040) model time 0.2621 (0.2594) loss 5.1616 (5.9228) grad_norm 2.6051 (2.1486) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][160/625] eta 0:02:02 lr 0.000830 wd 0.0500 time 0.2571 (0.2637) data time 0.0009 (0.0038) model time 0.2562 (0.2590) loss 5.9306 (5.9155) grad_norm 1.6121 (2.1416) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][170/625] eta 0:01:59 lr 0.000830 wd 0.0500 time 0.2491 (0.2632) data time 0.0008 (0.0036) model time 0.2483 (0.2586) loss 6.1016 (5.9118) grad_norm 1.7249 (2.1238) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][180/625] eta 0:01:56 lr 0.000830 wd 0.0500 time 0.2532 (0.2628) data time 0.0008 (0.0035) model time 0.2524 (0.2584) loss 5.0395 (5.9306) grad_norm 2.9507 (2.1064) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][190/625] eta 0:01:54 lr 0.000830 wd 0.0500 time 0.2529 (0.2624) data time 0.0008 (0.0034) model time 0.2521 (0.2581) loss 6.2318 (5.9407) grad_norm 1.9094 (2.1105) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][200/625] eta 0:01:51 lr 0.000830 wd 0.0500 time 0.2550 (0.2621) data time 0.0009 (0.0032) model time 0.2542 (0.2579) loss 4.5400 (5.9273) grad_norm 2.6537 (2.1240) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][210/625] eta 0:01:48 lr 0.000830 wd 0.0500 time 0.2554 (0.2619) data time 0.0008 (0.0031) model time 0.2546 (0.2578) loss 5.7488 (5.9282) grad_norm 2.6571 (2.1300) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][220/625] eta 0:01:46 lr 0.000829 wd 0.0500 time 0.4627 (0.2625) data time 0.0008 (0.0030) model time 0.4618 (0.2588) loss 6.8228 (5.9271) grad_norm 1.5379 (2.1243) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][230/625] eta 0:01:43 lr 0.000829 wd 0.0500 time 0.2636 (0.2623) data time 0.0008 (0.0029) model time 0.2628 (0.2587) loss 6.0564 (5.9281) grad_norm 3.1770 (2.1163) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][240/625] eta 0:01:40 lr 0.000829 wd 0.0500 time 0.2541 (0.2620) data time 0.0017 (0.0029) model time 0.2524 (0.2584) loss 6.9390 (5.9351) grad_norm 1.9600 (2.1307) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][250/625] eta 0:01:38 lr 0.000829 wd 0.0500 time 0.2528 (0.2618) data time 0.0007 (0.0028) model time 0.2521 (0.2583) loss 6.2220 (5.9199) grad_norm 1.4253 (2.1317) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][260/625] eta 0:01:35 lr 0.000829 wd 0.0500 time 0.2597 (0.2615) data time 0.0006 (0.0027) model time 0.2591 (0.2581) loss 5.7363 (5.9220) grad_norm 2.3751 (2.1301) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][270/625] eta 0:01:32 lr 0.000829 wd 0.0500 time 0.2553 (0.2613) data time 0.0008 (0.0026) model time 0.2545 (0.2580) loss 5.5014 (5.9177) grad_norm 1.8901 (2.1483) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][280/625] eta 0:01:30 lr 0.000828 wd 0.0500 time 0.2528 (0.2611) data time 0.0011 (0.0026) model time 0.2517 (0.2578) loss 5.8103 (5.9186) grad_norm 1.2055 (2.1650) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][290/625] eta 0:01:27 lr 0.000828 wd 0.0500 time 0.2538 (0.2613) data time 0.0009 (0.0025) model time 0.2530 (0.2582) loss 5.9153 (5.9120) grad_norm 1.5951 (2.1790) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][300/625] eta 0:01:24 lr 0.000828 wd 0.0500 time 0.2615 (0.2612) data time 0.0006 (0.0025) model time 0.2609 (0.2581) loss 6.9134 (5.9167) grad_norm 1.4370 (2.1702) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][310/625] eta 0:01:22 lr 0.000828 wd 0.0500 time 0.2619 (0.2610) data time 0.0006 (0.0024) model time 0.2612 (0.2580) loss 5.9352 (5.9231) grad_norm 1.4551 (2.1677) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][320/625] eta 0:01:19 lr 0.000828 wd 0.0500 time 0.2569 (0.2608) data time 0.0006 (0.0024) model time 0.2563 (0.2578) loss 6.8090 (5.9315) grad_norm 1.9755 (2.1645) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][330/625] eta 0:01:16 lr 0.000827 wd 0.0500 time 0.2497 (0.2607) data time 0.0007 (0.0023) model time 0.2490 (0.2577) loss 5.4473 (5.9358) grad_norm 2.0287 (2.1545) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][340/625] eta 0:01:14 lr 0.000827 wd 0.0500 time 0.4655 (0.2612) data time 0.0009 (0.0023) model time 0.4646 (0.2584) loss 6.8911 (5.9482) grad_norm 1.7105 (2.1499) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][350/625] eta 0:01:11 lr 0.000827 wd 0.0500 time 0.2545 (0.2610) data time 0.0012 (0.0023) model time 0.2533 (0.2583) loss 6.5517 (5.9494) grad_norm 1.6388 (2.1415) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][360/625] eta 0:01:09 lr 0.000827 wd 0.0500 time 0.2553 (0.2615) data time 0.0006 (0.0022) model time 0.2547 (0.2588) loss 4.9100 (5.9375) grad_norm 2.9814 (2.1513) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][370/625] eta 0:01:06 lr 0.000827 wd 0.0500 time 0.2597 (0.2619) data time 0.0006 (0.0022) model time 0.2591 (0.2594) loss 5.6109 (5.9408) grad_norm 2.0329 (2.1577) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][380/625] eta 0:01:04 lr 0.000827 wd 0.0500 time 0.2556 (0.2618) data time 0.0006 (0.0021) model time 0.2550 (0.2593) loss 5.3660 (5.9376) grad_norm 1.7235 (2.1551) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][390/625] eta 0:01:01 lr 0.000826 wd 0.0500 time 0.4201 (0.2620) data time 0.0009 (0.0021) model time 0.4192 (0.2597) loss 6.2519 (5.9390) grad_norm 1.4143 (2.1404) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][400/625] eta 0:00:59 lr 0.000826 wd 0.0500 time 0.2535 (0.2624) data time 0.0009 (0.0021) model time 0.2526 (0.2601) loss 4.8943 (5.9358) grad_norm 2.6060 (2.1351) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][410/625] eta 0:00:56 lr 0.000826 wd 0.0500 time 0.2542 (0.2623) data time 0.0008 (0.0021) model time 0.2534 (0.2600) loss 5.2934 (5.9373) grad_norm 1.2775 (2.1295) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][420/625] eta 0:00:53 lr 0.000826 wd 0.0500 time 0.2563 (0.2626) data time 0.0011 (0.0020) model time 0.2552 (0.2604) loss 6.8273 (5.9427) grad_norm 1.4429 (2.1193) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][430/625] eta 0:00:51 lr 0.000826 wd 0.0500 time 0.2554 (0.2624) data time 0.0007 (0.0020) model time 0.2547 (0.2603) loss 4.7520 (5.9368) grad_norm 1.8420 (2.1220) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][440/625] eta 0:00:48 lr 0.000826 wd 0.0500 time 0.2546 (0.2623) data time 0.0011 (0.0020) model time 0.2535 (0.2601) loss 6.2072 (5.9372) grad_norm 2.1539 (2.1210) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][450/625] eta 0:00:45 lr 0.000825 wd 0.0500 time 0.2567 (0.2622) data time 0.0008 (0.0020) model time 0.2559 (0.2600) loss 6.6133 (5.9450) grad_norm 2.1120 (2.1226) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][460/625] eta 0:00:43 lr 0.000825 wd 0.0500 time 0.2533 (0.2620) data time 0.0010 (0.0019) model time 0.2523 (0.2599) loss 4.8990 (5.9389) grad_norm 1.6085 (2.1112) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][470/625] eta 0:00:40 lr 0.000825 wd 0.0500 time 0.2559 (0.2623) data time 0.0010 (0.0019) model time 0.2549 (0.2603) loss 6.7561 (5.9372) grad_norm 1.3083 (2.1066) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][480/625] eta 0:00:38 lr 0.000825 wd 0.0500 time 0.2534 (0.2626) data time 0.0009 (0.0019) model time 0.2526 (0.2606) loss 5.5665 (5.9379) grad_norm 1.5705 (2.1009) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][490/625] eta 0:00:35 lr 0.000825 wd 0.0500 time 0.2574 (0.2625) data time 0.0006 (0.0019) model time 0.2567 (0.2605) loss 6.5525 (5.9431) grad_norm 2.6013 (2.0949) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][500/625] eta 0:00:32 lr 0.000824 wd 0.0500 time 0.2551 (0.2624) data time 0.0008 (0.0019) model time 0.2543 (0.2604) loss 6.0705 (5.9407) grad_norm 1.5697 (2.0937) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][510/625] eta 0:00:30 lr 0.000824 wd 0.0500 time 0.2593 (0.2622) data time 0.0009 (0.0018) model time 0.2584 (0.2603) loss 6.7568 (5.9443) grad_norm 4.0577 (2.0970) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][520/625] eta 0:00:27 lr 0.000824 wd 0.0500 time 0.2582 (0.2621) data time 0.0007 (0.0018) model time 0.2576 (0.2601) loss 5.0830 (5.9415) grad_norm 1.6541 (2.0975) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][530/625] eta 0:00:24 lr 0.000824 wd 0.0500 time 0.2539 (0.2620) data time 0.0007 (0.0018) model time 0.2532 (0.2600) loss 6.5837 (5.9403) grad_norm 1.3214 (2.0918) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][540/625] eta 0:00:22 lr 0.000824 wd 0.0500 time 0.2591 (0.2619) data time 0.0008 (0.0018) model time 0.2583 (0.2600) loss 6.5211 (5.9375) grad_norm 2.0250 (2.0923) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][550/625] eta 0:00:19 lr 0.000824 wd 0.0500 time 0.2576 (0.2618) data time 0.0008 (0.0018) model time 0.2569 (0.2599) loss 6.0045 (5.9373) grad_norm 3.2891 (2.0961) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][560/625] eta 0:00:17 lr 0.000823 wd 0.0500 time 0.2558 (0.2617) data time 0.0009 (0.0018) model time 0.2549 (0.2597) loss 6.3702 (5.9405) grad_norm 1.7201 (2.0941) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][570/625] eta 0:00:14 lr 0.000823 wd 0.0500 time 0.2566 (0.2616) data time 0.0006 (0.0017) model time 0.2560 (0.2597) loss 7.6676 (5.9391) grad_norm 2.6093 (2.0925) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][580/625] eta 0:00:11 lr 0.000823 wd 0.0500 time 0.2533 (0.2615) data time 0.0010 (0.0017) model time 0.2524 (0.2596) loss 6.6805 (5.9378) grad_norm 4.0401 (2.0958) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][590/625] eta 0:00:09 lr 0.000823 wd 0.0500 time 0.2540 (0.2615) data time 0.0009 (0.0017) model time 0.2531 (0.2595) loss 4.9959 (5.9389) grad_norm 2.1763 (2.0954) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][600/625] eta 0:00:06 lr 0.000823 wd 0.0500 time 0.2559 (0.2614) data time 0.0008 (0.0017) model time 0.2551 (0.2595) loss 5.2130 (5.9430) grad_norm 2.0419 (2.0974) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][610/625] eta 0:00:03 lr 0.000823 wd 0.0500 time 0.2528 (0.2613) data time 0.0005 (0.0017) model time 0.2523 (0.2594) loss 5.7579 (5.9478) grad_norm 2.3145 (2.0968) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][620/625] eta 0:00:01 lr 0.000822 wd 0.0500 time 0.2576 (0.2612) data time 0.0003 (0.0017) model time 0.2573 (0.2593) loss 6.2043 (5.9501) grad_norm 1.4885 (2.1013) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 176 training takes 0:02:43 [2024-08-04 05:52:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:52:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.601 (0.601) Loss 0.6362 (0.6362) Acc@1 88.770 (88.770) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 1.0029 (0.7778) Acc@1 79.102 (84.988) Acc@5 95.361 (97.337) Mem 9655MB [2024-08-04 05:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.1289 (0.9112) Acc@1 75.586 (81.562) Acc@5 93.799 (95.875) Mem 9655MB [2024-08-04 05:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.244 Acc@5 95.849 [2024-08-04 05:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 05:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.24% [2024-08-04 05:52:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 05:52:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 05:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.545 (0.545) Loss 0.5859 (0.5859) Acc@1 89.697 (89.697) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9307 (0.7249) Acc@1 80.322 (85.769) Acc@5 95.557 (97.514) Mem 9655MB [2024-08-04 05:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0605 (0.8528) Acc@1 75.586 (82.294) Acc@5 94.531 (96.187) Mem 9655MB [2024-08-04 05:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.984 Acc@5 96.177 [2024-08-04 05:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 05:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][0/625] eta 0:11:36 lr 0.000822 wd 0.0500 time 1.1138 (1.1138) data time 0.5708 (0.5708) model time 0.0000 (0.0000) loss 5.4673 (5.4673) grad_norm 3.5373 (3.5373) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][10/625] eta 0:03:33 lr 0.000822 wd 0.0500 time 0.2560 (0.3468) data time 0.0008 (0.0528) model time 0.0000 (0.0000) loss 5.7076 (5.9634) grad_norm 1.6682 (2.0355) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][20/625] eta 0:03:03 lr 0.000822 wd 0.0500 time 0.2582 (0.3033) data time 0.0008 (0.0281) model time 0.0000 (0.0000) loss 5.0681 (6.0801) grad_norm 1.9547 (2.3057) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][30/625] eta 0:02:55 lr 0.000822 wd 0.0500 time 0.2530 (0.2956) data time 0.0010 (0.0193) model time 0.0000 (0.0000) loss 5.0863 (6.0956) grad_norm 2.4204 (2.2697) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][40/625] eta 0:02:47 lr 0.000822 wd 0.0500 time 0.2531 (0.2860) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 6.7407 (6.1286) grad_norm 1.7700 (2.2324) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][50/625] eta 0:02:41 lr 0.000821 wd 0.0500 time 0.2556 (0.2801) data time 0.0011 (0.0121) model time 0.0000 (0.0000) loss 5.7592 (6.0767) grad_norm 1.3745 (2.2762) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][60/625] eta 0:02:35 lr 0.000821 wd 0.0500 time 0.2543 (0.2760) data time 0.0010 (0.0103) model time 0.2533 (0.2542) loss 6.3041 (5.9846) grad_norm 1.6007 (2.2125) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][70/625] eta 0:02:31 lr 0.000821 wd 0.0500 time 0.2576 (0.2734) data time 0.0009 (0.0090) model time 0.2567 (0.2554) loss 5.1873 (5.9278) grad_norm 3.2798 (2.2201) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][80/625] eta 0:02:28 lr 0.000821 wd 0.0500 time 0.2550 (0.2717) data time 0.0009 (0.0079) model time 0.2541 (0.2566) loss 5.6080 (5.9629) grad_norm 2.8766 (2.2237) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][90/625] eta 0:02:24 lr 0.000821 wd 0.0500 time 0.2566 (0.2699) data time 0.0008 (0.0072) model time 0.2558 (0.2561) loss 5.1003 (5.9465) grad_norm 1.7999 (2.2038) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][100/625] eta 0:02:21 lr 0.000821 wd 0.0500 time 0.2585 (0.2686) data time 0.0009 (0.0066) model time 0.2576 (0.2560) loss 4.8373 (5.9214) grad_norm 1.5550 (2.1787) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][110/625] eta 0:02:17 lr 0.000820 wd 0.0500 time 0.2558 (0.2676) data time 0.0009 (0.0061) model time 0.2549 (0.2560) loss 5.9508 (5.9468) grad_norm 3.1978 (2.2018) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][120/625] eta 0:02:14 lr 0.000820 wd 0.0500 time 0.2540 (0.2667) data time 0.0009 (0.0056) model time 0.2531 (0.2560) loss 6.0586 (5.9569) grad_norm 2.6874 (2.1976) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][130/625] eta 0:02:11 lr 0.000820 wd 0.0500 time 0.2582 (0.2660) data time 0.0008 (0.0053) model time 0.2575 (0.2561) loss 6.0427 (5.9713) grad_norm 2.0145 (2.1621) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][140/625] eta 0:02:10 lr 0.000820 wd 0.0500 time 0.2593 (0.2682) data time 0.0008 (0.0050) model time 0.2585 (0.2606) loss 6.3735 (5.9697) grad_norm 1.9510 (2.1463) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][150/625] eta 0:02:07 lr 0.000820 wd 0.0500 time 0.2589 (0.2674) data time 0.0010 (0.0047) model time 0.2579 (0.2601) loss 6.1590 (5.9537) grad_norm 1.8593 (2.1550) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][160/625] eta 0:02:04 lr 0.000820 wd 0.0500 time 0.2568 (0.2680) data time 0.0008 (0.0045) model time 0.2560 (0.2614) loss 5.5722 (5.9623) grad_norm 2.0942 (2.1663) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][170/625] eta 0:02:01 lr 0.000819 wd 0.0500 time 0.2565 (0.2674) data time 0.0010 (0.0043) model time 0.2555 (0.2611) loss 4.7788 (5.9671) grad_norm 1.9107 (2.1565) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][180/625] eta 0:01:58 lr 0.000819 wd 0.0500 time 0.2563 (0.2669) data time 0.0008 (0.0041) model time 0.2556 (0.2608) loss 4.9792 (5.9540) grad_norm 1.9945 (2.1605) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][190/625] eta 0:01:55 lr 0.000819 wd 0.0500 time 0.2574 (0.2663) data time 0.0007 (0.0039) model time 0.2568 (0.2603) loss 6.6805 (5.9420) grad_norm 3.4331 (2.1850) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][200/625] eta 0:01:52 lr 0.000819 wd 0.0500 time 0.2536 (0.2657) data time 0.0008 (0.0038) model time 0.2529 (0.2600) loss 6.5170 (5.9449) grad_norm 1.3918 (2.1677) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][210/625] eta 0:01:50 lr 0.000819 wd 0.0500 time 0.2557 (0.2658) data time 0.0008 (0.0036) model time 0.2549 (0.2603) loss 5.1999 (5.9399) grad_norm 2.1233 (2.1710) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][220/625] eta 0:01:47 lr 0.000818 wd 0.0500 time 0.2587 (0.2653) data time 0.0005 (0.0035) model time 0.2581 (0.2599) loss 4.7139 (5.9406) grad_norm 1.5731 (2.1666) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][230/625] eta 0:01:44 lr 0.000818 wd 0.0500 time 0.2548 (0.2648) data time 0.0007 (0.0034) model time 0.2541 (0.2596) loss 7.0059 (5.9303) grad_norm 2.0633 (2.1651) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][240/625] eta 0:01:41 lr 0.000818 wd 0.0500 time 0.2551 (0.2645) data time 0.0010 (0.0033) model time 0.2541 (0.2593) loss 4.5398 (5.9276) grad_norm 2.9952 (2.1593) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][250/625] eta 0:01:39 lr 0.000818 wd 0.0500 time 0.2555 (0.2649) data time 0.0008 (0.0032) model time 0.2547 (0.2601) loss 6.7484 (5.9414) grad_norm 2.5024 (2.1576) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][260/625] eta 0:01:36 lr 0.000818 wd 0.0500 time 0.2560 (0.2646) data time 0.0008 (0.0031) model time 0.2552 (0.2599) loss 6.1394 (5.9384) grad_norm 1.5094 (2.1534) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][270/625] eta 0:01:33 lr 0.000818 wd 0.0500 time 0.2518 (0.2643) data time 0.0008 (0.0030) model time 0.2509 (0.2597) loss 6.0952 (5.9418) grad_norm 3.9360 (2.1662) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][280/625] eta 0:01:31 lr 0.000817 wd 0.0500 time 0.2526 (0.2640) data time 0.0010 (0.0029) model time 0.2516 (0.2595) loss 5.3129 (5.9385) grad_norm 2.4469 (2.1819) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][290/625] eta 0:01:28 lr 0.000817 wd 0.0500 time 0.2576 (0.2641) data time 0.0007 (0.0029) model time 0.2569 (0.2598) loss 6.9433 (5.9361) grad_norm 2.1526 (2.1879) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][300/625] eta 0:01:25 lr 0.000817 wd 0.0500 time 0.2583 (0.2639) data time 0.0006 (0.0028) model time 0.2577 (0.2597) loss 6.6095 (5.9283) grad_norm 1.6501 (2.1938) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][310/625] eta 0:01:23 lr 0.000817 wd 0.0500 time 0.2563 (0.2637) data time 0.0008 (0.0027) model time 0.2555 (0.2595) loss 6.9112 (5.9534) grad_norm 2.0199 (2.1842) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][320/625] eta 0:01:20 lr 0.000817 wd 0.0500 time 0.2542 (0.2634) data time 0.0007 (0.0027) model time 0.2535 (0.2594) loss 6.2785 (5.9484) grad_norm 2.4747 (2.1850) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][330/625] eta 0:01:17 lr 0.000817 wd 0.0500 time 0.2612 (0.2632) data time 0.0006 (0.0026) model time 0.2607 (0.2592) loss 6.5353 (5.9529) grad_norm 1.6953 (2.1814) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][340/625] eta 0:01:14 lr 0.000816 wd 0.0500 time 0.2527 (0.2630) data time 0.0006 (0.0026) model time 0.2521 (0.2591) loss 5.8826 (5.9564) grad_norm 3.5546 (2.1856) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][350/625] eta 0:01:12 lr 0.000816 wd 0.0500 time 0.2564 (0.2633) data time 0.0009 (0.0025) model time 0.2555 (0.2595) loss 6.2296 (5.9541) grad_norm 2.1343 (2.1806) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][360/625] eta 0:01:09 lr 0.000816 wd 0.0500 time 0.2613 (0.2631) data time 0.0009 (0.0025) model time 0.2604 (0.2594) loss 5.7894 (5.9511) grad_norm 1.6923 (2.1911) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][370/625] eta 0:01:07 lr 0.000816 wd 0.0500 time 0.2532 (0.2629) data time 0.0018 (0.0025) model time 0.2514 (0.2593) loss 5.7927 (5.9548) grad_norm 2.2480 (2.2071) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][380/625] eta 0:01:04 lr 0.000816 wd 0.0500 time 0.2556 (0.2627) data time 0.0008 (0.0024) model time 0.2547 (0.2591) loss 6.5107 (5.9479) grad_norm 2.4460 (2.2070) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][390/625] eta 0:01:01 lr 0.000816 wd 0.0500 time 0.2667 (0.2626) data time 0.0010 (0.0024) model time 0.2657 (0.2591) loss 6.5144 (5.9552) grad_norm 2.3496 (2.2095) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][400/625] eta 0:00:59 lr 0.000815 wd 0.0500 time 0.2565 (0.2627) data time 0.0006 (0.0023) model time 0.2559 (0.2593) loss 5.8527 (5.9491) grad_norm 1.7850 (2.1997) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][410/625] eta 0:00:56 lr 0.000815 wd 0.0500 time 0.2530 (0.2626) data time 0.0009 (0.0023) model time 0.2521 (0.2591) loss 5.8770 (5.9467) grad_norm 1.8454 (2.2038) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][420/625] eta 0:00:53 lr 0.000815 wd 0.0500 time 0.2569 (0.2624) data time 0.0009 (0.0023) model time 0.2560 (0.2590) loss 6.9822 (5.9595) grad_norm 2.0461 (2.2072) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][430/625] eta 0:00:51 lr 0.000815 wd 0.0500 time 0.2585 (0.2623) data time 0.0010 (0.0022) model time 0.2575 (0.2590) loss 5.7157 (5.9590) grad_norm 2.0452 (2.2071) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][440/625] eta 0:00:48 lr 0.000815 wd 0.0500 time 0.2551 (0.2621) data time 0.0011 (0.0022) model time 0.2540 (0.2589) loss 5.9062 (5.9609) grad_norm 1.9877 (2.1984) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][450/625] eta 0:00:45 lr 0.000814 wd 0.0500 time 0.2543 (0.2620) data time 0.0007 (0.0022) model time 0.2536 (0.2587) loss 6.3786 (5.9558) grad_norm 4.1117 (2.2084) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][460/625] eta 0:00:43 lr 0.000814 wd 0.0500 time 0.2608 (0.2619) data time 0.0008 (0.0022) model time 0.2600 (0.2587) loss 5.8649 (5.9537) grad_norm 2.5630 (2.2237) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][470/625] eta 0:00:40 lr 0.000814 wd 0.0500 time 0.2570 (0.2618) data time 0.0006 (0.0021) model time 0.2564 (0.2586) loss 6.2090 (5.9516) grad_norm 1.6550 (2.2202) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][480/625] eta 0:00:37 lr 0.000814 wd 0.0500 time 0.2542 (0.2617) data time 0.0009 (0.0021) model time 0.2532 (0.2586) loss 6.5733 (5.9463) grad_norm 1.9066 (2.2268) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][490/625] eta 0:00:35 lr 0.000814 wd 0.0500 time 0.2617 (0.2616) data time 0.0007 (0.0021) model time 0.2610 (0.2585) loss 4.8922 (5.9468) grad_norm 2.4335 (2.2246) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][500/625] eta 0:00:32 lr 0.000814 wd 0.0500 time 0.2610 (0.2619) data time 0.0011 (0.0021) model time 0.2599 (0.2589) loss 5.3402 (5.9391) grad_norm 2.1600 (2.2376) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][510/625] eta 0:00:30 lr 0.000813 wd 0.0500 time 0.2539 (0.2620) data time 0.0009 (0.0020) model time 0.2529 (0.2591) loss 5.6260 (5.9303) grad_norm 2.3104 (2.2430) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][520/625] eta 0:00:27 lr 0.000813 wd 0.0500 time 0.2555 (0.2622) data time 0.0010 (0.0020) model time 0.2545 (0.2593) loss 4.8084 (5.9331) grad_norm 1.8819 (2.2520) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][530/625] eta 0:00:24 lr 0.000813 wd 0.0500 time 0.2573 (0.2621) data time 0.0006 (0.0020) model time 0.2567 (0.2593) loss 5.7046 (5.9322) grad_norm 1.2005 (2.2490) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][540/625] eta 0:00:22 lr 0.000813 wd 0.0500 time 0.2566 (0.2620) data time 0.0007 (0.0020) model time 0.2560 (0.2592) loss 5.3375 (5.9290) grad_norm 1.8175 (2.2456) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][550/625] eta 0:00:19 lr 0.000813 wd 0.0500 time 0.2578 (0.2619) data time 0.0008 (0.0020) model time 0.2570 (0.2591) loss 5.3958 (5.9268) grad_norm 1.8968 (2.2410) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][560/625] eta 0:00:17 lr 0.000813 wd 0.0500 time 0.2517 (0.2618) data time 0.0007 (0.0019) model time 0.2510 (0.2591) loss 4.8914 (5.9251) grad_norm 1.5317 (2.2452) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][570/625] eta 0:00:14 lr 0.000812 wd 0.0500 time 0.2532 (0.2617) data time 0.0006 (0.0019) model time 0.2525 (0.2590) loss 5.2550 (5.9251) grad_norm 1.4626 (2.2373) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][580/625] eta 0:00:11 lr 0.000812 wd 0.0500 time 0.2565 (0.2616) data time 0.0009 (0.0019) model time 0.2556 (0.2589) loss 5.5360 (5.9252) grad_norm 3.1475 (2.2351) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][590/625] eta 0:00:09 lr 0.000812 wd 0.0500 time 0.2566 (0.2615) data time 0.0012 (0.0019) model time 0.2554 (0.2589) loss 5.8293 (5.9226) grad_norm 2.4429 (2.2359) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][600/625] eta 0:00:06 lr 0.000812 wd 0.0500 time 0.2561 (0.2615) data time 0.0007 (0.0019) model time 0.2555 (0.2588) loss 4.9584 (5.9190) grad_norm 1.6815 (2.2389) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][610/625] eta 0:00:03 lr 0.000812 wd 0.0500 time 0.2524 (0.2614) data time 0.0007 (0.0019) model time 0.2517 (0.2587) loss 5.8689 (5.9240) grad_norm 1.3489 (2.2334) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][620/625] eta 0:00:01 lr 0.000812 wd 0.0500 time 0.2540 (0.2612) data time 0.0006 (0.0018) model time 0.2534 (0.2586) loss 5.9310 (5.9271) grad_norm 2.4450 (2.2276) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 177 training takes 0:02:43 [2024-08-04 05:55:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:55:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.6514 (0.6514) Acc@1 88.184 (88.184) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 05:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0068 (0.7805) Acc@1 79.492 (84.970) Acc@5 95.068 (97.341) Mem 9655MB [2024-08-04 05:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1436 (0.9127) Acc@1 74.365 (81.471) Acc@5 93.896 (95.852) Mem 9655MB [2024-08-04 05:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.188 Acc@5 95.813 [2024-08-04 05:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-04 05:55:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.760 (0.760) Loss 0.5850 (0.5850) Acc@1 89.844 (89.844) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9307 (0.7241) Acc@1 80.127 (85.778) Acc@5 95.557 (97.479) Mem 9655MB [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0596 (0.8521) Acc@1 75.586 (82.313) Acc@5 94.434 (96.147) Mem 9655MB [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.004 Acc@5 96.143 [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.00% [2024-08-04 05:55:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:55:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][0/625] eta 0:07:17 lr 0.000811 wd 0.0500 time 0.6997 (0.6997) data time 0.4591 (0.4591) model time 0.0000 (0.0000) loss 5.6045 (5.6045) grad_norm 1.4765 (1.4765) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][10/625] eta 0:03:01 lr 0.000811 wd 0.0500 time 0.2554 (0.2955) data time 0.0009 (0.0425) model time 0.0000 (0.0000) loss 6.2597 (6.1029) grad_norm 1.7711 (1.6547) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][20/625] eta 0:02:47 lr 0.000811 wd 0.0500 time 0.2600 (0.2764) data time 0.0005 (0.0227) model time 0.0000 (0.0000) loss 6.9148 (5.9271) grad_norm 1.1405 (1.8482) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][30/625] eta 0:02:40 lr 0.000811 wd 0.0500 time 0.2566 (0.2697) data time 0.0008 (0.0156) model time 0.0000 (0.0000) loss 5.8213 (5.9399) grad_norm 1.7518 (1.9174) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][40/625] eta 0:02:38 lr 0.000811 wd 0.0500 time 0.2582 (0.2713) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 5.6379 (5.9206) grad_norm 3.1960 (2.0498) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][50/625] eta 0:02:36 lr 0.000811 wd 0.0500 time 0.2551 (0.2722) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 5.3518 (5.9414) grad_norm 1.6709 (2.0688) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][60/625] eta 0:02:32 lr 0.000810 wd 0.0500 time 0.2580 (0.2695) data time 0.0006 (0.0084) model time 0.2574 (0.2551) loss 5.3827 (5.9435) grad_norm 3.8373 (2.0497) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][70/625] eta 0:02:28 lr 0.000810 wd 0.0500 time 0.2532 (0.2677) data time 0.0008 (0.0073) model time 0.2524 (0.2554) loss 5.3699 (5.9147) grad_norm 2.2516 (2.1701) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][80/625] eta 0:02:27 lr 0.000810 wd 0.0500 time 0.2568 (0.2714) data time 0.0007 (0.0065) model time 0.2561 (0.2692) loss 5.6463 (5.9309) grad_norm 1.9519 (2.1641) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][90/625] eta 0:02:25 lr 0.000810 wd 0.0500 time 0.2564 (0.2718) data time 0.0007 (0.0059) model time 0.2558 (0.2705) loss 6.2761 (5.9172) grad_norm 1.5380 (2.1424) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][100/625] eta 0:02:21 lr 0.000810 wd 0.0500 time 0.2552 (0.2703) data time 0.0006 (0.0054) model time 0.2546 (0.2674) loss 4.7159 (5.9182) grad_norm 2.3354 (2.1229) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][110/625] eta 0:02:18 lr 0.000810 wd 0.0500 time 0.2564 (0.2690) data time 0.0009 (0.0050) model time 0.2555 (0.2654) loss 6.1606 (5.9533) grad_norm 2.9795 (2.1068) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][120/625] eta 0:02:15 lr 0.000809 wd 0.0500 time 0.2630 (0.2680) data time 0.0008 (0.0047) model time 0.2622 (0.2640) loss 6.0256 (5.9513) grad_norm 2.1916 (2.1248) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][130/625] eta 0:02:12 lr 0.000809 wd 0.0500 time 0.2565 (0.2671) data time 0.0007 (0.0044) model time 0.2558 (0.2630) loss 5.4465 (5.9568) grad_norm 1.8452 (2.1711) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][140/625] eta 0:02:09 lr 0.000809 wd 0.0500 time 0.2629 (0.2664) data time 0.0006 (0.0041) model time 0.2623 (0.2622) loss 5.5665 (5.9562) grad_norm 2.8875 (2.1489) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][150/625] eta 0:02:06 lr 0.000809 wd 0.0500 time 0.2591 (0.2657) data time 0.0009 (0.0039) model time 0.2583 (0.2615) loss 6.6895 (5.9773) grad_norm 2.7222 (2.1339) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][160/625] eta 0:02:03 lr 0.000809 wd 0.0500 time 0.2559 (0.2651) data time 0.0009 (0.0037) model time 0.2550 (0.2609) loss 4.8975 (5.9727) grad_norm 1.8974 (2.1542) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:55:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][170/625] eta 0:02:00 lr 0.000808 wd 0.0500 time 0.2562 (0.2645) data time 0.0008 (0.0035) model time 0.2554 (0.2604) loss 6.2265 (5.9881) grad_norm 1.6798 (2.1770) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][180/625] eta 0:01:57 lr 0.000808 wd 0.0500 time 0.2635 (0.2641) data time 0.0006 (0.0034) model time 0.2629 (0.2600) loss 6.7975 (5.9805) grad_norm 3.1450 (2.2002) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][190/625] eta 0:01:54 lr 0.000808 wd 0.0500 time 0.2548 (0.2637) data time 0.0006 (0.0033) model time 0.2542 (0.2597) loss 5.7472 (5.9877) grad_norm 1.7355 (2.1821) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][200/625] eta 0:01:52 lr 0.000808 wd 0.0500 time 0.3849 (0.2640) data time 0.0008 (0.0031) model time 0.3841 (0.2603) loss 5.8660 (5.9952) grad_norm 1.5680 (2.1665) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][210/625] eta 0:01:49 lr 0.000808 wd 0.0500 time 0.2575 (0.2642) data time 0.0006 (0.0030) model time 0.2569 (0.2608) loss 6.5591 (5.9871) grad_norm 1.3289 (2.1492) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][220/625] eta 0:01:46 lr 0.000808 wd 0.0500 time 0.2629 (0.2638) data time 0.0008 (0.0029) model time 0.2621 (0.2604) loss 5.6235 (5.9965) grad_norm 1.7230 (2.1485) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][230/625] eta 0:01:44 lr 0.000807 wd 0.0500 time 0.2557 (0.2639) data time 0.0009 (0.0029) model time 0.2549 (0.2607) loss 6.2714 (6.0048) grad_norm 2.3719 (2.1475) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][240/625] eta 0:01:41 lr 0.000807 wd 0.0500 time 0.2567 (0.2636) data time 0.0010 (0.0028) model time 0.2557 (0.2603) loss 6.0660 (6.0077) grad_norm 3.7939 (2.1561) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][250/625] eta 0:01:38 lr 0.000807 wd 0.0500 time 0.2547 (0.2633) data time 0.0011 (0.0027) model time 0.2536 (0.2601) loss 5.6183 (5.9927) grad_norm 2.3457 (2.1580) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][260/625] eta 0:01:36 lr 0.000807 wd 0.0500 time 0.2577 (0.2630) data time 0.0006 (0.0026) model time 0.2571 (0.2599) loss 6.6891 (6.0024) grad_norm 1.4978 (2.1444) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][270/625] eta 0:01:33 lr 0.000807 wd 0.0500 time 0.2605 (0.2628) data time 0.0008 (0.0026) model time 0.2597 (0.2597) loss 4.7238 (6.0025) grad_norm 2.6226 (2.1550) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][280/625] eta 0:01:30 lr 0.000807 wd 0.0500 time 0.2587 (0.2625) data time 0.0008 (0.0025) model time 0.2579 (0.2595) loss 6.9296 (6.0045) grad_norm 1.9223 (2.1506) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][290/625] eta 0:01:27 lr 0.000806 wd 0.0500 time 0.2552 (0.2624) data time 0.0011 (0.0025) model time 0.2541 (0.2594) loss 5.8123 (6.0120) grad_norm 1.6335 (2.1490) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][300/625] eta 0:01:25 lr 0.000806 wd 0.0500 time 0.2564 (0.2622) data time 0.0007 (0.0024) model time 0.2557 (0.2592) loss 5.2147 (6.0036) grad_norm 1.4005 (2.1384) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][310/625] eta 0:01:22 lr 0.000806 wd 0.0500 time 0.2544 (0.2620) data time 0.0011 (0.0024) model time 0.2533 (0.2591) loss 6.1417 (5.9894) grad_norm 1.5077 (2.1410) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][320/625] eta 0:01:19 lr 0.000806 wd 0.0500 time 0.2617 (0.2619) data time 0.0010 (0.0023) model time 0.2607 (0.2590) loss 5.9503 (5.9801) grad_norm 2.9683 (2.1524) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][330/625] eta 0:01:17 lr 0.000806 wd 0.0500 time 0.2548 (0.2623) data time 0.0007 (0.0023) model time 0.2541 (0.2595) loss 5.8082 (5.9840) grad_norm 3.7177 (2.1591) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][340/625] eta 0:01:14 lr 0.000806 wd 0.0500 time 0.2625 (0.2621) data time 0.0007 (0.0022) model time 0.2617 (0.2594) loss 6.0140 (5.9903) grad_norm 1.6495 (2.1606) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][350/625] eta 0:01:12 lr 0.000805 wd 0.0500 time 0.2710 (0.2620) data time 0.0008 (0.0022) model time 0.2701 (0.2594) loss 6.2923 (5.9911) grad_norm 1.3745 (2.1492) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][360/625] eta 0:01:09 lr 0.000805 wd 0.0500 time 0.2660 (0.2619) data time 0.0008 (0.0022) model time 0.2651 (0.2593) loss 5.9975 (5.9863) grad_norm 1.4586 (2.1354) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][370/625] eta 0:01:06 lr 0.000805 wd 0.0500 time 0.2556 (0.2618) data time 0.0011 (0.0021) model time 0.2545 (0.2592) loss 6.2376 (5.9826) grad_norm 1.7386 (2.1197) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][380/625] eta 0:01:04 lr 0.000805 wd 0.0500 time 0.2589 (0.2616) data time 0.0007 (0.0021) model time 0.2582 (0.2591) loss 6.8368 (5.9777) grad_norm 1.3276 (2.1100) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][390/625] eta 0:01:01 lr 0.000805 wd 0.0500 time 0.2557 (0.2615) data time 0.0011 (0.0021) model time 0.2545 (0.2590) loss 4.8501 (5.9718) grad_norm 1.6307 (2.1011) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][400/625] eta 0:00:58 lr 0.000804 wd 0.0500 time 0.2571 (0.2614) data time 0.0009 (0.0021) model time 0.2562 (0.2589) loss 7.0419 (5.9806) grad_norm 2.4803 (2.1034) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][410/625] eta 0:00:56 lr 0.000804 wd 0.0500 time 0.2554 (0.2613) data time 0.0011 (0.0020) model time 0.2543 (0.2588) loss 5.8767 (5.9803) grad_norm 3.3412 (2.1120) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][420/625] eta 0:00:53 lr 0.000804 wd 0.0500 time 0.2563 (0.2612) data time 0.0009 (0.0020) model time 0.2554 (0.2587) loss 6.1950 (5.9744) grad_norm 2.9370 (2.1189) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][430/625] eta 0:00:50 lr 0.000804 wd 0.0500 time 0.2540 (0.2610) data time 0.0013 (0.0020) model time 0.2527 (0.2586) loss 5.8308 (5.9638) grad_norm 3.3272 (2.1474) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][440/625] eta 0:00:48 lr 0.000804 wd 0.0500 time 0.2529 (0.2610) data time 0.0009 (0.0020) model time 0.2520 (0.2586) loss 4.7703 (5.9584) grad_norm 3.1196 (2.1775) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][450/625] eta 0:00:45 lr 0.000804 wd 0.0500 time 0.2529 (0.2616) data time 0.0010 (0.0019) model time 0.2519 (0.2593) loss 4.7465 (5.9577) grad_norm 2.1939 (2.1821) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][460/625] eta 0:00:43 lr 0.000803 wd 0.0500 time 0.2539 (0.2615) data time 0.0007 (0.0019) model time 0.2531 (0.2592) loss 5.1325 (5.9535) grad_norm 1.2325 (2.1725) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][470/625] eta 0:00:40 lr 0.000803 wd 0.0500 time 0.2527 (0.2614) data time 0.0010 (0.0019) model time 0.2518 (0.2591) loss 5.3750 (5.9581) grad_norm 2.3904 (2.1716) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][480/625] eta 0:00:37 lr 0.000803 wd 0.0500 time 0.2511 (0.2613) data time 0.0010 (0.0019) model time 0.2501 (0.2590) loss 5.7019 (5.9543) grad_norm 1.9453 (2.1662) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][490/625] eta 0:00:35 lr 0.000803 wd 0.0500 time 0.2545 (0.2616) data time 0.0011 (0.0019) model time 0.2534 (0.2594) loss 4.4363 (5.9447) grad_norm 1.5757 (2.1564) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][500/625] eta 0:00:32 lr 0.000803 wd 0.0500 time 0.2619 (0.2619) data time 0.0009 (0.0018) model time 0.2611 (0.2598) loss 5.6234 (5.9540) grad_norm 1.3052 (2.1471) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][510/625] eta 0:00:30 lr 0.000803 wd 0.0500 time 0.2563 (0.2618) data time 0.0007 (0.0018) model time 0.2557 (0.2597) loss 5.9507 (5.9532) grad_norm 1.9821 (2.1396) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][520/625] eta 0:00:27 lr 0.000802 wd 0.0500 time 0.2545 (0.2617) data time 0.0008 (0.0018) model time 0.2537 (0.2596) loss 5.5084 (5.9622) grad_norm 2.4520 (2.1417) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][530/625] eta 0:00:24 lr 0.000802 wd 0.0500 time 0.2553 (0.2616) data time 0.0010 (0.0018) model time 0.2543 (0.2595) loss 6.3078 (5.9624) grad_norm 2.8534 (2.1580) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][540/625] eta 0:00:22 lr 0.000802 wd 0.0500 time 0.2585 (0.2615) data time 0.0005 (0.0018) model time 0.2580 (0.2594) loss 5.0707 (5.9592) grad_norm 1.6887 (2.1626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][550/625] eta 0:00:19 lr 0.000802 wd 0.0500 time 0.2546 (0.2616) data time 0.0007 (0.0018) model time 0.2539 (0.2596) loss 5.0661 (5.9525) grad_norm 1.1493 (2.1558) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][560/625] eta 0:00:16 lr 0.000802 wd 0.0500 time 0.2545 (0.2615) data time 0.0007 (0.0018) model time 0.2538 (0.2595) loss 6.0300 (5.9513) grad_norm 1.4702 (2.1490) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][570/625] eta 0:00:14 lr 0.000802 wd 0.0500 time 0.2563 (0.2614) data time 0.0007 (0.0017) model time 0.2557 (0.2594) loss 5.6863 (5.9533) grad_norm 3.2879 (2.1482) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][580/625] eta 0:00:11 lr 0.000801 wd 0.0500 time 0.2557 (0.2613) data time 0.0007 (0.0017) model time 0.2549 (0.2593) loss 5.5351 (5.9534) grad_norm 2.1217 (2.1425) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][590/625] eta 0:00:09 lr 0.000801 wd 0.0500 time 0.2561 (0.2613) data time 0.0009 (0.0017) model time 0.2552 (0.2593) loss 6.3774 (5.9540) grad_norm 1.5758 (2.1367) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][600/625] eta 0:00:06 lr 0.000801 wd 0.0500 time 0.2567 (0.2612) data time 0.0008 (0.0017) model time 0.2559 (0.2592) loss 7.0822 (5.9558) grad_norm 3.3802 (2.1316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][610/625] eta 0:00:03 lr 0.000801 wd 0.0500 time 0.2535 (0.2611) data time 0.0005 (0.0017) model time 0.2530 (0.2591) loss 5.7769 (5.9522) grad_norm 3.2134 (2.1428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][620/625] eta 0:00:01 lr 0.000801 wd 0.0500 time 0.2554 (0.2611) data time 0.0003 (0.0017) model time 0.2551 (0.2592) loss 6.5883 (5.9560) grad_norm 3.2455 (2.1457) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 178 training takes 0:02:43 [2024-08-04 05:57:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 05:57:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 05:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.6333 (0.6333) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 05:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0322 (0.7913) Acc@1 78.223 (84.930) Acc@5 95.117 (97.172) Mem 9655MB [2024-08-04 05:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1191 (0.9246) Acc@1 75.635 (81.508) Acc@5 94.043 (95.826) Mem 9655MB [2024-08-04 05:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.130 Acc@5 95.813 [2024-08-04 05:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 05:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.753 (0.753) Loss 0.5850 (0.5850) Acc@1 89.697 (89.697) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 05:58:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9292 (0.7241) Acc@1 80.176 (85.778) Acc@5 95.557 (97.483) Mem 9655MB [2024-08-04 05:58:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0586 (0.8520) Acc@1 75.439 (82.317) Acc@5 94.482 (96.159) Mem 9655MB [2024-08-04 05:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.020 Acc@5 96.149 [2024-08-04 05:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 05:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.02% [2024-08-04 05:58:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 05:58:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 05:58:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][0/625] eta 0:07:30 lr 0.000801 wd 0.0500 time 0.7215 (0.7215) data time 0.4773 (0.4773) model time 0.0000 (0.0000) loss 5.2748 (5.2748) grad_norm 2.9813 (2.9813) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 05:58:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][10/625] eta 0:03:03 lr 0.000800 wd 0.0500 time 0.2560 (0.2978) data time 0.0009 (0.0442) model time 0.0000 (0.0000) loss 5.5925 (5.9449) grad_norm 1.7679 (2.3757) loss_scale 2048.0000 (1210.1818) mem 9655MB [2024-08-04 05:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][20/625] eta 0:02:52 lr 0.000800 wd 0.0500 time 0.2546 (0.2846) data time 0.0010 (0.0236) model time 0.0000 (0.0000) loss 6.0945 (5.9746) grad_norm 2.0967 (2.2140) loss_scale 2048.0000 (1609.1429) mem 9655MB [2024-08-04 05:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][30/625] eta 0:02:47 lr 0.000800 wd 0.0500 time 0.2561 (0.2821) data time 0.0011 (0.0163) model time 0.0000 (0.0000) loss 5.1609 (5.8568) grad_norm 1.7109 (2.0446) loss_scale 2048.0000 (1750.7097) mem 9655MB [2024-08-04 05:58:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][40/625] eta 0:02:41 lr 0.000800 wd 0.0500 time 0.2732 (0.2766) data time 0.0007 (0.0125) model time 0.0000 (0.0000) loss 6.4645 (5.8282) grad_norm 1.6450 (2.0758) loss_scale 2048.0000 (1823.2195) mem 9655MB [2024-08-04 05:58:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][50/625] eta 0:02:36 lr 0.000800 wd 0.0500 time 0.2552 (0.2726) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 6.0721 (5.8787) grad_norm 2.0821 (2.0271) loss_scale 2048.0000 (1867.2941) mem 9655MB [2024-08-04 05:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][60/625] eta 0:02:32 lr 0.000800 wd 0.0500 time 0.2573 (0.2700) data time 0.0010 (0.0087) model time 0.2564 (0.2555) loss 6.5785 (5.8925) grad_norm 1.3988 (1.9396) loss_scale 2048.0000 (1896.9180) mem 9655MB [2024-08-04 05:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][70/625] eta 0:02:28 lr 0.000799 wd 0.0500 time 0.2531 (0.2680) data time 0.0008 (0.0076) model time 0.2523 (0.2552) loss 6.9501 (5.9490) grad_norm 1.5271 (1.9862) loss_scale 2048.0000 (1918.1972) mem 9655MB [2024-08-04 05:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][80/625] eta 0:02:25 lr 0.000799 wd 0.0500 time 0.2582 (0.2665) data time 0.0007 (0.0068) model time 0.2575 (0.2551) loss 6.4309 (5.9042) grad_norm 2.3378 (2.1454) loss_scale 2048.0000 (1934.2222) mem 9655MB [2024-08-04 05:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][90/625] eta 0:02:22 lr 0.000799 wd 0.0500 time 0.2552 (0.2654) data time 0.0009 (0.0061) model time 0.2543 (0.2554) loss 5.0000 (5.8797) grad_norm 1.8391 (2.2089) loss_scale 2048.0000 (1946.7253) mem 9655MB [2024-08-04 05:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][100/625] eta 0:02:18 lr 0.000799 wd 0.0500 time 0.2544 (0.2645) data time 0.0008 (0.0056) model time 0.2536 (0.2552) loss 5.2946 (5.8891) grad_norm 2.3821 (2.1742) loss_scale 2048.0000 (1956.7525) mem 9655MB [2024-08-04 05:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][110/625] eta 0:02:15 lr 0.000799 wd 0.0500 time 0.2525 (0.2636) data time 0.0007 (0.0052) model time 0.2519 (0.2551) loss 5.5181 (5.8760) grad_norm 1.8645 (2.1426) loss_scale 2048.0000 (1964.9730) mem 9655MB [2024-08-04 05:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][120/625] eta 0:02:12 lr 0.000798 wd 0.0500 time 0.2589 (0.2630) data time 0.0007 (0.0048) model time 0.2582 (0.2550) loss 5.8352 (5.8894) grad_norm 2.8260 (2.1239) loss_scale 2048.0000 (1971.8347) mem 9655MB [2024-08-04 05:58:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][130/625] eta 0:02:09 lr 0.000798 wd 0.0500 time 0.2529 (0.2624) data time 0.0011 (0.0045) model time 0.2518 (0.2549) loss 5.2653 (5.8834) grad_norm 2.8169 (2.1263) loss_scale 2048.0000 (1977.6489) mem 9655MB [2024-08-04 05:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][140/625] eta 0:02:07 lr 0.000798 wd 0.0500 time 0.2568 (0.2635) data time 0.0010 (0.0043) model time 0.2558 (0.2574) loss 5.0238 (5.8764) grad_norm 3.9388 (2.1723) loss_scale 2048.0000 (1982.6383) mem 9655MB [2024-08-04 05:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][150/625] eta 0:02:05 lr 0.000798 wd 0.0500 time 0.2564 (0.2632) data time 0.0007 (0.0041) model time 0.2557 (0.2575) loss 5.2042 (5.8853) grad_norm 1.4740 (2.2527) loss_scale 2048.0000 (1986.9669) mem 9655MB [2024-08-04 05:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][160/625] eta 0:02:02 lr 0.000798 wd 0.0500 time 0.2560 (0.2629) data time 0.0008 (0.0039) model time 0.2553 (0.2575) loss 6.7359 (5.8915) grad_norm 1.6913 (2.2533) loss_scale 2048.0000 (1990.7578) mem 9655MB [2024-08-04 05:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][170/625] eta 0:01:59 lr 0.000798 wd 0.0500 time 0.2536 (0.2635) data time 0.0010 (0.0037) model time 0.2526 (0.2587) loss 4.5112 (5.8900) grad_norm 1.5929 (2.2298) loss_scale 2048.0000 (1994.1053) mem 9655MB [2024-08-04 05:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][180/625] eta 0:01:57 lr 0.000797 wd 0.0500 time 0.2552 (0.2631) data time 0.0012 (0.0035) model time 0.2539 (0.2584) loss 7.2353 (5.8947) grad_norm 1.5262 (2.2094) loss_scale 2048.0000 (1997.0829) mem 9655MB [2024-08-04 05:58:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][190/625] eta 0:01:54 lr 0.000797 wd 0.0500 time 0.2567 (0.2638) data time 0.0007 (0.0034) model time 0.2559 (0.2597) loss 5.6056 (5.9037) grad_norm 2.0100 (2.1819) loss_scale 2048.0000 (1999.7487) mem 9655MB [2024-08-04 05:58:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][200/625] eta 0:01:52 lr 0.000797 wd 0.0500 time 0.2560 (0.2645) data time 0.0009 (0.0033) model time 0.2551 (0.2608) loss 6.3479 (5.9243) grad_norm 1.5453 (2.1693) loss_scale 2048.0000 (2002.1493) mem 9655MB [2024-08-04 05:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][210/625] eta 0:01:49 lr 0.000797 wd 0.0500 time 0.2590 (0.2641) data time 0.0007 (0.0032) model time 0.2583 (0.2605) loss 6.5139 (5.9276) grad_norm 1.2755 (2.1460) loss_scale 2048.0000 (2004.3223) mem 9655MB [2024-08-04 05:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][220/625] eta 0:01:46 lr 0.000797 wd 0.0500 time 0.2555 (0.2638) data time 0.0012 (0.0031) model time 0.2543 (0.2602) loss 5.8230 (5.9299) grad_norm 2.2171 (2.1425) loss_scale 2048.0000 (2006.2986) mem 9655MB [2024-08-04 05:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][230/625] eta 0:01:44 lr 0.000797 wd 0.0500 time 0.2545 (0.2634) data time 0.0011 (0.0030) model time 0.2534 (0.2599) loss 7.6856 (5.9389) grad_norm 2.3772 (2.1250) loss_scale 2048.0000 (2008.1039) mem 9655MB [2024-08-04 05:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][240/625] eta 0:01:41 lr 0.000796 wd 0.0500 time 0.2555 (0.2632) data time 0.0015 (0.0029) model time 0.2540 (0.2597) loss 6.4426 (5.9348) grad_norm 2.0748 (2.1176) loss_scale 2048.0000 (2009.7593) mem 9655MB [2024-08-04 05:59:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][250/625] eta 0:01:38 lr 0.000796 wd 0.0500 time 0.2535 (0.2629) data time 0.0009 (0.0028) model time 0.2527 (0.2595) loss 5.9211 (5.9504) grad_norm 1.3092 (2.1116) loss_scale 2048.0000 (2011.2829) mem 9655MB [2024-08-04 05:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][260/625] eta 0:01:35 lr 0.000796 wd 0.0500 time 0.2525 (0.2626) data time 0.0008 (0.0028) model time 0.2518 (0.2592) loss 6.9076 (5.9547) grad_norm 1.6846 (2.1079) loss_scale 2048.0000 (2012.6897) mem 9655MB [2024-08-04 05:59:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][270/625] eta 0:01:33 lr 0.000796 wd 0.0500 time 0.2547 (0.2624) data time 0.0009 (0.0027) model time 0.2538 (0.2591) loss 5.3731 (5.9400) grad_norm 2.0266 (2.1005) loss_scale 2048.0000 (2013.9926) mem 9655MB [2024-08-04 05:59:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][280/625] eta 0:01:30 lr 0.000796 wd 0.0500 time 0.2543 (0.2621) data time 0.0009 (0.0026) model time 0.2534 (0.2589) loss 5.7310 (5.9382) grad_norm 1.4943 (2.1175) loss_scale 2048.0000 (2015.2028) mem 9655MB [2024-08-04 05:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][290/625] eta 0:01:27 lr 0.000796 wd 0.0500 time 0.2532 (0.2620) data time 0.0006 (0.0026) model time 0.2525 (0.2588) loss 6.3689 (5.9354) grad_norm 3.6162 (2.1267) loss_scale 2048.0000 (2016.3299) mem 9655MB [2024-08-04 05:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][300/625] eta 0:01:25 lr 0.000795 wd 0.0500 time 0.2524 (0.2618) data time 0.0010 (0.0025) model time 0.2514 (0.2586) loss 5.7873 (5.9279) grad_norm 3.2792 (2.1441) loss_scale 2048.0000 (2017.3821) mem 9655MB [2024-08-04 05:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][310/625] eta 0:01:22 lr 0.000795 wd 0.0500 time 0.2495 (0.2616) data time 0.0007 (0.0024) model time 0.2488 (0.2585) loss 5.4714 (5.9327) grad_norm 1.8025 (2.1488) loss_scale 2048.0000 (2018.3666) mem 9655MB [2024-08-04 05:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][320/625] eta 0:01:19 lr 0.000795 wd 0.0500 time 0.2592 (0.2615) data time 0.0008 (0.0024) model time 0.2584 (0.2584) loss 6.1644 (5.9375) grad_norm 2.1922 (2.1410) loss_scale 2048.0000 (2019.2897) mem 9655MB [2024-08-04 05:59:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][330/625] eta 0:01:17 lr 0.000795 wd 0.0500 time 0.2546 (0.2613) data time 0.0006 (0.0024) model time 0.2540 (0.2583) loss 5.3440 (5.9357) grad_norm 3.5548 (2.1339) loss_scale 2048.0000 (2020.1571) mem 9655MB [2024-08-04 05:59:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][340/625] eta 0:01:14 lr 0.000795 wd 0.0500 time 0.2572 (0.2612) data time 0.0013 (0.0023) model time 0.2560 (0.2582) loss 6.6842 (5.9369) grad_norm 2.0231 (2.1374) loss_scale 2048.0000 (2020.9736) mem 9655MB [2024-08-04 05:59:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][350/625] eta 0:01:12 lr 0.000794 wd 0.0500 time 0.2513 (0.2621) data time 0.0007 (0.0023) model time 0.2507 (0.2594) loss 5.0967 (5.9397) grad_norm 2.6492 (2.1484) loss_scale 2048.0000 (2021.7436) mem 9655MB [2024-08-04 05:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][360/625] eta 0:01:09 lr 0.000794 wd 0.0500 time 0.2688 (0.2620) data time 0.0008 (0.0022) model time 0.2680 (0.2593) loss 4.8350 (5.9271) grad_norm 1.4879 (2.1458) loss_scale 2048.0000 (2022.4709) mem 9655MB [2024-08-04 05:59:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][370/625] eta 0:01:06 lr 0.000794 wd 0.0500 time 0.2561 (0.2619) data time 0.0012 (0.0022) model time 0.2549 (0.2593) loss 6.4718 (5.9294) grad_norm 1.4973 (2.1356) loss_scale 2048.0000 (2023.1590) mem 9655MB [2024-08-04 05:59:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][380/625] eta 0:01:04 lr 0.000794 wd 0.0500 time 0.2543 (0.2622) data time 0.0009 (0.0022) model time 0.2533 (0.2597) loss 7.0350 (5.9329) grad_norm 1.5872 (2.1215) loss_scale 2048.0000 (2023.8110) mem 9655MB [2024-08-04 05:59:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][390/625] eta 0:01:01 lr 0.000794 wd 0.0500 time 0.2563 (0.2620) data time 0.0008 (0.0021) model time 0.2555 (0.2595) loss 5.8520 (5.9468) grad_norm 1.9910 (2.1118) loss_scale 2048.0000 (2024.4297) mem 9655MB [2024-08-04 05:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][400/625] eta 0:00:58 lr 0.000794 wd 0.0500 time 0.2619 (0.2619) data time 0.0012 (0.0021) model time 0.2607 (0.2595) loss 6.2792 (5.9484) grad_norm 1.5397 (2.1109) loss_scale 2048.0000 (2025.0175) mem 9655MB [2024-08-04 05:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][410/625] eta 0:00:56 lr 0.000793 wd 0.0500 time 0.2530 (0.2618) data time 0.0008 (0.0021) model time 0.2522 (0.2594) loss 6.5292 (5.9551) grad_norm 1.7530 (2.1059) loss_scale 2048.0000 (2025.5766) mem 9655MB [2024-08-04 05:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][420/625] eta 0:00:53 lr 0.000793 wd 0.0500 time 0.2601 (0.2617) data time 0.0009 (0.0020) model time 0.2592 (0.2593) loss 6.8129 (5.9564) grad_norm 1.8578 (inf) loss_scale 1024.0000 (2011.5154) mem 9655MB [2024-08-04 05:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][430/625] eta 0:00:51 lr 0.000793 wd 0.0500 time 0.2552 (0.2616) data time 0.0007 (0.0020) model time 0.2545 (0.2592) loss 5.7869 (5.9514) grad_norm 2.3060 (inf) loss_scale 1024.0000 (1988.6032) mem 9655MB [2024-08-04 05:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][440/625] eta 0:00:48 lr 0.000793 wd 0.0500 time 0.2574 (0.2616) data time 0.0009 (0.0020) model time 0.2566 (0.2592) loss 6.4987 (5.9509) grad_norm 3.2053 (inf) loss_scale 1024.0000 (1966.7302) mem 9655MB [2024-08-04 05:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][450/625] eta 0:00:45 lr 0.000793 wd 0.0500 time 0.2579 (0.2619) data time 0.0008 (0.0020) model time 0.2570 (0.2597) loss 6.1348 (5.9498) grad_norm 2.2978 (inf) loss_scale 1024.0000 (1945.8271) mem 9655MB [2024-08-04 06:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][460/625] eta 0:00:43 lr 0.000793 wd 0.0500 time 0.2571 (0.2623) data time 0.0011 (0.0020) model time 0.2561 (0.2601) loss 7.0291 (5.9459) grad_norm 2.1071 (inf) loss_scale 1024.0000 (1925.8308) mem 9655MB [2024-08-04 06:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][470/625] eta 0:00:40 lr 0.000792 wd 0.0500 time 0.2647 (0.2626) data time 0.0007 (0.0019) model time 0.2640 (0.2605) loss 5.0818 (5.9456) grad_norm 5.5309 (inf) loss_scale 1024.0000 (1906.6837) mem 9655MB [2024-08-04 06:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][480/625] eta 0:00:38 lr 0.000792 wd 0.0500 time 0.2600 (0.2625) data time 0.0011 (0.0019) model time 0.2589 (0.2604) loss 6.1977 (5.9432) grad_norm 1.8431 (inf) loss_scale 1024.0000 (1888.3326) mem 9655MB [2024-08-04 06:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][490/625] eta 0:00:35 lr 0.000792 wd 0.0500 time 0.2455 (0.2623) data time 0.0009 (0.0019) model time 0.2447 (0.2602) loss 5.2553 (5.9509) grad_norm 2.0391 (inf) loss_scale 1024.0000 (1870.7291) mem 9655MB [2024-08-04 06:00:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][500/625] eta 0:00:32 lr 0.000792 wd 0.0500 time 0.2559 (0.2622) data time 0.0007 (0.0019) model time 0.2551 (0.2601) loss 5.9389 (5.9476) grad_norm 2.0891 (inf) loss_scale 1024.0000 (1853.8283) mem 9655MB [2024-08-04 06:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][510/625] eta 0:00:30 lr 0.000792 wd 0.0500 time 0.2546 (0.2621) data time 0.0009 (0.0019) model time 0.2536 (0.2600) loss 5.6312 (5.9445) grad_norm 2.7677 (inf) loss_scale 1024.0000 (1837.5890) mem 9655MB [2024-08-04 06:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][520/625] eta 0:00:27 lr 0.000792 wd 0.0500 time 0.2557 (0.2620) data time 0.0010 (0.0018) model time 0.2547 (0.2599) loss 6.1750 (5.9430) grad_norm 1.9422 (inf) loss_scale 1024.0000 (1821.9731) mem 9655MB [2024-08-04 06:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][530/625] eta 0:00:24 lr 0.000791 wd 0.0500 time 0.2566 (0.2621) data time 0.0014 (0.0018) model time 0.2552 (0.2600) loss 5.3891 (5.9337) grad_norm 1.3287 (inf) loss_scale 1024.0000 (1806.9454) mem 9655MB [2024-08-04 06:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][540/625] eta 0:00:22 lr 0.000791 wd 0.0500 time 0.2516 (0.2620) data time 0.0010 (0.0018) model time 0.2506 (0.2599) loss 6.1567 (5.9277) grad_norm 1.4553 (inf) loss_scale 1024.0000 (1792.4732) mem 9655MB [2024-08-04 06:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][550/625] eta 0:00:19 lr 0.000791 wd 0.0500 time 0.2374 (0.2621) data time 0.0009 (0.0018) model time 0.2365 (0.2601) loss 5.6003 (5.9346) grad_norm 2.4124 (inf) loss_scale 1024.0000 (1778.5263) mem 9655MB [2024-08-04 06:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][560/625] eta 0:00:17 lr 0.000791 wd 0.0500 time 0.2528 (0.2620) data time 0.0007 (0.0018) model time 0.2521 (0.2600) loss 6.3662 (5.9351) grad_norm 2.9607 (inf) loss_scale 1024.0000 (1765.0766) mem 9655MB [2024-08-04 06:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][570/625] eta 0:00:14 lr 0.000791 wd 0.0500 time 0.2560 (0.2619) data time 0.0009 (0.0018) model time 0.2551 (0.2599) loss 5.4125 (5.9350) grad_norm 1.3427 (inf) loss_scale 1024.0000 (1752.0981) mem 9655MB [2024-08-04 06:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][580/625] eta 0:00:11 lr 0.000790 wd 0.0500 time 0.2561 (0.2618) data time 0.0011 (0.0018) model time 0.2549 (0.2598) loss 6.8019 (5.9394) grad_norm 2.0590 (inf) loss_scale 1024.0000 (1739.5663) mem 9655MB [2024-08-04 06:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][590/625] eta 0:00:09 lr 0.000790 wd 0.0500 time 0.2576 (0.2617) data time 0.0008 (0.0017) model time 0.2568 (0.2597) loss 6.0929 (5.9436) grad_norm 1.7372 (inf) loss_scale 1024.0000 (1727.4585) mem 9655MB [2024-08-04 06:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][600/625] eta 0:00:06 lr 0.000790 wd 0.0500 time 0.2578 (0.2619) data time 0.0011 (0.0017) model time 0.2567 (0.2600) loss 6.5015 (5.9510) grad_norm 1.7103 (inf) loss_scale 1024.0000 (1715.7537) mem 9655MB [2024-08-04 06:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][610/625] eta 0:00:03 lr 0.000790 wd 0.0500 time 0.2523 (0.2618) data time 0.0004 (0.0017) model time 0.2519 (0.2599) loss 6.3722 (5.9511) grad_norm 2.6829 (inf) loss_scale 1024.0000 (1704.4321) mem 9655MB [2024-08-04 06:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][620/625] eta 0:00:01 lr 0.000790 wd 0.0500 time 0.2567 (0.2617) data time 0.0006 (0.0017) model time 0.2561 (0.2598) loss 7.1228 (5.9519) grad_norm 2.1954 (inf) loss_scale 1024.0000 (1693.4750) mem 9655MB [2024-08-04 06:00:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 179 training takes 0:02:43 [2024-08-04 06:00:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:00:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.560 (0.560) Loss 0.6318 (0.6318) Acc@1 88.721 (88.721) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 06:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 1.0322 (0.7934) Acc@1 79.248 (84.739) Acc@5 94.385 (97.235) Mem 9655MB [2024-08-04 06:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.1123 (0.9251) Acc@1 74.805 (81.285) Acc@5 94.238 (95.826) Mem 9655MB [2024-08-04 06:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.984 Acc@5 95.825 [2024-08-04 06:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-04 06:00:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.804 (0.804) Loss 0.5854 (0.5854) Acc@1 89.746 (89.746) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9287 (0.7238) Acc@1 80.225 (85.809) Acc@5 95.508 (97.501) Mem 9655MB [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0576 (0.8515) Acc@1 75.635 (82.368) Acc@5 94.482 (96.175) Mem 9655MB [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.072 Acc@5 96.165 [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.07% [2024-08-04 06:00:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:00:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:00:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][0/625] eta 0:06:38 lr 0.000790 wd 0.0500 time 0.6373 (0.6373) data time 0.3912 (0.3912) model time 0.0000 (0.0000) loss 6.0014 (6.0014) grad_norm 2.1142 (2.1142) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][10/625] eta 0:03:11 lr 0.000790 wd 0.0500 time 0.2546 (0.3107) data time 0.0006 (0.0363) model time 0.0000 (0.0000) loss 5.4786 (6.0766) grad_norm 1.9048 (2.4866) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][20/625] eta 0:02:52 lr 0.000789 wd 0.0500 time 0.2563 (0.2847) data time 0.0011 (0.0195) model time 0.0000 (0.0000) loss 6.5637 (5.9001) grad_norm 1.3743 (2.1557) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][30/625] eta 0:02:43 lr 0.000789 wd 0.0500 time 0.2545 (0.2754) data time 0.0009 (0.0135) model time 0.0000 (0.0000) loss 5.7556 (5.7785) grad_norm 2.2767 (2.2511) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][40/625] eta 0:02:39 lr 0.000789 wd 0.0500 time 0.2538 (0.2733) data time 0.0010 (0.0104) model time 0.0000 (0.0000) loss 5.1152 (5.8174) grad_norm 1.9250 (2.1421) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][50/625] eta 0:02:35 lr 0.000789 wd 0.0500 time 0.2525 (0.2699) data time 0.0006 (0.0086) model time 0.0000 (0.0000) loss 5.3458 (5.8120) grad_norm 1.6939 (2.0894) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][60/625] eta 0:02:31 lr 0.000789 wd 0.0500 time 0.2534 (0.2679) data time 0.0009 (0.0073) model time 0.2525 (0.2567) loss 7.4778 (5.8281) grad_norm 2.1887 (2.0648) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][70/625] eta 0:02:27 lr 0.000788 wd 0.0500 time 0.2523 (0.2662) data time 0.0008 (0.0064) model time 0.2515 (0.2556) loss 6.3473 (5.8358) grad_norm 2.2765 (2.0917) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][80/625] eta 0:02:24 lr 0.000788 wd 0.0500 time 0.2532 (0.2651) data time 0.0009 (0.0058) model time 0.2523 (0.2558) loss 5.9352 (5.8595) grad_norm 2.3435 (2.0716) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][90/625] eta 0:02:21 lr 0.000788 wd 0.0500 time 0.2573 (0.2641) data time 0.0006 (0.0053) model time 0.2567 (0.2557) loss 6.7640 (5.8942) grad_norm 1.3444 (2.0255) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][100/625] eta 0:02:18 lr 0.000788 wd 0.0500 time 0.2515 (0.2632) data time 0.0010 (0.0048) model time 0.2505 (0.2554) loss 4.6593 (5.8917) grad_norm 1.8225 (2.0241) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][110/625] eta 0:02:15 lr 0.000788 wd 0.0500 time 0.2584 (0.2627) data time 0.0009 (0.0045) model time 0.2575 (0.2556) loss 6.1906 (5.9015) grad_norm 1.7028 (2.0227) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][120/625] eta 0:02:12 lr 0.000788 wd 0.0500 time 0.2539 (0.2621) data time 0.0010 (0.0042) model time 0.2529 (0.2554) loss 6.3629 (5.9166) grad_norm 4.2770 (2.0091) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][130/625] eta 0:02:09 lr 0.000787 wd 0.0500 time 0.2538 (0.2618) data time 0.0009 (0.0039) model time 0.2529 (0.2556) loss 6.8984 (5.9231) grad_norm 1.4066 (2.0169) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][140/625] eta 0:02:07 lr 0.000787 wd 0.0500 time 0.2509 (0.2625) data time 0.0010 (0.0037) model time 0.2499 (0.2573) loss 5.7908 (5.9220) grad_norm 1.5761 (2.0377) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][150/625] eta 0:02:04 lr 0.000787 wd 0.0500 time 0.2558 (0.2621) data time 0.0010 (0.0036) model time 0.2549 (0.2571) loss 6.1848 (5.9176) grad_norm 1.5921 (2.0268) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][160/625] eta 0:02:01 lr 0.000787 wd 0.0500 time 0.2528 (0.2617) data time 0.0010 (0.0034) model time 0.2518 (0.2569) loss 5.1966 (5.9178) grad_norm 1.8595 (2.0229) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][170/625] eta 0:01:58 lr 0.000787 wd 0.0500 time 0.2547 (0.2614) data time 0.0010 (0.0033) model time 0.2536 (0.2567) loss 6.3254 (5.9079) grad_norm 2.7002 (2.0353) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][180/625] eta 0:01:56 lr 0.000787 wd 0.0500 time 0.2555 (0.2610) data time 0.0009 (0.0032) model time 0.2545 (0.2565) loss 5.9981 (5.8980) grad_norm 1.3275 (2.0350) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][190/625] eta 0:01:53 lr 0.000786 wd 0.0500 time 0.2587 (0.2608) data time 0.0010 (0.0031) model time 0.2577 (0.2565) loss 5.3531 (5.9010) grad_norm 1.7138 (2.0239) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][200/625] eta 0:01:51 lr 0.000786 wd 0.0500 time 0.2569 (0.2616) data time 0.0006 (0.0029) model time 0.2563 (0.2578) loss 5.6821 (5.8908) grad_norm 3.9072 (2.0641) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][210/625] eta 0:01:48 lr 0.000786 wd 0.0500 time 0.2520 (0.2620) data time 0.0008 (0.0028) model time 0.2513 (0.2585) loss 5.8417 (5.8889) grad_norm 2.2084 (2.1064) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][220/625] eta 0:01:46 lr 0.000786 wd 0.0500 time 0.2569 (0.2617) data time 0.0007 (0.0028) model time 0.2563 (0.2583) loss 5.6379 (5.8835) grad_norm 1.5956 (2.1129) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][230/625] eta 0:01:43 lr 0.000786 wd 0.0500 time 0.2575 (0.2615) data time 0.0007 (0.0027) model time 0.2568 (0.2580) loss 4.8939 (5.8912) grad_norm 2.1811 (2.1122) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][240/625] eta 0:01:40 lr 0.000786 wd 0.0500 time 0.2587 (0.2612) data time 0.0007 (0.0026) model time 0.2580 (0.2579) loss 6.6200 (5.9135) grad_norm 2.3056 (2.1139) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][250/625] eta 0:01:37 lr 0.000785 wd 0.0500 time 0.2557 (0.2611) data time 0.0009 (0.0025) model time 0.2548 (0.2578) loss 6.3869 (5.9183) grad_norm 4.7732 (2.1178) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][260/625] eta 0:01:35 lr 0.000785 wd 0.0500 time 0.2607 (0.2614) data time 0.0008 (0.0025) model time 0.2600 (0.2583) loss 5.2709 (5.9235) grad_norm 2.7005 (2.1372) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][270/625] eta 0:01:32 lr 0.000785 wd 0.0500 time 0.2569 (0.2612) data time 0.0008 (0.0024) model time 0.2561 (0.2582) loss 5.2386 (5.9193) grad_norm 1.5403 (2.1496) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][280/625] eta 0:01:30 lr 0.000785 wd 0.0500 time 0.2574 (0.2610) data time 0.0008 (0.0024) model time 0.2567 (0.2581) loss 6.2120 (5.9315) grad_norm 3.2528 (2.1414) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][290/625] eta 0:01:27 lr 0.000785 wd 0.0500 time 0.2519 (0.2609) data time 0.0011 (0.0023) model time 0.2508 (0.2580) loss 5.5176 (5.9327) grad_norm 2.7786 (2.1531) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][300/625] eta 0:01:24 lr 0.000785 wd 0.0500 time 0.2539 (0.2608) data time 0.0009 (0.0023) model time 0.2530 (0.2580) loss 6.7053 (5.9341) grad_norm 2.1945 (2.1554) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][310/625] eta 0:01:22 lr 0.000784 wd 0.0500 time 0.2598 (0.2607) data time 0.0008 (0.0022) model time 0.2589 (0.2579) loss 5.4226 (5.9294) grad_norm 1.5576 (2.1437) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][320/625] eta 0:01:19 lr 0.000784 wd 0.0500 time 0.2688 (0.2606) data time 0.0010 (0.0022) model time 0.2679 (0.2578) loss 5.3828 (5.9320) grad_norm 1.5896 (2.1303) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][330/625] eta 0:01:16 lr 0.000784 wd 0.0500 time 0.2555 (0.2604) data time 0.0007 (0.0022) model time 0.2548 (0.2577) loss 7.5273 (5.9383) grad_norm 2.9296 (2.1314) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][340/625] eta 0:01:14 lr 0.000784 wd 0.0500 time 0.2578 (0.2608) data time 0.0009 (0.0021) model time 0.2569 (0.2582) loss 5.5820 (5.9367) grad_norm 1.7169 (2.1547) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][350/625] eta 0:01:11 lr 0.000784 wd 0.0500 time 0.2528 (0.2612) data time 0.0009 (0.0021) model time 0.2520 (0.2588) loss 4.8991 (5.9397) grad_norm 2.4963 (2.1688) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][360/625] eta 0:01:09 lr 0.000783 wd 0.0500 time 0.2567 (0.2611) data time 0.0008 (0.0021) model time 0.2560 (0.2586) loss 5.7926 (5.9489) grad_norm 2.0750 (2.1710) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][370/625] eta 0:01:06 lr 0.000783 wd 0.0500 time 0.2575 (0.2610) data time 0.0006 (0.0020) model time 0.2568 (0.2585) loss 7.0852 (5.9420) grad_norm 2.2518 (2.1667) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][380/625] eta 0:01:03 lr 0.000783 wd 0.0500 time 0.2661 (0.2609) data time 0.0008 (0.0020) model time 0.2653 (0.2585) loss 6.8987 (5.9388) grad_norm 2.5993 (2.1638) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][390/625] eta 0:01:01 lr 0.000783 wd 0.0500 time 0.2634 (0.2608) data time 0.0010 (0.0020) model time 0.2625 (0.2584) loss 5.9127 (5.9451) grad_norm 1.9279 (2.1674) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][400/625] eta 0:00:58 lr 0.000783 wd 0.0500 time 0.2548 (0.2612) data time 0.0010 (0.0019) model time 0.2537 (0.2589) loss 6.0606 (5.9503) grad_norm 2.4570 (2.1666) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][410/625] eta 0:00:56 lr 0.000783 wd 0.0500 time 0.2571 (0.2611) data time 0.0008 (0.0019) model time 0.2563 (0.2588) loss 6.7806 (5.9529) grad_norm 2.7833 (2.1592) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][420/625] eta 0:00:53 lr 0.000782 wd 0.0500 time 0.2522 (0.2609) data time 0.0008 (0.0019) model time 0.2514 (0.2587) loss 4.5746 (5.9556) grad_norm 1.4169 (2.1620) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][430/625] eta 0:00:50 lr 0.000782 wd 0.0500 time 0.2532 (0.2613) data time 0.0011 (0.0019) model time 0.2521 (0.2592) loss 4.7708 (5.9559) grad_norm 1.6077 (2.1622) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][440/625] eta 0:00:48 lr 0.000782 wd 0.0500 time 0.2543 (0.2612) data time 0.0007 (0.0019) model time 0.2536 (0.2590) loss 4.8721 (5.9612) grad_norm 2.6393 (2.1731) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][450/625] eta 0:00:45 lr 0.000782 wd 0.0500 time 0.2559 (0.2610) data time 0.0008 (0.0018) model time 0.2550 (0.2589) loss 6.1136 (5.9621) grad_norm 2.9328 (2.1964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][460/625] eta 0:00:43 lr 0.000782 wd 0.0500 time 0.2585 (0.2609) data time 0.0009 (0.0018) model time 0.2577 (0.2588) loss 4.9029 (5.9639) grad_norm 2.3018 (2.2070) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][470/625] eta 0:00:40 lr 0.000782 wd 0.0500 time 0.2511 (0.2608) data time 0.0012 (0.0018) model time 0.2499 (0.2587) loss 6.5895 (5.9649) grad_norm 1.7691 (2.2097) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][480/625] eta 0:00:37 lr 0.000781 wd 0.0500 time 0.2510 (0.2607) data time 0.0007 (0.0018) model time 0.2503 (0.2586) loss 6.4198 (5.9669) grad_norm 1.4597 (2.2216) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:02:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][490/625] eta 0:00:35 lr 0.000781 wd 0.0500 time 0.2582 (0.2606) data time 0.0006 (0.0018) model time 0.2576 (0.2586) loss 6.2468 (5.9702) grad_norm 1.9046 (2.2191) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][500/625] eta 0:00:32 lr 0.000781 wd 0.0500 time 0.3890 (0.2608) data time 0.0010 (0.0017) model time 0.3880 (0.2588) loss 6.6803 (5.9711) grad_norm 2.4087 (2.2168) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][510/625] eta 0:00:29 lr 0.000781 wd 0.0500 time 0.2537 (0.2607) data time 0.0017 (0.0017) model time 0.2520 (0.2587) loss 4.5308 (5.9605) grad_norm 1.4123 (2.2103) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][520/625] eta 0:00:27 lr 0.000781 wd 0.0500 time 0.2746 (0.2606) data time 0.0006 (0.0017) model time 0.2739 (0.2586) loss 5.4338 (5.9561) grad_norm 2.1480 (2.2043) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][530/625] eta 0:00:24 lr 0.000781 wd 0.0500 time 0.2566 (0.2605) data time 0.0008 (0.0017) model time 0.2559 (0.2585) loss 6.1938 (5.9595) grad_norm 1.4830 (2.1941) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][540/625] eta 0:00:22 lr 0.000780 wd 0.0500 time 0.2554 (0.2605) data time 0.0010 (0.0017) model time 0.2544 (0.2585) loss 5.4793 (5.9642) grad_norm 2.1203 (2.1905) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][550/625] eta 0:00:19 lr 0.000780 wd 0.0500 time 0.2608 (0.2604) data time 0.0007 (0.0017) model time 0.2601 (0.2585) loss 5.1717 (5.9651) grad_norm 1.7818 (2.1946) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][560/625] eta 0:00:16 lr 0.000780 wd 0.0500 time 0.2534 (0.2607) data time 0.0007 (0.0017) model time 0.2528 (0.2588) loss 5.9714 (5.9645) grad_norm 2.5796 (2.1978) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][570/625] eta 0:00:14 lr 0.000780 wd 0.0500 time 0.2547 (0.2606) data time 0.0007 (0.0017) model time 0.2540 (0.2587) loss 5.0594 (5.9609) grad_norm 2.3384 (2.2031) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][580/625] eta 0:00:11 lr 0.000780 wd 0.0500 time 0.2552 (0.2605) data time 0.0007 (0.0016) model time 0.2545 (0.2586) loss 6.0636 (5.9570) grad_norm 1.9414 (2.2005) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][590/625] eta 0:00:09 lr 0.000779 wd 0.0500 time 0.2536 (0.2604) data time 0.0008 (0.0016) model time 0.2528 (0.2585) loss 6.1518 (5.9614) grad_norm 1.3493 (2.2003) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][600/625] eta 0:00:06 lr 0.000779 wd 0.0500 time 0.2668 (0.2604) data time 0.0006 (0.0016) model time 0.2661 (0.2585) loss 5.2567 (5.9613) grad_norm 1.6524 (2.1931) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][610/625] eta 0:00:03 lr 0.000779 wd 0.0500 time 0.2529 (0.2603) data time 0.0006 (0.0016) model time 0.2523 (0.2584) loss 7.0179 (5.9618) grad_norm 1.4832 (2.1857) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][620/625] eta 0:00:01 lr 0.000779 wd 0.0500 time 0.2574 (0.2602) data time 0.0005 (0.0016) model time 0.2569 (0.2583) loss 6.5924 (5.9645) grad_norm 2.4471 (2.1812) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 180 training takes 0:02:42 [2024-08-04 06:03:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:03:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.6396 (0.6396) Acc@1 89.258 (89.258) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 1.0205 (0.7817) Acc@1 78.369 (85.005) Acc@5 95.312 (97.332) Mem 9655MB [2024-08-04 06:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1445 (0.9213) Acc@1 74.512 (81.406) Acc@5 94.141 (95.859) Mem 9655MB [2024-08-04 06:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.126 Acc@5 95.839 [2024-08-04 06:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-04 06:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.876 (0.876) Loss 0.5850 (0.5850) Acc@1 89.697 (89.697) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 06:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.132) Loss 0.9277 (0.7233) Acc@1 80.322 (85.835) Acc@5 95.508 (97.523) Mem 9655MB [2024-08-04 06:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.095) Loss 1.0586 (0.8508) Acc@1 75.830 (82.408) Acc@5 94.580 (96.208) Mem 9655MB [2024-08-04 06:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.106 Acc@5 96.197 [2024-08-04 06:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 06:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.11% [2024-08-04 06:03:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:03:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][0/625] eta 0:07:54 lr 0.000779 wd 0.0500 time 0.7600 (0.7600) data time 0.5204 (0.5204) model time 0.0000 (0.0000) loss 4.9243 (4.9243) grad_norm 1.8783 (1.8783) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][10/625] eta 0:03:05 lr 0.000779 wd 0.0500 time 0.2576 (0.3015) data time 0.0007 (0.0481) model time 0.0000 (0.0000) loss 6.7670 (5.9006) grad_norm 2.0372 (2.1433) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][20/625] eta 0:02:49 lr 0.000779 wd 0.0500 time 0.2577 (0.2804) data time 0.0006 (0.0256) model time 0.0000 (0.0000) loss 7.1231 (5.9755) grad_norm 1.7186 (2.1441) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][30/625] eta 0:02:48 lr 0.000778 wd 0.0500 time 0.2572 (0.2836) data time 0.0007 (0.0176) model time 0.0000 (0.0000) loss 6.5206 (5.9311) grad_norm 2.1048 (2.0950) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][40/625] eta 0:02:44 lr 0.000778 wd 0.0500 time 0.2603 (0.2819) data time 0.0006 (0.0136) model time 0.0000 (0.0000) loss 6.6490 (5.9034) grad_norm 3.1896 (2.1739) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][50/625] eta 0:02:39 lr 0.000778 wd 0.0500 time 0.2663 (0.2771) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 5.5834 (5.8669) grad_norm 3.5138 (2.2081) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][60/625] eta 0:02:35 lr 0.000778 wd 0.0500 time 0.3722 (0.2756) data time 0.0008 (0.0094) model time 0.3714 (0.2668) loss 4.8936 (5.8686) grad_norm 1.4181 (2.1842) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][70/625] eta 0:02:32 lr 0.000778 wd 0.0500 time 0.2556 (0.2754) data time 0.0006 (0.0083) model time 0.2549 (0.2701) loss 6.5669 (5.8715) grad_norm 1.6895 (2.1756) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][80/625] eta 0:02:28 lr 0.000778 wd 0.0500 time 0.2574 (0.2730) data time 0.0015 (0.0074) model time 0.2560 (0.2649) loss 6.7606 (5.9073) grad_norm 2.3626 (2.2001) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][90/625] eta 0:02:24 lr 0.000777 wd 0.0500 time 0.2515 (0.2710) data time 0.0007 (0.0067) model time 0.2507 (0.2622) loss 5.8661 (5.9230) grad_norm 2.1004 (2.2017) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][100/625] eta 0:02:21 lr 0.000777 wd 0.0500 time 0.2570 (0.2695) data time 0.0011 (0.0061) model time 0.2558 (0.2608) loss 6.1743 (5.9374) grad_norm 2.2728 (2.1625) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][110/625] eta 0:02:18 lr 0.000777 wd 0.0500 time 0.2568 (0.2682) data time 0.0007 (0.0056) model time 0.2561 (0.2597) loss 6.2894 (5.9127) grad_norm 3.0621 (2.1703) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][120/625] eta 0:02:14 lr 0.000777 wd 0.0500 time 0.2549 (0.2672) data time 0.0011 (0.0052) model time 0.2538 (0.2590) loss 5.7053 (5.9334) grad_norm 2.6781 (2.1500) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][130/625] eta 0:02:11 lr 0.000777 wd 0.0500 time 0.2562 (0.2663) data time 0.0008 (0.0049) model time 0.2554 (0.2584) loss 6.5712 (5.9361) grad_norm 1.3136 (2.1377) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][140/625] eta 0:02:09 lr 0.000776 wd 0.0500 time 0.2538 (0.2669) data time 0.0007 (0.0046) model time 0.2531 (0.2602) loss 5.9676 (5.9288) grad_norm 1.7925 (2.1212) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][150/625] eta 0:02:07 lr 0.000776 wd 0.0500 time 0.2519 (0.2675) data time 0.0009 (0.0044) model time 0.2510 (0.2616) loss 6.9284 (5.9592) grad_norm 1.6431 (2.1456) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][160/625] eta 0:02:04 lr 0.000776 wd 0.0500 time 0.2571 (0.2679) data time 0.0008 (0.0042) model time 0.2563 (0.2626) loss 5.0710 (5.9466) grad_norm 3.8921 (2.1603) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][170/625] eta 0:02:01 lr 0.000776 wd 0.0500 time 0.2596 (0.2672) data time 0.0006 (0.0040) model time 0.2590 (0.2620) loss 5.4809 (5.9253) grad_norm 2.3907 (2.1763) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][180/625] eta 0:01:58 lr 0.000776 wd 0.0500 time 0.2549 (0.2666) data time 0.0006 (0.0038) model time 0.2543 (0.2615) loss 5.2700 (5.9070) grad_norm 1.6034 (2.1717) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][190/625] eta 0:01:55 lr 0.000776 wd 0.0500 time 0.2560 (0.2660) data time 0.0009 (0.0037) model time 0.2551 (0.2611) loss 5.6310 (5.8875) grad_norm 1.6612 (2.1714) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][200/625] eta 0:01:52 lr 0.000775 wd 0.0500 time 0.2568 (0.2656) data time 0.0007 (0.0035) model time 0.2561 (0.2608) loss 4.5779 (5.8816) grad_norm 3.8768 (2.1781) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][210/625] eta 0:01:50 lr 0.000775 wd 0.0500 time 0.2549 (0.2651) data time 0.0011 (0.0034) model time 0.2538 (0.2604) loss 6.3504 (5.8783) grad_norm 2.3857 (2.1660) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][220/625] eta 0:01:47 lr 0.000775 wd 0.0500 time 0.2556 (0.2647) data time 0.0006 (0.0033) model time 0.2549 (0.2600) loss 5.2570 (5.8677) grad_norm 2.3009 (2.1738) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][230/625] eta 0:01:44 lr 0.000775 wd 0.0500 time 0.2572 (0.2643) data time 0.0008 (0.0032) model time 0.2564 (0.2598) loss 4.7159 (5.8615) grad_norm 1.6292 (2.1703) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][240/625] eta 0:01:41 lr 0.000775 wd 0.0500 time 0.2536 (0.2640) data time 0.0009 (0.0031) model time 0.2527 (0.2595) loss 6.5013 (5.8608) grad_norm 1.5520 (2.1471) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][250/625] eta 0:01:38 lr 0.000775 wd 0.0500 time 0.2561 (0.2637) data time 0.0007 (0.0030) model time 0.2555 (0.2593) loss 7.2264 (5.8784) grad_norm 2.8426 (2.1408) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][260/625] eta 0:01:36 lr 0.000774 wd 0.0500 time 0.2542 (0.2634) data time 0.0008 (0.0029) model time 0.2533 (0.2591) loss 6.6269 (5.8843) grad_norm 2.8520 (2.1430) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][270/625] eta 0:01:33 lr 0.000774 wd 0.0500 time 0.2546 (0.2631) data time 0.0008 (0.0029) model time 0.2538 (0.2589) loss 5.1470 (5.8873) grad_norm 2.4590 (2.1338) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][280/625] eta 0:01:30 lr 0.000774 wd 0.0500 time 0.2560 (0.2628) data time 0.0009 (0.0028) model time 0.2550 (0.2587) loss 7.2868 (5.8926) grad_norm 1.5029 (2.1279) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][290/625] eta 0:01:27 lr 0.000774 wd 0.0500 time 0.2585 (0.2627) data time 0.0008 (0.0027) model time 0.2577 (0.2587) loss 6.0556 (5.9057) grad_norm 2.3863 (2.1257) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][300/625] eta 0:01:25 lr 0.000774 wd 0.0500 time 0.2576 (0.2631) data time 0.0010 (0.0027) model time 0.2565 (0.2593) loss 6.0367 (5.8987) grad_norm 2.3188 (2.1731) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][310/625] eta 0:01:22 lr 0.000774 wd 0.0500 time 0.2535 (0.2628) data time 0.0005 (0.0026) model time 0.2529 (0.2591) loss 6.6809 (5.9013) grad_norm 3.8938 (2.1968) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][320/625] eta 0:01:20 lr 0.000773 wd 0.0500 time 0.2565 (0.2626) data time 0.0012 (0.0025) model time 0.2553 (0.2590) loss 6.2799 (5.9148) grad_norm 3.9993 (2.2024) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][330/625] eta 0:01:17 lr 0.000773 wd 0.0500 time 0.2604 (0.2624) data time 0.0006 (0.0025) model time 0.2599 (0.2588) loss 4.9382 (5.9099) grad_norm 1.3902 (2.2035) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][340/625] eta 0:01:14 lr 0.000773 wd 0.0500 time 0.2565 (0.2623) data time 0.0006 (0.0025) model time 0.2559 (0.2587) loss 5.7373 (5.9088) grad_norm 2.1343 (2.1955) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][350/625] eta 0:01:12 lr 0.000773 wd 0.0500 time 0.2555 (0.2630) data time 0.0009 (0.0024) model time 0.2546 (0.2597) loss 6.7484 (5.8977) grad_norm 1.6388 (2.1831) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][360/625] eta 0:01:09 lr 0.000773 wd 0.0500 time 0.2556 (0.2629) data time 0.0009 (0.0024) model time 0.2547 (0.2596) loss 5.3398 (5.8957) grad_norm 2.4534 (2.1712) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][370/625] eta 0:01:06 lr 0.000773 wd 0.0500 time 0.2554 (0.2627) data time 0.0010 (0.0023) model time 0.2545 (0.2595) loss 6.9940 (5.9077) grad_norm 1.9136 (2.1658) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][380/625] eta 0:01:04 lr 0.000772 wd 0.0500 time 0.2550 (0.2626) data time 0.0008 (0.0023) model time 0.2542 (0.2594) loss 5.5246 (5.9031) grad_norm 1.8612 (2.1673) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][390/625] eta 0:01:01 lr 0.000772 wd 0.0500 time 0.2540 (0.2625) data time 0.0010 (0.0023) model time 0.2531 (0.2593) loss 5.0586 (5.9069) grad_norm 2.9673 (2.1639) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][400/625] eta 0:00:59 lr 0.000772 wd 0.0500 time 0.2587 (0.2628) data time 0.0009 (0.0022) model time 0.2578 (0.2598) loss 5.0180 (5.9083) grad_norm 2.7571 (2.1661) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][410/625] eta 0:00:56 lr 0.000772 wd 0.0500 time 0.2563 (0.2626) data time 0.0007 (0.0022) model time 0.2556 (0.2596) loss 6.4685 (5.9124) grad_norm 1.2394 (2.1588) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][420/625] eta 0:00:53 lr 0.000772 wd 0.0500 time 0.2594 (0.2625) data time 0.0008 (0.0022) model time 0.2586 (0.2595) loss 6.8968 (5.9137) grad_norm 1.5559 (2.1470) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][430/625] eta 0:00:51 lr 0.000771 wd 0.0500 time 0.2530 (0.2623) data time 0.0009 (0.0021) model time 0.2521 (0.2594) loss 6.1429 (5.9148) grad_norm 3.1264 (2.1547) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][440/625] eta 0:00:48 lr 0.000771 wd 0.0500 time 0.2537 (0.2622) data time 0.0009 (0.0021) model time 0.2527 (0.2593) loss 6.3620 (5.9158) grad_norm 2.8830 (2.1653) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][450/625] eta 0:00:45 lr 0.000771 wd 0.0500 time 0.2566 (0.2624) data time 0.0007 (0.0021) model time 0.2559 (0.2596) loss 6.6647 (5.9211) grad_norm 4.7710 (2.1752) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][460/625] eta 0:00:43 lr 0.000771 wd 0.0500 time 0.2556 (0.2627) data time 0.0007 (0.0021) model time 0.2549 (0.2599) loss 6.5998 (5.9337) grad_norm 4.4914 (2.2113) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][470/625] eta 0:00:40 lr 0.000771 wd 0.0500 time 0.2588 (0.2626) data time 0.0009 (0.0020) model time 0.2579 (0.2599) loss 6.5596 (5.9434) grad_norm 2.9094 (2.2174) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][480/625] eta 0:00:38 lr 0.000771 wd 0.0500 time 0.2562 (0.2624) data time 0.0008 (0.0020) model time 0.2554 (0.2597) loss 6.3251 (5.9563) grad_norm 2.6974 (2.2176) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][490/625] eta 0:00:35 lr 0.000770 wd 0.0500 time 0.2537 (0.2627) data time 0.0010 (0.0020) model time 0.2526 (0.2601) loss 6.4771 (5.9619) grad_norm 3.1436 (2.2360) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][500/625] eta 0:00:32 lr 0.000770 wd 0.0500 time 0.2566 (0.2628) data time 0.0009 (0.0020) model time 0.2557 (0.2603) loss 5.5966 (5.9593) grad_norm 3.3279 (2.2465) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][510/625] eta 0:00:30 lr 0.000770 wd 0.0500 time 0.2601 (0.2627) data time 0.0009 (0.0019) model time 0.2592 (0.2602) loss 7.0273 (5.9567) grad_norm 4.7852 (2.2569) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][520/625] eta 0:00:27 lr 0.000770 wd 0.0500 time 0.2528 (0.2626) data time 0.0010 (0.0019) model time 0.2518 (0.2601) loss 6.7403 (5.9597) grad_norm 1.9798 (2.2535) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][530/625] eta 0:00:24 lr 0.000770 wd 0.0500 time 0.2566 (0.2624) data time 0.0010 (0.0019) model time 0.2556 (0.2600) loss 6.1195 (5.9644) grad_norm 3.1950 (2.2607) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][540/625] eta 0:00:22 lr 0.000770 wd 0.0500 time 0.2549 (0.2627) data time 0.0006 (0.0019) model time 0.2543 (0.2602) loss 5.8416 (5.9658) grad_norm 2.4174 (2.2555) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][550/625] eta 0:00:19 lr 0.000769 wd 0.0500 time 0.2554 (0.2626) data time 0.0009 (0.0019) model time 0.2544 (0.2601) loss 5.6488 (5.9663) grad_norm 2.4501 (2.2532) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][560/625] eta 0:00:17 lr 0.000769 wd 0.0500 time 0.2618 (0.2624) data time 0.0008 (0.0019) model time 0.2610 (0.2600) loss 6.5503 (5.9676) grad_norm 2.3263 (2.2535) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][570/625] eta 0:00:14 lr 0.000769 wd 0.0500 time 0.2574 (0.2623) data time 0.0008 (0.0018) model time 0.2566 (0.2600) loss 6.8180 (5.9718) grad_norm 3.9587 (2.2626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][580/625] eta 0:00:11 lr 0.000769 wd 0.0500 time 0.2609 (0.2622) data time 0.0007 (0.0018) model time 0.2602 (0.2599) loss 6.8502 (5.9758) grad_norm 1.6770 (2.2662) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][590/625] eta 0:00:09 lr 0.000769 wd 0.0500 time 0.2563 (0.2621) data time 0.0008 (0.0018) model time 0.2554 (0.2598) loss 6.1617 (5.9743) grad_norm 2.1064 (2.2645) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][600/625] eta 0:00:06 lr 0.000769 wd 0.0500 time 0.2578 (0.2620) data time 0.0006 (0.0018) model time 0.2572 (0.2597) loss 6.8391 (5.9729) grad_norm 2.0729 (2.2583) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][610/625] eta 0:00:03 lr 0.000768 wd 0.0500 time 0.2523 (0.2619) data time 0.0006 (0.0018) model time 0.2517 (0.2596) loss 5.7581 (5.9708) grad_norm 2.8903 (2.2549) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][620/625] eta 0:00:01 lr 0.000768 wd 0.0500 time 0.2531 (0.2618) data time 0.0003 (0.0018) model time 0.2527 (0.2595) loss 5.3173 (5.9674) grad_norm 1.4689 (2.2478) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 181 training takes 0:02:43 [2024-08-04 06:06:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:06:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.6309 (0.6309) Acc@1 88.574 (88.574) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9814 (0.7757) Acc@1 78.906 (85.116) Acc@5 95.215 (97.301) Mem 9655MB [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0928 (0.9036) Acc@1 75.342 (81.731) Acc@5 94.092 (95.896) Mem 9655MB [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.344 Acc@5 95.883 [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.34% [2024-08-04 06:06:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:06:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5845 (0.5845) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:06:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 0.9272 (0.7227) Acc@1 80.322 (85.840) Acc@5 95.557 (97.541) Mem 9655MB [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0576 (0.8501) Acc@1 75.879 (82.424) Acc@5 94.580 (96.222) Mem 9655MB [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.112 Acc@5 96.209 [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.11% [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:06:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][0/625] eta 0:07:07 lr 0.000768 wd 0.0500 time 0.6840 (0.6840) data time 0.4407 (0.4407) model time 0.0000 (0.0000) loss 5.0471 (5.0471) grad_norm 1.4728 (1.4728) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][10/625] eta 0:03:02 lr 0.000768 wd 0.0500 time 0.2493 (0.2965) data time 0.0007 (0.0409) model time 0.0000 (0.0000) loss 7.5500 (6.0607) grad_norm 1.4062 (1.7726) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][20/625] eta 0:02:54 lr 0.000768 wd 0.0500 time 0.2567 (0.2877) data time 0.0007 (0.0219) model time 0.0000 (0.0000) loss 6.6540 (6.0125) grad_norm 2.1074 (1.8765) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][30/625] eta 0:02:45 lr 0.000768 wd 0.0500 time 0.2558 (0.2776) data time 0.0006 (0.0151) model time 0.0000 (0.0000) loss 5.5861 (5.8840) grad_norm 3.7178 (2.2079) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][40/625] eta 0:02:40 lr 0.000767 wd 0.0500 time 0.2553 (0.2745) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 6.7562 (5.9254) grad_norm 3.3590 (2.2173) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][50/625] eta 0:02:35 lr 0.000767 wd 0.0500 time 0.2541 (0.2707) data time 0.0013 (0.0096) model time 0.0000 (0.0000) loss 5.7206 (5.9847) grad_norm 1.5795 (2.1690) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][60/625] eta 0:02:31 lr 0.000767 wd 0.0500 time 0.2571 (0.2687) data time 0.0008 (0.0081) model time 0.2563 (0.2574) loss 4.9727 (5.9634) grad_norm 2.2531 (2.1428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][70/625] eta 0:02:28 lr 0.000767 wd 0.0500 time 0.2552 (0.2668) data time 0.0010 (0.0071) model time 0.2542 (0.2559) loss 5.9912 (5.9696) grad_norm 3.2415 (2.1143) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][80/625] eta 0:02:24 lr 0.000767 wd 0.0500 time 0.2572 (0.2653) data time 0.0007 (0.0064) model time 0.2566 (0.2552) loss 7.5587 (6.0562) grad_norm 1.6854 (2.1129) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][90/625] eta 0:02:21 lr 0.000767 wd 0.0500 time 0.2534 (0.2643) data time 0.0007 (0.0058) model time 0.2527 (0.2552) loss 4.7512 (6.0468) grad_norm 2.7610 (2.1123) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][100/625] eta 0:02:19 lr 0.000766 wd 0.0500 time 0.4683 (0.2655) data time 0.0010 (0.0053) model time 0.4673 (0.2592) loss 6.4116 (6.0852) grad_norm 1.5109 (2.1811) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][110/625] eta 0:02:17 lr 0.000766 wd 0.0500 time 0.2574 (0.2665) data time 0.0010 (0.0049) model time 0.2564 (0.2619) loss 6.1136 (6.0629) grad_norm 1.6845 (2.1477) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][120/625] eta 0:02:14 lr 0.000766 wd 0.0500 time 0.2550 (0.2672) data time 0.0008 (0.0046) model time 0.2542 (0.2637) loss 5.2480 (6.0410) grad_norm 1.4306 (2.1464) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][130/625] eta 0:02:11 lr 0.000766 wd 0.0500 time 0.2552 (0.2664) data time 0.0008 (0.0043) model time 0.2544 (0.2627) loss 5.6486 (6.0318) grad_norm 1.8677 (2.1503) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][140/625] eta 0:02:08 lr 0.000766 wd 0.0500 time 0.2564 (0.2657) data time 0.0009 (0.0041) model time 0.2555 (0.2619) loss 5.9953 (6.0479) grad_norm 1.7681 (2.1227) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][150/625] eta 0:02:06 lr 0.000766 wd 0.0500 time 0.2535 (0.2660) data time 0.0011 (0.0039) model time 0.2525 (0.2626) loss 6.9887 (6.0475) grad_norm 1.6883 (2.1032) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][160/625] eta 0:02:03 lr 0.000765 wd 0.0500 time 0.2560 (0.2654) data time 0.0008 (0.0037) model time 0.2551 (0.2619) loss 5.9730 (6.0473) grad_norm 2.3783 (2.0823) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][170/625] eta 0:02:00 lr 0.000765 wd 0.0500 time 0.2518 (0.2648) data time 0.0007 (0.0035) model time 0.2511 (0.2613) loss 6.1773 (6.0591) grad_norm 3.0219 (2.0971) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][180/625] eta 0:01:57 lr 0.000765 wd 0.0500 time 0.2626 (0.2643) data time 0.0006 (0.0034) model time 0.2620 (0.2609) loss 5.0408 (6.0529) grad_norm 3.5394 (2.1198) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][190/625] eta 0:01:54 lr 0.000765 wd 0.0500 time 0.2557 (0.2639) data time 0.0008 (0.0033) model time 0.2550 (0.2604) loss 7.4859 (6.0665) grad_norm 2.1571 (2.1391) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][200/625] eta 0:01:51 lr 0.000765 wd 0.0500 time 0.2559 (0.2635) data time 0.0012 (0.0031) model time 0.2547 (0.2601) loss 4.8378 (6.0467) grad_norm 1.5511 (2.1441) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][210/625] eta 0:01:49 lr 0.000764 wd 0.0500 time 0.2538 (0.2632) data time 0.0010 (0.0030) model time 0.2528 (0.2598) loss 6.5559 (6.0483) grad_norm 1.7951 (2.1398) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][220/625] eta 0:01:46 lr 0.000764 wd 0.0500 time 0.2533 (0.2628) data time 0.0008 (0.0029) model time 0.2525 (0.2595) loss 6.0066 (6.0341) grad_norm 3.5888 (2.1378) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][230/625] eta 0:01:43 lr 0.000764 wd 0.0500 time 0.2521 (0.2625) data time 0.0007 (0.0028) model time 0.2514 (0.2593) loss 5.2593 (6.0124) grad_norm 1.7226 (2.1521) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][240/625] eta 0:01:40 lr 0.000764 wd 0.0500 time 0.2557 (0.2623) data time 0.0009 (0.0028) model time 0.2548 (0.2591) loss 6.0279 (6.0110) grad_norm 1.5442 (2.1642) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][250/625] eta 0:01:38 lr 0.000764 wd 0.0500 time 0.2526 (0.2620) data time 0.0010 (0.0027) model time 0.2516 (0.2589) loss 5.1253 (6.0104) grad_norm 1.6999 (2.1577) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][260/625] eta 0:01:35 lr 0.000764 wd 0.0500 time 0.2560 (0.2617) data time 0.0007 (0.0026) model time 0.2553 (0.2586) loss 5.0498 (6.0040) grad_norm 1.7937 (2.1485) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][270/625] eta 0:01:32 lr 0.000763 wd 0.0500 time 0.2539 (0.2615) data time 0.0007 (0.0026) model time 0.2532 (0.2584) loss 7.1940 (6.0172) grad_norm 2.5168 (2.1519) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][280/625] eta 0:01:30 lr 0.000763 wd 0.0500 time 0.2528 (0.2613) data time 0.0010 (0.0025) model time 0.2519 (0.2582) loss 6.6815 (6.0187) grad_norm 1.4205 (2.1842) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][290/625] eta 0:01:27 lr 0.000763 wd 0.0500 time 0.2549 (0.2611) data time 0.0011 (0.0025) model time 0.2538 (0.2581) loss 7.1878 (6.0214) grad_norm 1.8192 (2.1869) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][300/625] eta 0:01:25 lr 0.000763 wd 0.0500 time 0.2578 (0.2617) data time 0.0006 (0.0024) model time 0.2573 (0.2589) loss 5.3890 (6.0291) grad_norm 2.3117 (2.1863) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][310/625] eta 0:01:22 lr 0.000763 wd 0.0500 time 0.2508 (0.2614) data time 0.0006 (0.0024) model time 0.2501 (0.2587) loss 7.0161 (6.0261) grad_norm 2.1364 (2.1757) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][320/625] eta 0:01:19 lr 0.000763 wd 0.0500 time 0.2529 (0.2613) data time 0.0009 (0.0023) model time 0.2521 (0.2586) loss 6.4066 (6.0312) grad_norm 1.6171 (2.1620) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][330/625] eta 0:01:17 lr 0.000762 wd 0.0500 time 0.2534 (0.2617) data time 0.0010 (0.0023) model time 0.2525 (0.2591) loss 5.8746 (6.0231) grad_norm 2.2565 (2.1779) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][340/625] eta 0:01:14 lr 0.000762 wd 0.0500 time 0.2606 (0.2620) data time 0.0008 (0.0022) model time 0.2599 (0.2596) loss 6.4897 (6.0091) grad_norm 2.8202 (2.1742) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][350/625] eta 0:01:12 lr 0.000762 wd 0.0500 time 0.2534 (0.2622) data time 0.0007 (0.0022) model time 0.2527 (0.2598) loss 5.3477 (6.0052) grad_norm 4.6396 (2.1699) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][360/625] eta 0:01:09 lr 0.000762 wd 0.0500 time 0.2545 (0.2620) data time 0.0007 (0.0022) model time 0.2538 (0.2597) loss 5.0833 (5.9922) grad_norm 2.7858 (2.1697) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][370/625] eta 0:01:06 lr 0.000762 wd 0.0500 time 0.2523 (0.2624) data time 0.0008 (0.0021) model time 0.2515 (0.2602) loss 5.9648 (6.0014) grad_norm 2.2804 (2.1769) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][380/625] eta 0:01:04 lr 0.000762 wd 0.0500 time 0.2578 (0.2622) data time 0.0008 (0.0021) model time 0.2570 (0.2600) loss 4.1344 (5.9966) grad_norm 1.7327 (2.1727) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][390/625] eta 0:01:01 lr 0.000761 wd 0.0500 time 0.2544 (0.2621) data time 0.0008 (0.0021) model time 0.2536 (0.2598) loss 4.9836 (5.9941) grad_norm 1.4473 (2.1662) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][400/625] eta 0:00:58 lr 0.000761 wd 0.0500 time 0.2542 (0.2619) data time 0.0007 (0.0020) model time 0.2535 (0.2597) loss 6.9415 (5.9946) grad_norm 2.6240 (2.1647) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][410/625] eta 0:00:56 lr 0.000761 wd 0.0500 time 0.2562 (0.2618) data time 0.0007 (0.0020) model time 0.2555 (0.2596) loss 7.4447 (5.9956) grad_norm 2.2505 (2.1656) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][420/625] eta 0:00:53 lr 0.000761 wd 0.0500 time 0.2516 (0.2623) data time 0.0007 (0.0020) model time 0.2509 (0.2602) loss 6.4360 (5.9949) grad_norm 1.9508 (2.1615) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][430/625] eta 0:00:51 lr 0.000761 wd 0.0500 time 0.2568 (0.2622) data time 0.0007 (0.0019) model time 0.2561 (0.2601) loss 5.2410 (5.9927) grad_norm 2.4665 (2.1557) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][440/625] eta 0:00:48 lr 0.000761 wd 0.0500 time 0.2568 (0.2620) data time 0.0008 (0.0019) model time 0.2561 (0.2600) loss 5.9803 (5.9891) grad_norm 3.5909 (2.1583) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][450/625] eta 0:00:45 lr 0.000760 wd 0.0500 time 0.2559 (0.2622) data time 0.0010 (0.0019) model time 0.2549 (0.2601) loss 6.7569 (5.9901) grad_norm 2.4458 (2.1608) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][460/625] eta 0:00:43 lr 0.000760 wd 0.0500 time 0.2528 (0.2620) data time 0.0010 (0.0019) model time 0.2518 (0.2600) loss 5.6628 (5.9942) grad_norm 2.4035 (2.1647) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][470/625] eta 0:00:40 lr 0.000760 wd 0.0500 time 0.2528 (0.2619) data time 0.0006 (0.0019) model time 0.2521 (0.2599) loss 6.7586 (5.9945) grad_norm 1.8619 (2.1606) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][480/625] eta 0:00:37 lr 0.000760 wd 0.0500 time 0.2565 (0.2618) data time 0.0008 (0.0018) model time 0.2557 (0.2599) loss 5.7924 (5.9932) grad_norm 1.6654 (2.1626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][490/625] eta 0:00:35 lr 0.000760 wd 0.0500 time 0.2570 (0.2617) data time 0.0008 (0.0018) model time 0.2562 (0.2597) loss 6.2577 (5.9906) grad_norm 1.9286 (2.1695) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][500/625] eta 0:00:32 lr 0.000759 wd 0.0500 time 0.2550 (0.2616) data time 0.0009 (0.0018) model time 0.2541 (0.2596) loss 5.8123 (5.9924) grad_norm 2.9001 (2.1637) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][510/625] eta 0:00:30 lr 0.000759 wd 0.0500 time 0.2559 (0.2615) data time 0.0012 (0.0018) model time 0.2548 (0.2595) loss 6.5173 (5.9949) grad_norm 1.3097 (2.1578) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][520/625] eta 0:00:27 lr 0.000759 wd 0.0500 time 0.2616 (0.2614) data time 0.0009 (0.0018) model time 0.2607 (0.2595) loss 5.9609 (5.9932) grad_norm 1.6590 (2.1513) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][530/625] eta 0:00:24 lr 0.000759 wd 0.0500 time 0.2642 (0.2613) data time 0.0010 (0.0018) model time 0.2632 (0.2594) loss 5.4471 (5.9908) grad_norm 1.5154 (2.1531) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][540/625] eta 0:00:22 lr 0.000759 wd 0.0500 time 0.2571 (0.2612) data time 0.0008 (0.0017) model time 0.2563 (0.2593) loss 5.5412 (5.9961) grad_norm 2.7772 (2.1510) loss_scale 2048.0000 (1025.8928) mem 9655MB [2024-08-04 06:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][550/625] eta 0:00:19 lr 0.000759 wd 0.0500 time 0.2499 (0.2611) data time 0.0010 (0.0017) model time 0.2490 (0.2592) loss 6.2190 (5.9948) grad_norm 1.9439 (2.1492) loss_scale 2048.0000 (1044.4428) mem 9655MB [2024-08-04 06:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][560/625] eta 0:00:16 lr 0.000758 wd 0.0500 time 0.2752 (0.2611) data time 0.0008 (0.0017) model time 0.2743 (0.2592) loss 6.5258 (5.9964) grad_norm 1.8533 (2.1482) loss_scale 2048.0000 (1062.3316) mem 9655MB [2024-08-04 06:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][570/625] eta 0:00:14 lr 0.000758 wd 0.0500 time 0.2543 (0.2611) data time 0.0010 (0.0017) model time 0.2534 (0.2592) loss 6.5158 (5.9954) grad_norm 1.6642 (2.1497) loss_scale 2048.0000 (1079.5937) mem 9655MB [2024-08-04 06:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][580/625] eta 0:00:11 lr 0.000758 wd 0.0500 time 0.2597 (0.2610) data time 0.0006 (0.0017) model time 0.2592 (0.2591) loss 5.2081 (5.9869) grad_norm 1.4462 (2.1509) loss_scale 2048.0000 (1096.2616) mem 9655MB [2024-08-04 06:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][590/625] eta 0:00:09 lr 0.000758 wd 0.0500 time 0.2491 (0.2609) data time 0.0010 (0.0017) model time 0.2481 (0.2590) loss 6.3868 (5.9825) grad_norm 2.1339 (2.1450) loss_scale 2048.0000 (1112.3655) mem 9655MB [2024-08-04 06:09:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][600/625] eta 0:00:06 lr 0.000758 wd 0.0500 time 0.2566 (0.2612) data time 0.0009 (0.0017) model time 0.2557 (0.2593) loss 6.0138 (5.9817) grad_norm 3.2254 (2.1427) loss_scale 2048.0000 (1127.9334) mem 9655MB [2024-08-04 06:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][610/625] eta 0:00:03 lr 0.000758 wd 0.0500 time 0.2520 (0.2611) data time 0.0004 (0.0017) model time 0.2516 (0.2593) loss 6.2757 (5.9869) grad_norm 1.5739 (2.1445) loss_scale 2048.0000 (1142.9918) mem 9655MB [2024-08-04 06:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][620/625] eta 0:00:01 lr 0.000757 wd 0.0500 time 0.2529 (0.2610) data time 0.0004 (0.0016) model time 0.2525 (0.2591) loss 5.4724 (5.9899) grad_norm 2.0742 (2.1609) loss_scale 2048.0000 (1157.5652) mem 9655MB [2024-08-04 06:09:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 182 training takes 0:02:43 [2024-08-04 06:09:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:09:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.6416 (0.6416) Acc@1 89.111 (89.111) Acc@5 98.340 (98.340) Mem 9655MB [2024-08-04 06:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9849 (0.7890) Acc@1 79.150 (85.067) Acc@5 95.752 (97.332) Mem 9655MB [2024-08-04 06:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.1338 (0.9231) Acc@1 75.244 (81.541) Acc@5 93.652 (95.903) Mem 9655MB [2024-08-04 06:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.310 Acc@5 95.905 [2024-08-04 06:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-04 06:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.702 (0.702) Loss 0.5840 (0.5840) Acc@1 89.600 (89.600) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9272 (0.7223) Acc@1 80.322 (85.866) Acc@5 95.654 (97.550) Mem 9655MB [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0576 (0.8497) Acc@1 75.732 (82.466) Acc@5 94.531 (96.229) Mem 9655MB [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.156 Acc@5 96.215 [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.16% [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:09:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][0/625] eta 0:07:17 lr 0.000757 wd 0.0500 time 0.6995 (0.6995) data time 0.4473 (0.4473) model time 0.0000 (0.0000) loss 6.4351 (6.4351) grad_norm 2.2163 (2.2163) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][10/625] eta 0:03:02 lr 0.000757 wd 0.0500 time 0.2558 (0.2967) data time 0.0008 (0.0415) model time 0.0000 (0.0000) loss 6.2163 (5.9532) grad_norm 2.4841 (2.8837) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][20/625] eta 0:02:47 lr 0.000757 wd 0.0500 time 0.2563 (0.2770) data time 0.0006 (0.0221) model time 0.0000 (0.0000) loss 6.8352 (5.8020) grad_norm 1.9330 (2.7754) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][30/625] eta 0:02:43 lr 0.000757 wd 0.0500 time 0.2583 (0.2756) data time 0.0008 (0.0153) model time 0.0000 (0.0000) loss 5.7005 (5.8784) grad_norm 2.5714 (2.6641) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][40/625] eta 0:02:40 lr 0.000757 wd 0.0500 time 0.2565 (0.2750) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 6.4319 (5.8057) grad_norm 1.3850 (2.4122) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][50/625] eta 0:02:36 lr 0.000756 wd 0.0500 time 0.2579 (0.2714) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 6.2079 (5.8263) grad_norm 1.9934 (2.2604) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][60/625] eta 0:02:31 lr 0.000756 wd 0.0500 time 0.2554 (0.2687) data time 0.0010 (0.0082) model time 0.2544 (0.2542) loss 6.0013 (5.9010) grad_norm 4.5867 (2.2812) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][70/625] eta 0:02:29 lr 0.000756 wd 0.0500 time 0.2548 (0.2697) data time 0.0008 (0.0072) model time 0.2540 (0.2645) loss 6.2845 (5.8875) grad_norm 1.5452 (2.2625) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][80/625] eta 0:02:26 lr 0.000756 wd 0.0500 time 0.2557 (0.2680) data time 0.0010 (0.0064) model time 0.2547 (0.2613) loss 5.2866 (5.9166) grad_norm 1.6073 (2.2219) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][90/625] eta 0:02:22 lr 0.000756 wd 0.0500 time 0.2600 (0.2667) data time 0.0005 (0.0058) model time 0.2595 (0.2599) loss 5.8866 (5.9638) grad_norm 2.6915 (2.2635) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][100/625] eta 0:02:19 lr 0.000756 wd 0.0500 time 0.2570 (0.2657) data time 0.0007 (0.0053) model time 0.2563 (0.2590) loss 5.1245 (5.9542) grad_norm 2.3864 (2.3653) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][110/625] eta 0:02:16 lr 0.000755 wd 0.0500 time 0.2585 (0.2648) data time 0.0006 (0.0049) model time 0.2578 (0.2582) loss 5.7340 (5.9607) grad_norm 1.4479 (2.3218) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][120/625] eta 0:02:13 lr 0.000755 wd 0.0500 time 0.2689 (0.2641) data time 0.0009 (0.0046) model time 0.2680 (0.2579) loss 5.2396 (5.9493) grad_norm 1.3239 (2.3235) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][130/625] eta 0:02:10 lr 0.000755 wd 0.0500 time 0.2608 (0.2634) data time 0.0006 (0.0043) model time 0.2602 (0.2574) loss 7.3943 (5.9620) grad_norm 2.2520 (2.2852) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][140/625] eta 0:02:07 lr 0.000755 wd 0.0500 time 0.2562 (0.2629) data time 0.0005 (0.0041) model time 0.2557 (0.2572) loss 6.3611 (5.9759) grad_norm 1.3187 (2.2612) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][150/625] eta 0:02:05 lr 0.000755 wd 0.0500 time 0.2602 (0.2637) data time 0.0006 (0.0039) model time 0.2596 (0.2589) loss 5.9037 (5.9581) grad_norm 2.4924 (2.2666) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][160/625] eta 0:02:02 lr 0.000755 wd 0.0500 time 0.2559 (0.2632) data time 0.0009 (0.0037) model time 0.2550 (0.2585) loss 6.2837 (5.9493) grad_norm 2.1592 (2.2854) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][170/625] eta 0:01:59 lr 0.000754 wd 0.0500 time 0.2569 (0.2628) data time 0.0007 (0.0035) model time 0.2562 (0.2582) loss 7.0010 (5.9556) grad_norm 1.8685 (2.2887) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][180/625] eta 0:01:57 lr 0.000754 wd 0.0500 time 0.2549 (0.2642) data time 0.0009 (0.0034) model time 0.2540 (0.2605) loss 6.4263 (5.9766) grad_norm 1.8296 (2.2937) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][190/625] eta 0:01:54 lr 0.000754 wd 0.0500 time 0.2577 (0.2638) data time 0.0006 (0.0033) model time 0.2571 (0.2601) loss 6.0946 (5.9812) grad_norm 1.9566 (2.2774) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][200/625] eta 0:01:51 lr 0.000754 wd 0.0500 time 0.2573 (0.2635) data time 0.0009 (0.0032) model time 0.2564 (0.2598) loss 6.3536 (5.9829) grad_norm 2.4952 (2.2855) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][210/625] eta 0:01:49 lr 0.000754 wd 0.0500 time 0.2572 (0.2631) data time 0.0007 (0.0030) model time 0.2564 (0.2595) loss 6.3013 (5.9850) grad_norm 3.0251 (2.3054) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][220/625] eta 0:01:46 lr 0.000754 wd 0.0500 time 0.2570 (0.2628) data time 0.0009 (0.0030) model time 0.2561 (0.2592) loss 6.6636 (5.9799) grad_norm 2.0134 (2.2940) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][230/625] eta 0:01:43 lr 0.000753 wd 0.0500 time 0.2620 (0.2625) data time 0.0006 (0.0029) model time 0.2614 (0.2590) loss 6.5331 (5.9789) grad_norm 1.3727 (2.2954) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][240/625] eta 0:01:41 lr 0.000753 wd 0.0500 time 0.2560 (0.2630) data time 0.0007 (0.0028) model time 0.2553 (0.2598) loss 6.8600 (5.9851) grad_norm 1.8797 (2.3008) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][250/625] eta 0:01:38 lr 0.000753 wd 0.0500 time 0.2655 (0.2628) data time 0.0006 (0.0027) model time 0.2649 (0.2597) loss 5.4812 (5.9659) grad_norm 1.4795 (2.3149) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][260/625] eta 0:01:35 lr 0.000753 wd 0.0500 time 0.2573 (0.2625) data time 0.0006 (0.0026) model time 0.2567 (0.2594) loss 4.8751 (5.9685) grad_norm 1.9031 (2.3098) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][270/625] eta 0:01:33 lr 0.000753 wd 0.0500 time 0.2584 (0.2623) data time 0.0006 (0.0026) model time 0.2578 (0.2592) loss 6.0524 (5.9617) grad_norm 1.7270 (2.3034) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][280/625] eta 0:01:30 lr 0.000753 wd 0.0500 time 0.2584 (0.2621) data time 0.0007 (0.0025) model time 0.2577 (0.2590) loss 6.6176 (5.9599) grad_norm 3.7814 (2.3001) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][290/625] eta 0:01:27 lr 0.000752 wd 0.0500 time 0.2589 (0.2619) data time 0.0008 (0.0025) model time 0.2581 (0.2589) loss 6.3620 (5.9576) grad_norm 2.4585 (2.2854) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][300/625] eta 0:01:25 lr 0.000752 wd 0.0500 time 0.2577 (0.2617) data time 0.0009 (0.0024) model time 0.2568 (0.2587) loss 6.4187 (5.9592) grad_norm 2.7123 (2.2790) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][310/625] eta 0:01:22 lr 0.000752 wd 0.0500 time 0.2594 (0.2615) data time 0.0007 (0.0024) model time 0.2587 (0.2586) loss 7.0612 (5.9496) grad_norm 1.6885 (2.2667) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][320/625] eta 0:01:19 lr 0.000752 wd 0.0500 time 0.2597 (0.2614) data time 0.0008 (0.0023) model time 0.2589 (0.2585) loss 6.6276 (5.9589) grad_norm 1.9355 (2.2595) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][330/625] eta 0:01:17 lr 0.000752 wd 0.0500 time 0.2557 (0.2612) data time 0.0009 (0.0023) model time 0.2549 (0.2584) loss 6.2805 (5.9702) grad_norm 3.0889 (2.2559) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][340/625] eta 0:01:14 lr 0.000752 wd 0.0500 time 0.2552 (0.2610) data time 0.0007 (0.0022) model time 0.2545 (0.2583) loss 7.0801 (5.9772) grad_norm 1.5799 (2.2498) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][350/625] eta 0:01:11 lr 0.000751 wd 0.0500 time 0.2615 (0.2615) data time 0.0006 (0.0022) model time 0.2609 (0.2589) loss 6.6292 (5.9800) grad_norm 3.8045 (2.2520) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][360/625] eta 0:01:09 lr 0.000751 wd 0.0500 time 0.2573 (0.2617) data time 0.0012 (0.0022) model time 0.2562 (0.2592) loss 5.6281 (5.9801) grad_norm 2.5145 (2.2601) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][370/625] eta 0:01:06 lr 0.000751 wd 0.0500 time 0.2582 (0.2616) data time 0.0006 (0.0021) model time 0.2576 (0.2592) loss 6.2194 (5.9800) grad_norm 3.3951 (2.2590) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][380/625] eta 0:01:04 lr 0.000751 wd 0.0500 time 0.3877 (0.2619) data time 0.0009 (0.0021) model time 0.3868 (0.2595) loss 6.1185 (5.9731) grad_norm 1.8580 (2.2436) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][390/625] eta 0:01:01 lr 0.000751 wd 0.0500 time 0.2562 (0.2618) data time 0.0006 (0.0021) model time 0.2555 (0.2594) loss 6.4959 (5.9674) grad_norm 1.8084 (2.2284) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:10:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][400/625] eta 0:00:58 lr 0.000750 wd 0.0500 time 0.2707 (0.2621) data time 0.0011 (0.0020) model time 0.2696 (0.2599) loss 6.2058 (5.9611) grad_norm 2.3527 (2.2172) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][410/625] eta 0:00:56 lr 0.000750 wd 0.0500 time 0.2563 (0.2625) data time 0.0008 (0.0020) model time 0.2555 (0.2603) loss 5.2723 (5.9548) grad_norm 2.8128 (2.2208) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][420/625] eta 0:00:53 lr 0.000750 wd 0.0500 time 0.2569 (0.2623) data time 0.0011 (0.0020) model time 0.2558 (0.2602) loss 5.7806 (5.9497) grad_norm 2.0287 (2.2182) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:11:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][430/625] eta 0:00:51 lr 0.000750 wd 0.0500 time 0.2545 (0.2622) data time 0.0007 (0.0020) model time 0.2538 (0.2600) loss 5.2197 (5.9454) grad_norm 2.4249 (2.2154) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:11:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][440/625] eta 0:00:48 lr 0.000750 wd 0.0500 time 0.2598 (0.2620) data time 0.0008 (0.0019) model time 0.2590 (0.2599) loss 6.6783 (5.9419) grad_norm 1.6889 (inf) loss_scale 1024.0000 (2043.3560) mem 9655MB [2024-08-04 06:11:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][450/625] eta 0:00:45 lr 0.000750 wd 0.0500 time 0.2569 (0.2619) data time 0.0007 (0.0019) model time 0.2562 (0.2598) loss 6.6395 (5.9344) grad_norm 3.2093 (inf) loss_scale 1024.0000 (2020.7539) mem 9655MB [2024-08-04 06:11:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][460/625] eta 0:00:43 lr 0.000749 wd 0.0500 time 0.2554 (0.2617) data time 0.0008 (0.0019) model time 0.2546 (0.2596) loss 7.0483 (5.9306) grad_norm 2.7218 (inf) loss_scale 1024.0000 (1999.1323) mem 9655MB [2024-08-04 06:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][470/625] eta 0:00:40 lr 0.000749 wd 0.0500 time 0.2544 (0.2616) data time 0.0010 (0.0019) model time 0.2534 (0.2595) loss 6.7238 (5.9296) grad_norm 1.7739 (inf) loss_scale 1024.0000 (1978.4289) mem 9655MB [2024-08-04 06:11:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][480/625] eta 0:00:37 lr 0.000749 wd 0.0500 time 0.2560 (0.2615) data time 0.0011 (0.0019) model time 0.2549 (0.2594) loss 6.0980 (5.9362) grad_norm 2.3267 (inf) loss_scale 1024.0000 (1958.5863) mem 9655MB [2024-08-04 06:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][490/625] eta 0:00:35 lr 0.000749 wd 0.0500 time 0.2601 (0.2614) data time 0.0005 (0.0018) model time 0.2596 (0.2593) loss 4.8718 (5.9396) grad_norm 2.1350 (inf) loss_scale 1024.0000 (1939.5519) mem 9655MB [2024-08-04 06:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][500/625] eta 0:00:32 lr 0.000749 wd 0.0500 time 0.2579 (0.2613) data time 0.0009 (0.0018) model time 0.2570 (0.2593) loss 4.7067 (5.9356) grad_norm 2.6625 (inf) loss_scale 1024.0000 (1921.2774) mem 9655MB [2024-08-04 06:11:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][510/625] eta 0:00:30 lr 0.000749 wd 0.0500 time 0.2558 (0.2613) data time 0.0010 (0.0018) model time 0.2548 (0.2592) loss 6.9180 (5.9393) grad_norm 4.2806 (inf) loss_scale 1024.0000 (1903.7182) mem 9655MB [2024-08-04 06:11:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][520/625] eta 0:00:27 lr 0.000748 wd 0.0500 time 0.2569 (0.2612) data time 0.0008 (0.0018) model time 0.2561 (0.2591) loss 5.8750 (5.9408) grad_norm 3.1910 (inf) loss_scale 1024.0000 (1886.8330) mem 9655MB [2024-08-04 06:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][530/625] eta 0:00:24 lr 0.000748 wd 0.0500 time 0.2591 (0.2611) data time 0.0007 (0.0018) model time 0.2583 (0.2591) loss 6.5291 (5.9445) grad_norm 2.1700 (inf) loss_scale 1024.0000 (1870.5838) mem 9655MB [2024-08-04 06:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][540/625] eta 0:00:22 lr 0.000748 wd 0.0500 time 0.2518 (0.2614) data time 0.0008 (0.0017) model time 0.2510 (0.2594) loss 5.8472 (5.9450) grad_norm 1.4352 (inf) loss_scale 1024.0000 (1854.9353) mem 9655MB [2024-08-04 06:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][550/625] eta 0:00:19 lr 0.000748 wd 0.0500 time 0.2542 (0.2613) data time 0.0009 (0.0017) model time 0.2533 (0.2593) loss 6.3365 (5.9452) grad_norm 1.7893 (inf) loss_scale 1024.0000 (1839.8548) mem 9655MB [2024-08-04 06:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][560/625] eta 0:00:16 lr 0.000748 wd 0.0500 time 0.2613 (0.2612) data time 0.0010 (0.0017) model time 0.2603 (0.2593) loss 5.6202 (5.9475) grad_norm 1.2307 (inf) loss_scale 1024.0000 (1825.3119) mem 9655MB [2024-08-04 06:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][570/625] eta 0:00:14 lr 0.000748 wd 0.0500 time 0.2570 (0.2611) data time 0.0012 (0.0017) model time 0.2558 (0.2592) loss 5.4176 (5.9508) grad_norm 1.4676 (inf) loss_scale 1024.0000 (1811.2785) mem 9655MB [2024-08-04 06:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][580/625] eta 0:00:11 lr 0.000747 wd 0.0500 time 0.2573 (0.2610) data time 0.0006 (0.0017) model time 0.2566 (0.2591) loss 4.8435 (5.9577) grad_norm 1.9694 (inf) loss_scale 1024.0000 (1797.7281) mem 9655MB [2024-08-04 06:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][590/625] eta 0:00:09 lr 0.000747 wd 0.0500 time 0.2556 (0.2609) data time 0.0008 (0.0017) model time 0.2548 (0.2590) loss 6.4548 (5.9619) grad_norm 2.0114 (inf) loss_scale 1024.0000 (1784.6362) mem 9655MB [2024-08-04 06:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][600/625] eta 0:00:06 lr 0.000747 wd 0.0500 time 0.2563 (0.2612) data time 0.0009 (0.0017) model time 0.2554 (0.2593) loss 6.0290 (5.9587) grad_norm 2.5624 (inf) loss_scale 1024.0000 (1771.9800) mem 9655MB [2024-08-04 06:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][610/625] eta 0:00:03 lr 0.000747 wd 0.0500 time 0.2531 (0.2611) data time 0.0006 (0.0017) model time 0.2525 (0.2592) loss 6.6769 (5.9578) grad_norm 1.3084 (inf) loss_scale 1024.0000 (1759.7381) mem 9655MB [2024-08-04 06:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][620/625] eta 0:00:01 lr 0.000747 wd 0.0500 time 0.2535 (0.2610) data time 0.0004 (0.0016) model time 0.2532 (0.2591) loss 5.8084 (5.9592) grad_norm 2.6697 (inf) loss_scale 1024.0000 (1747.8905) mem 9655MB [2024-08-04 06:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 183 training takes 0:02:43 [2024-08-04 06:11:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:11:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6416 (0.6416) Acc@1 88.770 (88.770) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.063 (0.092) Loss 1.0127 (0.7896) Acc@1 79.346 (85.183) Acc@5 95.703 (97.319) Mem 9655MB [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1191 (0.9198) Acc@1 75.684 (81.731) Acc@5 94.189 (95.847) Mem 9655MB [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.422 Acc@5 95.815 [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.42% [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:12:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.590 (0.590) Loss 0.5840 (0.5840) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 0.9272 (0.7227) Acc@1 80.322 (85.875) Acc@5 95.605 (97.541) Mem 9655MB [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0557 (0.8495) Acc@1 75.732 (82.468) Acc@5 94.531 (96.219) Mem 9655MB [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.160 Acc@5 96.205 [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.16% [2024-08-04 06:12:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:12:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][0/625] eta 0:07:59 lr 0.000747 wd 0.0500 time 0.7671 (0.7671) data time 0.5297 (0.5297) model time 0.0000 (0.0000) loss 5.8739 (5.8739) grad_norm 2.9025 (2.9025) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][10/625] eta 0:03:05 lr 0.000746 wd 0.0500 time 0.2565 (0.3016) data time 0.0007 (0.0490) model time 0.0000 (0.0000) loss 5.6994 (5.9801) grad_norm 1.5573 (1.8594) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][20/625] eta 0:02:56 lr 0.000746 wd 0.0500 time 0.4703 (0.2909) data time 0.0008 (0.0261) model time 0.0000 (0.0000) loss 6.2416 (5.8625) grad_norm 1.8166 (1.7391) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][30/625] eta 0:02:46 lr 0.000746 wd 0.0500 time 0.2541 (0.2795) data time 0.0007 (0.0179) model time 0.0000 (0.0000) loss 6.1744 (5.9154) grad_norm 1.1436 (1.8030) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][40/625] eta 0:02:40 lr 0.000746 wd 0.0500 time 0.2634 (0.2738) data time 0.0007 (0.0138) model time 0.0000 (0.0000) loss 6.3130 (5.8930) grad_norm 1.7056 (1.9293) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][50/625] eta 0:02:37 lr 0.000746 wd 0.0500 time 0.2511 (0.2743) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 6.8055 (5.9201) grad_norm 1.7968 (2.0119) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][60/625] eta 0:02:34 lr 0.000746 wd 0.0500 time 0.4065 (0.2738) data time 0.0009 (0.0095) model time 0.4057 (0.2706) loss 4.7122 (5.8783) grad_norm 1.8236 (2.0148) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][70/625] eta 0:02:30 lr 0.000745 wd 0.0500 time 0.2557 (0.2715) data time 0.0009 (0.0083) model time 0.2548 (0.2634) loss 5.1105 (5.9120) grad_norm 3.3851 (1.9873) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][80/625] eta 0:02:27 lr 0.000745 wd 0.0500 time 0.2598 (0.2698) data time 0.0008 (0.0074) model time 0.2589 (0.2611) loss 6.4187 (5.9132) grad_norm 2.2854 (1.9850) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][90/625] eta 0:02:23 lr 0.000745 wd 0.0500 time 0.2510 (0.2683) data time 0.0009 (0.0067) model time 0.2501 (0.2598) loss 6.3392 (5.9477) grad_norm 1.8206 (2.0205) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][100/625] eta 0:02:20 lr 0.000745 wd 0.0500 time 0.2605 (0.2672) data time 0.0010 (0.0061) model time 0.2595 (0.2590) loss 6.9898 (5.9764) grad_norm 1.6401 (2.0275) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][110/625] eta 0:02:18 lr 0.000745 wd 0.0500 time 0.2565 (0.2693) data time 0.0006 (0.0057) model time 0.2559 (0.2642) loss 4.3363 (5.9745) grad_norm 1.5313 (2.0248) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][120/625] eta 0:02:15 lr 0.000745 wd 0.0500 time 0.2545 (0.2682) data time 0.0009 (0.0053) model time 0.2536 (0.2629) loss 6.6123 (5.9860) grad_norm 1.4901 (2.0110) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][130/625] eta 0:02:12 lr 0.000744 wd 0.0500 time 0.2547 (0.2673) data time 0.0008 (0.0049) model time 0.2539 (0.2619) loss 5.5319 (6.0012) grad_norm 2.0947 (2.0042) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][140/625] eta 0:02:09 lr 0.000744 wd 0.0500 time 0.2543 (0.2665) data time 0.0007 (0.0047) model time 0.2536 (0.2611) loss 4.7749 (5.9933) grad_norm 1.6719 (1.9979) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][150/625] eta 0:02:06 lr 0.000744 wd 0.0500 time 0.2572 (0.2668) data time 0.0006 (0.0044) model time 0.2565 (0.2620) loss 5.2663 (5.9804) grad_norm 2.0950 (1.9891) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][160/625] eta 0:02:03 lr 0.000744 wd 0.0500 time 0.2672 (0.2662) data time 0.0009 (0.0042) model time 0.2663 (0.2615) loss 5.1071 (5.9792) grad_norm 1.4275 (1.9644) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][170/625] eta 0:02:00 lr 0.000744 wd 0.0500 time 0.2566 (0.2656) data time 0.0008 (0.0040) model time 0.2558 (0.2609) loss 6.1551 (5.9823) grad_norm 2.2986 (1.9753) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][180/625] eta 0:01:57 lr 0.000744 wd 0.0500 time 0.2584 (0.2651) data time 0.0009 (0.0038) model time 0.2574 (0.2605) loss 5.6519 (5.9626) grad_norm 1.3614 (1.9762) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][190/625] eta 0:01:55 lr 0.000743 wd 0.0500 time 0.2599 (0.2646) data time 0.0008 (0.0037) model time 0.2591 (0.2601) loss 6.1633 (5.9709) grad_norm 3.4621 (1.9879) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][200/625] eta 0:01:52 lr 0.000743 wd 0.0500 time 0.2561 (0.2641) data time 0.0011 (0.0035) model time 0.2550 (0.2598) loss 5.5905 (5.9623) grad_norm 1.4919 (2.0216) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][210/625] eta 0:01:49 lr 0.000743 wd 0.0500 time 0.2597 (0.2638) data time 0.0007 (0.0034) model time 0.2591 (0.2595) loss 5.8820 (5.9623) grad_norm 1.3547 (2.0308) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:13:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][220/625] eta 0:01:46 lr 0.000743 wd 0.0500 time 0.2549 (0.2634) data time 0.0006 (0.0033) model time 0.2543 (0.2592) loss 5.2403 (5.9587) grad_norm 1.8960 (2.0317) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:13:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][230/625] eta 0:01:43 lr 0.000743 wd 0.0500 time 0.2524 (0.2631) data time 0.0009 (0.0032) model time 0.2516 (0.2590) loss 6.1476 (5.9691) grad_norm 1.8955 (inf) loss_scale 512.0000 (1019.5671) mem 9655MB [2024-08-04 06:13:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][240/625] eta 0:01:41 lr 0.000743 wd 0.0500 time 0.2554 (0.2628) data time 0.0006 (0.0031) model time 0.2548 (0.2588) loss 5.3600 (5.9490) grad_norm 2.6080 (inf) loss_scale 512.0000 (998.5062) mem 9655MB [2024-08-04 06:13:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][250/625] eta 0:01:38 lr 0.000742 wd 0.0500 time 0.2592 (0.2631) data time 0.0007 (0.0030) model time 0.2585 (0.2593) loss 5.0279 (5.9526) grad_norm 2.6305 (inf) loss_scale 512.0000 (979.1235) mem 9655MB [2024-08-04 06:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][260/625] eta 0:01:35 lr 0.000742 wd 0.0500 time 0.2548 (0.2628) data time 0.0006 (0.0029) model time 0.2541 (0.2591) loss 6.5335 (5.9504) grad_norm 2.1162 (inf) loss_scale 512.0000 (961.2261) mem 9655MB [2024-08-04 06:13:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][270/625] eta 0:01:33 lr 0.000742 wd 0.0500 time 0.2520 (0.2626) data time 0.0007 (0.0029) model time 0.2512 (0.2589) loss 6.7460 (5.9447) grad_norm 1.3045 (inf) loss_scale 512.0000 (944.6494) mem 9655MB [2024-08-04 06:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][280/625] eta 0:01:30 lr 0.000742 wd 0.0500 time 0.2565 (0.2624) data time 0.0009 (0.0028) model time 0.2556 (0.2588) loss 5.3529 (5.9409) grad_norm 2.0027 (inf) loss_scale 512.0000 (929.2527) mem 9655MB [2024-08-04 06:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][290/625] eta 0:01:27 lr 0.000742 wd 0.0500 time 0.2601 (0.2622) data time 0.0006 (0.0027) model time 0.2595 (0.2587) loss 5.9920 (5.9364) grad_norm 1.5858 (inf) loss_scale 512.0000 (914.9141) mem 9655MB [2024-08-04 06:13:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][300/625] eta 0:01:25 lr 0.000741 wd 0.0500 time 0.2525 (0.2620) data time 0.0010 (0.0027) model time 0.2515 (0.2585) loss 5.6425 (5.9261) grad_norm 1.7668 (inf) loss_scale 512.0000 (901.5282) mem 9655MB [2024-08-04 06:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][310/625] eta 0:01:22 lr 0.000741 wd 0.0500 time 0.2527 (0.2618) data time 0.0007 (0.0026) model time 0.2520 (0.2584) loss 6.7203 (5.9339) grad_norm 2.5986 (inf) loss_scale 512.0000 (889.0032) mem 9655MB [2024-08-04 06:13:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][320/625] eta 0:01:19 lr 0.000741 wd 0.0500 time 0.2561 (0.2616) data time 0.0008 (0.0026) model time 0.2553 (0.2583) loss 6.0222 (5.9327) grad_norm 3.5866 (inf) loss_scale 512.0000 (877.2586) mem 9655MB [2024-08-04 06:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][330/625] eta 0:01:17 lr 0.000741 wd 0.0500 time 0.2578 (0.2615) data time 0.0008 (0.0025) model time 0.2571 (0.2582) loss 6.7049 (5.9289) grad_norm 1.5421 (inf) loss_scale 512.0000 (866.2236) mem 9655MB [2024-08-04 06:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][340/625] eta 0:01:14 lr 0.000741 wd 0.0500 time 0.2530 (0.2613) data time 0.0009 (0.0025) model time 0.2521 (0.2581) loss 5.6037 (5.9237) grad_norm 1.8917 (inf) loss_scale 512.0000 (855.8358) mem 9655MB [2024-08-04 06:13:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][350/625] eta 0:01:11 lr 0.000741 wd 0.0500 time 0.2522 (0.2617) data time 0.0016 (0.0024) model time 0.2506 (0.2586) loss 5.7034 (5.9305) grad_norm 2.7506 (inf) loss_scale 512.0000 (846.0399) mem 9655MB [2024-08-04 06:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][360/625] eta 0:01:09 lr 0.000740 wd 0.0500 time 0.2574 (0.2615) data time 0.0006 (0.0024) model time 0.2568 (0.2585) loss 5.8052 (5.9366) grad_norm 1.1186 (inf) loss_scale 512.0000 (836.7867) mem 9655MB [2024-08-04 06:13:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][370/625] eta 0:01:06 lr 0.000740 wd 0.0500 time 0.2749 (0.2614) data time 0.0008 (0.0023) model time 0.2741 (0.2585) loss 6.0287 (5.9439) grad_norm 5.8926 (inf) loss_scale 512.0000 (828.0323) mem 9655MB [2024-08-04 06:13:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][380/625] eta 0:01:04 lr 0.000740 wd 0.0500 time 0.2747 (0.2613) data time 0.0010 (0.0023) model time 0.2738 (0.2584) loss 5.2540 (5.9447) grad_norm 2.7184 (inf) loss_scale 512.0000 (819.7375) mem 9655MB [2024-08-04 06:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][390/625] eta 0:01:01 lr 0.000740 wd 0.0500 time 0.2529 (0.2612) data time 0.0009 (0.0023) model time 0.2520 (0.2583) loss 5.8592 (5.9404) grad_norm 2.3922 (inf) loss_scale 512.0000 (811.8670) mem 9655MB [2024-08-04 06:13:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][400/625] eta 0:00:58 lr 0.000740 wd 0.0500 time 0.2572 (0.2611) data time 0.0006 (0.0022) model time 0.2565 (0.2582) loss 6.6128 (5.9374) grad_norm 3.3742 (inf) loss_scale 512.0000 (804.3890) mem 9655MB [2024-08-04 06:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][410/625] eta 0:00:56 lr 0.000740 wd 0.0500 time 0.2587 (0.2610) data time 0.0009 (0.0022) model time 0.2578 (0.2582) loss 5.8775 (5.9297) grad_norm 2.4813 (inf) loss_scale 512.0000 (797.2749) mem 9655MB [2024-08-04 06:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][420/625] eta 0:00:53 lr 0.000739 wd 0.0500 time 0.2604 (0.2614) data time 0.0006 (0.0022) model time 0.2598 (0.2587) loss 5.9532 (5.9282) grad_norm 1.6943 (inf) loss_scale 512.0000 (790.4988) mem 9655MB [2024-08-04 06:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][430/625] eta 0:00:50 lr 0.000739 wd 0.0500 time 0.2593 (0.2613) data time 0.0008 (0.0022) model time 0.2585 (0.2586) loss 6.5534 (5.9371) grad_norm 2.5726 (inf) loss_scale 512.0000 (784.0371) mem 9655MB [2024-08-04 06:13:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][440/625] eta 0:00:48 lr 0.000739 wd 0.0500 time 0.2565 (0.2616) data time 0.0015 (0.0021) model time 0.2550 (0.2590) loss 4.8773 (5.9322) grad_norm 2.7039 (inf) loss_scale 512.0000 (777.8685) mem 9655MB [2024-08-04 06:14:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][450/625] eta 0:00:45 lr 0.000739 wd 0.0500 time 0.2570 (0.2615) data time 0.0006 (0.0021) model time 0.2563 (0.2589) loss 5.1997 (5.9310) grad_norm 1.8300 (inf) loss_scale 512.0000 (771.9734) mem 9655MB [2024-08-04 06:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][460/625] eta 0:00:43 lr 0.000739 wd 0.0500 time 0.2550 (0.2614) data time 0.0008 (0.0021) model time 0.2542 (0.2588) loss 6.2077 (5.9307) grad_norm 1.4792 (inf) loss_scale 512.0000 (766.3341) mem 9655MB [2024-08-04 06:14:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][470/625] eta 0:00:40 lr 0.000739 wd 0.0500 time 0.2609 (0.2613) data time 0.0006 (0.0020) model time 0.2604 (0.2588) loss 5.3679 (5.9344) grad_norm 3.0313 (inf) loss_scale 512.0000 (760.9342) mem 9655MB [2024-08-04 06:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][480/625] eta 0:00:37 lr 0.000738 wd 0.0500 time 0.2543 (0.2616) data time 0.0010 (0.0020) model time 0.2533 (0.2592) loss 5.8855 (5.9308) grad_norm 1.4990 (inf) loss_scale 512.0000 (755.7588) mem 9655MB [2024-08-04 06:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][490/625] eta 0:00:35 lr 0.000738 wd 0.0500 time 0.2522 (0.2615) data time 0.0009 (0.0020) model time 0.2514 (0.2591) loss 5.6422 (5.9278) grad_norm 2.1178 (inf) loss_scale 512.0000 (750.7943) mem 9655MB [2024-08-04 06:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][500/625] eta 0:00:32 lr 0.000738 wd 0.0500 time 0.2513 (0.2614) data time 0.0009 (0.0020) model time 0.2504 (0.2590) loss 5.1277 (5.9334) grad_norm 1.8136 (inf) loss_scale 512.0000 (746.0279) mem 9655MB [2024-08-04 06:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][510/625] eta 0:00:30 lr 0.000738 wd 0.0500 time 0.2554 (0.2613) data time 0.0008 (0.0020) model time 0.2546 (0.2589) loss 6.6066 (5.9365) grad_norm 1.9706 (inf) loss_scale 512.0000 (741.4481) mem 9655MB [2024-08-04 06:14:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][520/625] eta 0:00:27 lr 0.000738 wd 0.0500 time 0.2539 (0.2612) data time 0.0007 (0.0019) model time 0.2532 (0.2588) loss 5.6592 (5.9424) grad_norm 1.6868 (inf) loss_scale 512.0000 (737.0441) mem 9655MB [2024-08-04 06:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][530/625] eta 0:00:24 lr 0.000738 wd 0.0500 time 0.2552 (0.2611) data time 0.0014 (0.0019) model time 0.2538 (0.2588) loss 6.3349 (5.9349) grad_norm 2.0505 (inf) loss_scale 512.0000 (732.8060) mem 9655MB [2024-08-04 06:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][540/625] eta 0:00:22 lr 0.000737 wd 0.0500 time 0.2530 (0.2610) data time 0.0008 (0.0019) model time 0.2522 (0.2587) loss 6.3956 (5.9296) grad_norm 1.4239 (inf) loss_scale 512.0000 (728.7246) mem 9655MB [2024-08-04 06:14:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][550/625] eta 0:00:19 lr 0.000737 wd 0.0500 time 0.2519 (0.2613) data time 0.0006 (0.0019) model time 0.2512 (0.2590) loss 5.1217 (5.9303) grad_norm 1.8334 (inf) loss_scale 512.0000 (724.7913) mem 9655MB [2024-08-04 06:14:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][560/625] eta 0:00:17 lr 0.000737 wd 0.0500 time 0.4641 (0.2616) data time 0.0009 (0.0019) model time 0.4632 (0.2594) loss 6.5746 (5.9349) grad_norm 1.4346 (inf) loss_scale 512.0000 (720.9982) mem 9655MB [2024-08-04 06:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][570/625] eta 0:00:14 lr 0.000737 wd 0.0500 time 0.2553 (0.2615) data time 0.0008 (0.0019) model time 0.2545 (0.2593) loss 6.0385 (5.9380) grad_norm 3.4598 (inf) loss_scale 512.0000 (717.3380) mem 9655MB [2024-08-04 06:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][580/625] eta 0:00:11 lr 0.000737 wd 0.0500 time 0.2556 (0.2614) data time 0.0007 (0.0018) model time 0.2549 (0.2592) loss 4.9616 (5.9426) grad_norm 2.0574 (inf) loss_scale 512.0000 (713.8038) mem 9655MB [2024-08-04 06:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][590/625] eta 0:00:09 lr 0.000737 wd 0.0500 time 0.2583 (0.2613) data time 0.0007 (0.0018) model time 0.2576 (0.2591) loss 5.0722 (5.9395) grad_norm 1.5567 (inf) loss_scale 512.0000 (710.3892) mem 9655MB [2024-08-04 06:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][600/625] eta 0:00:06 lr 0.000736 wd 0.0500 time 0.2553 (0.2612) data time 0.0008 (0.0018) model time 0.2545 (0.2590) loss 7.2476 (5.9417) grad_norm 1.3647 (inf) loss_scale 512.0000 (707.0882) mem 9655MB [2024-08-04 06:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][610/625] eta 0:00:03 lr 0.000736 wd 0.0500 time 0.2531 (0.2611) data time 0.0006 (0.0018) model time 0.2525 (0.2590) loss 5.9270 (5.9471) grad_norm 3.7083 (inf) loss_scale 512.0000 (703.8953) mem 9655MB [2024-08-04 06:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][620/625] eta 0:00:01 lr 0.000736 wd 0.0500 time 0.2505 (0.2610) data time 0.0006 (0.0018) model time 0.2498 (0.2589) loss 4.8333 (5.9466) grad_norm 1.8135 (inf) loss_scale 512.0000 (700.8052) mem 9655MB [2024-08-04 06:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 184 training takes 0:02:43 [2024-08-04 06:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:14:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.6187 (0.6187) Acc@1 89.209 (89.209) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 06:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9517 (0.7514) Acc@1 78.906 (85.263) Acc@5 95.166 (97.465) Mem 9655MB [2024-08-04 06:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.1221 (0.8878) Acc@1 74.854 (81.734) Acc@5 93.750 (95.884) Mem 9655MB [2024-08-04 06:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.432 Acc@5 95.855 [2024-08-04 06:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.43% [2024-08-04 06:14:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:14:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.494 (0.494) Loss 0.5835 (0.5835) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9263 (0.7223) Acc@1 80.371 (85.920) Acc@5 95.557 (97.545) Mem 9655MB [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0557 (0.8494) Acc@1 75.781 (82.492) Acc@5 94.531 (96.229) Mem 9655MB [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.184 Acc@5 96.209 [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.18% [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:14:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][0/625] eta 0:07:39 lr 0.000736 wd 0.0500 time 0.7358 (0.7358) data time 0.4930 (0.4930) model time 0.0000 (0.0000) loss 4.9701 (4.9701) grad_norm 1.5880 (1.5880) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][10/625] eta 0:03:03 lr 0.000736 wd 0.0500 time 0.2525 (0.2991) data time 0.0008 (0.0457) model time 0.0000 (0.0000) loss 7.1791 (6.4547) grad_norm 2.0461 (2.4422) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][20/625] eta 0:02:48 lr 0.000736 wd 0.0500 time 0.2550 (0.2781) data time 0.0008 (0.0244) model time 0.0000 (0.0000) loss 6.8127 (6.2826) grad_norm 2.0773 (2.2358) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][30/625] eta 0:02:45 lr 0.000735 wd 0.0500 time 0.2652 (0.2774) data time 0.0007 (0.0168) model time 0.0000 (0.0000) loss 5.9013 (6.1297) grad_norm 5.9590 (2.4936) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][40/625] eta 0:02:39 lr 0.000735 wd 0.0500 time 0.2568 (0.2718) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 5.1348 (6.0477) grad_norm 2.8303 (2.7212) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][50/625] eta 0:02:34 lr 0.000735 wd 0.0500 time 0.2532 (0.2686) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 7.0016 (5.9818) grad_norm 2.9350 (2.7207) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][60/625] eta 0:02:31 lr 0.000735 wd 0.0500 time 0.3877 (0.2688) data time 0.0008 (0.0090) model time 0.3869 (0.2692) loss 4.9873 (5.9393) grad_norm 2.8553 (2.6202) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][70/625] eta 0:02:28 lr 0.000735 wd 0.0500 time 0.2581 (0.2669) data time 0.0006 (0.0079) model time 0.2575 (0.2619) loss 6.7361 (5.8965) grad_norm 1.6559 (2.7992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][80/625] eta 0:02:24 lr 0.000735 wd 0.0500 time 0.2566 (0.2655) data time 0.0009 (0.0070) model time 0.2557 (0.2595) loss 6.3665 (5.9149) grad_norm 1.4165 (2.7262) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][90/625] eta 0:02:22 lr 0.000734 wd 0.0500 time 0.2604 (0.2670) data time 0.0008 (0.0063) model time 0.2596 (0.2640) loss 6.6498 (5.8983) grad_norm 3.1616 (2.7948) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][100/625] eta 0:02:20 lr 0.000734 wd 0.0500 time 0.2546 (0.2678) data time 0.0010 (0.0058) model time 0.2537 (0.2662) loss 6.8576 (5.9316) grad_norm 1.5119 (2.7183) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][110/625] eta 0:02:17 lr 0.000734 wd 0.0500 time 0.2528 (0.2667) data time 0.0009 (0.0054) model time 0.2519 (0.2642) loss 5.0773 (5.9233) grad_norm 2.4268 (2.6548) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][120/625] eta 0:02:14 lr 0.000734 wd 0.0500 time 0.2567 (0.2660) data time 0.0008 (0.0050) model time 0.2560 (0.2632) loss 6.8390 (5.9090) grad_norm 1.7642 (2.6248) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][130/625] eta 0:02:11 lr 0.000734 wd 0.0500 time 0.2566 (0.2653) data time 0.0009 (0.0047) model time 0.2558 (0.2623) loss 5.9167 (5.8992) grad_norm 1.5125 (2.5838) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][140/625] eta 0:02:09 lr 0.000734 wd 0.0500 time 0.2559 (0.2660) data time 0.0009 (0.0044) model time 0.2550 (0.2637) loss 5.8399 (5.8847) grad_norm 1.2820 (2.5174) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][150/625] eta 0:02:06 lr 0.000733 wd 0.0500 time 0.2521 (0.2653) data time 0.0009 (0.0042) model time 0.2513 (0.2628) loss 5.8776 (5.8777) grad_norm 2.1705 (2.4754) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][160/625] eta 0:02:03 lr 0.000733 wd 0.0500 time 0.2557 (0.2647) data time 0.0008 (0.0040) model time 0.2549 (0.2620) loss 6.1636 (5.8754) grad_norm 1.4364 (2.4417) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][170/625] eta 0:02:00 lr 0.000733 wd 0.0500 time 0.2556 (0.2642) data time 0.0007 (0.0038) model time 0.2549 (0.2615) loss 6.5159 (5.8824) grad_norm 2.0444 (2.4412) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][180/625] eta 0:01:57 lr 0.000733 wd 0.0500 time 0.2567 (0.2638) data time 0.0008 (0.0036) model time 0.2559 (0.2610) loss 6.6129 (5.8783) grad_norm 5.0013 (2.5195) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][190/625] eta 0:01:54 lr 0.000733 wd 0.0500 time 0.2541 (0.2634) data time 0.0007 (0.0035) model time 0.2534 (0.2606) loss 5.9524 (5.8964) grad_norm 3.2652 (2.5188) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][200/625] eta 0:01:51 lr 0.000733 wd 0.0500 time 0.2568 (0.2629) data time 0.0007 (0.0033) model time 0.2561 (0.2602) loss 6.6862 (5.9048) grad_norm 1.8605 (2.5029) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][210/625] eta 0:01:48 lr 0.000732 wd 0.0500 time 0.2514 (0.2626) data time 0.0008 (0.0032) model time 0.2505 (0.2599) loss 6.6122 (5.9188) grad_norm 1.3741 (2.4662) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][220/625] eta 0:01:46 lr 0.000732 wd 0.0500 time 0.2579 (0.2624) data time 0.0010 (0.0031) model time 0.2569 (0.2596) loss 6.1147 (5.9351) grad_norm 2.1452 (2.4327) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][230/625] eta 0:01:43 lr 0.000732 wd 0.0500 time 0.2574 (0.2629) data time 0.0007 (0.0030) model time 0.2567 (0.2604) loss 5.1842 (5.9234) grad_norm 3.3607 (2.4299) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][240/625] eta 0:01:41 lr 0.000732 wd 0.0500 time 0.2629 (0.2643) data time 0.0006 (0.0029) model time 0.2623 (0.2623) loss 5.9491 (5.9393) grad_norm 2.0942 (2.4488) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][250/625] eta 0:01:38 lr 0.000732 wd 0.0500 time 0.2555 (0.2640) data time 0.0008 (0.0029) model time 0.2547 (0.2620) loss 5.8352 (5.9361) grad_norm 4.7567 (2.4857) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][260/625] eta 0:01:36 lr 0.000731 wd 0.0500 time 0.2560 (0.2637) data time 0.0007 (0.0028) model time 0.2553 (0.2616) loss 4.4294 (5.9303) grad_norm 2.6125 (2.4742) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][270/625] eta 0:01:33 lr 0.000731 wd 0.0500 time 0.2550 (0.2634) data time 0.0007 (0.0027) model time 0.2543 (0.2613) loss 6.1859 (5.9323) grad_norm 1.7714 (2.4718) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][280/625] eta 0:01:30 lr 0.000731 wd 0.0500 time 0.2570 (0.2631) data time 0.0008 (0.0026) model time 0.2562 (0.2610) loss 5.5745 (5.9355) grad_norm 2.8895 (2.4650) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][290/625] eta 0:01:28 lr 0.000731 wd 0.0500 time 0.2529 (0.2629) data time 0.0008 (0.0026) model time 0.2521 (0.2608) loss 6.4602 (5.9386) grad_norm 2.0504 (2.4712) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][300/625] eta 0:01:25 lr 0.000731 wd 0.0500 time 0.2603 (0.2626) data time 0.0007 (0.0025) model time 0.2596 (0.2606) loss 4.3859 (5.9192) grad_norm 1.9034 (2.4560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][310/625] eta 0:01:22 lr 0.000731 wd 0.0500 time 0.2570 (0.2624) data time 0.0007 (0.0025) model time 0.2563 (0.2604) loss 5.2567 (5.9059) grad_norm 1.6335 (2.4374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][320/625] eta 0:01:19 lr 0.000730 wd 0.0500 time 0.2569 (0.2622) data time 0.0011 (0.0024) model time 0.2558 (0.2602) loss 5.9221 (5.9129) grad_norm 1.3799 (2.4251) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][330/625] eta 0:01:17 lr 0.000730 wd 0.0500 time 0.2567 (0.2620) data time 0.0008 (0.0024) model time 0.2559 (0.2600) loss 6.6639 (5.9169) grad_norm 2.4699 (2.4188) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][340/625] eta 0:01:14 lr 0.000730 wd 0.0500 time 0.2557 (0.2619) data time 0.0009 (0.0023) model time 0.2548 (0.2598) loss 6.1033 (5.9135) grad_norm 1.7217 (2.3989) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][350/625] eta 0:01:11 lr 0.000730 wd 0.0500 time 0.2560 (0.2617) data time 0.0008 (0.0023) model time 0.2552 (0.2597) loss 5.4408 (5.9158) grad_norm 3.0720 (2.3929) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][360/625] eta 0:01:09 lr 0.000730 wd 0.0500 time 0.2550 (0.2616) data time 0.0012 (0.0023) model time 0.2538 (0.2595) loss 6.9440 (5.9261) grad_norm 1.8525 (2.3757) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][370/625] eta 0:01:06 lr 0.000730 wd 0.0500 time 0.2595 (0.2619) data time 0.0008 (0.0022) model time 0.2587 (0.2600) loss 5.7055 (5.9232) grad_norm 1.5827 (2.3592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][380/625] eta 0:01:04 lr 0.000729 wd 0.0500 time 0.2565 (0.2618) data time 0.0007 (0.0022) model time 0.2558 (0.2599) loss 5.1071 (5.9179) grad_norm 2.6833 (2.3546) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][390/625] eta 0:01:01 lr 0.000729 wd 0.0500 time 0.2536 (0.2616) data time 0.0006 (0.0022) model time 0.2529 (0.2597) loss 5.3530 (5.9110) grad_norm 1.8921 (2.3549) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][400/625] eta 0:00:58 lr 0.000729 wd 0.0500 time 0.2561 (0.2615) data time 0.0010 (0.0021) model time 0.2551 (0.2596) loss 5.6399 (5.9164) grad_norm 1.2451 (2.3393) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][410/625] eta 0:00:56 lr 0.000729 wd 0.0500 time 0.2610 (0.2614) data time 0.0008 (0.0021) model time 0.2602 (0.2595) loss 6.0890 (5.9239) grad_norm 2.2767 (2.3294) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][420/625] eta 0:00:53 lr 0.000729 wd 0.0500 time 0.2525 (0.2617) data time 0.0013 (0.0021) model time 0.2512 (0.2599) loss 5.8079 (5.9193) grad_norm 2.0076 (2.3164) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][430/625] eta 0:00:51 lr 0.000729 wd 0.0500 time 0.2526 (0.2616) data time 0.0009 (0.0020) model time 0.2518 (0.2598) loss 4.5704 (5.9175) grad_norm 2.2665 (2.3134) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][440/625] eta 0:00:48 lr 0.000728 wd 0.0500 time 0.2554 (0.2615) data time 0.0007 (0.0020) model time 0.2547 (0.2597) loss 6.4821 (5.9175) grad_norm 1.5157 (2.3040) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][450/625] eta 0:00:45 lr 0.000728 wd 0.0500 time 0.2562 (0.2614) data time 0.0008 (0.0020) model time 0.2554 (0.2596) loss 5.4455 (5.9153) grad_norm 1.9358 (2.2935) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][460/625] eta 0:00:43 lr 0.000728 wd 0.0500 time 0.2541 (0.2612) data time 0.0010 (0.0020) model time 0.2531 (0.2594) loss 6.2244 (5.9246) grad_norm 1.7111 (2.2938) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][470/625] eta 0:00:40 lr 0.000728 wd 0.0500 time 0.2551 (0.2611) data time 0.0008 (0.0020) model time 0.2542 (0.2593) loss 5.2895 (5.9209) grad_norm 1.6389 (2.2949) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][480/625] eta 0:00:37 lr 0.000728 wd 0.0500 time 0.2558 (0.2614) data time 0.0009 (0.0019) model time 0.2548 (0.2597) loss 6.3019 (5.9298) grad_norm 2.6440 (2.3011) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][490/625] eta 0:00:35 lr 0.000728 wd 0.0500 time 0.2564 (0.2613) data time 0.0014 (0.0019) model time 0.2550 (0.2596) loss 6.8017 (5.9368) grad_norm 2.3168 (2.2921) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][500/625] eta 0:00:32 lr 0.000727 wd 0.0500 time 0.2662 (0.2612) data time 0.0007 (0.0019) model time 0.2655 (0.2595) loss 6.5087 (5.9361) grad_norm 1.6566 (2.2907) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][510/625] eta 0:00:30 lr 0.000727 wd 0.0500 time 0.2562 (0.2611) data time 0.0007 (0.0019) model time 0.2555 (0.2594) loss 6.0378 (5.9384) grad_norm 2.2423 (2.2893) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][520/625] eta 0:00:27 lr 0.000727 wd 0.0500 time 0.2621 (0.2613) data time 0.0007 (0.0019) model time 0.2613 (0.2596) loss 4.4427 (5.9368) grad_norm 1.3902 (2.3180) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][530/625] eta 0:00:24 lr 0.000727 wd 0.0500 time 0.2541 (0.2612) data time 0.0009 (0.0018) model time 0.2532 (0.2595) loss 5.3292 (5.9340) grad_norm 1.9906 (2.3267) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][540/625] eta 0:00:22 lr 0.000727 wd 0.0500 time 0.2543 (0.2615) data time 0.0009 (0.0018) model time 0.2534 (0.2598) loss 5.9695 (5.9339) grad_norm 2.1381 (2.3288) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][550/625] eta 0:00:19 lr 0.000727 wd 0.0500 time 0.2543 (0.2614) data time 0.0007 (0.0018) model time 0.2536 (0.2598) loss 5.3351 (5.9341) grad_norm 2.2039 (2.3306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][560/625] eta 0:00:16 lr 0.000726 wd 0.0500 time 0.2543 (0.2613) data time 0.0008 (0.0018) model time 0.2535 (0.2597) loss 5.1738 (5.9322) grad_norm 2.0391 (2.3260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][570/625] eta 0:00:14 lr 0.000726 wd 0.0500 time 0.2623 (0.2613) data time 0.0010 (0.0018) model time 0.2613 (0.2596) loss 5.0081 (5.9257) grad_norm 1.5402 (2.3208) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][580/625] eta 0:00:11 lr 0.000726 wd 0.0500 time 0.2568 (0.2612) data time 0.0009 (0.0018) model time 0.2558 (0.2596) loss 5.2151 (5.9216) grad_norm 1.5984 (2.3265) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][590/625] eta 0:00:09 lr 0.000726 wd 0.0500 time 0.2594 (0.2612) data time 0.0007 (0.0018) model time 0.2588 (0.2595) loss 4.8644 (5.9216) grad_norm 2.1473 (2.3319) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][600/625] eta 0:00:06 lr 0.000726 wd 0.0500 time 0.2500 (0.2614) data time 0.0011 (0.0017) model time 0.2489 (0.2599) loss 5.9930 (5.9179) grad_norm 2.2233 (2.3250) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][610/625] eta 0:00:03 lr 0.000726 wd 0.0500 time 0.2530 (0.2615) data time 0.0004 (0.0017) model time 0.2525 (0.2600) loss 6.0322 (5.9156) grad_norm 2.0199 (2.3214) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][620/625] eta 0:00:01 lr 0.000725 wd 0.0500 time 0.2520 (0.2614) data time 0.0006 (0.0017) model time 0.2513 (0.2598) loss 6.1852 (5.9134) grad_norm 1.3851 (2.3122) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 185 training takes 0:02:43 [2024-08-04 06:17:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:17:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.541 (0.541) Loss 0.6274 (0.6274) Acc@1 88.818 (88.818) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9990 (0.7675) Acc@1 78.955 (85.134) Acc@5 95.312 (97.288) Mem 9655MB [2024-08-04 06:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1172 (0.9017) Acc@1 75.293 (81.743) Acc@5 94.092 (95.859) Mem 9655MB [2024-08-04 06:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.428 Acc@5 95.861 [2024-08-04 06:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:17:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.722 (0.722) Loss 0.5830 (0.5830) Acc@1 89.844 (89.844) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9253 (0.7220) Acc@1 80.225 (85.960) Acc@5 95.605 (97.541) Mem 9655MB [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0547 (0.8490) Acc@1 75.830 (82.492) Acc@5 94.629 (96.243) Mem 9655MB [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.186 Acc@5 96.217 [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.19% [2024-08-04 06:17:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:17:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][0/625] eta 0:07:24 lr 0.000725 wd 0.0500 time 0.7110 (0.7110) data time 0.4684 (0.4684) model time 0.0000 (0.0000) loss 6.3358 (6.3358) grad_norm 1.4032 (1.4032) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][10/625] eta 0:03:02 lr 0.000725 wd 0.0500 time 0.2578 (0.2972) data time 0.0006 (0.0434) model time 0.0000 (0.0000) loss 5.4464 (6.1369) grad_norm 2.4596 (1.8732) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][20/625] eta 0:02:47 lr 0.000725 wd 0.0500 time 0.2566 (0.2775) data time 0.0007 (0.0231) model time 0.0000 (0.0000) loss 5.0287 (6.1312) grad_norm 1.6938 (1.8824) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][30/625] eta 0:02:41 lr 0.000725 wd 0.0500 time 0.2563 (0.2710) data time 0.0007 (0.0160) model time 0.0000 (0.0000) loss 6.5718 (6.0255) grad_norm 2.2186 (1.9133) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][40/625] eta 0:02:36 lr 0.000725 wd 0.0500 time 0.2701 (0.2677) data time 0.0015 (0.0124) model time 0.0000 (0.0000) loss 5.9932 (6.0455) grad_norm 2.5784 (2.0237) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][50/625] eta 0:02:32 lr 0.000724 wd 0.0500 time 0.2584 (0.2654) data time 0.0008 (0.0101) model time 0.0000 (0.0000) loss 6.6481 (6.0532) grad_norm 4.9453 (2.1799) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][60/625] eta 0:02:29 lr 0.000724 wd 0.0500 time 0.2554 (0.2638) data time 0.0008 (0.0086) model time 0.2546 (0.2548) loss 4.5882 (6.0267) grad_norm 3.1735 (2.2684) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][70/625] eta 0:02:25 lr 0.000724 wd 0.0500 time 0.2567 (0.2627) data time 0.0006 (0.0075) model time 0.2561 (0.2550) loss 4.8206 (6.0271) grad_norm 1.3324 (2.3379) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][80/625] eta 0:02:22 lr 0.000724 wd 0.0500 time 0.2588 (0.2619) data time 0.0006 (0.0067) model time 0.2582 (0.2551) loss 6.6549 (6.0064) grad_norm 3.3934 (2.3244) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][90/625] eta 0:02:19 lr 0.000724 wd 0.0500 time 0.2540 (0.2615) data time 0.0008 (0.0061) model time 0.2531 (0.2557) loss 6.2781 (5.9592) grad_norm 1.3615 (2.2697) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][100/625] eta 0:02:18 lr 0.000724 wd 0.0500 time 0.2564 (0.2630) data time 0.0006 (0.0056) model time 0.2558 (0.2597) loss 6.5206 (5.9700) grad_norm 2.2007 (2.2696) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][110/625] eta 0:02:16 lr 0.000723 wd 0.0500 time 0.2588 (0.2641) data time 0.0006 (0.0051) model time 0.2583 (0.2621) loss 6.8372 (5.9806) grad_norm 1.5880 (2.2237) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][120/625] eta 0:02:13 lr 0.000723 wd 0.0500 time 0.2603 (0.2634) data time 0.0006 (0.0048) model time 0.2597 (0.2611) loss 5.6627 (5.9656) grad_norm 2.7059 (2.2230) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][130/625] eta 0:02:10 lr 0.000723 wd 0.0500 time 0.2554 (0.2629) data time 0.0008 (0.0045) model time 0.2546 (0.2605) loss 6.3165 (5.9801) grad_norm 2.0408 (2.2027) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][140/625] eta 0:02:07 lr 0.000723 wd 0.0500 time 0.2623 (0.2626) data time 0.0008 (0.0042) model time 0.2615 (0.2601) loss 6.4307 (5.9815) grad_norm 2.4597 (2.2119) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][150/625] eta 0:02:04 lr 0.000723 wd 0.0500 time 0.2543 (0.2622) data time 0.0007 (0.0040) model time 0.2536 (0.2596) loss 6.9044 (6.0043) grad_norm 2.3702 (2.2076) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][160/625] eta 0:02:01 lr 0.000723 wd 0.0500 time 0.2568 (0.2618) data time 0.0009 (0.0038) model time 0.2559 (0.2592) loss 6.4876 (5.9940) grad_norm 1.7002 (2.1909) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][170/625] eta 0:01:59 lr 0.000722 wd 0.0500 time 0.2534 (0.2622) data time 0.0008 (0.0037) model time 0.2527 (0.2599) loss 5.2360 (5.9926) grad_norm 1.5499 (2.1656) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][180/625] eta 0:01:56 lr 0.000722 wd 0.0500 time 0.2533 (0.2618) data time 0.0007 (0.0035) model time 0.2526 (0.2594) loss 6.5670 (5.9622) grad_norm 2.5076 (2.1526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][190/625] eta 0:01:53 lr 0.000722 wd 0.0500 time 0.2547 (0.2615) data time 0.0006 (0.0034) model time 0.2541 (0.2592) loss 7.2795 (5.9530) grad_norm 2.0149 (2.1489) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][200/625] eta 0:01:51 lr 0.000722 wd 0.0500 time 0.4676 (0.2623) data time 0.0010 (0.0033) model time 0.4666 (0.2603) loss 5.7146 (5.9511) grad_norm 2.0154 (2.1492) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][210/625] eta 0:01:48 lr 0.000722 wd 0.0500 time 0.2522 (0.2626) data time 0.0008 (0.0031) model time 0.2514 (0.2608) loss 5.4548 (5.9500) grad_norm 1.6233 (2.1474) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][220/625] eta 0:01:46 lr 0.000722 wd 0.0500 time 0.4578 (0.2632) data time 0.0010 (0.0030) model time 0.4569 (0.2616) loss 6.1219 (5.9433) grad_norm 1.5293 (2.1320) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][230/625] eta 0:01:43 lr 0.000721 wd 0.0500 time 0.2558 (0.2629) data time 0.0010 (0.0030) model time 0.2549 (0.2613) loss 5.7964 (5.9358) grad_norm 1.3476 (2.1312) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][240/625] eta 0:01:41 lr 0.000721 wd 0.0500 time 0.2586 (0.2627) data time 0.0007 (0.0029) model time 0.2578 (0.2610) loss 5.8591 (5.9444) grad_norm 2.1079 (2.1295) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][250/625] eta 0:01:38 lr 0.000721 wd 0.0500 time 0.2533 (0.2633) data time 0.0012 (0.0028) model time 0.2521 (0.2618) loss 5.4878 (5.9479) grad_norm 2.6252 (2.1575) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][260/625] eta 0:01:36 lr 0.000721 wd 0.0500 time 0.2555 (0.2637) data time 0.0007 (0.0027) model time 0.2548 (0.2623) loss 5.5241 (5.9337) grad_norm 4.5412 (2.1742) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][270/625] eta 0:01:33 lr 0.000721 wd 0.0500 time 0.2529 (0.2634) data time 0.0008 (0.0027) model time 0.2521 (0.2620) loss 4.2790 (5.9337) grad_norm 1.6236 (2.1755) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][280/625] eta 0:01:30 lr 0.000721 wd 0.0500 time 0.2564 (0.2631) data time 0.0007 (0.0026) model time 0.2557 (0.2617) loss 6.0966 (5.9499) grad_norm 1.6426 (2.1585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][290/625] eta 0:01:28 lr 0.000720 wd 0.0500 time 0.2562 (0.2633) data time 0.0011 (0.0025) model time 0.2551 (0.2619) loss 5.0331 (5.9431) grad_norm 1.7711 (2.1590) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][300/625] eta 0:01:25 lr 0.000720 wd 0.0500 time 0.2586 (0.2635) data time 0.0008 (0.0025) model time 0.2578 (0.2621) loss 6.3234 (5.9374) grad_norm 2.2461 (2.1568) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][310/625] eta 0:01:22 lr 0.000720 wd 0.0500 time 0.2527 (0.2632) data time 0.0014 (0.0024) model time 0.2513 (0.2619) loss 7.4120 (5.9274) grad_norm 1.2464 (2.1477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][320/625] eta 0:01:20 lr 0.000720 wd 0.0500 time 0.2562 (0.2630) data time 0.0008 (0.0024) model time 0.2554 (0.2617) loss 5.7205 (5.9151) grad_norm 1.4431 (2.1380) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][330/625] eta 0:01:17 lr 0.000720 wd 0.0500 time 0.2544 (0.2628) data time 0.0010 (0.0023) model time 0.2534 (0.2614) loss 6.1723 (5.9118) grad_norm 2.1818 (2.1344) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][340/625] eta 0:01:14 lr 0.000719 wd 0.0500 time 0.2620 (0.2626) data time 0.0008 (0.0023) model time 0.2612 (0.2612) loss 6.0522 (5.9250) grad_norm 1.5242 (2.1311) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][350/625] eta 0:01:12 lr 0.000719 wd 0.0500 time 0.2567 (0.2624) data time 0.0008 (0.0023) model time 0.2559 (0.2610) loss 6.2780 (5.9227) grad_norm 3.2543 (2.1365) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][360/625] eta 0:01:09 lr 0.000719 wd 0.0500 time 0.2568 (0.2623) data time 0.0011 (0.0022) model time 0.2557 (0.2609) loss 6.6791 (5.9266) grad_norm 3.1732 (2.1400) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][370/625] eta 0:01:06 lr 0.000719 wd 0.0500 time 0.2588 (0.2627) data time 0.0009 (0.0022) model time 0.2579 (0.2613) loss 4.7296 (5.9268) grad_norm 2.0540 (2.1486) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][380/625] eta 0:01:04 lr 0.000719 wd 0.0500 time 0.2574 (0.2625) data time 0.0006 (0.0022) model time 0.2567 (0.2612) loss 6.3064 (5.9240) grad_norm 1.3697 (2.1473) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][390/625] eta 0:01:01 lr 0.000719 wd 0.0500 time 0.2584 (0.2624) data time 0.0007 (0.0021) model time 0.2577 (0.2610) loss 6.6886 (5.9298) grad_norm 2.2768 (2.1396) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][400/625] eta 0:00:59 lr 0.000718 wd 0.0500 time 0.2517 (0.2622) data time 0.0007 (0.0021) model time 0.2510 (0.2608) loss 7.1597 (5.9402) grad_norm 1.8455 (2.1380) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][410/625] eta 0:00:56 lr 0.000718 wd 0.0500 time 0.2555 (0.2621) data time 0.0007 (0.0021) model time 0.2548 (0.2607) loss 6.6411 (5.9456) grad_norm 2.0888 (2.1388) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][420/625] eta 0:00:53 lr 0.000718 wd 0.0500 time 0.2568 (0.2619) data time 0.0008 (0.0020) model time 0.2560 (0.2605) loss 5.2596 (5.9402) grad_norm 1.7866 (2.1353) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][430/625] eta 0:00:51 lr 0.000718 wd 0.0500 time 0.2584 (0.2618) data time 0.0008 (0.0020) model time 0.2576 (0.2604) loss 6.4505 (5.9403) grad_norm 1.9331 (2.1261) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][440/625] eta 0:00:48 lr 0.000718 wd 0.0500 time 0.2561 (0.2617) data time 0.0013 (0.0020) model time 0.2548 (0.2603) loss 5.8230 (5.9432) grad_norm 1.7858 (2.1235) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][450/625] eta 0:00:45 lr 0.000718 wd 0.0500 time 0.2559 (0.2615) data time 0.0009 (0.0020) model time 0.2550 (0.2601) loss 5.8482 (5.9375) grad_norm 2.0617 (2.1342) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][460/625] eta 0:00:43 lr 0.000717 wd 0.0500 time 0.2572 (0.2614) data time 0.0006 (0.0019) model time 0.2566 (0.2600) loss 6.0384 (5.9316) grad_norm 4.2916 (2.1381) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][470/625] eta 0:00:40 lr 0.000717 wd 0.0500 time 0.2620 (0.2618) data time 0.0007 (0.0019) model time 0.2613 (0.2604) loss 6.3385 (5.9365) grad_norm 2.2767 (2.1520) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][480/625] eta 0:00:37 lr 0.000717 wd 0.0500 time 0.2582 (0.2617) data time 0.0007 (0.0019) model time 0.2575 (0.2603) loss 5.1271 (5.9275) grad_norm 1.9769 (2.1448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][490/625] eta 0:00:35 lr 0.000717 wd 0.0500 time 0.2532 (0.2615) data time 0.0009 (0.0019) model time 0.2523 (0.2602) loss 5.0868 (5.9223) grad_norm 3.7633 (2.1602) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][500/625] eta 0:00:32 lr 0.000717 wd 0.0500 time 0.2546 (0.2614) data time 0.0018 (0.0019) model time 0.2529 (0.2601) loss 6.6149 (5.9257) grad_norm 2.4276 (2.1573) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][510/625] eta 0:00:30 lr 0.000717 wd 0.0500 time 0.2530 (0.2613) data time 0.0008 (0.0018) model time 0.2522 (0.2600) loss 6.8037 (5.9235) grad_norm 2.0149 (2.1565) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][520/625] eta 0:00:27 lr 0.000716 wd 0.0500 time 0.3848 (0.2614) data time 0.0011 (0.0018) model time 0.3837 (0.2601) loss 5.3069 (5.9226) grad_norm 1.5927 (2.1549) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][530/625] eta 0:00:24 lr 0.000716 wd 0.0500 time 0.2554 (0.2613) data time 0.0010 (0.0018) model time 0.2544 (0.2600) loss 6.1114 (5.9225) grad_norm 2.6941 (2.1567) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][540/625] eta 0:00:22 lr 0.000716 wd 0.0500 time 0.2513 (0.2612) data time 0.0009 (0.0018) model time 0.2504 (0.2598) loss 5.3162 (5.9241) grad_norm 2.6753 (2.1580) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][550/625] eta 0:00:19 lr 0.000716 wd 0.0500 time 0.2567 (0.2611) data time 0.0006 (0.0018) model time 0.2561 (0.2597) loss 7.0488 (5.9245) grad_norm 2.4864 (2.1551) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][560/625] eta 0:00:16 lr 0.000716 wd 0.0500 time 0.2553 (0.2610) data time 0.0010 (0.0018) model time 0.2543 (0.2596) loss 5.4750 (5.9233) grad_norm 2.1185 (2.1541) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][570/625] eta 0:00:14 lr 0.000716 wd 0.0500 time 0.2537 (0.2609) data time 0.0007 (0.0017) model time 0.2530 (0.2596) loss 6.4881 (5.9248) grad_norm 1.8638 (2.1594) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][580/625] eta 0:00:11 lr 0.000715 wd 0.0500 time 0.2541 (0.2609) data time 0.0010 (0.0017) model time 0.2531 (0.2595) loss 7.0076 (5.9233) grad_norm 1.8570 (2.1531) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][590/625] eta 0:00:09 lr 0.000715 wd 0.0500 time 0.2548 (0.2608) data time 0.0008 (0.0017) model time 0.2540 (0.2594) loss 6.8720 (5.9231) grad_norm 2.5636 (2.1581) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][600/625] eta 0:00:06 lr 0.000715 wd 0.0500 time 0.2539 (0.2607) data time 0.0010 (0.0017) model time 0.2529 (0.2593) loss 6.5887 (5.9269) grad_norm 3.2907 (2.1600) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][610/625] eta 0:00:03 lr 0.000715 wd 0.0500 time 0.2523 (0.2606) data time 0.0004 (0.0017) model time 0.2519 (0.2593) loss 5.3950 (5.9267) grad_norm 1.8953 (2.1595) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][620/625] eta 0:00:01 lr 0.000715 wd 0.0500 time 0.2542 (0.2608) data time 0.0003 (0.0017) model time 0.2539 (0.2595) loss 5.2044 (5.9212) grad_norm 2.1172 (2.1612) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 186 training takes 0:02:42 [2024-08-04 06:20:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:20:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.6138 (0.6138) Acc@1 87.891 (87.891) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 06:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.093) Loss 0.9507 (0.7639) Acc@1 80.176 (85.312) Acc@5 95.605 (97.403) Mem 9655MB [2024-08-04 06:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.1152 (0.8988) Acc@1 75.635 (81.785) Acc@5 93.896 (95.922) Mem 9655MB [2024-08-04 06:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.468 Acc@5 95.891 [2024-08-04 06:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-04 06:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.47% [2024-08-04 06:20:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:20:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.501 (0.501) Loss 0.5825 (0.5825) Acc@1 89.795 (89.795) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9248 (0.7216) Acc@1 80.371 (85.995) Acc@5 95.410 (97.536) Mem 9655MB [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0537 (0.8487) Acc@1 75.830 (82.529) Acc@5 94.727 (96.245) Mem 9655MB [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.224 Acc@5 96.221 [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.22% [2024-08-04 06:20:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:20:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][0/625] eta 0:07:20 lr 0.000715 wd 0.0500 time 0.7044 (0.7044) data time 0.4548 (0.4548) model time 0.0000 (0.0000) loss 6.6937 (6.6937) grad_norm 1.4676 (1.4676) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][10/625] eta 0:03:02 lr 0.000714 wd 0.0500 time 0.2536 (0.2963) data time 0.0008 (0.0423) model time 0.0000 (0.0000) loss 5.9913 (5.9827) grad_norm 1.5689 (1.8105) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][20/625] eta 0:02:51 lr 0.000714 wd 0.0500 time 0.2545 (0.2834) data time 0.0007 (0.0226) model time 0.0000 (0.0000) loss 5.4970 (5.7781) grad_norm 1.9021 (2.2519) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][30/625] eta 0:02:43 lr 0.000714 wd 0.0500 time 0.2527 (0.2744) data time 0.0010 (0.0156) model time 0.0000 (0.0000) loss 6.3570 (5.8030) grad_norm 1.5874 (2.2387) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][40/625] eta 0:02:38 lr 0.000714 wd 0.0500 time 0.2516 (0.2702) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 5.4679 (5.8179) grad_norm 1.8631 (2.2133) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][50/625] eta 0:02:35 lr 0.000714 wd 0.0500 time 0.2525 (0.2710) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 5.0737 (5.8895) grad_norm 1.6635 (2.1672) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][60/625] eta 0:02:33 lr 0.000714 wd 0.0500 time 0.3977 (0.2708) data time 0.0008 (0.0084) model time 0.3969 (0.2688) loss 6.0942 (5.8211) grad_norm 1.7343 (2.1086) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][70/625] eta 0:02:29 lr 0.000713 wd 0.0500 time 0.2584 (0.2687) data time 0.0008 (0.0074) model time 0.2576 (0.2616) loss 6.4225 (5.8766) grad_norm 3.2126 (2.1743) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][80/625] eta 0:02:25 lr 0.000713 wd 0.0500 time 0.2606 (0.2670) data time 0.0008 (0.0066) model time 0.2598 (0.2592) loss 6.6345 (5.8613) grad_norm 1.9044 (2.1641) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][90/625] eta 0:02:22 lr 0.000713 wd 0.0500 time 0.2558 (0.2658) data time 0.0010 (0.0060) model time 0.2547 (0.2582) loss 6.3143 (5.8553) grad_norm 1.4983 (2.0972) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][100/625] eta 0:02:19 lr 0.000713 wd 0.0500 time 0.2577 (0.2648) data time 0.0009 (0.0054) model time 0.2568 (0.2576) loss 5.1075 (5.8788) grad_norm 1.7397 (2.0940) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][110/625] eta 0:02:16 lr 0.000713 wd 0.0500 time 0.2540 (0.2659) data time 0.0014 (0.0051) model time 0.2526 (0.2606) loss 5.4401 (5.8072) grad_norm 1.9907 (2.0865) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][120/625] eta 0:02:13 lr 0.000713 wd 0.0500 time 0.2543 (0.2650) data time 0.0009 (0.0047) model time 0.2535 (0.2597) loss 5.0460 (5.7963) grad_norm 2.7853 (2.0986) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][130/625] eta 0:02:10 lr 0.000712 wd 0.0500 time 0.2555 (0.2643) data time 0.0006 (0.0044) model time 0.2549 (0.2590) loss 6.5273 (5.8021) grad_norm 2.0558 (2.1058) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][140/625] eta 0:02:07 lr 0.000712 wd 0.0500 time 0.2572 (0.2637) data time 0.0008 (0.0042) model time 0.2564 (0.2586) loss 6.3543 (5.7807) grad_norm 2.5987 (2.1250) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][150/625] eta 0:02:05 lr 0.000712 wd 0.0500 time 0.2564 (0.2633) data time 0.0008 (0.0040) model time 0.2556 (0.2584) loss 4.5413 (5.7943) grad_norm 1.8871 (2.1054) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][160/625] eta 0:02:02 lr 0.000712 wd 0.0500 time 0.2558 (0.2627) data time 0.0007 (0.0038) model time 0.2551 (0.2580) loss 6.3470 (5.7980) grad_norm 2.0093 (2.1063) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][170/625] eta 0:01:59 lr 0.000712 wd 0.0500 time 0.2547 (0.2624) data time 0.0008 (0.0036) model time 0.2539 (0.2578) loss 6.8885 (5.8062) grad_norm 2.1938 (2.1174) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][180/625] eta 0:01:56 lr 0.000712 wd 0.0500 time 0.2568 (0.2626) data time 0.0010 (0.0035) model time 0.2558 (0.2584) loss 6.1936 (5.8193) grad_norm 1.5884 (2.1168) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][190/625] eta 0:01:54 lr 0.000711 wd 0.0500 time 0.2542 (0.2623) data time 0.0011 (0.0033) model time 0.2531 (0.2582) loss 6.0692 (5.8453) grad_norm 1.3863 (2.1134) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][200/625] eta 0:01:51 lr 0.000711 wd 0.0500 time 0.2513 (0.2621) data time 0.0012 (0.0032) model time 0.2501 (0.2581) loss 6.2675 (5.8539) grad_norm 2.0390 (2.1313) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][210/625] eta 0:01:48 lr 0.000711 wd 0.0500 time 0.2579 (0.2618) data time 0.0008 (0.0031) model time 0.2571 (0.2579) loss 5.1467 (5.8495) grad_norm 2.9127 (2.1704) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][220/625] eta 0:01:45 lr 0.000711 wd 0.0500 time 0.2600 (0.2616) data time 0.0007 (0.0030) model time 0.2593 (0.2578) loss 5.8222 (5.8608) grad_norm 2.1941 (2.1888) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][230/625] eta 0:01:43 lr 0.000711 wd 0.0500 time 0.4241 (0.2622) data time 0.0010 (0.0029) model time 0.4231 (0.2588) loss 6.7727 (5.8800) grad_norm 2.7459 (2.1871) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][240/625] eta 0:01:41 lr 0.000711 wd 0.0500 time 0.2522 (0.2635) data time 0.0008 (0.0028) model time 0.2514 (0.2605) loss 5.1703 (5.8851) grad_norm 1.6525 (2.1865) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][250/625] eta 0:01:38 lr 0.000710 wd 0.0500 time 0.2544 (0.2632) data time 0.0009 (0.0028) model time 0.2534 (0.2603) loss 5.1156 (5.8793) grad_norm 2.6672 (2.2003) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][260/625] eta 0:01:35 lr 0.000710 wd 0.0500 time 0.2524 (0.2630) data time 0.0009 (0.0027) model time 0.2515 (0.2601) loss 6.5989 (5.8901) grad_norm 1.4506 (2.2039) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][270/625] eta 0:01:33 lr 0.000710 wd 0.0500 time 0.2585 (0.2633) data time 0.0007 (0.0026) model time 0.2578 (0.2605) loss 5.8030 (5.9008) grad_norm 2.9024 (2.2099) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][280/625] eta 0:01:30 lr 0.000710 wd 0.0500 time 0.2561 (0.2630) data time 0.0009 (0.0026) model time 0.2552 (0.2603) loss 6.8018 (5.9102) grad_norm 1.3083 (2.1893) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][290/625] eta 0:01:28 lr 0.000710 wd 0.0500 time 0.2547 (0.2633) data time 0.0010 (0.0025) model time 0.2537 (0.2607) loss 6.4632 (5.9247) grad_norm 2.1650 (2.1868) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][300/625] eta 0:01:25 lr 0.000710 wd 0.0500 time 0.2570 (0.2631) data time 0.0007 (0.0025) model time 0.2562 (0.2605) loss 4.6472 (5.9262) grad_norm 2.1866 (2.1808) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][310/625] eta 0:01:22 lr 0.000709 wd 0.0500 time 0.2550 (0.2629) data time 0.0010 (0.0024) model time 0.2540 (0.2603) loss 6.2832 (5.9254) grad_norm 1.6042 (2.1822) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][320/625] eta 0:01:20 lr 0.000709 wd 0.0500 time 0.2595 (0.2627) data time 0.0009 (0.0024) model time 0.2586 (0.2601) loss 6.2109 (5.9315) grad_norm 2.3928 (2.1764) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][330/625] eta 0:01:17 lr 0.000709 wd 0.0500 time 0.2556 (0.2625) data time 0.0008 (0.0023) model time 0.2548 (0.2600) loss 6.1186 (5.9274) grad_norm 3.4446 (2.1739) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:21:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][340/625] eta 0:01:14 lr 0.000709 wd 0.0500 time 0.2663 (0.2624) data time 0.0006 (0.0023) model time 0.2657 (0.2599) loss 6.3437 (5.9228) grad_norm 1.7817 (2.1705) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][350/625] eta 0:01:12 lr 0.000709 wd 0.0500 time 0.2566 (0.2622) data time 0.0007 (0.0023) model time 0.2559 (0.2598) loss 5.3275 (5.9215) grad_norm 1.7628 (2.1759) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][360/625] eta 0:01:09 lr 0.000709 wd 0.0500 time 0.2552 (0.2621) data time 0.0011 (0.0022) model time 0.2540 (0.2596) loss 6.3740 (5.9219) grad_norm 1.4818 (2.1823) loss_scale 1024.0000 (521.9280) mem 9655MB [2024-08-04 06:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][370/625] eta 0:01:06 lr 0.000708 wd 0.0500 time 0.2524 (0.2619) data time 0.0007 (0.0022) model time 0.2517 (0.2595) loss 5.7103 (5.9153) grad_norm 2.1105 (2.1817) loss_scale 1024.0000 (535.4609) mem 9655MB [2024-08-04 06:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][380/625] eta 0:01:04 lr 0.000708 wd 0.0500 time 0.2505 (0.2617) data time 0.0009 (0.0022) model time 0.2496 (0.2593) loss 5.5631 (5.9110) grad_norm 2.4508 (2.1809) loss_scale 1024.0000 (548.2835) mem 9655MB [2024-08-04 06:22:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][390/625] eta 0:01:01 lr 0.000708 wd 0.0500 time 0.2571 (0.2616) data time 0.0010 (0.0021) model time 0.2561 (0.2592) loss 6.3658 (5.9136) grad_norm 4.2195 (2.2616) loss_scale 1024.0000 (560.4501) mem 9655MB [2024-08-04 06:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][400/625] eta 0:00:58 lr 0.000708 wd 0.0500 time 0.2560 (0.2615) data time 0.0007 (0.0021) model time 0.2552 (0.2591) loss 7.0990 (5.9190) grad_norm 3.3246 (2.2830) loss_scale 1024.0000 (572.0100) mem 9655MB [2024-08-04 06:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][410/625] eta 0:00:56 lr 0.000708 wd 0.0500 time 0.2563 (0.2614) data time 0.0008 (0.0021) model time 0.2554 (0.2590) loss 5.8349 (5.9254) grad_norm 1.8814 (2.2846) loss_scale 1024.0000 (583.0073) mem 9655MB [2024-08-04 06:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][420/625] eta 0:00:53 lr 0.000708 wd 0.0500 time 0.2585 (0.2613) data time 0.0009 (0.0020) model time 0.2576 (0.2589) loss 5.8845 (5.9210) grad_norm 2.2463 (2.2727) loss_scale 1024.0000 (593.4822) mem 9655MB [2024-08-04 06:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][430/625] eta 0:00:51 lr 0.000707 wd 0.0500 time 0.2565 (0.2616) data time 0.0009 (0.0020) model time 0.2555 (0.2594) loss 5.7076 (5.9243) grad_norm 1.7463 (2.2682) loss_scale 1024.0000 (603.4710) mem 9655MB [2024-08-04 06:22:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][440/625] eta 0:00:48 lr 0.000707 wd 0.0500 time 0.2684 (0.2615) data time 0.0008 (0.0020) model time 0.2676 (0.2593) loss 6.3464 (5.9179) grad_norm 1.9246 (2.2558) loss_scale 1024.0000 (613.0068) mem 9655MB [2024-08-04 06:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][450/625] eta 0:00:45 lr 0.000707 wd 0.0500 time 0.2656 (0.2615) data time 0.0009 (0.0020) model time 0.2647 (0.2593) loss 5.3868 (5.9201) grad_norm 1.5183 (2.2441) loss_scale 1024.0000 (622.1197) mem 9655MB [2024-08-04 06:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][460/625] eta 0:00:43 lr 0.000707 wd 0.0500 time 0.2575 (0.2617) data time 0.0007 (0.0019) model time 0.2568 (0.2596) loss 6.2064 (5.9217) grad_norm 3.8544 (2.2351) loss_scale 1024.0000 (630.8373) mem 9655MB [2024-08-04 06:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][470/625] eta 0:00:40 lr 0.000707 wd 0.0500 time 0.2603 (0.2616) data time 0.0007 (0.0019) model time 0.2596 (0.2596) loss 6.9987 (5.9208) grad_norm 1.7327 (2.2381) loss_scale 1024.0000 (639.1847) mem 9655MB [2024-08-04 06:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][480/625] eta 0:00:38 lr 0.000707 wd 0.0500 time 0.2562 (0.2622) data time 0.0010 (0.0019) model time 0.2552 (0.2602) loss 5.9043 (5.9187) grad_norm 1.5356 (2.2302) loss_scale 1024.0000 (647.1850) mem 9655MB [2024-08-04 06:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][490/625] eta 0:00:35 lr 0.000706 wd 0.0500 time 0.2553 (0.2621) data time 0.0009 (0.0019) model time 0.2544 (0.2601) loss 5.8696 (5.9198) grad_norm 2.9370 (2.2250) loss_scale 1024.0000 (654.8595) mem 9655MB [2024-08-04 06:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][500/625] eta 0:00:32 lr 0.000706 wd 0.0500 time 0.4248 (0.2623) data time 0.0009 (0.0019) model time 0.4239 (0.2603) loss 7.0652 (5.9214) grad_norm 2.8577 (2.2237) loss_scale 1024.0000 (662.2275) mem 9655MB [2024-08-04 06:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][510/625] eta 0:00:30 lr 0.000706 wd 0.0500 time 0.2520 (0.2631) data time 0.0006 (0.0018) model time 0.2514 (0.2613) loss 4.8936 (5.9148) grad_norm 2.6543 (2.2359) loss_scale 1024.0000 (669.3072) mem 9655MB [2024-08-04 06:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][520/625] eta 0:00:27 lr 0.000706 wd 0.0500 time 0.2559 (0.2634) data time 0.0010 (0.0018) model time 0.2549 (0.2616) loss 5.6036 (5.9144) grad_norm 3.5766 (2.2381) loss_scale 1024.0000 (676.1152) mem 9655MB [2024-08-04 06:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][530/625] eta 0:00:25 lr 0.000706 wd 0.0500 time 0.2571 (0.2632) data time 0.0008 (0.0018) model time 0.2563 (0.2614) loss 6.9460 (5.9093) grad_norm 2.7418 (2.2394) loss_scale 1024.0000 (682.6667) mem 9655MB [2024-08-04 06:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][540/625] eta 0:00:22 lr 0.000706 wd 0.0500 time 0.2577 (0.2631) data time 0.0010 (0.0018) model time 0.2566 (0.2613) loss 5.8991 (5.9067) grad_norm 3.4608 (2.2339) loss_scale 1024.0000 (688.9760) mem 9655MB [2024-08-04 06:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][550/625] eta 0:00:19 lr 0.000705 wd 0.0500 time 0.2616 (0.2630) data time 0.0008 (0.0018) model time 0.2609 (0.2612) loss 5.2246 (5.9043) grad_norm 1.8714 (2.2329) loss_scale 1024.0000 (695.0563) mem 9655MB [2024-08-04 06:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][560/625] eta 0:00:17 lr 0.000705 wd 0.0500 time 0.2571 (0.2628) data time 0.0007 (0.0018) model time 0.2564 (0.2610) loss 5.2327 (5.9066) grad_norm 1.9794 (2.2270) loss_scale 1024.0000 (700.9198) mem 9655MB [2024-08-04 06:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][570/625] eta 0:00:14 lr 0.000705 wd 0.0500 time 0.2544 (0.2627) data time 0.0009 (0.0018) model time 0.2535 (0.2609) loss 5.4791 (5.8999) grad_norm 2.1410 (2.2215) loss_scale 1024.0000 (706.5779) mem 9655MB [2024-08-04 06:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][580/625] eta 0:00:11 lr 0.000705 wd 0.0500 time 0.2545 (0.2626) data time 0.0009 (0.0017) model time 0.2536 (0.2608) loss 6.3402 (5.9087) grad_norm 1.4911 (2.2151) loss_scale 1024.0000 (712.0413) mem 9655MB [2024-08-04 06:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][590/625] eta 0:00:09 lr 0.000705 wd 0.0500 time 0.2531 (0.2625) data time 0.0009 (0.0017) model time 0.2522 (0.2607) loss 6.1100 (5.9043) grad_norm 1.4240 (2.2093) loss_scale 1024.0000 (717.3198) mem 9655MB [2024-08-04 06:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][600/625] eta 0:00:06 lr 0.000704 wd 0.0500 time 0.2561 (0.2624) data time 0.0006 (0.0017) model time 0.2555 (0.2606) loss 6.8464 (5.9070) grad_norm 3.3351 (2.2172) loss_scale 1024.0000 (722.4226) mem 9655MB [2024-08-04 06:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][610/625] eta 0:00:03 lr 0.000704 wd 0.0500 time 0.2517 (0.2623) data time 0.0004 (0.0017) model time 0.2513 (0.2605) loss 6.1692 (5.9129) grad_norm 1.9715 (2.2398) loss_scale 1024.0000 (727.3584) mem 9655MB [2024-08-04 06:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][620/625] eta 0:00:01 lr 0.000704 wd 0.0500 time 0.2543 (0.2622) data time 0.0003 (0.0017) model time 0.2540 (0.2604) loss 6.1861 (5.9218) grad_norm 1.5540 (2.2430) loss_scale 1024.0000 (732.1353) mem 9655MB [2024-08-04 06:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 187 training takes 0:02:43 [2024-08-04 06:23:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:23:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.6167 (0.6167) Acc@1 89.404 (89.404) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9966 (0.7892) Acc@1 79.199 (84.899) Acc@5 95.312 (97.257) Mem 9655MB [2024-08-04 06:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1504 (0.9155) Acc@1 74.268 (81.594) Acc@5 94.141 (95.850) Mem 9655MB [2024-08-04 06:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.370 Acc@5 95.813 [2024-08-04 06:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.745 (0.745) Loss 0.5825 (0.5825) Acc@1 89.795 (89.795) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.9238 (0.7212) Acc@1 80.273 (86.017) Acc@5 95.508 (97.554) Mem 9655MB [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0527 (0.8481) Acc@1 75.928 (82.578) Acc@5 94.678 (96.243) Mem 9655MB [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.278 Acc@5 96.219 [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.28% [2024-08-04 06:23:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:23:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][0/625] eta 0:06:38 lr 0.000704 wd 0.0500 time 0.6383 (0.6383) data time 0.3797 (0.3797) model time 0.0000 (0.0000) loss 7.1809 (7.1809) grad_norm 2.4723 (2.4723) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][10/625] eta 0:02:58 lr 0.000704 wd 0.0500 time 0.2586 (0.2907) data time 0.0009 (0.0355) model time 0.0000 (0.0000) loss 6.3041 (5.9759) grad_norm 1.8710 (2.3430) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][20/625] eta 0:02:45 lr 0.000704 wd 0.0500 time 0.2566 (0.2742) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 5.7177 (5.8866) grad_norm 10.6441 (2.8017) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][30/625] eta 0:02:39 lr 0.000704 wd 0.0500 time 0.2568 (0.2684) data time 0.0009 (0.0132) model time 0.0000 (0.0000) loss 5.9873 (5.9045) grad_norm 2.8585 (2.6273) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][40/625] eta 0:02:37 lr 0.000703 wd 0.0500 time 0.2535 (0.2700) data time 0.0007 (0.0102) model time 0.0000 (0.0000) loss 6.8386 (5.9566) grad_norm 2.7464 (2.6510) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][50/625] eta 0:02:35 lr 0.000703 wd 0.0500 time 0.2597 (0.2710) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 5.6775 (5.9394) grad_norm 2.2152 (2.6381) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][60/625] eta 0:02:31 lr 0.000703 wd 0.0500 time 0.2565 (0.2684) data time 0.0009 (0.0072) model time 0.2555 (0.2544) loss 6.6281 (5.9806) grad_norm 1.5273 (2.5457) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][70/625] eta 0:02:28 lr 0.000703 wd 0.0500 time 0.2564 (0.2668) data time 0.0008 (0.0063) model time 0.2556 (0.2553) loss 5.8562 (5.9530) grad_norm 3.8947 (2.5695) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][80/625] eta 0:02:24 lr 0.000703 wd 0.0500 time 0.2644 (0.2657) data time 0.0009 (0.0056) model time 0.2635 (0.2558) loss 6.4287 (5.9556) grad_norm 2.1910 (2.5485) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][90/625] eta 0:02:21 lr 0.000703 wd 0.0500 time 0.2505 (0.2645) data time 0.0010 (0.0051) model time 0.2495 (0.2552) loss 5.8554 (5.9321) grad_norm 1.1724 (2.5082) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][100/625] eta 0:02:18 lr 0.000702 wd 0.0500 time 0.2578 (0.2638) data time 0.0010 (0.0048) model time 0.2568 (0.2554) loss 4.9845 (5.9214) grad_norm 1.5503 (2.4190) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][110/625] eta 0:02:15 lr 0.000702 wd 0.0500 time 0.2531 (0.2630) data time 0.0011 (0.0044) model time 0.2521 (0.2552) loss 5.1142 (5.9066) grad_norm 2.1262 (2.3701) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][120/625] eta 0:02:12 lr 0.000702 wd 0.0500 time 0.2540 (0.2624) data time 0.0011 (0.0041) model time 0.2529 (0.2551) loss 5.9466 (5.9263) grad_norm 1.6037 (2.3964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][130/625] eta 0:02:09 lr 0.000702 wd 0.0500 time 0.2507 (0.2618) data time 0.0007 (0.0039) model time 0.2500 (0.2550) loss 4.9502 (5.9212) grad_norm 1.9170 (2.3702) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][140/625] eta 0:02:07 lr 0.000702 wd 0.0500 time 0.2559 (0.2630) data time 0.0009 (0.0037) model time 0.2550 (0.2574) loss 5.9737 (5.9301) grad_norm 3.4264 (2.3607) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][150/625] eta 0:02:04 lr 0.000702 wd 0.0500 time 0.2575 (0.2625) data time 0.0009 (0.0035) model time 0.2566 (0.2572) loss 6.9599 (5.9359) grad_norm 2.0923 (2.3792) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][160/625] eta 0:02:01 lr 0.000701 wd 0.0500 time 0.2571 (0.2623) data time 0.0008 (0.0033) model time 0.2563 (0.2572) loss 6.4854 (5.9423) grad_norm 2.1557 (2.3477) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][170/625] eta 0:01:59 lr 0.000701 wd 0.0500 time 0.2576 (0.2619) data time 0.0007 (0.0032) model time 0.2569 (0.2570) loss 5.9348 (5.9271) grad_norm 1.2904 (2.3398) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][180/625] eta 0:01:56 lr 0.000701 wd 0.0500 time 0.2590 (0.2616) data time 0.0008 (0.0031) model time 0.2582 (0.2569) loss 5.4670 (5.9247) grad_norm 2.1972 (2.3234) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][190/625] eta 0:01:53 lr 0.000701 wd 0.0500 time 0.2567 (0.2613) data time 0.0007 (0.0030) model time 0.2561 (0.2568) loss 5.4834 (5.9370) grad_norm 1.3258 (2.2994) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][200/625] eta 0:01:50 lr 0.000701 wd 0.0500 time 0.2556 (0.2610) data time 0.0008 (0.0029) model time 0.2548 (0.2567) loss 5.9982 (5.9410) grad_norm 1.6715 (2.2891) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][210/625] eta 0:01:48 lr 0.000701 wd 0.0500 time 0.2503 (0.2608) data time 0.0010 (0.0028) model time 0.2492 (0.2565) loss 5.8486 (5.9438) grad_norm 2.6969 (2.3003) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][220/625] eta 0:01:45 lr 0.000700 wd 0.0500 time 0.2543 (0.2606) data time 0.0012 (0.0027) model time 0.2531 (0.2564) loss 5.3593 (5.9420) grad_norm 2.0155 (2.3124) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][230/625] eta 0:01:42 lr 0.000700 wd 0.0500 time 0.2544 (0.2604) data time 0.0007 (0.0026) model time 0.2537 (0.2564) loss 5.7867 (5.9486) grad_norm 1.7160 (2.3065) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][240/625] eta 0:01:40 lr 0.000700 wd 0.0500 time 0.2577 (0.2602) data time 0.0008 (0.0026) model time 0.2568 (0.2563) loss 6.3544 (5.9615) grad_norm 1.8688 (2.3083) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][250/625] eta 0:01:37 lr 0.000700 wd 0.0500 time 0.2527 (0.2600) data time 0.0010 (0.0025) model time 0.2518 (0.2562) loss 5.3721 (5.9591) grad_norm 1.3910 (2.2922) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][260/625] eta 0:01:34 lr 0.000700 wd 0.0500 time 0.2570 (0.2598) data time 0.0009 (0.0024) model time 0.2561 (0.2562) loss 5.3412 (5.9624) grad_norm 1.8753 (2.3090) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][270/625] eta 0:01:32 lr 0.000700 wd 0.0500 time 0.2576 (0.2597) data time 0.0012 (0.0024) model time 0.2565 (0.2561) loss 4.9522 (5.9444) grad_norm 1.6859 (2.2999) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][280/625] eta 0:01:29 lr 0.000699 wd 0.0500 time 0.2560 (0.2595) data time 0.0011 (0.0023) model time 0.2550 (0.2560) loss 6.6913 (5.9438) grad_norm 2.2463 (2.2899) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][290/625] eta 0:01:26 lr 0.000699 wd 0.0500 time 0.2543 (0.2594) data time 0.0010 (0.0023) model time 0.2532 (0.2559) loss 4.8425 (5.9400) grad_norm 1.7561 (2.2795) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][300/625] eta 0:01:24 lr 0.000699 wd 0.0500 time 0.2555 (0.2601) data time 0.0006 (0.0022) model time 0.2549 (0.2569) loss 5.7429 (5.9322) grad_norm 1.3328 (2.2714) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][310/625] eta 0:01:21 lr 0.000699 wd 0.0500 time 0.2718 (0.2600) data time 0.0008 (0.0022) model time 0.2711 (0.2569) loss 5.9852 (5.9216) grad_norm 2.1872 (2.2718) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][320/625] eta 0:01:19 lr 0.000699 wd 0.0500 time 0.2565 (0.2599) data time 0.0007 (0.0021) model time 0.2557 (0.2569) loss 5.9024 (5.9222) grad_norm 1.8344 (2.2630) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][330/625] eta 0:01:16 lr 0.000699 wd 0.0500 time 0.2532 (0.2598) data time 0.0007 (0.0021) model time 0.2525 (0.2568) loss 6.0395 (5.9264) grad_norm 2.4951 (2.2518) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][340/625] eta 0:01:14 lr 0.000698 wd 0.0500 time 0.2563 (0.2603) data time 0.0008 (0.0021) model time 0.2554 (0.2575) loss 4.9171 (5.9248) grad_norm 2.8128 (2.2485) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][350/625] eta 0:01:11 lr 0.000698 wd 0.0500 time 0.2562 (0.2602) data time 0.0007 (0.0020) model time 0.2555 (0.2574) loss 7.2135 (5.9285) grad_norm 2.5008 (2.2666) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][360/625] eta 0:01:08 lr 0.000698 wd 0.0500 time 0.2567 (0.2601) data time 0.0007 (0.0020) model time 0.2560 (0.2573) loss 5.4741 (5.9289) grad_norm 1.3018 (2.2688) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][370/625] eta 0:01:06 lr 0.000698 wd 0.0500 time 0.2558 (0.2605) data time 0.0009 (0.0020) model time 0.2549 (0.2579) loss 6.3410 (5.9262) grad_norm 1.8851 (2.2626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][380/625] eta 0:01:03 lr 0.000698 wd 0.0500 time 0.2507 (0.2604) data time 0.0007 (0.0020) model time 0.2499 (0.2578) loss 4.5666 (5.9244) grad_norm 1.4164 (2.2538) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][390/625] eta 0:01:01 lr 0.000697 wd 0.0500 time 0.2541 (0.2603) data time 0.0011 (0.0019) model time 0.2531 (0.2578) loss 5.6986 (5.9203) grad_norm 1.5178 (2.2407) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][400/625] eta 0:00:58 lr 0.000697 wd 0.0500 time 0.2539 (0.2607) data time 0.0007 (0.0019) model time 0.2531 (0.2583) loss 5.5901 (5.9232) grad_norm 1.8236 (2.2301) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][410/625] eta 0:00:56 lr 0.000697 wd 0.0500 time 0.2623 (0.2606) data time 0.0012 (0.0019) model time 0.2612 (0.2582) loss 6.0523 (5.9322) grad_norm 3.0362 (2.2215) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][420/625] eta 0:00:53 lr 0.000697 wd 0.0500 time 0.2551 (0.2605) data time 0.0007 (0.0019) model time 0.2544 (0.2581) loss 6.2409 (5.9395) grad_norm 3.2912 (2.2170) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][430/625] eta 0:00:50 lr 0.000697 wd 0.0500 time 0.2490 (0.2604) data time 0.0008 (0.0018) model time 0.2482 (0.2581) loss 6.0111 (5.9360) grad_norm 1.9613 (2.2071) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][440/625] eta 0:00:48 lr 0.000697 wd 0.0500 time 0.2557 (0.2607) data time 0.0007 (0.0018) model time 0.2549 (0.2585) loss 6.5301 (5.9391) grad_norm 1.8806 (2.1989) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][450/625] eta 0:00:45 lr 0.000696 wd 0.0500 time 0.2535 (0.2606) data time 0.0010 (0.0018) model time 0.2525 (0.2583) loss 5.7444 (5.9378) grad_norm 3.7529 (2.2530) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][460/625] eta 0:00:43 lr 0.000696 wd 0.0500 time 0.3972 (0.2608) data time 0.0006 (0.0018) model time 0.3966 (0.2586) loss 4.8015 (5.9415) grad_norm 2.0552 (2.2679) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][470/625] eta 0:00:40 lr 0.000696 wd 0.0500 time 0.2602 (0.2607) data time 0.0008 (0.0018) model time 0.2594 (0.2585) loss 6.5395 (5.9469) grad_norm 1.5577 (2.2680) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][480/625] eta 0:00:37 lr 0.000696 wd 0.0500 time 0.2579 (0.2610) data time 0.0010 (0.0017) model time 0.2569 (0.2589) loss 6.1551 (5.9477) grad_norm 2.4152 (2.2670) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][490/625] eta 0:00:35 lr 0.000696 wd 0.0500 time 0.2578 (0.2613) data time 0.0008 (0.0017) model time 0.2570 (0.2592) loss 6.0431 (5.9519) grad_norm 2.4410 (2.2610) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][500/625] eta 0:00:32 lr 0.000696 wd 0.0500 time 0.2578 (0.2616) data time 0.0010 (0.0017) model time 0.2568 (0.2596) loss 5.7711 (5.9564) grad_norm 3.5068 (2.2648) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][510/625] eta 0:00:30 lr 0.000695 wd 0.0500 time 0.2547 (0.2615) data time 0.0010 (0.0017) model time 0.2537 (0.2595) loss 6.3310 (5.9552) grad_norm 2.7316 (2.2636) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][520/625] eta 0:00:27 lr 0.000695 wd 0.0500 time 0.2556 (0.2614) data time 0.0006 (0.0017) model time 0.2550 (0.2594) loss 6.8123 (5.9580) grad_norm 2.4503 (2.2661) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][530/625] eta 0:00:24 lr 0.000695 wd 0.0500 time 0.2541 (0.2613) data time 0.0009 (0.0017) model time 0.2532 (0.2594) loss 6.3891 (5.9611) grad_norm 1.7191 (2.2626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][540/625] eta 0:00:22 lr 0.000695 wd 0.0500 time 0.2507 (0.2612) data time 0.0010 (0.0017) model time 0.2497 (0.2593) loss 4.4853 (5.9641) grad_norm 2.3751 (2.2576) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][550/625] eta 0:00:19 lr 0.000695 wd 0.0500 time 0.2561 (0.2613) data time 0.0017 (0.0016) model time 0.2544 (0.2594) loss 4.5296 (5.9609) grad_norm 2.5563 (2.2578) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][560/625] eta 0:00:16 lr 0.000695 wd 0.0500 time 0.2556 (0.2613) data time 0.0018 (0.0016) model time 0.2538 (0.2593) loss 4.6455 (5.9601) grad_norm 1.7196 (2.2540) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][570/625] eta 0:00:14 lr 0.000694 wd 0.0500 time 0.2519 (0.2612) data time 0.0007 (0.0016) model time 0.2512 (0.2593) loss 5.0944 (5.9602) grad_norm 1.6775 (2.2460) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][580/625] eta 0:00:11 lr 0.000694 wd 0.0500 time 0.2529 (0.2611) data time 0.0007 (0.0016) model time 0.2523 (0.2592) loss 5.7153 (5.9563) grad_norm 2.4962 (2.2456) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][590/625] eta 0:00:09 lr 0.000694 wd 0.0500 time 0.2539 (0.2610) data time 0.0008 (0.0016) model time 0.2531 (0.2591) loss 5.2680 (5.9541) grad_norm 1.5057 (2.2411) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][600/625] eta 0:00:06 lr 0.000694 wd 0.0500 time 0.2547 (0.2609) data time 0.0009 (0.0016) model time 0.2539 (0.2590) loss 5.4582 (5.9504) grad_norm 2.3454 (2.2394) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][610/625] eta 0:00:03 lr 0.000694 wd 0.0500 time 0.2540 (0.2608) data time 0.0006 (0.0016) model time 0.2534 (0.2590) loss 6.8215 (5.9482) grad_norm 2.1887 (2.2377) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][620/625] eta 0:00:01 lr 0.000694 wd 0.0500 time 0.2529 (0.2607) data time 0.0005 (0.0016) model time 0.2524 (0.2589) loss 5.5107 (5.9508) grad_norm 2.2204 (2.2418) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 188 training takes 0:02:42 [2024-08-04 06:26:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:26:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.6143 (0.6143) Acc@1 89.307 (89.307) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 06:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9990 (0.7698) Acc@1 78.223 (85.112) Acc@5 96.191 (97.306) Mem 9655MB [2024-08-04 06:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1201 (0.9045) Acc@1 76.025 (81.741) Acc@5 93.652 (95.896) Mem 9655MB [2024-08-04 06:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.534 Acc@5 95.883 [2024-08-04 06:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-04 06:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.53% [2024-08-04 06:26:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:26:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.541 (0.541) Loss 0.5820 (0.5820) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 06:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9233 (0.7212) Acc@1 80.176 (86.026) Acc@5 95.557 (97.545) Mem 9655MB [2024-08-04 06:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0527 (0.8481) Acc@1 75.879 (82.580) Acc@5 94.629 (96.226) Mem 9655MB [2024-08-04 06:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.274 Acc@5 96.213 [2024-08-04 06:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][0/625] eta 0:11:23 lr 0.000694 wd 0.0500 time 1.0930 (1.0930) data time 0.4295 (0.4295) model time 0.0000 (0.0000) loss 5.5269 (5.5269) grad_norm 1.2841 (1.2841) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][10/625] eta 0:03:38 lr 0.000693 wd 0.0500 time 0.3785 (0.3550) data time 0.0008 (0.0400) model time 0.0000 (0.0000) loss 4.8466 (5.6756) grad_norm 2.5645 (2.0958) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][20/625] eta 0:03:06 lr 0.000693 wd 0.0500 time 0.2568 (0.3079) data time 0.0010 (0.0215) model time 0.0000 (0.0000) loss 4.9593 (5.7522) grad_norm 3.5042 (2.2289) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][30/625] eta 0:02:52 lr 0.000693 wd 0.0500 time 0.2540 (0.2907) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 4.7904 (5.7499) grad_norm 1.9980 (2.1396) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][40/625] eta 0:02:45 lr 0.000693 wd 0.0500 time 0.2609 (0.2822) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 5.7487 (5.7457) grad_norm 2.5072 (2.2424) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][50/625] eta 0:02:39 lr 0.000693 wd 0.0500 time 0.2557 (0.2770) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 5.1469 (5.7409) grad_norm 3.2660 (2.3100) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][60/625] eta 0:02:34 lr 0.000693 wd 0.0500 time 0.2536 (0.2734) data time 0.0010 (0.0080) model time 0.2527 (0.2542) loss 6.1251 (5.7067) grad_norm 1.5911 (2.2940) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][70/625] eta 0:02:30 lr 0.000692 wd 0.0500 time 0.2567 (0.2709) data time 0.0009 (0.0070) model time 0.2558 (0.2545) loss 5.2628 (5.7018) grad_norm 1.3238 (2.2749) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][80/625] eta 0:02:26 lr 0.000692 wd 0.0500 time 0.2708 (0.2695) data time 0.0008 (0.0063) model time 0.2700 (0.2559) loss 5.7197 (5.7353) grad_norm 4.3180 (2.3141) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][90/625] eta 0:02:23 lr 0.000692 wd 0.0500 time 0.2613 (0.2681) data time 0.0009 (0.0057) model time 0.2604 (0.2558) loss 6.1400 (5.7475) grad_norm 5.0754 (2.4245) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][100/625] eta 0:02:20 lr 0.000692 wd 0.0500 time 0.2626 (0.2672) data time 0.0009 (0.0052) model time 0.2618 (0.2563) loss 6.1044 (5.7715) grad_norm 1.4383 (2.4081) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][110/625] eta 0:02:17 lr 0.000692 wd 0.0500 time 0.2492 (0.2679) data time 0.0008 (0.0048) model time 0.2484 (0.2593) loss 6.2686 (5.7542) grad_norm 1.6225 (2.3829) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][120/625] eta 0:02:15 lr 0.000692 wd 0.0500 time 0.2512 (0.2680) data time 0.0009 (0.0045) model time 0.2504 (0.2605) loss 6.3604 (5.7571) grad_norm 2.0477 (2.3935) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][130/625] eta 0:02:12 lr 0.000691 wd 0.0500 time 0.2535 (0.2672) data time 0.0008 (0.0042) model time 0.2527 (0.2600) loss 6.2638 (5.7894) grad_norm 2.8311 (2.4041) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][140/625] eta 0:02:09 lr 0.000691 wd 0.0500 time 0.2572 (0.2677) data time 0.0011 (0.0040) model time 0.2561 (0.2615) loss 6.1391 (5.7904) grad_norm 1.7384 (2.3969) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][150/625] eta 0:02:07 lr 0.000691 wd 0.0500 time 0.2553 (0.2683) data time 0.0010 (0.0038) model time 0.2543 (0.2629) loss 4.7124 (5.7868) grad_norm 1.6523 (2.3864) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][160/625] eta 0:02:04 lr 0.000691 wd 0.0500 time 0.2592 (0.2675) data time 0.0005 (0.0036) model time 0.2587 (0.2622) loss 6.0908 (5.8084) grad_norm 1.7219 (2.3699) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][170/625] eta 0:02:01 lr 0.000691 wd 0.0500 time 0.2562 (0.2681) data time 0.0008 (0.0035) model time 0.2554 (0.2633) loss 6.4760 (5.8165) grad_norm 2.3525 (2.3399) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][180/625] eta 0:01:59 lr 0.000690 wd 0.0500 time 0.2582 (0.2674) data time 0.0010 (0.0033) model time 0.2571 (0.2627) loss 5.3342 (5.8167) grad_norm 2.3317 (2.3348) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][190/625] eta 0:01:56 lr 0.000690 wd 0.0500 time 0.2544 (0.2668) data time 0.0007 (0.0032) model time 0.2536 (0.2621) loss 4.4401 (5.8133) grad_norm 1.5921 (2.3063) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][200/625] eta 0:01:53 lr 0.000690 wd 0.0500 time 0.2536 (0.2671) data time 0.0010 (0.0031) model time 0.2527 (0.2628) loss 5.8011 (5.8043) grad_norm 1.9349 (2.2734) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][210/625] eta 0:01:51 lr 0.000690 wd 0.0500 time 0.2582 (0.2682) data time 0.0009 (0.0030) model time 0.2572 (0.2645) loss 5.7691 (5.8246) grad_norm 1.5585 (2.2441) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][220/625] eta 0:01:48 lr 0.000690 wd 0.0500 time 0.2577 (0.2677) data time 0.0007 (0.0029) model time 0.2569 (0.2640) loss 5.9249 (5.8267) grad_norm 1.9501 (2.2342) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][230/625] eta 0:01:45 lr 0.000690 wd 0.0500 time 0.2607 (0.2672) data time 0.0006 (0.0028) model time 0.2601 (0.2635) loss 6.6983 (5.8400) grad_norm 2.5284 (2.2140) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][240/625] eta 0:01:42 lr 0.000689 wd 0.0500 time 0.2530 (0.2668) data time 0.0008 (0.0027) model time 0.2522 (0.2631) loss 4.6885 (5.8342) grad_norm 2.9133 (2.2137) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][250/625] eta 0:01:39 lr 0.000689 wd 0.0500 time 0.2532 (0.2664) data time 0.0015 (0.0027) model time 0.2517 (0.2627) loss 6.7343 (5.8339) grad_norm 1.8920 (2.2045) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][260/625] eta 0:01:37 lr 0.000689 wd 0.0500 time 0.2562 (0.2661) data time 0.0008 (0.0026) model time 0.2553 (0.2625) loss 6.6516 (5.8603) grad_norm 2.8722 (2.2024) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][270/625] eta 0:01:34 lr 0.000689 wd 0.0500 time 0.2557 (0.2657) data time 0.0006 (0.0026) model time 0.2551 (0.2621) loss 5.3979 (5.8723) grad_norm 1.3710 (2.1859) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][280/625] eta 0:01:31 lr 0.000689 wd 0.0500 time 0.2549 (0.2653) data time 0.0008 (0.0025) model time 0.2541 (0.2618) loss 6.7923 (5.8732) grad_norm 1.9949 (2.1706) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][290/625] eta 0:01:28 lr 0.000689 wd 0.0500 time 0.2546 (0.2650) data time 0.0009 (0.0024) model time 0.2537 (0.2615) loss 5.7457 (5.8872) grad_norm 1.9266 (2.1643) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][300/625] eta 0:01:26 lr 0.000688 wd 0.0500 time 0.2581 (0.2647) data time 0.0009 (0.0024) model time 0.2572 (0.2613) loss 4.5369 (5.8798) grad_norm 1.5747 (2.1575) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][310/625] eta 0:01:23 lr 0.000688 wd 0.0500 time 0.2562 (0.2645) data time 0.0008 (0.0023) model time 0.2554 (0.2610) loss 5.5873 (5.8647) grad_norm 1.6739 (2.1688) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][320/625] eta 0:01:20 lr 0.000688 wd 0.0500 time 0.2545 (0.2642) data time 0.0010 (0.0023) model time 0.2535 (0.2609) loss 6.2584 (5.8689) grad_norm 2.4549 (2.1910) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][330/625] eta 0:01:17 lr 0.000688 wd 0.0500 time 0.2561 (0.2640) data time 0.0012 (0.0023) model time 0.2550 (0.2607) loss 4.9781 (5.8533) grad_norm 3.1287 (2.1935) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][340/625] eta 0:01:15 lr 0.000688 wd 0.0500 time 0.2573 (0.2638) data time 0.0018 (0.0022) model time 0.2555 (0.2605) loss 5.7998 (5.8519) grad_norm 1.7295 (2.2006) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][350/625] eta 0:01:12 lr 0.000688 wd 0.0500 time 0.2579 (0.2635) data time 0.0008 (0.0022) model time 0.2571 (0.2603) loss 6.3661 (5.8522) grad_norm 1.7795 (2.1896) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][360/625] eta 0:01:09 lr 0.000687 wd 0.0500 time 0.2575 (0.2633) data time 0.0010 (0.0022) model time 0.2565 (0.2601) loss 6.6379 (5.8531) grad_norm 1.8087 (2.1809) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][370/625] eta 0:01:07 lr 0.000687 wd 0.0500 time 0.2560 (0.2631) data time 0.0006 (0.0021) model time 0.2555 (0.2599) loss 5.3689 (5.8485) grad_norm 1.9780 (2.1721) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][380/625] eta 0:01:04 lr 0.000687 wd 0.0500 time 0.2524 (0.2629) data time 0.0009 (0.0021) model time 0.2515 (0.2598) loss 5.7885 (5.8435) grad_norm 2.6931 (2.1999) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][390/625] eta 0:01:01 lr 0.000687 wd 0.0500 time 0.2572 (0.2628) data time 0.0007 (0.0021) model time 0.2565 (0.2596) loss 5.0228 (5.8573) grad_norm 2.2489 (2.2198) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][400/625] eta 0:00:59 lr 0.000687 wd 0.0500 time 0.2593 (0.2626) data time 0.0009 (0.0020) model time 0.2584 (0.2595) loss 4.9958 (5.8470) grad_norm 1.8743 (2.2182) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][410/625] eta 0:00:56 lr 0.000687 wd 0.0500 time 0.2594 (0.2625) data time 0.0007 (0.0020) model time 0.2586 (0.2594) loss 6.5742 (5.8586) grad_norm 1.6509 (2.2165) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][420/625] eta 0:00:53 lr 0.000686 wd 0.0500 time 0.2550 (0.2623) data time 0.0010 (0.0020) model time 0.2540 (0.2593) loss 5.8625 (5.8593) grad_norm 1.7386 (2.2149) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][430/625] eta 0:00:51 lr 0.000686 wd 0.0500 time 0.2553 (0.2622) data time 0.0007 (0.0020) model time 0.2545 (0.2592) loss 7.1585 (5.8524) grad_norm 2.1693 (2.2227) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][440/625] eta 0:00:48 lr 0.000686 wd 0.0500 time 0.2537 (0.2621) data time 0.0009 (0.0019) model time 0.2528 (0.2592) loss 5.1794 (5.8454) grad_norm 3.1869 (2.2345) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][450/625] eta 0:00:45 lr 0.000686 wd 0.0500 time 0.2531 (0.2619) data time 0.0008 (0.0019) model time 0.2522 (0.2591) loss 6.5498 (5.8483) grad_norm 1.4404 (2.2307) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][460/625] eta 0:00:43 lr 0.000686 wd 0.0500 time 0.2572 (0.2618) data time 0.0006 (0.0019) model time 0.2566 (0.2590) loss 6.3282 (5.8578) grad_norm 2.9663 (2.2293) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][470/625] eta 0:00:40 lr 0.000686 wd 0.0500 time 0.2598 (0.2617) data time 0.0010 (0.0019) model time 0.2589 (0.2589) loss 5.2495 (5.8511) grad_norm 1.5557 (2.2308) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][480/625] eta 0:00:37 lr 0.000685 wd 0.0500 time 0.2557 (0.2616) data time 0.0009 (0.0019) model time 0.2548 (0.2588) loss 5.4848 (5.8525) grad_norm 2.6751 (2.2450) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][490/625] eta 0:00:35 lr 0.000685 wd 0.0500 time 0.2553 (0.2615) data time 0.0008 (0.0019) model time 0.2545 (0.2588) loss 6.4810 (5.8562) grad_norm 2.6175 (2.2422) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][500/625] eta 0:00:32 lr 0.000685 wd 0.0500 time 0.2579 (0.2615) data time 0.0008 (0.0018) model time 0.2571 (0.2587) loss 5.5565 (5.8513) grad_norm 2.4362 (2.2383) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][510/625] eta 0:00:30 lr 0.000685 wd 0.0500 time 0.2611 (0.2621) data time 0.0008 (0.0018) model time 0.2603 (0.2595) loss 6.4739 (5.8569) grad_norm 1.5665 (2.2335) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][520/625] eta 0:00:27 lr 0.000685 wd 0.0500 time 0.2577 (0.2623) data time 0.0011 (0.0018) model time 0.2566 (0.2597) loss 5.7609 (5.8505) grad_norm 2.9480 (2.2380) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][530/625] eta 0:00:24 lr 0.000685 wd 0.0500 time 0.2592 (0.2621) data time 0.0007 (0.0018) model time 0.2585 (0.2596) loss 5.7035 (5.8492) grad_norm 3.0742 (2.2382) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][540/625] eta 0:00:22 lr 0.000684 wd 0.0500 time 0.2556 (0.2620) data time 0.0009 (0.0018) model time 0.2548 (0.2595) loss 4.1517 (5.8451) grad_norm 2.3483 (2.2428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][550/625] eta 0:00:19 lr 0.000684 wd 0.0500 time 0.2580 (0.2619) data time 0.0007 (0.0017) model time 0.2573 (0.2594) loss 4.6515 (5.8424) grad_norm 3.5264 (2.2428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][560/625] eta 0:00:17 lr 0.000684 wd 0.0500 time 0.2551 (0.2618) data time 0.0010 (0.0017) model time 0.2541 (0.2593) loss 6.1398 (5.8436) grad_norm 2.3703 (2.2438) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][570/625] eta 0:00:14 lr 0.000684 wd 0.0500 time 0.2572 (0.2617) data time 0.0007 (0.0017) model time 0.2565 (0.2592) loss 5.6617 (5.8451) grad_norm 1.8613 (2.2379) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][580/625] eta 0:00:11 lr 0.000684 wd 0.0500 time 0.2593 (0.2616) data time 0.0008 (0.0017) model time 0.2586 (0.2591) loss 6.9402 (5.8499) grad_norm 1.2299 (2.2401) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][590/625] eta 0:00:09 lr 0.000684 wd 0.0500 time 0.2557 (0.2615) data time 0.0008 (0.0017) model time 0.2549 (0.2591) loss 5.1535 (5.8453) grad_norm 1.9184 (2.2444) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][600/625] eta 0:00:06 lr 0.000683 wd 0.0500 time 0.2564 (0.2614) data time 0.0008 (0.0017) model time 0.2555 (0.2590) loss 7.1586 (5.8496) grad_norm 2.8196 (2.2428) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][610/625] eta 0:00:03 lr 0.000683 wd 0.0500 time 0.2532 (0.2613) data time 0.0004 (0.0017) model time 0.2529 (0.2589) loss 4.7532 (5.8487) grad_norm 3.2222 (2.2556) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][620/625] eta 0:00:01 lr 0.000683 wd 0.0500 time 0.2541 (0.2612) data time 0.0005 (0.0017) model time 0.2536 (0.2588) loss 6.8109 (5.8563) grad_norm 1.5787 (2.2599) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 189 training takes 0:02:43 [2024-08-04 06:28:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:28:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.6182 (0.6182) Acc@1 89.355 (89.355) Acc@5 98.340 (98.340) Mem 9655MB [2024-08-04 06:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.102) Loss 0.9941 (0.7606) Acc@1 79.004 (85.107) Acc@5 94.971 (97.394) Mem 9655MB [2024-08-04 06:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0957 (0.8894) Acc@1 75.293 (81.678) Acc@5 94.385 (95.971) Mem 9655MB [2024-08-04 06:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.370 Acc@5 95.957 [2024-08-04 06:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.790 (0.790) Loss 0.5820 (0.5820) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9224 (0.7207) Acc@1 80.371 (86.066) Acc@5 95.605 (97.559) Mem 9655MB [2024-08-04 06:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0518 (0.8474) Acc@1 75.879 (82.592) Acc@5 94.678 (96.229) Mem 9655MB [2024-08-04 06:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.280 Acc@5 96.217 [2024-08-04 06:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.28% [2024-08-04 06:28:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:28:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][0/625] eta 0:07:27 lr 0.000683 wd 0.0500 time 0.7158 (0.7158) data time 0.4700 (0.4700) model time 0.0000 (0.0000) loss 4.5217 (4.5217) grad_norm 3.1731 (3.1731) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][10/625] eta 0:03:02 lr 0.000683 wd 0.0500 time 0.2575 (0.2974) data time 0.0007 (0.0436) model time 0.0000 (0.0000) loss 6.2028 (5.4168) grad_norm 2.0331 (2.6134) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][20/625] eta 0:02:54 lr 0.000683 wd 0.0500 time 0.2559 (0.2886) data time 0.0006 (0.0233) model time 0.0000 (0.0000) loss 6.6068 (5.5366) grad_norm 1.7859 (2.3684) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][30/625] eta 0:02:45 lr 0.000683 wd 0.0500 time 0.2601 (0.2784) data time 0.0006 (0.0161) model time 0.0000 (0.0000) loss 5.4379 (5.5684) grad_norm 2.2453 (2.3089) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][40/625] eta 0:02:41 lr 0.000682 wd 0.0500 time 0.2580 (0.2762) data time 0.0010 (0.0124) model time 0.0000 (0.0000) loss 4.7108 (5.6247) grad_norm 1.7888 (2.1720) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][50/625] eta 0:02:40 lr 0.000682 wd 0.0500 time 0.2584 (0.2796) data time 0.0011 (0.0102) model time 0.0000 (0.0000) loss 6.5830 (5.6717) grad_norm 2.8929 (2.1028) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][60/625] eta 0:02:35 lr 0.000682 wd 0.0500 time 0.2563 (0.2759) data time 0.0007 (0.0086) model time 0.2556 (0.2564) loss 6.8943 (5.7258) grad_norm 2.7520 (2.4564) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][70/625] eta 0:02:31 lr 0.000682 wd 0.0500 time 0.2559 (0.2732) data time 0.0008 (0.0076) model time 0.2552 (0.2559) loss 6.6763 (5.8194) grad_norm 4.6339 (2.4781) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][80/625] eta 0:02:29 lr 0.000682 wd 0.0500 time 0.2546 (0.2736) data time 0.0008 (0.0067) model time 0.2538 (0.2626) loss 5.2395 (5.8138) grad_norm 2.4347 (2.4548) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][90/625] eta 0:02:25 lr 0.000682 wd 0.0500 time 0.2559 (0.2716) data time 0.0007 (0.0061) model time 0.2552 (0.2605) loss 6.6258 (5.8296) grad_norm 2.0441 (2.4310) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][100/625] eta 0:02:21 lr 0.000681 wd 0.0500 time 0.2545 (0.2701) data time 0.0009 (0.0056) model time 0.2536 (0.2594) loss 4.9574 (5.8548) grad_norm 1.5158 (2.4124) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][110/625] eta 0:02:18 lr 0.000681 wd 0.0500 time 0.2550 (0.2687) data time 0.0008 (0.0052) model time 0.2542 (0.2586) loss 4.3440 (5.8348) grad_norm 2.4727 (2.4160) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][120/625] eta 0:02:16 lr 0.000681 wd 0.0500 time 0.2553 (0.2694) data time 0.0010 (0.0048) model time 0.2543 (0.2611) loss 6.0010 (5.8263) grad_norm 2.9144 (2.3964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][130/625] eta 0:02:12 lr 0.000681 wd 0.0500 time 0.2522 (0.2684) data time 0.0007 (0.0045) model time 0.2515 (0.2604) loss 6.9152 (5.8317) grad_norm 1.6271 (2.3878) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][140/625] eta 0:02:09 lr 0.000681 wd 0.0500 time 0.2539 (0.2675) data time 0.0008 (0.0043) model time 0.2531 (0.2598) loss 6.4834 (5.8110) grad_norm 1.6534 (2.3768) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][150/625] eta 0:02:06 lr 0.000681 wd 0.0500 time 0.2583 (0.2669) data time 0.0006 (0.0041) model time 0.2577 (0.2594) loss 4.8739 (5.8183) grad_norm 3.2967 (2.3649) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][160/625] eta 0:02:03 lr 0.000680 wd 0.0500 time 0.2541 (0.2662) data time 0.0009 (0.0039) model time 0.2532 (0.2590) loss 6.9531 (5.8290) grad_norm 1.7368 (2.3694) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][170/625] eta 0:02:00 lr 0.000680 wd 0.0500 time 0.2561 (0.2656) data time 0.0009 (0.0037) model time 0.2552 (0.2588) loss 5.9817 (5.8176) grad_norm 2.9074 (2.3940) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][180/625] eta 0:01:58 lr 0.000680 wd 0.0500 time 0.4745 (0.2663) data time 0.0008 (0.0035) model time 0.4737 (0.2602) loss 5.4124 (5.8118) grad_norm 2.5465 (2.3877) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][190/625] eta 0:01:55 lr 0.000680 wd 0.0500 time 0.2582 (0.2658) data time 0.0006 (0.0034) model time 0.2576 (0.2599) loss 7.1294 (5.8287) grad_norm 2.1437 (2.3684) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][200/625] eta 0:01:53 lr 0.000680 wd 0.0500 time 0.3845 (0.2670) data time 0.0008 (0.0033) model time 0.3837 (0.2618) loss 5.9579 (5.8546) grad_norm 2.5105 (2.3590) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][210/625] eta 0:01:50 lr 0.000680 wd 0.0500 time 0.2514 (0.2666) data time 0.0007 (0.0032) model time 0.2507 (0.2615) loss 6.7741 (5.8557) grad_norm 1.6426 (2.3356) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][220/625] eta 0:01:47 lr 0.000679 wd 0.0500 time 0.2574 (0.2661) data time 0.0010 (0.0031) model time 0.2564 (0.2611) loss 5.9035 (5.8539) grad_norm 1.9162 (2.3377) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][230/625] eta 0:01:44 lr 0.000679 wd 0.0500 time 0.2534 (0.2657) data time 0.0010 (0.0030) model time 0.2524 (0.2608) loss 6.7545 (5.8668) grad_norm 1.5128 (2.3503) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][240/625] eta 0:01:42 lr 0.000679 wd 0.0500 time 0.4321 (0.2661) data time 0.0010 (0.0029) model time 0.4311 (0.2615) loss 6.8517 (5.8616) grad_norm 1.8441 (2.3452) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][250/625] eta 0:01:39 lr 0.000679 wd 0.0500 time 0.2562 (0.2665) data time 0.0008 (0.0028) model time 0.2554 (0.2622) loss 6.3837 (5.8669) grad_norm 2.3394 (2.3504) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][260/625] eta 0:01:37 lr 0.000679 wd 0.0500 time 0.2586 (0.2662) data time 0.0010 (0.0027) model time 0.2576 (0.2620) loss 5.5705 (5.8629) grad_norm 2.2634 (2.3294) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][270/625] eta 0:01:34 lr 0.000678 wd 0.0500 time 0.2579 (0.2658) data time 0.0008 (0.0027) model time 0.2571 (0.2617) loss 6.7580 (5.8614) grad_norm 2.5854 (2.3757) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][280/625] eta 0:01:31 lr 0.000678 wd 0.0500 time 0.2589 (0.2656) data time 0.0008 (0.0026) model time 0.2580 (0.2615) loss 5.9852 (5.8571) grad_norm 2.1358 (2.3818) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][290/625] eta 0:01:28 lr 0.000678 wd 0.0500 time 0.2534 (0.2652) data time 0.0010 (0.0026) model time 0.2524 (0.2612) loss 6.5409 (5.8622) grad_norm 2.0541 (2.3737) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][300/625] eta 0:01:26 lr 0.000678 wd 0.0500 time 0.2568 (0.2649) data time 0.0011 (0.0025) model time 0.2557 (0.2610) loss 6.0009 (5.8782) grad_norm 2.0495 (2.3870) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][310/625] eta 0:01:23 lr 0.000678 wd 0.0500 time 0.2525 (0.2646) data time 0.0007 (0.0025) model time 0.2518 (0.2607) loss 6.6348 (5.8988) grad_norm 2.1701 (2.3899) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][320/625] eta 0:01:20 lr 0.000678 wd 0.0500 time 0.2561 (0.2643) data time 0.0006 (0.0024) model time 0.2555 (0.2605) loss 6.7799 (5.9038) grad_norm 2.7530 (2.3861) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][330/625] eta 0:01:18 lr 0.000677 wd 0.0500 time 0.4589 (0.2647) data time 0.0008 (0.0024) model time 0.4581 (0.2610) loss 6.3990 (5.8916) grad_norm 1.5293 (2.3765) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][340/625] eta 0:01:15 lr 0.000677 wd 0.0500 time 0.2512 (0.2650) data time 0.0009 (0.0023) model time 0.2503 (0.2615) loss 5.0000 (5.9044) grad_norm 4.1205 (2.3933) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][350/625] eta 0:01:12 lr 0.000677 wd 0.0500 time 0.2553 (0.2653) data time 0.0010 (0.0023) model time 0.2544 (0.2620) loss 4.1665 (5.8957) grad_norm 2.2096 (2.3910) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][360/625] eta 0:01:10 lr 0.000677 wd 0.0500 time 0.2591 (0.2651) data time 0.0008 (0.0022) model time 0.2583 (0.2617) loss 6.3678 (5.8989) grad_norm 1.8280 (2.3815) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][370/625] eta 0:01:07 lr 0.000677 wd 0.0500 time 0.2608 (0.2648) data time 0.0010 (0.0022) model time 0.2598 (0.2615) loss 6.5106 (5.8956) grad_norm 1.9192 (2.3903) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][380/625] eta 0:01:04 lr 0.000677 wd 0.0500 time 0.2559 (0.2646) data time 0.0007 (0.0022) model time 0.2553 (0.2614) loss 7.1918 (5.8903) grad_norm 1.7689 (2.3768) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][390/625] eta 0:01:02 lr 0.000676 wd 0.0500 time 0.2559 (0.2644) data time 0.0009 (0.0021) model time 0.2550 (0.2612) loss 5.4086 (5.8909) grad_norm 1.8358 (2.3686) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][400/625] eta 0:00:59 lr 0.000676 wd 0.0500 time 0.2556 (0.2652) data time 0.0006 (0.0021) model time 0.2549 (0.2622) loss 7.2728 (5.8996) grad_norm 2.1556 (2.3641) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][410/625] eta 0:00:57 lr 0.000676 wd 0.0500 time 0.4456 (0.2655) data time 0.0011 (0.0021) model time 0.4445 (0.2625) loss 6.4831 (5.8985) grad_norm 2.0215 (2.3488) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][420/625] eta 0:00:54 lr 0.000676 wd 0.0500 time 0.2554 (0.2658) data time 0.0006 (0.0021) model time 0.2548 (0.2629) loss 4.3468 (5.8996) grad_norm 1.5795 (2.3494) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][430/625] eta 0:00:51 lr 0.000676 wd 0.0500 time 0.2546 (0.2656) data time 0.0006 (0.0020) model time 0.2540 (0.2627) loss 6.0144 (5.8949) grad_norm 1.6806 (2.3448) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][440/625] eta 0:00:49 lr 0.000676 wd 0.0500 time 0.2529 (0.2653) data time 0.0008 (0.0020) model time 0.2521 (0.2625) loss 6.1337 (5.8968) grad_norm 1.4230 (2.3433) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][450/625] eta 0:00:46 lr 0.000675 wd 0.0500 time 0.2537 (0.2651) data time 0.0009 (0.0020) model time 0.2528 (0.2623) loss 6.4802 (5.8980) grad_norm 1.9147 (2.3427) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][460/625] eta 0:00:43 lr 0.000675 wd 0.0500 time 0.2517 (0.2654) data time 0.0007 (0.0020) model time 0.2509 (0.2627) loss 6.6575 (5.9001) grad_norm 1.9009 (2.3475) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][470/625] eta 0:00:41 lr 0.000675 wd 0.0500 time 0.2546 (0.2652) data time 0.0010 (0.0019) model time 0.2535 (0.2625) loss 6.3989 (5.9042) grad_norm 2.3059 (2.3520) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][480/625] eta 0:00:38 lr 0.000675 wd 0.0500 time 0.2589 (0.2650) data time 0.0009 (0.0019) model time 0.2581 (0.2624) loss 5.6416 (5.9077) grad_norm 1.7668 (2.3429) loss_scale 2048.0000 (1028.2578) mem 9655MB [2024-08-04 06:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][490/625] eta 0:00:35 lr 0.000675 wd 0.0500 time 0.2572 (0.2649) data time 0.0008 (0.0019) model time 0.2564 (0.2623) loss 6.0372 (5.9104) grad_norm 1.7527 (2.3368) loss_scale 2048.0000 (1049.0265) mem 9655MB [2024-08-04 06:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][500/625] eta 0:00:33 lr 0.000675 wd 0.0500 time 0.2650 (0.2648) data time 0.0009 (0.0019) model time 0.2641 (0.2622) loss 5.0387 (5.9095) grad_norm 1.3814 (2.3303) loss_scale 2048.0000 (1068.9661) mem 9655MB [2024-08-04 06:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][510/625] eta 0:00:30 lr 0.000674 wd 0.0500 time 0.2521 (0.2646) data time 0.0010 (0.0019) model time 0.2511 (0.2620) loss 5.0718 (5.9029) grad_norm 2.4129 (2.3279) loss_scale 2048.0000 (1088.1252) mem 9655MB [2024-08-04 06:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][520/625] eta 0:00:27 lr 0.000674 wd 0.0500 time 0.2541 (0.2644) data time 0.0010 (0.0018) model time 0.2531 (0.2618) loss 6.1807 (5.8998) grad_norm 2.4650 (2.3261) loss_scale 2048.0000 (1106.5489) mem 9655MB [2024-08-04 06:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][530/625] eta 0:00:25 lr 0.000674 wd 0.0500 time 0.2575 (0.2643) data time 0.0006 (0.0018) model time 0.2569 (0.2617) loss 7.2362 (5.9053) grad_norm 2.3744 (2.3266) loss_scale 2048.0000 (1124.2787) mem 9655MB [2024-08-04 06:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][540/625] eta 0:00:22 lr 0.000674 wd 0.0500 time 0.2659 (0.2641) data time 0.0008 (0.0018) model time 0.2651 (0.2616) loss 6.4775 (5.9081) grad_norm 2.7734 (2.3224) loss_scale 2048.0000 (1141.3530) mem 9655MB [2024-08-04 06:31:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][550/625] eta 0:00:19 lr 0.000674 wd 0.0500 time 0.2564 (0.2640) data time 0.0006 (0.0018) model time 0.2558 (0.2615) loss 6.2284 (5.9079) grad_norm 1.6084 (2.3142) loss_scale 2048.0000 (1157.8076) mem 9655MB [2024-08-04 06:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][560/625] eta 0:00:17 lr 0.000674 wd 0.0500 time 0.2551 (0.2639) data time 0.0010 (0.0018) model time 0.2541 (0.2614) loss 6.2291 (5.9104) grad_norm 1.6374 (2.3071) loss_scale 2048.0000 (1173.6756) mem 9655MB [2024-08-04 06:31:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][570/625] eta 0:00:14 lr 0.000673 wd 0.0500 time 0.2570 (0.2637) data time 0.0009 (0.0018) model time 0.2561 (0.2613) loss 6.9989 (5.9079) grad_norm 3.0343 (2.3040) loss_scale 2048.0000 (1188.9877) mem 9655MB [2024-08-04 06:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][580/625] eta 0:00:11 lr 0.000673 wd 0.0500 time 0.2628 (0.2636) data time 0.0007 (0.0017) model time 0.2621 (0.2612) loss 7.6116 (5.9098) grad_norm 2.8201 (2.3074) loss_scale 2048.0000 (1203.7728) mem 9655MB [2024-08-04 06:31:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][590/625] eta 0:00:09 lr 0.000673 wd 0.0500 time 0.2571 (0.2635) data time 0.0010 (0.0017) model time 0.2561 (0.2611) loss 6.0757 (5.9054) grad_norm 2.0185 (2.3030) loss_scale 2048.0000 (1218.0575) mem 9655MB [2024-08-04 06:31:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][600/625] eta 0:00:06 lr 0.000673 wd 0.0500 time 0.2564 (0.2634) data time 0.0008 (0.0017) model time 0.2556 (0.2610) loss 6.0764 (5.8967) grad_norm 2.7226 (2.3084) loss_scale 2048.0000 (1231.8669) mem 9655MB [2024-08-04 06:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][610/625] eta 0:00:03 lr 0.000673 wd 0.0500 time 0.2514 (0.2633) data time 0.0004 (0.0017) model time 0.2509 (0.2609) loss 4.5915 (5.9017) grad_norm 3.6222 (2.3140) loss_scale 2048.0000 (1245.2242) mem 9655MB [2024-08-04 06:31:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][620/625] eta 0:00:01 lr 0.000673 wd 0.0500 time 0.2540 (0.2631) data time 0.0006 (0.0017) model time 0.2534 (0.2607) loss 6.4699 (5.9061) grad_norm 1.2620 (2.3117) loss_scale 2048.0000 (1258.1514) mem 9655MB [2024-08-04 06:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 190 training takes 0:02:44 [2024-08-04 06:31:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:31:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.6157 (0.6157) Acc@1 88.965 (88.965) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.092) Loss 0.9775 (0.7611) Acc@1 79.541 (85.276) Acc@5 95.850 (97.483) Mem 9655MB [2024-08-04 06:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.0791 (0.8968) Acc@1 77.246 (81.699) Acc@5 93.506 (95.964) Mem 9655MB [2024-08-04 06:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.434 Acc@5 95.941 [2024-08-04 06:31:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-04 06:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.803 (0.803) Loss 0.5820 (0.5820) Acc@1 89.746 (89.746) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.129) Loss 0.9229 (0.7207) Acc@1 80.371 (86.062) Acc@5 95.605 (97.536) Mem 9655MB [2024-08-04 06:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0518 (0.8472) Acc@1 76.123 (82.613) Acc@5 94.678 (96.219) Mem 9655MB [2024-08-04 06:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.308 Acc@5 96.207 [2024-08-04 06:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.31% [2024-08-04 06:31:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:31:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][0/625] eta 0:07:24 lr 0.000673 wd 0.0500 time 0.7113 (0.7113) data time 0.4716 (0.4716) model time 0.0000 (0.0000) loss 4.9171 (4.9171) grad_norm 1.9691 (1.9691) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][10/625] eta 0:03:09 lr 0.000672 wd 0.0500 time 0.3769 (0.3086) data time 0.0007 (0.0437) model time 0.0000 (0.0000) loss 5.9226 (5.9727) grad_norm 1.6674 (1.9576) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][20/625] eta 0:02:55 lr 0.000672 wd 0.0500 time 0.2569 (0.2895) data time 0.0007 (0.0234) model time 0.0000 (0.0000) loss 5.9540 (5.9483) grad_norm 1.4107 (2.0706) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][30/625] eta 0:02:45 lr 0.000672 wd 0.0500 time 0.2532 (0.2787) data time 0.0007 (0.0162) model time 0.0000 (0.0000) loss 5.3813 (5.8697) grad_norm 2.1446 (2.0321) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][40/625] eta 0:02:39 lr 0.000672 wd 0.0500 time 0.2587 (0.2735) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 5.2068 (5.8110) grad_norm 1.8196 (2.0360) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][50/625] eta 0:02:35 lr 0.000672 wd 0.0500 time 0.2514 (0.2699) data time 0.0011 (0.0102) model time 0.0000 (0.0000) loss 5.2178 (5.8821) grad_norm 2.5647 (1.9907) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][60/625] eta 0:02:31 lr 0.000672 wd 0.0500 time 0.2624 (0.2677) data time 0.0013 (0.0087) model time 0.2611 (0.2557) loss 5.0902 (5.8823) grad_norm 2.1266 (2.0221) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][70/625] eta 0:02:27 lr 0.000671 wd 0.0500 time 0.2560 (0.2661) data time 0.0008 (0.0076) model time 0.2552 (0.2554) loss 6.0440 (5.9300) grad_norm 2.5908 (2.0185) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][80/625] eta 0:02:25 lr 0.000671 wd 0.0500 time 0.2579 (0.2672) data time 0.0009 (0.0068) model time 0.2569 (0.2616) loss 5.7731 (5.9219) grad_norm 1.5649 (1.9604) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][90/625] eta 0:02:22 lr 0.000671 wd 0.0500 time 0.2534 (0.2659) data time 0.0009 (0.0061) model time 0.2526 (0.2598) loss 6.3533 (5.9173) grad_norm 1.4517 (1.9451) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][100/625] eta 0:02:19 lr 0.000671 wd 0.0500 time 0.2509 (0.2649) data time 0.0010 (0.0056) model time 0.2499 (0.2588) loss 6.1453 (5.9080) grad_norm 1.8404 (1.9311) loss_scale 2048.0000 (2048.0000) mem 9655MB [2024-08-04 06:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][110/625] eta 0:02:15 lr 0.000671 wd 0.0500 time 0.2541 (0.2640) data time 0.0008 (0.0052) model time 0.2533 (0.2579) loss 5.9890 (5.8970) grad_norm 1.6982 (inf) loss_scale 1024.0000 (2011.0991) mem 9655MB [2024-08-04 06:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][120/625] eta 0:02:12 lr 0.000671 wd 0.0500 time 0.2555 (0.2634) data time 0.0007 (0.0049) model time 0.2549 (0.2576) loss 5.8157 (5.9028) grad_norm 1.2233 (inf) loss_scale 1024.0000 (1929.5207) mem 9655MB [2024-08-04 06:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][130/625] eta 0:02:10 lr 0.000670 wd 0.0500 time 0.2621 (0.2628) data time 0.0008 (0.0046) model time 0.2613 (0.2574) loss 6.9001 (5.8947) grad_norm 5.5930 (inf) loss_scale 1024.0000 (1860.3969) mem 9655MB [2024-08-04 06:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][140/625] eta 0:02:07 lr 0.000670 wd 0.0500 time 0.2593 (0.2639) data time 0.0008 (0.0043) model time 0.2585 (0.2595) loss 4.4873 (5.8762) grad_norm 1.7921 (inf) loss_scale 1024.0000 (1801.0780) mem 9655MB [2024-08-04 06:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][150/625] eta 0:02:05 lr 0.000670 wd 0.0500 time 0.2543 (0.2633) data time 0.0008 (0.0041) model time 0.2535 (0.2590) loss 4.8846 (5.8723) grad_norm 1.3966 (inf) loss_scale 1024.0000 (1749.6159) mem 9655MB [2024-08-04 06:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][160/625] eta 0:02:02 lr 0.000670 wd 0.0500 time 0.2595 (0.2630) data time 0.0006 (0.0039) model time 0.2589 (0.2588) loss 6.2185 (5.8878) grad_norm 2.0609 (inf) loss_scale 1024.0000 (1704.5466) mem 9655MB [2024-08-04 06:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][170/625] eta 0:01:59 lr 0.000670 wd 0.0500 time 0.2577 (0.2626) data time 0.0008 (0.0037) model time 0.2569 (0.2586) loss 5.8860 (5.9150) grad_norm 1.8562 (inf) loss_scale 1024.0000 (1664.7485) mem 9655MB [2024-08-04 06:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][180/625] eta 0:01:56 lr 0.000670 wd 0.0500 time 0.2603 (0.2623) data time 0.0006 (0.0035) model time 0.2597 (0.2584) loss 5.3005 (5.9178) grad_norm 1.2504 (inf) loss_scale 1024.0000 (1629.3481) mem 9655MB [2024-08-04 06:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][190/625] eta 0:01:54 lr 0.000669 wd 0.0500 time 0.2569 (0.2630) data time 0.0011 (0.0034) model time 0.2558 (0.2596) loss 6.6010 (5.9155) grad_norm 1.5004 (inf) loss_scale 1024.0000 (1597.6545) mem 9655MB [2024-08-04 06:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][200/625] eta 0:01:51 lr 0.000669 wd 0.0500 time 0.2570 (0.2629) data time 0.0008 (0.0033) model time 0.2562 (0.2596) loss 6.8251 (5.9121) grad_norm 2.0398 (inf) loss_scale 1024.0000 (1569.1144) mem 9655MB [2024-08-04 06:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][210/625] eta 0:01:49 lr 0.000669 wd 0.0500 time 0.2597 (0.2636) data time 0.0006 (0.0032) model time 0.2591 (0.2606) loss 5.7676 (5.9044) grad_norm 1.6277 (inf) loss_scale 1024.0000 (1543.2796) mem 9655MB [2024-08-04 06:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][220/625] eta 0:01:46 lr 0.000669 wd 0.0500 time 0.2591 (0.2633) data time 0.0008 (0.0031) model time 0.2583 (0.2603) loss 6.1622 (5.8992) grad_norm 1.6436 (inf) loss_scale 1024.0000 (1519.7828) mem 9655MB [2024-08-04 06:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][230/625] eta 0:01:43 lr 0.000669 wd 0.0500 time 0.2582 (0.2629) data time 0.0008 (0.0030) model time 0.2574 (0.2600) loss 6.6364 (5.8874) grad_norm 2.4594 (inf) loss_scale 1024.0000 (1498.3203) mem 9655MB [2024-08-04 06:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][240/625] eta 0:01:41 lr 0.000669 wd 0.0500 time 0.2575 (0.2626) data time 0.0008 (0.0029) model time 0.2566 (0.2598) loss 6.3372 (5.9028) grad_norm 1.6828 (inf) loss_scale 1024.0000 (1478.6390) mem 9655MB [2024-08-04 06:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][250/625] eta 0:01:38 lr 0.000668 wd 0.0500 time 0.2529 (0.2623) data time 0.0008 (0.0028) model time 0.2521 (0.2595) loss 6.2217 (5.9026) grad_norm 2.0368 (inf) loss_scale 1024.0000 (1460.5259) mem 9655MB [2024-08-04 06:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][260/625] eta 0:01:35 lr 0.000668 wd 0.0500 time 0.2569 (0.2621) data time 0.0009 (0.0027) model time 0.2560 (0.2593) loss 6.3871 (5.9009) grad_norm 2.0616 (inf) loss_scale 1024.0000 (1443.8008) mem 9655MB [2024-08-04 06:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][270/625] eta 0:01:33 lr 0.000668 wd 0.0500 time 0.2552 (0.2627) data time 0.0007 (0.0027) model time 0.2546 (0.2601) loss 6.3731 (5.8977) grad_norm 1.7452 (inf) loss_scale 1024.0000 (1428.3100) mem 9655MB [2024-08-04 06:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][280/625] eta 0:01:30 lr 0.000668 wd 0.0500 time 0.2548 (0.2624) data time 0.0009 (0.0026) model time 0.2539 (0.2598) loss 6.1779 (5.9024) grad_norm 1.6366 (inf) loss_scale 1024.0000 (1413.9217) mem 9655MB [2024-08-04 06:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][290/625] eta 0:01:27 lr 0.000668 wd 0.0500 time 0.2583 (0.2622) data time 0.0008 (0.0025) model time 0.2575 (0.2596) loss 5.2442 (5.8989) grad_norm 3.1432 (inf) loss_scale 1024.0000 (1400.5223) mem 9655MB [2024-08-04 06:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][300/625] eta 0:01:25 lr 0.000668 wd 0.0500 time 0.2607 (0.2619) data time 0.0007 (0.0025) model time 0.2600 (0.2594) loss 4.9777 (5.9086) grad_norm 2.4947 (inf) loss_scale 1024.0000 (1388.0133) mem 9655MB [2024-08-04 06:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][310/625] eta 0:01:22 lr 0.000667 wd 0.0500 time 0.2559 (0.2627) data time 0.0006 (0.0024) model time 0.2553 (0.2603) loss 5.8631 (5.9043) grad_norm 1.8455 (inf) loss_scale 1024.0000 (1376.3087) mem 9655MB [2024-08-04 06:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][320/625] eta 0:01:20 lr 0.000667 wd 0.0500 time 0.2536 (0.2624) data time 0.0011 (0.0024) model time 0.2525 (0.2601) loss 6.1215 (5.9069) grad_norm 1.8555 (inf) loss_scale 1024.0000 (1365.3333) mem 9655MB [2024-08-04 06:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][330/625] eta 0:01:17 lr 0.000667 wd 0.0500 time 0.2522 (0.2622) data time 0.0007 (0.0024) model time 0.2515 (0.2599) loss 4.7726 (5.9070) grad_norm 3.0815 (inf) loss_scale 1024.0000 (1355.0211) mem 9655MB [2024-08-04 06:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][340/625] eta 0:01:14 lr 0.000667 wd 0.0500 time 0.2571 (0.2627) data time 0.0007 (0.0023) model time 0.2564 (0.2605) loss 6.2644 (5.9119) grad_norm 2.9867 (inf) loss_scale 1024.0000 (1345.3138) mem 9655MB [2024-08-04 06:33:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][350/625] eta 0:01:12 lr 0.000667 wd 0.0500 time 0.2592 (0.2630) data time 0.0005 (0.0023) model time 0.2586 (0.2609) loss 5.7077 (5.9116) grad_norm 2.8223 (inf) loss_scale 1024.0000 (1336.1595) mem 9655MB [2024-08-04 06:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][360/625] eta 0:01:09 lr 0.000667 wd 0.0500 time 0.2547 (0.2628) data time 0.0012 (0.0022) model time 0.2535 (0.2608) loss 5.9168 (5.9066) grad_norm 1.7952 (inf) loss_scale 1024.0000 (1327.5125) mem 9655MB [2024-08-04 06:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][370/625] eta 0:01:07 lr 0.000666 wd 0.0500 time 0.2561 (0.2632) data time 0.0011 (0.0022) model time 0.2550 (0.2612) loss 5.8234 (5.9147) grad_norm 1.9959 (inf) loss_scale 1024.0000 (1319.3315) mem 9655MB [2024-08-04 06:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][380/625] eta 0:01:04 lr 0.000666 wd 0.0500 time 0.2532 (0.2634) data time 0.0007 (0.0022) model time 0.2524 (0.2615) loss 6.6626 (5.9191) grad_norm 3.8980 (inf) loss_scale 1024.0000 (1311.5801) mem 9655MB [2024-08-04 06:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][390/625] eta 0:01:01 lr 0.000666 wd 0.0500 time 0.2563 (0.2632) data time 0.0009 (0.0021) model time 0.2554 (0.2613) loss 7.0384 (5.9245) grad_norm 1.7830 (inf) loss_scale 1024.0000 (1304.2251) mem 9655MB [2024-08-04 06:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][400/625] eta 0:00:59 lr 0.000666 wd 0.0500 time 0.2580 (0.2639) data time 0.0007 (0.0021) model time 0.2573 (0.2621) loss 6.2593 (5.9284) grad_norm 1.6302 (inf) loss_scale 1024.0000 (1297.2369) mem 9655MB [2024-08-04 06:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][410/625] eta 0:00:56 lr 0.000666 wd 0.0500 time 0.2582 (0.2637) data time 0.0008 (0.0021) model time 0.2573 (0.2619) loss 6.4346 (5.9343) grad_norm 2.3133 (inf) loss_scale 1024.0000 (1290.5888) mem 9655MB [2024-08-04 06:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][420/625] eta 0:00:54 lr 0.000666 wd 0.0500 time 0.2541 (0.2635) data time 0.0008 (0.0020) model time 0.2532 (0.2617) loss 6.7645 (5.9346) grad_norm 1.7740 (inf) loss_scale 1024.0000 (1284.2565) mem 9655MB [2024-08-04 06:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][430/625] eta 0:00:51 lr 0.000665 wd 0.0500 time 0.2566 (0.2633) data time 0.0011 (0.0020) model time 0.2555 (0.2615) loss 6.4200 (5.9305) grad_norm 1.8523 (inf) loss_scale 1024.0000 (1278.2181) mem 9655MB [2024-08-04 06:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][440/625] eta 0:00:48 lr 0.000665 wd 0.0500 time 0.2569 (0.2632) data time 0.0009 (0.0020) model time 0.2561 (0.2613) loss 6.0829 (5.9254) grad_norm 1.2858 (inf) loss_scale 1024.0000 (1272.4535) mem 9655MB [2024-08-04 06:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][450/625] eta 0:00:46 lr 0.000665 wd 0.0500 time 0.2580 (0.2634) data time 0.0008 (0.0020) model time 0.2572 (0.2616) loss 6.4839 (5.9260) grad_norm 2.1214 (inf) loss_scale 1024.0000 (1266.9446) mem 9655MB [2024-08-04 06:33:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][460/625] eta 0:00:43 lr 0.000665 wd 0.0500 time 0.2595 (0.2635) data time 0.0005 (0.0019) model time 0.2590 (0.2618) loss 4.7129 (5.9169) grad_norm 2.9323 (inf) loss_scale 1024.0000 (1261.6746) mem 9655MB [2024-08-04 06:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][470/625] eta 0:00:40 lr 0.000665 wd 0.0500 time 0.2554 (0.2634) data time 0.0010 (0.0019) model time 0.2543 (0.2616) loss 5.6543 (5.9237) grad_norm 6.1634 (inf) loss_scale 1024.0000 (1256.6285) mem 9655MB [2024-08-04 06:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][480/625] eta 0:00:38 lr 0.000665 wd 0.0500 time 0.2581 (0.2632) data time 0.0006 (0.0019) model time 0.2575 (0.2615) loss 6.1345 (5.9238) grad_norm 1.6545 (inf) loss_scale 1024.0000 (1251.7921) mem 9655MB [2024-08-04 06:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][490/625] eta 0:00:35 lr 0.000664 wd 0.0500 time 0.2550 (0.2631) data time 0.0006 (0.0019) model time 0.2544 (0.2614) loss 6.5912 (5.9301) grad_norm 1.4168 (inf) loss_scale 1024.0000 (1247.1527) mem 9655MB [2024-08-04 06:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][500/625] eta 0:00:32 lr 0.000664 wd 0.0500 time 0.2562 (0.2629) data time 0.0006 (0.0019) model time 0.2556 (0.2612) loss 5.0893 (5.9272) grad_norm 2.1046 (inf) loss_scale 1024.0000 (1242.6986) mem 9655MB [2024-08-04 06:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][510/625] eta 0:00:30 lr 0.000664 wd 0.0500 time 0.2511 (0.2628) data time 0.0010 (0.0018) model time 0.2501 (0.2611) loss 7.5515 (5.9269) grad_norm 2.6224 (inf) loss_scale 1024.0000 (1238.4188) mem 9655MB [2024-08-04 06:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][520/625] eta 0:00:27 lr 0.000664 wd 0.0500 time 0.2569 (0.2627) data time 0.0007 (0.0018) model time 0.2562 (0.2610) loss 5.9076 (5.9226) grad_norm 1.8293 (inf) loss_scale 1024.0000 (1234.3033) mem 9655MB [2024-08-04 06:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][530/625] eta 0:00:24 lr 0.000664 wd 0.0500 time 0.2547 (0.2625) data time 0.0009 (0.0018) model time 0.2539 (0.2608) loss 4.7046 (5.9207) grad_norm 2.6112 (inf) loss_scale 1024.0000 (1230.3427) mem 9655MB [2024-08-04 06:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][540/625] eta 0:00:22 lr 0.000664 wd 0.0500 time 0.2656 (0.2625) data time 0.0008 (0.0018) model time 0.2648 (0.2608) loss 5.9931 (5.9177) grad_norm 4.0949 (inf) loss_scale 1024.0000 (1226.5287) mem 9655MB [2024-08-04 06:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][550/625] eta 0:00:19 lr 0.000663 wd 0.0500 time 0.2569 (0.2624) data time 0.0009 (0.0018) model time 0.2560 (0.2607) loss 6.5518 (5.9218) grad_norm 1.8676 (inf) loss_scale 1024.0000 (1222.8530) mem 9655MB [2024-08-04 06:34:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][560/625] eta 0:00:17 lr 0.000663 wd 0.0500 time 0.2606 (0.2623) data time 0.0008 (0.0018) model time 0.2598 (0.2606) loss 6.8865 (5.9213) grad_norm 3.3945 (inf) loss_scale 1024.0000 (1219.3084) mem 9655MB [2024-08-04 06:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][570/625] eta 0:00:14 lr 0.000663 wd 0.0500 time 0.2579 (0.2621) data time 0.0006 (0.0017) model time 0.2573 (0.2605) loss 5.0703 (5.9182) grad_norm 1.9464 (inf) loss_scale 1024.0000 (1215.8879) mem 9655MB [2024-08-04 06:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][580/625] eta 0:00:11 lr 0.000663 wd 0.0500 time 0.2544 (0.2620) data time 0.0007 (0.0017) model time 0.2538 (0.2604) loss 6.2430 (5.9215) grad_norm 1.3603 (inf) loss_scale 1024.0000 (1212.5852) mem 9655MB [2024-08-04 06:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][590/625] eta 0:00:09 lr 0.000663 wd 0.0500 time 0.2530 (0.2619) data time 0.0008 (0.0017) model time 0.2522 (0.2603) loss 5.7140 (5.9251) grad_norm 2.0298 (inf) loss_scale 1024.0000 (1209.3942) mem 9655MB [2024-08-04 06:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][600/625] eta 0:00:06 lr 0.000663 wd 0.0500 time 0.2531 (0.2619) data time 0.0009 (0.0017) model time 0.2522 (0.2602) loss 5.8227 (5.9213) grad_norm 2.2199 (inf) loss_scale 1024.0000 (1206.3095) mem 9655MB [2024-08-04 06:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][610/625] eta 0:00:03 lr 0.000662 wd 0.0500 time 0.2529 (0.2618) data time 0.0003 (0.0017) model time 0.2526 (0.2601) loss 4.8509 (5.9162) grad_norm 2.0196 (inf) loss_scale 1024.0000 (1203.3257) mem 9655MB [2024-08-04 06:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][620/625] eta 0:00:01 lr 0.000662 wd 0.0500 time 0.2542 (0.2616) data time 0.0005 (0.0017) model time 0.2537 (0.2600) loss 5.7420 (5.9131) grad_norm 2.0967 (inf) loss_scale 1024.0000 (1200.4380) mem 9655MB [2024-08-04 06:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 191 training takes 0:02:43 [2024-08-04 06:34:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:34:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.6157 (0.6157) Acc@1 89.600 (89.600) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 06:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 1.0078 (0.7654) Acc@1 79.199 (85.467) Acc@5 95.215 (97.381) Mem 9655MB [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0908 (0.8995) Acc@1 76.123 (81.955) Acc@5 94.336 (95.982) Mem 9655MB [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.732 Acc@5 95.993 [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.73% [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:34:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5820 (0.5820) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9238 (0.7204) Acc@1 80.371 (86.111) Acc@5 95.654 (97.545) Mem 9655MB [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0518 (0.8469) Acc@1 75.977 (82.640) Acc@5 94.727 (96.222) Mem 9655MB [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.342 Acc@5 96.209 [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.34% [2024-08-04 06:34:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:34:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][0/625] eta 0:07:14 lr 0.000662 wd 0.0500 time 0.6949 (0.6949) data time 0.4549 (0.4549) model time 0.0000 (0.0000) loss 6.3858 (6.3858) grad_norm 1.9641 (1.9641) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][10/625] eta 0:03:13 lr 0.000662 wd 0.0500 time 0.2564 (0.3142) data time 0.0008 (0.0423) model time 0.0000 (0.0000) loss 5.1549 (6.1505) grad_norm 1.9767 (2.0069) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][20/625] eta 0:02:58 lr 0.000662 wd 0.0500 time 0.2563 (0.2944) data time 0.0008 (0.0228) model time 0.0000 (0.0000) loss 6.6092 (6.1649) grad_norm 1.4302 (2.0801) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][30/625] eta 0:02:51 lr 0.000662 wd 0.0500 time 0.2574 (0.2882) data time 0.0013 (0.0157) model time 0.0000 (0.0000) loss 6.0527 (6.0983) grad_norm 1.6213 (2.1053) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][40/625] eta 0:02:44 lr 0.000661 wd 0.0500 time 0.2589 (0.2804) data time 0.0011 (0.0121) model time 0.0000 (0.0000) loss 6.2856 (6.0168) grad_norm 2.4184 (2.1222) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][50/625] eta 0:02:38 lr 0.000661 wd 0.0500 time 0.2574 (0.2757) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 4.8206 (6.0153) grad_norm 1.9329 (2.0878) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][60/625] eta 0:02:33 lr 0.000661 wd 0.0500 time 0.2620 (0.2725) data time 0.0008 (0.0084) model time 0.2612 (0.2553) loss 4.9130 (5.9821) grad_norm 2.8195 (2.0612) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][70/625] eta 0:02:30 lr 0.000661 wd 0.0500 time 0.2564 (0.2704) data time 0.0010 (0.0074) model time 0.2554 (0.2559) loss 6.0802 (6.0185) grad_norm 1.1453 (2.0297) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][80/625] eta 0:02:27 lr 0.000661 wd 0.0500 time 0.2566 (0.2713) data time 0.0008 (0.0066) model time 0.2558 (0.2626) loss 5.5438 (6.0183) grad_norm 1.2565 (2.0753) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][90/625] eta 0:02:24 lr 0.000661 wd 0.0500 time 0.2585 (0.2698) data time 0.0008 (0.0060) model time 0.2577 (0.2612) loss 5.4934 (6.0003) grad_norm 3.7351 (2.0688) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][100/625] eta 0:02:20 lr 0.000660 wd 0.0500 time 0.2581 (0.2684) data time 0.0007 (0.0055) model time 0.2574 (0.2600) loss 5.9581 (6.0162) grad_norm 1.4480 (2.0811) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][110/625] eta 0:02:18 lr 0.000660 wd 0.0500 time 0.2560 (0.2691) data time 0.0009 (0.0051) model time 0.2551 (0.2625) loss 5.4693 (6.0253) grad_norm 2.1955 (2.1099) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][120/625] eta 0:02:15 lr 0.000660 wd 0.0500 time 0.2536 (0.2679) data time 0.0009 (0.0047) model time 0.2527 (0.2612) loss 5.3628 (6.0156) grad_norm 3.7030 (2.2501) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][130/625] eta 0:02:12 lr 0.000660 wd 0.0500 time 0.2525 (0.2669) data time 0.0008 (0.0044) model time 0.2517 (0.2603) loss 6.9285 (6.0005) grad_norm 1.9935 (2.2525) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][140/625] eta 0:02:09 lr 0.000660 wd 0.0500 time 0.2554 (0.2660) data time 0.0009 (0.0042) model time 0.2545 (0.2596) loss 6.0043 (6.0004) grad_norm 1.4202 (2.2496) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][150/625] eta 0:02:06 lr 0.000660 wd 0.0500 time 0.2560 (0.2665) data time 0.0013 (0.0040) model time 0.2547 (0.2609) loss 5.4683 (6.0029) grad_norm 4.3975 (2.2989) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][160/625] eta 0:02:03 lr 0.000659 wd 0.0500 time 0.2581 (0.2658) data time 0.0010 (0.0038) model time 0.2571 (0.2603) loss 6.7156 (5.9811) grad_norm 2.3016 (2.3568) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][170/625] eta 0:02:00 lr 0.000659 wd 0.0500 time 0.2548 (0.2652) data time 0.0007 (0.0036) model time 0.2542 (0.2598) loss 5.2830 (5.9623) grad_norm 1.8139 (2.3574) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][180/625] eta 0:01:57 lr 0.000659 wd 0.0500 time 0.2579 (0.2647) data time 0.0007 (0.0035) model time 0.2572 (0.2594) loss 6.4047 (5.9769) grad_norm 2.2146 (2.3332) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][190/625] eta 0:01:54 lr 0.000659 wd 0.0500 time 0.2589 (0.2642) data time 0.0006 (0.0034) model time 0.2583 (0.2591) loss 4.8016 (5.9577) grad_norm 1.9195 (2.3012) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][200/625] eta 0:01:52 lr 0.000659 wd 0.0500 time 0.2586 (0.2638) data time 0.0010 (0.0032) model time 0.2576 (0.2588) loss 6.1496 (5.9515) grad_norm 1.8990 (2.2818) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][210/625] eta 0:01:49 lr 0.000659 wd 0.0500 time 0.2533 (0.2640) data time 0.0009 (0.0031) model time 0.2524 (0.2594) loss 6.5074 (5.9567) grad_norm 2.4154 (2.2761) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][220/625] eta 0:01:46 lr 0.000658 wd 0.0500 time 0.2595 (0.2637) data time 0.0008 (0.0030) model time 0.2587 (0.2591) loss 5.2893 (5.9408) grad_norm 2.4222 (2.2786) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][230/625] eta 0:01:44 lr 0.000658 wd 0.0500 time 0.2612 (0.2634) data time 0.0009 (0.0029) model time 0.2603 (0.2589) loss 4.7977 (5.9353) grad_norm 2.4936 (2.2927) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][240/625] eta 0:01:41 lr 0.000658 wd 0.0500 time 0.2567 (0.2631) data time 0.0008 (0.0029) model time 0.2559 (0.2587) loss 5.0448 (5.9331) grad_norm 2.6505 (2.2745) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][250/625] eta 0:01:38 lr 0.000658 wd 0.0500 time 0.2549 (0.2634) data time 0.0011 (0.0028) model time 0.2538 (0.2592) loss 5.8642 (5.9313) grad_norm 1.2297 (2.2477) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][260/625] eta 0:01:36 lr 0.000658 wd 0.0500 time 0.2531 (0.2631) data time 0.0008 (0.0027) model time 0.2522 (0.2591) loss 6.0582 (5.9272) grad_norm 1.4808 (2.2537) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][270/625] eta 0:01:33 lr 0.000658 wd 0.0500 time 0.2546 (0.2628) data time 0.0007 (0.0026) model time 0.2540 (0.2589) loss 5.8870 (5.9348) grad_norm 2.5877 (2.2591) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][280/625] eta 0:01:30 lr 0.000657 wd 0.0500 time 0.2531 (0.2626) data time 0.0011 (0.0026) model time 0.2520 (0.2588) loss 6.6008 (5.9187) grad_norm 2.2577 (2.2596) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][290/625] eta 0:01:27 lr 0.000657 wd 0.0500 time 0.2584 (0.2624) data time 0.0007 (0.0025) model time 0.2577 (0.2586) loss 5.8051 (5.9080) grad_norm 2.5674 (2.2543) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][300/625] eta 0:01:25 lr 0.000657 wd 0.0500 time 0.2550 (0.2622) data time 0.0006 (0.0025) model time 0.2544 (0.2584) loss 6.2940 (5.9086) grad_norm 2.6451 (2.2447) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][310/625] eta 0:01:22 lr 0.000657 wd 0.0500 time 0.2575 (0.2619) data time 0.0008 (0.0024) model time 0.2567 (0.2583) loss 6.2592 (5.9042) grad_norm 1.3139 (2.2370) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][320/625] eta 0:01:19 lr 0.000657 wd 0.0500 time 0.2565 (0.2618) data time 0.0009 (0.0024) model time 0.2556 (0.2582) loss 6.8299 (5.9110) grad_norm 1.5880 (2.2352) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][330/625] eta 0:01:17 lr 0.000657 wd 0.0500 time 0.4201 (0.2621) data time 0.0006 (0.0023) model time 0.4195 (0.2587) loss 4.8511 (5.9101) grad_norm 2.2211 (2.2327) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][340/625] eta 0:01:14 lr 0.000656 wd 0.0500 time 0.2526 (0.2625) data time 0.0010 (0.0023) model time 0.2517 (0.2592) loss 7.1223 (5.9145) grad_norm 1.5231 (2.2252) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][350/625] eta 0:01:12 lr 0.000656 wd 0.0500 time 0.2578 (0.2623) data time 0.0006 (0.0023) model time 0.2572 (0.2591) loss 5.5569 (5.9122) grad_norm 1.7961 (2.2192) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][360/625] eta 0:01:09 lr 0.000656 wd 0.0500 time 0.2545 (0.2622) data time 0.0008 (0.0022) model time 0.2537 (0.2590) loss 4.9530 (5.9105) grad_norm 2.7537 (2.2542) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][370/625] eta 0:01:06 lr 0.000656 wd 0.0500 time 0.2593 (0.2620) data time 0.0006 (0.0022) model time 0.2588 (0.2589) loss 5.0903 (5.9103) grad_norm 3.7541 (2.2639) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][380/625] eta 0:01:04 lr 0.000656 wd 0.0500 time 0.2570 (0.2619) data time 0.0011 (0.0022) model time 0.2559 (0.2588) loss 5.8085 (5.9061) grad_norm 1.9102 (2.2675) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][390/625] eta 0:01:01 lr 0.000656 wd 0.0500 time 0.2516 (0.2617) data time 0.0008 (0.0021) model time 0.2508 (0.2587) loss 5.7630 (5.9036) grad_norm 1.3482 (2.2562) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][400/625] eta 0:00:58 lr 0.000655 wd 0.0500 time 0.2565 (0.2616) data time 0.0008 (0.0021) model time 0.2557 (0.2586) loss 6.1493 (5.8996) grad_norm 2.7167 (2.2547) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][410/625] eta 0:00:56 lr 0.000655 wd 0.0500 time 0.4676 (0.2620) data time 0.0007 (0.0021) model time 0.4669 (0.2591) loss 6.4287 (5.9002) grad_norm 1.5726 (2.2477) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][420/625] eta 0:00:53 lr 0.000655 wd 0.0500 time 0.2552 (0.2619) data time 0.0010 (0.0020) model time 0.2542 (0.2590) loss 6.7405 (5.9035) grad_norm 1.5604 (2.2555) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][430/625] eta 0:00:51 lr 0.000655 wd 0.0500 time 0.2523 (0.2623) data time 0.0008 (0.0020) model time 0.2515 (0.2596) loss 6.7862 (5.9079) grad_norm 9.2992 (2.2594) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][440/625] eta 0:00:48 lr 0.000655 wd 0.0500 time 0.2602 (0.2622) data time 0.0007 (0.0020) model time 0.2595 (0.2595) loss 5.5084 (5.9158) grad_norm 2.9157 (2.2566) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][450/625] eta 0:00:45 lr 0.000655 wd 0.0500 time 0.2516 (0.2624) data time 0.0008 (0.0020) model time 0.2508 (0.2597) loss 5.9017 (5.9159) grad_norm 1.5035 (2.2619) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][460/625] eta 0:00:43 lr 0.000654 wd 0.0500 time 0.2553 (0.2622) data time 0.0010 (0.0019) model time 0.2543 (0.2596) loss 5.1558 (5.9094) grad_norm 2.6378 (2.2587) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][470/625] eta 0:00:40 lr 0.000654 wd 0.0500 time 0.2543 (0.2621) data time 0.0007 (0.0019) model time 0.2536 (0.2595) loss 6.8606 (5.9069) grad_norm 3.3500 (2.2667) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][480/625] eta 0:00:37 lr 0.000654 wd 0.0500 time 0.2639 (0.2620) data time 0.0010 (0.0019) model time 0.2629 (0.2594) loss 5.9184 (5.9058) grad_norm 3.0749 (2.2691) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][490/625] eta 0:00:35 lr 0.000654 wd 0.0500 time 0.2552 (0.2619) data time 0.0013 (0.0019) model time 0.2539 (0.2593) loss 5.6951 (5.9071) grad_norm 1.5719 (2.2625) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][500/625] eta 0:00:32 lr 0.000654 wd 0.0500 time 0.2546 (0.2617) data time 0.0009 (0.0019) model time 0.2537 (0.2592) loss 6.9320 (5.9068) grad_norm 2.0192 (2.2527) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][510/625] eta 0:00:30 lr 0.000654 wd 0.0500 time 0.2572 (0.2621) data time 0.0008 (0.0018) model time 0.2565 (0.2596) loss 5.8932 (5.9042) grad_norm 3.2409 (2.2499) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][520/625] eta 0:00:27 lr 0.000654 wd 0.0500 time 0.2587 (0.2623) data time 0.0007 (0.0018) model time 0.2580 (0.2599) loss 4.9742 (5.9084) grad_norm 1.5546 (2.2579) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][530/625] eta 0:00:24 lr 0.000653 wd 0.0500 time 0.2547 (0.2622) data time 0.0010 (0.0018) model time 0.2538 (0.2598) loss 4.9980 (5.8989) grad_norm 2.3503 (2.2675) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][540/625] eta 0:00:22 lr 0.000653 wd 0.0500 time 0.2515 (0.2620) data time 0.0009 (0.0018) model time 0.2506 (0.2597) loss 7.0299 (5.8956) grad_norm 2.5866 (2.2680) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][550/625] eta 0:00:19 lr 0.000653 wd 0.0500 time 0.2531 (0.2619) data time 0.0007 (0.0018) model time 0.2524 (0.2596) loss 6.2448 (5.8981) grad_norm 1.5323 (2.2653) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][560/625] eta 0:00:17 lr 0.000653 wd 0.0500 time 0.2564 (0.2618) data time 0.0011 (0.0018) model time 0.2553 (0.2595) loss 5.0938 (5.8992) grad_norm 1.3399 (2.2618) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][570/625] eta 0:00:14 lr 0.000653 wd 0.0500 time 0.2544 (0.2617) data time 0.0011 (0.0017) model time 0.2533 (0.2594) loss 5.7825 (5.8989) grad_norm 3.4830 (2.2680) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][580/625] eta 0:00:11 lr 0.000653 wd 0.0500 time 0.2564 (0.2620) data time 0.0010 (0.0017) model time 0.2554 (0.2597) loss 6.7859 (5.9033) grad_norm 1.8583 (2.2637) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][590/625] eta 0:00:09 lr 0.000652 wd 0.0500 time 0.2555 (0.2619) data time 0.0007 (0.0017) model time 0.2547 (0.2596) loss 5.4745 (5.8996) grad_norm 3.1045 (2.2597) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][600/625] eta 0:00:06 lr 0.000652 wd 0.0500 time 0.4476 (0.2621) data time 0.0007 (0.0017) model time 0.4469 (0.2599) loss 5.7352 (5.9004) grad_norm 1.4984 (2.2487) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][610/625] eta 0:00:03 lr 0.000652 wd 0.0500 time 0.2541 (0.2623) data time 0.0006 (0.0017) model time 0.2535 (0.2602) loss 5.2962 (5.8988) grad_norm 1.6627 (2.2508) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][620/625] eta 0:00:01 lr 0.000652 wd 0.0500 time 0.2527 (0.2622) data time 0.0004 (0.0017) model time 0.2523 (0.2600) loss 5.4363 (5.8932) grad_norm 2.0618 (2.2628) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 192 training takes 0:02:43 [2024-08-04 06:37:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:37:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.585 (0.585) Loss 0.6523 (0.6523) Acc@1 89.453 (89.453) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 06:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 1.0059 (0.7921) Acc@1 80.273 (85.569) Acc@5 95.654 (97.448) Mem 9655MB [2024-08-04 06:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.1172 (0.9266) Acc@1 75.879 (81.964) Acc@5 94.434 (96.005) Mem 9655MB [2024-08-04 06:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.748 Acc@5 96.009 [2024-08-04 06:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 06:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.75% [2024-08-04 06:37:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:37:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.489 (0.489) Loss 0.5820 (0.5820) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9233 (0.7199) Acc@1 80.420 (86.115) Acc@5 95.801 (97.572) Mem 9655MB [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0498 (0.8462) Acc@1 76.025 (82.650) Acc@5 94.629 (96.250) Mem 9655MB [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.350 Acc@5 96.233 [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.35% [2024-08-04 06:37:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:37:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][0/625] eta 0:06:48 lr 0.000652 wd 0.0500 time 0.6532 (0.6532) data time 0.4119 (0.4119) model time 0.0000 (0.0000) loss 4.9281 (4.9281) grad_norm 1.6446 (1.6446) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][10/625] eta 0:03:00 lr 0.000652 wd 0.0500 time 0.2527 (0.2936) data time 0.0008 (0.0382) model time 0.0000 (0.0000) loss 5.5584 (5.7041) grad_norm 2.4450 (2.8000) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][20/625] eta 0:02:52 lr 0.000651 wd 0.0500 time 0.4530 (0.2853) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 5.3751 (6.0162) grad_norm 1.7007 (2.7209) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][30/625] eta 0:02:47 lr 0.000651 wd 0.0500 time 0.2528 (0.2814) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 5.9130 (5.8904) grad_norm 2.0365 (2.5675) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][40/625] eta 0:02:41 lr 0.000651 wd 0.0500 time 0.2577 (0.2753) data time 0.0010 (0.0109) model time 0.0000 (0.0000) loss 5.5164 (5.9374) grad_norm 1.9578 (2.5668) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][50/625] eta 0:02:36 lr 0.000651 wd 0.0500 time 0.2549 (0.2716) data time 0.0010 (0.0090) model time 0.0000 (0.0000) loss 6.1645 (5.9741) grad_norm 2.1631 (2.5482) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][60/625] eta 0:02:31 lr 0.000651 wd 0.0500 time 0.2593 (0.2689) data time 0.0010 (0.0076) model time 0.2583 (0.2546) loss 5.3081 (5.9222) grad_norm 1.4658 (2.4729) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][70/625] eta 0:02:29 lr 0.000651 wd 0.0500 time 0.2498 (0.2696) data time 0.0006 (0.0067) model time 0.2492 (0.2638) loss 5.0187 (5.9298) grad_norm 1.7475 (2.3928) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][80/625] eta 0:02:27 lr 0.000650 wd 0.0500 time 0.2553 (0.2700) data time 0.0009 (0.0060) model time 0.2544 (0.2665) loss 6.7985 (5.9294) grad_norm 3.9202 (2.3615) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][90/625] eta 0:02:23 lr 0.000650 wd 0.0500 time 0.2550 (0.2686) data time 0.0007 (0.0054) model time 0.2542 (0.2639) loss 5.9170 (5.9207) grad_norm 2.1278 (2.3214) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][100/625] eta 0:02:20 lr 0.000650 wd 0.0500 time 0.2543 (0.2672) data time 0.0009 (0.0050) model time 0.2534 (0.2619) loss 4.7034 (5.8956) grad_norm 2.2599 (2.3142) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][110/625] eta 0:02:17 lr 0.000650 wd 0.0500 time 0.2522 (0.2662) data time 0.0009 (0.0046) model time 0.2514 (0.2608) loss 6.7376 (5.8893) grad_norm 1.5042 (2.2981) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][120/625] eta 0:02:14 lr 0.000650 wd 0.0500 time 0.2686 (0.2655) data time 0.0009 (0.0043) model time 0.2677 (0.2601) loss 5.6215 (5.8759) grad_norm 1.9906 (2.2676) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][130/625] eta 0:02:11 lr 0.000650 wd 0.0500 time 0.2596 (0.2648) data time 0.0006 (0.0041) model time 0.2590 (0.2596) loss 5.0981 (5.8625) grad_norm 2.6366 (2.2459) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][140/625] eta 0:02:08 lr 0.000649 wd 0.0500 time 0.2544 (0.2657) data time 0.0009 (0.0038) model time 0.2535 (0.2614) loss 5.1253 (5.8597) grad_norm 1.2798 (2.2294) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][150/625] eta 0:02:05 lr 0.000649 wd 0.0500 time 0.2565 (0.2650) data time 0.0010 (0.0036) model time 0.2555 (0.2607) loss 5.4061 (5.8432) grad_norm 3.3747 (2.2689) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][160/625] eta 0:02:03 lr 0.000649 wd 0.0500 time 0.2572 (0.2645) data time 0.0006 (0.0035) model time 0.2566 (0.2603) loss 4.7945 (5.8614) grad_norm 3.5721 (2.3312) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][170/625] eta 0:02:00 lr 0.000649 wd 0.0500 time 0.2581 (0.2641) data time 0.0007 (0.0033) model time 0.2574 (0.2600) loss 6.0450 (5.8551) grad_norm 2.1945 (2.3214) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][180/625] eta 0:01:57 lr 0.000649 wd 0.0500 time 0.2605 (0.2636) data time 0.0006 (0.0032) model time 0.2600 (0.2596) loss 6.9625 (5.8548) grad_norm 1.4608 (2.3171) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][190/625] eta 0:01:54 lr 0.000649 wd 0.0500 time 0.2548 (0.2632) data time 0.0008 (0.0031) model time 0.2540 (0.2593) loss 4.2841 (5.8422) grad_norm 2.3004 (2.2912) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][200/625] eta 0:01:52 lr 0.000648 wd 0.0500 time 0.2585 (0.2639) data time 0.0006 (0.0030) model time 0.2578 (0.2603) loss 4.4402 (5.8351) grad_norm 2.0618 (2.2762) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][210/625] eta 0:01:49 lr 0.000648 wd 0.0500 time 0.2575 (0.2635) data time 0.0010 (0.0029) model time 0.2566 (0.2600) loss 5.1836 (5.8491) grad_norm 2.2337 (2.2595) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][220/625] eta 0:01:46 lr 0.000648 wd 0.0500 time 0.2544 (0.2632) data time 0.0010 (0.0028) model time 0.2534 (0.2598) loss 5.6482 (5.8498) grad_norm 2.2619 (2.2518) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][230/625] eta 0:01:43 lr 0.000648 wd 0.0500 time 0.2561 (0.2629) data time 0.0009 (0.0027) model time 0.2552 (0.2595) loss 6.1515 (5.8419) grad_norm 3.7868 (2.2613) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][240/625] eta 0:01:41 lr 0.000648 wd 0.0500 time 0.2547 (0.2634) data time 0.0015 (0.0026) model time 0.2532 (0.2603) loss 4.4884 (5.8463) grad_norm 1.6704 (2.2747) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][250/625] eta 0:01:38 lr 0.000648 wd 0.0500 time 0.2542 (0.2632) data time 0.0007 (0.0026) model time 0.2535 (0.2601) loss 5.8683 (5.8418) grad_norm 1.9833 (2.2833) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][260/625] eta 0:01:35 lr 0.000647 wd 0.0500 time 0.2560 (0.2629) data time 0.0009 (0.0025) model time 0.2550 (0.2599) loss 5.8865 (5.8480) grad_norm 3.0331 (2.2755) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][270/625] eta 0:01:33 lr 0.000647 wd 0.0500 time 0.2570 (0.2626) data time 0.0008 (0.0024) model time 0.2562 (0.2596) loss 6.7473 (5.8686) grad_norm 3.3185 (2.2830) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][280/625] eta 0:01:30 lr 0.000647 wd 0.0500 time 0.2624 (0.2624) data time 0.0007 (0.0024) model time 0.2617 (0.2595) loss 6.5159 (5.8815) grad_norm 1.8849 (2.2739) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][290/625] eta 0:01:27 lr 0.000647 wd 0.0500 time 0.2556 (0.2622) data time 0.0009 (0.0023) model time 0.2547 (0.2593) loss 5.0480 (5.8900) grad_norm 1.8766 (2.2788) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][300/625] eta 0:01:25 lr 0.000647 wd 0.0500 time 0.2549 (0.2620) data time 0.0010 (0.0023) model time 0.2539 (0.2591) loss 6.2610 (5.8985) grad_norm 3.3951 (2.2809) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][310/625] eta 0:01:22 lr 0.000647 wd 0.0500 time 0.2548 (0.2618) data time 0.0007 (0.0022) model time 0.2541 (0.2590) loss 6.9345 (5.9059) grad_norm 2.6440 (2.2833) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][320/625] eta 0:01:19 lr 0.000646 wd 0.0500 time 0.2608 (0.2618) data time 0.0008 (0.0022) model time 0.2599 (0.2590) loss 4.8771 (5.9015) grad_norm 2.0980 (2.2794) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][330/625] eta 0:01:17 lr 0.000646 wd 0.0500 time 0.2492 (0.2616) data time 0.0009 (0.0022) model time 0.2484 (0.2589) loss 5.5818 (5.9001) grad_norm 2.1935 (2.2890) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][340/625] eta 0:01:14 lr 0.000646 wd 0.0500 time 0.2548 (0.2615) data time 0.0010 (0.0021) model time 0.2538 (0.2588) loss 4.6855 (5.8914) grad_norm 2.9756 (2.2919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][350/625] eta 0:01:11 lr 0.000646 wd 0.0500 time 0.2570 (0.2614) data time 0.0010 (0.0021) model time 0.2560 (0.2588) loss 6.2020 (5.8962) grad_norm 2.6274 (2.2907) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][360/625] eta 0:01:09 lr 0.000646 wd 0.0500 time 0.2550 (0.2613) data time 0.0010 (0.0021) model time 0.2540 (0.2586) loss 5.5262 (5.8916) grad_norm 2.1570 (2.2917) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:38:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][370/625] eta 0:01:06 lr 0.000646 wd 0.0500 time 0.2560 (0.2611) data time 0.0009 (0.0020) model time 0.2550 (0.2585) loss 6.3356 (5.8992) grad_norm 1.8830 (2.2943) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][380/625] eta 0:01:03 lr 0.000645 wd 0.0500 time 0.2560 (0.2610) data time 0.0005 (0.0020) model time 0.2555 (0.2584) loss 6.8004 (5.8998) grad_norm 1.7709 (2.3196) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][390/625] eta 0:01:01 lr 0.000645 wd 0.0500 time 0.2641 (0.2609) data time 0.0006 (0.0020) model time 0.2635 (0.2584) loss 5.4931 (5.9028) grad_norm 1.7521 (2.3179) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][400/625] eta 0:00:58 lr 0.000645 wd 0.0500 time 0.2549 (0.2607) data time 0.0008 (0.0019) model time 0.2541 (0.2582) loss 5.4656 (5.9065) grad_norm 1.8676 (2.3111) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][410/625] eta 0:00:56 lr 0.000645 wd 0.0500 time 0.2565 (0.2606) data time 0.0011 (0.0019) model time 0.2554 (0.2582) loss 6.0158 (5.9022) grad_norm 2.2957 (2.3109) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][420/625] eta 0:00:53 lr 0.000645 wd 0.0500 time 0.2589 (0.2610) data time 0.0005 (0.0019) model time 0.2583 (0.2586) loss 5.4629 (5.9081) grad_norm 2.8677 (2.3159) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][430/625] eta 0:00:50 lr 0.000645 wd 0.0500 time 0.2552 (0.2608) data time 0.0008 (0.0019) model time 0.2544 (0.2585) loss 6.7087 (5.9096) grad_norm 2.2182 (2.3142) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][440/625] eta 0:00:48 lr 0.000644 wd 0.0500 time 0.2583 (0.2607) data time 0.0007 (0.0019) model time 0.2576 (0.2584) loss 5.0118 (5.9103) grad_norm 2.4766 (2.3151) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][450/625] eta 0:00:45 lr 0.000644 wd 0.0500 time 0.2579 (0.2606) data time 0.0006 (0.0018) model time 0.2573 (0.2583) loss 5.0443 (5.9085) grad_norm 1.2875 (2.3071) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][460/625] eta 0:00:43 lr 0.000644 wd 0.0500 time 0.2600 (0.2613) data time 0.0006 (0.0018) model time 0.2594 (0.2591) loss 5.9015 (5.9027) grad_norm 3.4830 (2.3157) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][470/625] eta 0:00:40 lr 0.000644 wd 0.0500 time 0.2538 (0.2612) data time 0.0008 (0.0018) model time 0.2530 (0.2590) loss 5.4268 (5.8887) grad_norm 3.6163 (2.3472) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][480/625] eta 0:00:37 lr 0.000644 wd 0.0500 time 0.2524 (0.2615) data time 0.0008 (0.0018) model time 0.2516 (0.2594) loss 5.4783 (5.8896) grad_norm 2.7489 (2.3490) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][490/625] eta 0:00:35 lr 0.000644 wd 0.0500 time 0.2600 (0.2614) data time 0.0005 (0.0018) model time 0.2595 (0.2593) loss 5.9281 (5.8858) grad_norm 2.5249 (2.3531) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][500/625] eta 0:00:32 lr 0.000643 wd 0.0500 time 0.2541 (0.2613) data time 0.0007 (0.0017) model time 0.2534 (0.2592) loss 4.8037 (5.8820) grad_norm 2.1710 (2.3489) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][510/625] eta 0:00:30 lr 0.000643 wd 0.0500 time 0.2547 (0.2612) data time 0.0006 (0.0017) model time 0.2542 (0.2591) loss 6.0409 (5.8809) grad_norm 1.7762 (2.3387) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][520/625] eta 0:00:27 lr 0.000643 wd 0.0500 time 0.2557 (0.2611) data time 0.0007 (0.0017) model time 0.2550 (0.2591) loss 5.6829 (5.8793) grad_norm 2.1465 (2.3313) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][530/625] eta 0:00:24 lr 0.000643 wd 0.0500 time 0.2610 (0.2610) data time 0.0008 (0.0017) model time 0.2601 (0.2590) loss 6.3680 (5.8829) grad_norm 3.8050 (2.3342) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][540/625] eta 0:00:22 lr 0.000643 wd 0.0500 time 0.2549 (0.2609) data time 0.0011 (0.0017) model time 0.2538 (0.2589) loss 4.8028 (5.8799) grad_norm 2.5916 (2.3469) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][550/625] eta 0:00:19 lr 0.000643 wd 0.0500 time 0.2557 (0.2608) data time 0.0006 (0.0017) model time 0.2551 (0.2588) loss 6.6164 (5.8865) grad_norm 3.3232 (2.3479) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][560/625] eta 0:00:16 lr 0.000643 wd 0.0500 time 0.2598 (0.2610) data time 0.0006 (0.0017) model time 0.2592 (0.2590) loss 5.0446 (5.8820) grad_norm 1.5441 (2.3537) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][570/625] eta 0:00:14 lr 0.000642 wd 0.0500 time 0.2539 (0.2609) data time 0.0010 (0.0016) model time 0.2529 (0.2589) loss 5.9446 (5.8807) grad_norm 1.8238 (2.3440) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][580/625] eta 0:00:11 lr 0.000642 wd 0.0500 time 0.2557 (0.2608) data time 0.0008 (0.0016) model time 0.2549 (0.2589) loss 6.6216 (5.8762) grad_norm 1.9916 (2.3340) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][590/625] eta 0:00:09 lr 0.000642 wd 0.0500 time 0.2550 (0.2614) data time 0.0006 (0.0016) model time 0.2544 (0.2595) loss 5.3600 (5.8826) grad_norm 1.8365 (2.3281) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:39:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][600/625] eta 0:00:06 lr 0.000642 wd 0.0500 time 0.2574 (0.2617) data time 0.0008 (0.0016) model time 0.2566 (0.2598) loss 6.5340 (5.8848) grad_norm 2.2627 (2.3247) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][610/625] eta 0:00:03 lr 0.000642 wd 0.0500 time 0.2518 (0.2616) data time 0.0006 (0.0016) model time 0.2512 (0.2597) loss 5.2936 (5.8812) grad_norm 1.5022 (2.3206) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][620/625] eta 0:00:01 lr 0.000642 wd 0.0500 time 0.2527 (0.2614) data time 0.0005 (0.0016) model time 0.2522 (0.2596) loss 5.0081 (5.8783) grad_norm 2.0721 (2.3146) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 193 training takes 0:02:43 [2024-08-04 06:40:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:40:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:40:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.576 (0.576) Loss 0.6123 (0.6123) Acc@1 89.062 (89.062) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:40:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.9521 (0.7543) Acc@1 79.932 (85.502) Acc@5 95.801 (97.337) Mem 9655MB [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0996 (0.8893) Acc@1 75.146 (82.043) Acc@5 94.580 (95.938) Mem 9655MB [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.764 Acc@5 95.933 [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.76% [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:40:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:40:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.611 (0.611) Loss 0.5825 (0.5825) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 0.9224 (0.7198) Acc@1 80.420 (86.155) Acc@5 95.801 (97.554) Mem 9655MB [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0479 (0.8459) Acc@1 76.025 (82.668) Acc@5 94.727 (96.238) Mem 9655MB [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.378 Acc@5 96.221 [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.38% [2024-08-04 06:40:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:40:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][0/625] eta 0:07:35 lr 0.000641 wd 0.0500 time 0.7294 (0.7294) data time 0.4784 (0.4784) model time 0.0000 (0.0000) loss 6.4244 (6.4244) grad_norm 3.0426 (3.0426) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][10/625] eta 0:03:13 lr 0.000641 wd 0.0500 time 0.2564 (0.3143) data time 0.0009 (0.0443) model time 0.0000 (0.0000) loss 5.9007 (5.9949) grad_norm 1.8395 (1.9703) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][20/625] eta 0:02:58 lr 0.000641 wd 0.0500 time 0.2576 (0.2952) data time 0.0007 (0.0237) model time 0.0000 (0.0000) loss 4.5473 (5.8345) grad_norm 1.7989 (1.9131) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][30/625] eta 0:02:51 lr 0.000641 wd 0.0500 time 0.2547 (0.2889) data time 0.0011 (0.0164) model time 0.0000 (0.0000) loss 5.0502 (5.8568) grad_norm 2.6058 (2.1154) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][40/625] eta 0:02:44 lr 0.000641 wd 0.0500 time 0.2549 (0.2808) data time 0.0010 (0.0126) model time 0.0000 (0.0000) loss 6.7948 (5.9314) grad_norm 2.9752 (2.0663) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][50/625] eta 0:02:38 lr 0.000641 wd 0.0500 time 0.2607 (0.2762) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 5.6134 (5.9535) grad_norm 2.7604 (2.2445) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][60/625] eta 0:02:34 lr 0.000640 wd 0.0500 time 0.2590 (0.2730) data time 0.0009 (0.0088) model time 0.2581 (0.2558) loss 5.7001 (5.9530) grad_norm 1.3740 (2.3769) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][70/625] eta 0:02:30 lr 0.000640 wd 0.0500 time 0.2555 (0.2709) data time 0.0010 (0.0077) model time 0.2545 (0.2567) loss 5.4198 (5.9415) grad_norm 2.2542 (2.3889) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][80/625] eta 0:02:26 lr 0.000640 wd 0.0500 time 0.2615 (0.2692) data time 0.0009 (0.0068) model time 0.2606 (0.2564) loss 4.3450 (5.8934) grad_norm 1.8488 (2.3304) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][90/625] eta 0:02:24 lr 0.000640 wd 0.0500 time 0.2564 (0.2700) data time 0.0006 (0.0062) model time 0.2557 (0.2613) loss 6.1415 (5.8677) grad_norm 1.7302 (2.2778) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][100/625] eta 0:02:21 lr 0.000640 wd 0.0500 time 0.2567 (0.2702) data time 0.0008 (0.0057) model time 0.2560 (0.2633) loss 6.2798 (5.8572) grad_norm 1.6644 (2.2450) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][110/625] eta 0:02:18 lr 0.000640 wd 0.0500 time 0.2543 (0.2690) data time 0.0006 (0.0052) model time 0.2537 (0.2620) loss 5.6316 (5.8483) grad_norm 2.0025 (2.2274) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][120/625] eta 0:02:15 lr 0.000639 wd 0.0500 time 0.2535 (0.2679) data time 0.0007 (0.0049) model time 0.2529 (0.2610) loss 4.5796 (5.8424) grad_norm 1.2875 (2.2103) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][130/625] eta 0:02:12 lr 0.000639 wd 0.0500 time 0.2595 (0.2671) data time 0.0007 (0.0046) model time 0.2588 (0.2604) loss 5.6027 (5.8367) grad_norm 1.5526 (2.1861) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][140/625] eta 0:02:09 lr 0.000639 wd 0.0500 time 0.2579 (0.2665) data time 0.0009 (0.0043) model time 0.2570 (0.2601) loss 5.7703 (5.8470) grad_norm 2.1121 (2.1773) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][150/625] eta 0:02:06 lr 0.000639 wd 0.0500 time 0.2512 (0.2672) data time 0.0009 (0.0041) model time 0.2503 (0.2617) loss 6.0535 (5.8568) grad_norm 1.5542 (2.1451) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][160/625] eta 0:02:03 lr 0.000639 wd 0.0500 time 0.2590 (0.2665) data time 0.0007 (0.0039) model time 0.2583 (0.2611) loss 5.5916 (5.8725) grad_norm 1.4685 (2.1250) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][170/625] eta 0:02:00 lr 0.000639 wd 0.0500 time 0.2544 (0.2659) data time 0.0008 (0.0037) model time 0.2535 (0.2606) loss 6.0229 (5.8570) grad_norm 11.3167 (2.2219) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][180/625] eta 0:01:58 lr 0.000638 wd 0.0500 time 0.2579 (0.2653) data time 0.0011 (0.0036) model time 0.2568 (0.2602) loss 5.2563 (5.8399) grad_norm 2.0120 (2.2331) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:40:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][190/625] eta 0:01:55 lr 0.000638 wd 0.0500 time 0.2530 (0.2648) data time 0.0007 (0.0034) model time 0.2523 (0.2598) loss 6.8163 (5.8439) grad_norm 3.2482 (2.2916) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:41:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][200/625] eta 0:01:52 lr 0.000638 wd 0.0500 time 0.2567 (0.2644) data time 0.0007 (0.0033) model time 0.2559 (0.2595) loss 4.9879 (5.8419) grad_norm 2.2288 (2.3282) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:41:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][210/625] eta 0:01:50 lr 0.000638 wd 0.0500 time 0.2573 (0.2660) data time 0.0008 (0.0032) model time 0.2565 (0.2619) loss 6.5051 (5.8542) grad_norm 2.3929 (2.3296) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:41:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][220/625] eta 0:01:47 lr 0.000638 wd 0.0500 time 0.2562 (0.2656) data time 0.0007 (0.0031) model time 0.2555 (0.2615) loss 5.3524 (5.8392) grad_norm 2.0869 (2.3486) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][230/625] eta 0:01:44 lr 0.000638 wd 0.0500 time 0.2585 (0.2652) data time 0.0006 (0.0030) model time 0.2579 (0.2611) loss 6.8905 (5.8320) grad_norm 2.9560 (2.3512) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:41:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][240/625] eta 0:01:41 lr 0.000637 wd 0.0500 time 0.2555 (0.2648) data time 0.0007 (0.0029) model time 0.2548 (0.2608) loss 5.4054 (5.8521) grad_norm 2.0637 (2.3292) loss_scale 2048.0000 (1062.2407) mem 9655MB [2024-08-04 06:41:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][250/625] eta 0:01:39 lr 0.000637 wd 0.0500 time 0.2552 (0.2645) data time 0.0012 (0.0028) model time 0.2540 (0.2605) loss 6.4644 (5.8649) grad_norm 3.2323 (2.3154) loss_scale 2048.0000 (1101.5139) mem 9655MB [2024-08-04 06:41:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][260/625] eta 0:01:36 lr 0.000637 wd 0.0500 time 0.2543 (0.2646) data time 0.0011 (0.0028) model time 0.2532 (0.2609) loss 4.8298 (5.8532) grad_norm 2.0463 (2.3115) loss_scale 2048.0000 (1137.7778) mem 9655MB [2024-08-04 06:41:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][270/625] eta 0:01:34 lr 0.000637 wd 0.0500 time 0.4437 (0.2650) data time 0.0007 (0.0027) model time 0.4430 (0.2615) loss 6.4554 (5.8677) grad_norm 2.3111 (inf) loss_scale 1024.0000 (1148.6937) mem 9655MB [2024-08-04 06:41:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][280/625] eta 0:01:31 lr 0.000637 wd 0.0500 time 0.2557 (0.2654) data time 0.0009 (0.0026) model time 0.2547 (0.2620) loss 4.8856 (5.8589) grad_norm 1.9198 (inf) loss_scale 1024.0000 (1144.2562) mem 9655MB [2024-08-04 06:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][290/625] eta 0:01:28 lr 0.000637 wd 0.0500 time 0.2573 (0.2650) data time 0.0009 (0.0026) model time 0.2564 (0.2617) loss 6.5498 (5.8710) grad_norm 1.8790 (inf) loss_scale 1024.0000 (1140.1237) mem 9655MB [2024-08-04 06:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][300/625] eta 0:01:26 lr 0.000636 wd 0.0500 time 0.2565 (0.2658) data time 0.0008 (0.0025) model time 0.2557 (0.2628) loss 6.0728 (5.8577) grad_norm 1.6146 (inf) loss_scale 1024.0000 (1136.2658) mem 9655MB [2024-08-04 06:41:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][310/625] eta 0:01:23 lr 0.000636 wd 0.0500 time 0.2523 (0.2660) data time 0.0010 (0.0025) model time 0.2514 (0.2631) loss 5.7994 (5.8518) grad_norm 1.9833 (inf) loss_scale 1024.0000 (1132.6559) mem 9655MB [2024-08-04 06:41:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][320/625] eta 0:01:21 lr 0.000636 wd 0.0500 time 0.2610 (0.2658) data time 0.0011 (0.0024) model time 0.2599 (0.2629) loss 5.0728 (5.8529) grad_norm 3.0907 (inf) loss_scale 1024.0000 (1129.2710) mem 9655MB [2024-08-04 06:41:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][330/625] eta 0:01:18 lr 0.000636 wd 0.0500 time 0.2581 (0.2655) data time 0.0008 (0.0024) model time 0.2573 (0.2626) loss 6.3818 (5.8581) grad_norm 1.5345 (inf) loss_scale 1024.0000 (1126.0906) mem 9655MB [2024-08-04 06:41:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][340/625] eta 0:01:15 lr 0.000636 wd 0.0500 time 0.2538 (0.2652) data time 0.0009 (0.0023) model time 0.2529 (0.2624) loss 5.0304 (5.8581) grad_norm 3.8778 (inf) loss_scale 1024.0000 (1123.0968) mem 9655MB [2024-08-04 06:41:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][350/625] eta 0:01:12 lr 0.000636 wd 0.0500 time 0.2495 (0.2650) data time 0.0008 (0.0023) model time 0.2487 (0.2621) loss 5.9383 (5.8468) grad_norm 2.6614 (inf) loss_scale 1024.0000 (1120.2735) mem 9655MB [2024-08-04 06:41:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][360/625] eta 0:01:10 lr 0.000635 wd 0.0500 time 0.2527 (0.2647) data time 0.0007 (0.0023) model time 0.2521 (0.2619) loss 4.8581 (5.8399) grad_norm 1.6688 (inf) loss_scale 1024.0000 (1117.6066) mem 9655MB [2024-08-04 06:41:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][370/625] eta 0:01:07 lr 0.000635 wd 0.0500 time 0.2539 (0.2645) data time 0.0007 (0.0022) model time 0.2531 (0.2617) loss 4.7072 (5.8296) grad_norm 4.1578 (inf) loss_scale 1024.0000 (1115.0836) mem 9655MB [2024-08-04 06:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][380/625] eta 0:01:04 lr 0.000635 wd 0.0500 time 0.4400 (0.2648) data time 0.0009 (0.0022) model time 0.4391 (0.2621) loss 5.4949 (5.8267) grad_norm 1.5756 (inf) loss_scale 1024.0000 (1112.6929) mem 9655MB [2024-08-04 06:41:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][390/625] eta 0:01:02 lr 0.000635 wd 0.0500 time 0.2594 (0.2645) data time 0.0008 (0.0022) model time 0.2587 (0.2618) loss 6.1884 (5.8253) grad_norm 2.6628 (inf) loss_scale 1024.0000 (1110.4246) mem 9655MB [2024-08-04 06:41:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][400/625] eta 0:00:59 lr 0.000635 wd 0.0500 time 0.2585 (0.2644) data time 0.0005 (0.0021) model time 0.2579 (0.2617) loss 6.3985 (5.8337) grad_norm 1.6066 (inf) loss_scale 1024.0000 (1108.2693) mem 9655MB [2024-08-04 06:41:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][410/625] eta 0:00:56 lr 0.000635 wd 0.0500 time 0.2572 (0.2642) data time 0.0008 (0.0021) model time 0.2565 (0.2615) loss 5.7572 (5.8372) grad_norm 1.6584 (inf) loss_scale 1024.0000 (1106.2190) mem 9655MB [2024-08-04 06:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][420/625] eta 0:00:54 lr 0.000635 wd 0.0500 time 0.2562 (0.2640) data time 0.0007 (0.0021) model time 0.2555 (0.2614) loss 5.9123 (5.8389) grad_norm 1.7113 (inf) loss_scale 1024.0000 (1104.2660) mem 9655MB [2024-08-04 06:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][430/625] eta 0:00:51 lr 0.000634 wd 0.0500 time 0.2604 (0.2639) data time 0.0006 (0.0020) model time 0.2598 (0.2613) loss 6.2718 (5.8462) grad_norm 2.4162 (inf) loss_scale 1024.0000 (1102.4037) mem 9655MB [2024-08-04 06:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][440/625] eta 0:00:48 lr 0.000634 wd 0.0500 time 0.2552 (0.2637) data time 0.0009 (0.0020) model time 0.2542 (0.2611) loss 6.4639 (5.8532) grad_norm 1.7497 (inf) loss_scale 1024.0000 (1100.6259) mem 9655MB [2024-08-04 06:42:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][450/625] eta 0:00:46 lr 0.000634 wd 0.0500 time 0.2574 (0.2635) data time 0.0009 (0.0020) model time 0.2566 (0.2609) loss 5.4334 (5.8553) grad_norm 2.0812 (inf) loss_scale 1024.0000 (1098.9268) mem 9655MB [2024-08-04 06:42:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][460/625] eta 0:00:43 lr 0.000634 wd 0.0500 time 0.2620 (0.2633) data time 0.0009 (0.0020) model time 0.2611 (0.2608) loss 6.5497 (5.8532) grad_norm 2.2875 (inf) loss_scale 1024.0000 (1097.3015) mem 9655MB [2024-08-04 06:42:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][470/625] eta 0:00:40 lr 0.000634 wd 0.0500 time 0.2597 (0.2632) data time 0.0006 (0.0020) model time 0.2591 (0.2607) loss 5.2070 (5.8551) grad_norm 1.8212 (inf) loss_scale 1024.0000 (1095.7452) mem 9655MB [2024-08-04 06:42:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][480/625] eta 0:00:38 lr 0.000634 wd 0.0500 time 0.2608 (0.2631) data time 0.0010 (0.0019) model time 0.2597 (0.2606) loss 7.0781 (5.8523) grad_norm 2.7341 (inf) loss_scale 1024.0000 (1094.2536) mem 9655MB [2024-08-04 06:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][490/625] eta 0:00:35 lr 0.000633 wd 0.0500 time 0.2581 (0.2630) data time 0.0006 (0.0019) model time 0.2575 (0.2605) loss 5.8583 (5.8543) grad_norm 2.4662 (inf) loss_scale 1024.0000 (1092.8228) mem 9655MB [2024-08-04 06:42:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][500/625] eta 0:00:32 lr 0.000633 wd 0.0500 time 0.2537 (0.2628) data time 0.0007 (0.0019) model time 0.2530 (0.2604) loss 5.0340 (5.8528) grad_norm 2.0395 (inf) loss_scale 1024.0000 (1091.4491) mem 9655MB [2024-08-04 06:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][510/625] eta 0:00:30 lr 0.000633 wd 0.0500 time 0.2545 (0.2628) data time 0.0008 (0.0019) model time 0.2537 (0.2604) loss 6.3375 (5.8488) grad_norm 2.2489 (inf) loss_scale 1024.0000 (1090.1292) mem 9655MB [2024-08-04 06:42:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][520/625] eta 0:00:27 lr 0.000633 wd 0.0500 time 0.2569 (0.2627) data time 0.0011 (0.0019) model time 0.2559 (0.2603) loss 6.5558 (5.8540) grad_norm 5.2115 (inf) loss_scale 1024.0000 (1088.8599) mem 9655MB [2024-08-04 06:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][530/625] eta 0:00:24 lr 0.000633 wd 0.0500 time 0.2518 (0.2625) data time 0.0007 (0.0018) model time 0.2511 (0.2602) loss 4.9789 (5.8529) grad_norm 2.8403 (inf) loss_scale 1024.0000 (1087.6384) mem 9655MB [2024-08-04 06:42:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][540/625] eta 0:00:22 lr 0.000633 wd 0.0500 time 0.2538 (0.2624) data time 0.0007 (0.0018) model time 0.2531 (0.2601) loss 5.6850 (5.8521) grad_norm 1.9893 (inf) loss_scale 1024.0000 (1086.4621) mem 9655MB [2024-08-04 06:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][550/625] eta 0:00:19 lr 0.000632 wd 0.0500 time 0.2572 (0.2623) data time 0.0006 (0.0018) model time 0.2566 (0.2600) loss 4.8763 (5.8457) grad_norm 2.0870 (inf) loss_scale 1024.0000 (1085.3285) mem 9655MB [2024-08-04 06:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][560/625] eta 0:00:17 lr 0.000632 wd 0.0500 time 0.2552 (0.2622) data time 0.0008 (0.0018) model time 0.2545 (0.2599) loss 5.3102 (5.8467) grad_norm 1.8878 (inf) loss_scale 1024.0000 (1084.2353) mem 9655MB [2024-08-04 06:42:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][570/625] eta 0:00:14 lr 0.000632 wd 0.0500 time 0.2607 (0.2621) data time 0.0009 (0.0018) model time 0.2598 (0.2598) loss 4.9251 (5.8492) grad_norm 2.2458 (inf) loss_scale 1024.0000 (1083.1804) mem 9655MB [2024-08-04 06:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][580/625] eta 0:00:11 lr 0.000632 wd 0.0500 time 0.2558 (0.2620) data time 0.0009 (0.0018) model time 0.2550 (0.2597) loss 6.6389 (5.8545) grad_norm 1.6183 (inf) loss_scale 1024.0000 (1082.1618) mem 9655MB [2024-08-04 06:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][590/625] eta 0:00:09 lr 0.000632 wd 0.0500 time 0.2592 (0.2619) data time 0.0007 (0.0017) model time 0.2585 (0.2597) loss 5.1278 (5.8498) grad_norm 2.4470 (inf) loss_scale 1024.0000 (1081.1777) mem 9655MB [2024-08-04 06:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][600/625] eta 0:00:06 lr 0.000632 wd 0.0500 time 0.2532 (0.2618) data time 0.0008 (0.0017) model time 0.2524 (0.2596) loss 5.7224 (5.8533) grad_norm 2.1974 (inf) loss_scale 1024.0000 (1080.2263) mem 9655MB [2024-08-04 06:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][610/625] eta 0:00:03 lr 0.000631 wd 0.0500 time 0.2532 (0.2620) data time 0.0004 (0.0017) model time 0.2527 (0.2598) loss 6.2489 (5.8531) grad_norm 2.7081 (inf) loss_scale 1024.0000 (1079.3061) mem 9655MB [2024-08-04 06:42:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][620/625] eta 0:00:01 lr 0.000631 wd 0.0500 time 0.2543 (0.2619) data time 0.0006 (0.0017) model time 0.2537 (0.2597) loss 5.9891 (5.8556) grad_norm 2.7786 (inf) loss_scale 1024.0000 (1078.4155) mem 9655MB [2024-08-04 06:42:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 194 training takes 0:02:43 [2024-08-04 06:42:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:42:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.6206 (0.6206) Acc@1 88.867 (88.867) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 06:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9927 (0.7655) Acc@1 79.150 (85.360) Acc@5 95.801 (97.390) Mem 9655MB [2024-08-04 06:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0889 (0.9036) Acc@1 74.902 (81.873) Acc@5 94.678 (95.936) Mem 9655MB [2024-08-04 06:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.572 Acc@5 95.957 [2024-08-04 06:42:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-04 06:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.745 (0.745) Loss 0.5820 (0.5820) Acc@1 89.893 (89.893) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9233 (0.7192) Acc@1 80.273 (86.137) Acc@5 95.752 (97.541) Mem 9655MB [2024-08-04 06:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0488 (0.8455) Acc@1 76.074 (82.657) Acc@5 94.727 (96.233) Mem 9655MB [2024-08-04 06:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.366 Acc@5 96.219 [2024-08-04 06:42:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:42:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][0/625] eta 0:11:31 lr 0.000631 wd 0.0500 time 1.1066 (1.1066) data time 0.6811 (0.6811) model time 0.0000 (0.0000) loss 6.4804 (6.4804) grad_norm 2.7626 (2.7626) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][10/625] eta 0:03:25 lr 0.000631 wd 0.0500 time 0.2602 (0.3338) data time 0.0008 (0.0627) model time 0.0000 (0.0000) loss 4.9555 (5.8620) grad_norm 2.6712 (2.5528) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][20/625] eta 0:02:59 lr 0.000631 wd 0.0500 time 0.2579 (0.2968) data time 0.0008 (0.0333) model time 0.0000 (0.0000) loss 6.3182 (5.9172) grad_norm 2.1693 (2.5825) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][30/625] eta 0:02:49 lr 0.000631 wd 0.0500 time 0.2584 (0.2842) data time 0.0008 (0.0228) model time 0.0000 (0.0000) loss 5.0218 (5.9391) grad_norm 2.3654 (2.7233) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][40/625] eta 0:02:43 lr 0.000630 wd 0.0500 time 0.2577 (0.2803) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 5.0698 (5.9448) grad_norm 2.4342 (2.6570) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][50/625] eta 0:02:38 lr 0.000630 wd 0.0500 time 0.2572 (0.2756) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 6.6793 (5.8689) grad_norm 1.7681 (2.5599) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][60/625] eta 0:02:35 lr 0.000630 wd 0.0500 time 0.3993 (0.2746) data time 0.0008 (0.0121) model time 0.3986 (0.2683) loss 5.7245 (5.8560) grad_norm 1.8121 (2.5330) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][70/625] eta 0:02:32 lr 0.000630 wd 0.0500 time 0.2596 (0.2745) data time 0.0008 (0.0105) model time 0.2588 (0.2707) loss 6.4388 (5.8660) grad_norm 3.6211 (2.5014) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][80/625] eta 0:02:28 lr 0.000630 wd 0.0500 time 0.2544 (0.2721) data time 0.0009 (0.0093) model time 0.2535 (0.2652) loss 5.7375 (5.8524) grad_norm 1.8793 (2.5782) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][90/625] eta 0:02:24 lr 0.000630 wd 0.0500 time 0.2532 (0.2703) data time 0.0007 (0.0084) model time 0.2525 (0.2625) loss 6.6329 (5.8832) grad_norm 2.2264 (2.5348) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][100/625] eta 0:02:21 lr 0.000630 wd 0.0500 time 0.2597 (0.2702) data time 0.0008 (0.0077) model time 0.2589 (0.2637) loss 5.3664 (5.8771) grad_norm 1.7497 (2.4492) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][110/625] eta 0:02:19 lr 0.000629 wd 0.0500 time 0.2583 (0.2707) data time 0.0006 (0.0071) model time 0.2577 (0.2655) loss 5.0965 (5.8806) grad_norm 1.5684 (2.4041) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][120/625] eta 0:02:16 lr 0.000629 wd 0.0500 time 0.2579 (0.2695) data time 0.0006 (0.0066) model time 0.2573 (0.2640) loss 6.4574 (5.8973) grad_norm 2.1009 (2.3800) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][130/625] eta 0:02:12 lr 0.000629 wd 0.0500 time 0.2564 (0.2685) data time 0.0007 (0.0061) model time 0.2557 (0.2630) loss 6.3130 (5.8995) grad_norm 1.7869 (2.3763) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][140/625] eta 0:02:09 lr 0.000629 wd 0.0500 time 0.2571 (0.2676) data time 0.0009 (0.0057) model time 0.2562 (0.2621) loss 6.2214 (5.9085) grad_norm 6.3602 (2.3872) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][150/625] eta 0:02:07 lr 0.000629 wd 0.0500 time 0.2539 (0.2682) data time 0.0009 (0.0054) model time 0.2529 (0.2635) loss 5.9325 (5.9004) grad_norm 1.5676 (2.3978) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][160/625] eta 0:02:04 lr 0.000629 wd 0.0500 time 0.2526 (0.2674) data time 0.0009 (0.0052) model time 0.2517 (0.2627) loss 6.1275 (5.8986) grad_norm 2.5244 (2.4717) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][170/625] eta 0:02:02 lr 0.000628 wd 0.0500 time 0.2557 (0.2687) data time 0.0008 (0.0049) model time 0.2549 (0.2648) loss 6.5240 (5.9130) grad_norm 2.8303 (2.4849) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][180/625] eta 0:01:59 lr 0.000628 wd 0.0500 time 0.2532 (0.2687) data time 0.0009 (0.0047) model time 0.2523 (0.2650) loss 5.3985 (5.9037) grad_norm 1.3572 (2.4554) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][190/625] eta 0:01:56 lr 0.000628 wd 0.0500 time 0.2556 (0.2680) data time 0.0009 (0.0045) model time 0.2547 (0.2643) loss 5.0450 (5.8939) grad_norm 1.6751 (2.4223) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][200/625] eta 0:01:53 lr 0.000628 wd 0.0500 time 0.2538 (0.2674) data time 0.0007 (0.0043) model time 0.2531 (0.2637) loss 4.6042 (5.8908) grad_norm 1.5504 (2.4061) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][210/625] eta 0:01:50 lr 0.000628 wd 0.0500 time 0.2537 (0.2669) data time 0.0008 (0.0041) model time 0.2528 (0.2632) loss 6.0330 (5.8793) grad_norm 2.2364 (2.3881) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][220/625] eta 0:01:47 lr 0.000628 wd 0.0500 time 0.2538 (0.2665) data time 0.0009 (0.0040) model time 0.2528 (0.2628) loss 5.9997 (5.8930) grad_norm 2.5129 (2.3887) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][230/625] eta 0:01:45 lr 0.000627 wd 0.0500 time 0.2527 (0.2660) data time 0.0007 (0.0039) model time 0.2519 (0.2624) loss 6.2228 (5.8932) grad_norm 2.6548 (2.4046) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][240/625] eta 0:01:42 lr 0.000627 wd 0.0500 time 0.2570 (0.2656) data time 0.0007 (0.0037) model time 0.2562 (0.2620) loss 4.3163 (5.8808) grad_norm 1.9079 (2.3955) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][250/625] eta 0:01:39 lr 0.000627 wd 0.0500 time 0.2549 (0.2652) data time 0.0010 (0.0036) model time 0.2538 (0.2617) loss 5.3229 (5.8798) grad_norm 1.4612 (2.3758) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][260/625] eta 0:01:36 lr 0.000627 wd 0.0500 time 0.2555 (0.2649) data time 0.0013 (0.0035) model time 0.2542 (0.2613) loss 5.0478 (5.8791) grad_norm 1.5838 (2.3523) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][270/625] eta 0:01:33 lr 0.000627 wd 0.0500 time 0.2549 (0.2646) data time 0.0009 (0.0034) model time 0.2540 (0.2611) loss 5.1671 (5.8769) grad_norm 2.4176 (2.3316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][280/625] eta 0:01:31 lr 0.000627 wd 0.0500 time 0.2604 (0.2643) data time 0.0008 (0.0033) model time 0.2596 (0.2608) loss 6.7067 (5.8796) grad_norm 2.5927 (2.3256) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][290/625] eta 0:01:28 lr 0.000626 wd 0.0500 time 0.2519 (0.2653) data time 0.0008 (0.0033) model time 0.2511 (0.2622) loss 5.9553 (5.8691) grad_norm 2.3194 (2.3351) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][300/625] eta 0:01:26 lr 0.000626 wd 0.0500 time 0.2570 (0.2650) data time 0.0009 (0.0032) model time 0.2561 (0.2618) loss 6.1703 (5.8680) grad_norm 1.5391 (2.3191) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][310/625] eta 0:01:23 lr 0.000626 wd 0.0500 time 0.2521 (0.2647) data time 0.0010 (0.0031) model time 0.2511 (0.2616) loss 6.4013 (5.8795) grad_norm 1.6352 (2.3138) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][320/625] eta 0:01:20 lr 0.000626 wd 0.0500 time 0.2560 (0.2644) data time 0.0007 (0.0030) model time 0.2553 (0.2613) loss 5.2799 (5.8843) grad_norm 2.3638 (2.3138) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][330/625] eta 0:01:17 lr 0.000626 wd 0.0500 time 0.2572 (0.2641) data time 0.0008 (0.0030) model time 0.2564 (0.2611) loss 5.7630 (5.8862) grad_norm 2.6048 (2.3124) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][340/625] eta 0:01:15 lr 0.000626 wd 0.0500 time 0.2575 (0.2643) data time 0.0009 (0.0029) model time 0.2566 (0.2614) loss 6.1571 (5.8921) grad_norm 2.1147 (2.3252) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][350/625] eta 0:01:12 lr 0.000625 wd 0.0500 time 0.2582 (0.2641) data time 0.0006 (0.0029) model time 0.2576 (0.2612) loss 5.0198 (5.8805) grad_norm 2.3972 (2.3279) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][360/625] eta 0:01:10 lr 0.000625 wd 0.0500 time 0.2515 (0.2642) data time 0.0010 (0.0028) model time 0.2505 (0.2614) loss 5.0444 (5.8907) grad_norm 1.4765 (2.3243) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][370/625] eta 0:01:07 lr 0.000625 wd 0.0500 time 0.2562 (0.2640) data time 0.0007 (0.0028) model time 0.2556 (0.2612) loss 6.0896 (5.8873) grad_norm 1.6629 (2.3108) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][380/625] eta 0:01:04 lr 0.000625 wd 0.0500 time 0.2545 (0.2638) data time 0.0009 (0.0027) model time 0.2535 (0.2610) loss 7.1227 (5.8950) grad_norm 4.6037 (2.3062) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][390/625] eta 0:01:01 lr 0.000625 wd 0.0500 time 0.2561 (0.2636) data time 0.0010 (0.0027) model time 0.2551 (0.2609) loss 6.5528 (5.8977) grad_norm 2.0324 (2.3011) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][400/625] eta 0:00:59 lr 0.000625 wd 0.0500 time 0.2665 (0.2634) data time 0.0007 (0.0026) model time 0.2657 (0.2607) loss 4.7814 (5.9002) grad_norm 1.5927 (2.2921) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][410/625] eta 0:00:56 lr 0.000624 wd 0.0500 time 0.4354 (0.2637) data time 0.0009 (0.0026) model time 0.4345 (0.2611) loss 6.5102 (5.9013) grad_norm 2.1409 (2.3097) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][420/625] eta 0:00:54 lr 0.000624 wd 0.0500 time 0.4553 (0.2649) data time 0.0007 (0.0025) model time 0.4546 (0.2625) loss 5.8667 (5.9068) grad_norm 1.3794 (2.3034) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][430/625] eta 0:00:51 lr 0.000624 wd 0.0500 time 0.2530 (0.2651) data time 0.0008 (0.0025) model time 0.2522 (0.2627) loss 5.2380 (5.9131) grad_norm 2.7293 (2.2964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][440/625] eta 0:00:49 lr 0.000624 wd 0.0500 time 0.2664 (0.2649) data time 0.0009 (0.0025) model time 0.2655 (0.2625) loss 6.5705 (5.9163) grad_norm 2.3307 (2.2967) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][450/625] eta 0:00:46 lr 0.000624 wd 0.0500 time 0.2605 (0.2647) data time 0.0006 (0.0024) model time 0.2599 (0.2623) loss 4.9179 (5.9139) grad_norm 2.4274 (2.2880) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][460/625] eta 0:00:43 lr 0.000624 wd 0.0500 time 0.2776 (0.2645) data time 0.0008 (0.0024) model time 0.2768 (0.2622) loss 5.6348 (5.9073) grad_norm 1.7075 (2.2726) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][470/625] eta 0:00:40 lr 0.000623 wd 0.0500 time 0.2637 (0.2644) data time 0.0006 (0.0024) model time 0.2631 (0.2621) loss 4.8803 (5.9034) grad_norm 2.9078 (2.2708) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][480/625] eta 0:00:38 lr 0.000623 wd 0.0500 time 0.2574 (0.2642) data time 0.0006 (0.0023) model time 0.2568 (0.2619) loss 6.4479 (5.9019) grad_norm 1.9856 (2.2738) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][490/625] eta 0:00:35 lr 0.000623 wd 0.0500 time 0.2536 (0.2645) data time 0.0010 (0.0023) model time 0.2527 (0.2622) loss 6.1834 (5.9066) grad_norm 4.4437 (2.2775) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][500/625] eta 0:00:33 lr 0.000623 wd 0.0500 time 0.2565 (0.2643) data time 0.0009 (0.0023) model time 0.2556 (0.2621) loss 6.6942 (5.9134) grad_norm 2.7228 (2.2827) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][510/625] eta 0:00:30 lr 0.000623 wd 0.0500 time 0.2600 (0.2646) data time 0.0008 (0.0023) model time 0.2592 (0.2624) loss 6.3307 (5.9121) grad_norm 2.5753 (2.2975) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][520/625] eta 0:00:27 lr 0.000623 wd 0.0500 time 0.2561 (0.2647) data time 0.0008 (0.0022) model time 0.2553 (0.2626) loss 5.0136 (5.9088) grad_norm 2.5375 (2.3074) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][530/625] eta 0:00:25 lr 0.000622 wd 0.0500 time 0.2554 (0.2645) data time 0.0015 (0.0022) model time 0.2539 (0.2624) loss 6.6924 (5.9127) grad_norm 3.4974 (2.3150) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][540/625] eta 0:00:22 lr 0.000622 wd 0.0500 time 0.2567 (0.2647) data time 0.0008 (0.0022) model time 0.2559 (0.2627) loss 6.5603 (5.9200) grad_norm 2.3511 (2.3197) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][550/625] eta 0:00:19 lr 0.000622 wd 0.0500 time 0.2563 (0.2649) data time 0.0007 (0.0022) model time 0.2556 (0.2629) loss 4.8888 (5.9196) grad_norm 4.6789 (2.3250) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][560/625] eta 0:00:17 lr 0.000622 wd 0.0500 time 0.2550 (0.2648) data time 0.0007 (0.0022) model time 0.2543 (0.2627) loss 6.5851 (5.9202) grad_norm 2.3304 (2.3338) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][570/625] eta 0:00:14 lr 0.000622 wd 0.0500 time 0.2585 (0.2646) data time 0.0007 (0.0021) model time 0.2578 (0.2626) loss 6.1554 (5.9235) grad_norm 2.4097 (2.3417) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][580/625] eta 0:00:11 lr 0.000622 wd 0.0500 time 0.2549 (0.2645) data time 0.0013 (0.0021) model time 0.2536 (0.2625) loss 5.6113 (5.9236) grad_norm 2.6402 (2.3404) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][590/625] eta 0:00:09 lr 0.000621 wd 0.0500 time 0.2576 (0.2644) data time 0.0006 (0.0021) model time 0.2569 (0.2623) loss 5.9216 (5.9243) grad_norm 2.4970 (2.3360) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][600/625] eta 0:00:06 lr 0.000621 wd 0.0500 time 0.2554 (0.2642) data time 0.0011 (0.0021) model time 0.2543 (0.2622) loss 6.4610 (5.9278) grad_norm 1.7495 (2.3316) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][610/625] eta 0:00:03 lr 0.000621 wd 0.0500 time 0.2530 (0.2641) data time 0.0006 (0.0021) model time 0.2524 (0.2621) loss 6.7528 (5.9320) grad_norm 2.3861 (2.3271) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][620/625] eta 0:00:01 lr 0.000621 wd 0.0500 time 0.2594 (0.2639) data time 0.0005 (0.0020) model time 0.2589 (0.2619) loss 5.6728 (5.9246) grad_norm 1.8075 (2.3200) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 195 training takes 0:02:44 [2024-08-04 06:45:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:45:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.6509 (0.6509) Acc@1 89.209 (89.209) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9927 (0.7876) Acc@1 79.492 (85.387) Acc@5 95.752 (97.368) Mem 9655MB [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1289 (0.9164) Acc@1 75.244 (82.031) Acc@5 94.092 (96.057) Mem 9655MB [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.768 Acc@5 96.083 [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.77% [2024-08-04 06:45:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:45:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:45:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5815 (0.5815) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9229 (0.7191) Acc@1 80.176 (86.128) Acc@5 95.801 (97.545) Mem 9655MB [2024-08-04 06:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0479 (0.8451) Acc@1 75.879 (82.657) Acc@5 94.727 (96.245) Mem 9655MB [2024-08-04 06:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.372 Acc@5 96.229 [2024-08-04 06:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][0/625] eta 0:11:37 lr 0.000621 wd 0.0500 time 1.1163 (1.1163) data time 0.6283 (0.6283) model time 0.0000 (0.0000) loss 6.0145 (6.0145) grad_norm 1.9860 (1.9860) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][10/625] eta 0:03:25 lr 0.000621 wd 0.0500 time 0.2554 (0.3338) data time 0.0006 (0.0579) model time 0.0000 (0.0000) loss 6.1897 (6.3481) grad_norm 1.5227 (1.8861) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][20/625] eta 0:02:59 lr 0.000621 wd 0.0500 time 0.2569 (0.2971) data time 0.0008 (0.0308) model time 0.0000 (0.0000) loss 5.7323 (6.1359) grad_norm 3.5984 (2.0296) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][30/625] eta 0:02:52 lr 0.000620 wd 0.0500 time 0.2480 (0.2898) data time 0.0009 (0.0212) model time 0.0000 (0.0000) loss 6.4758 (6.0370) grad_norm 2.3942 (2.0249) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:45:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][40/625] eta 0:02:44 lr 0.000620 wd 0.0500 time 0.2649 (0.2817) data time 0.0006 (0.0162) model time 0.0000 (0.0000) loss 6.4847 (5.9969) grad_norm 2.2166 (2.0453) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][50/625] eta 0:02:39 lr 0.000620 wd 0.0500 time 0.2525 (0.2766) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 6.5168 (5.9664) grad_norm 1.5348 (2.0435) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][60/625] eta 0:02:34 lr 0.000620 wd 0.0500 time 0.2528 (0.2732) data time 0.0015 (0.0112) model time 0.2514 (0.2546) loss 5.1666 (5.9051) grad_norm 1.5446 (2.0215) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][70/625] eta 0:02:30 lr 0.000620 wd 0.0500 time 0.2489 (0.2708) data time 0.0011 (0.0098) model time 0.2477 (0.2550) loss 6.2217 (5.9344) grad_norm 1.6213 (1.9738) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][80/625] eta 0:02:26 lr 0.000620 wd 0.0500 time 0.2575 (0.2691) data time 0.0012 (0.0087) model time 0.2563 (0.2553) loss 6.7835 (5.9332) grad_norm 1.6859 (1.9486) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][90/625] eta 0:02:23 lr 0.000619 wd 0.0500 time 0.2569 (0.2676) data time 0.0007 (0.0078) model time 0.2562 (0.2551) loss 5.4852 (5.9068) grad_norm 1.8858 (1.9497) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][100/625] eta 0:02:19 lr 0.000619 wd 0.0500 time 0.2531 (0.2664) data time 0.0009 (0.0072) model time 0.2521 (0.2551) loss 5.4309 (5.9038) grad_norm 2.0516 (1.9562) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][110/625] eta 0:02:16 lr 0.000619 wd 0.0500 time 0.2588 (0.2656) data time 0.0013 (0.0066) model time 0.2574 (0.2553) loss 5.5925 (5.8773) grad_norm 1.7786 (1.9493) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][120/625] eta 0:02:13 lr 0.000619 wd 0.0500 time 0.2551 (0.2648) data time 0.0010 (0.0061) model time 0.2541 (0.2552) loss 5.3988 (5.8704) grad_norm 1.8813 (1.9705) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][130/625] eta 0:02:10 lr 0.000619 wd 0.0500 time 0.2543 (0.2641) data time 0.0009 (0.0057) model time 0.2534 (0.2552) loss 5.7580 (5.8524) grad_norm 1.9277 (1.9562) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][140/625] eta 0:02:07 lr 0.000619 wd 0.0500 time 0.2599 (0.2636) data time 0.0007 (0.0054) model time 0.2593 (0.2553) loss 5.9707 (5.8718) grad_norm 1.8347 (1.9422) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][150/625] eta 0:02:05 lr 0.000618 wd 0.0500 time 0.2532 (0.2638) data time 0.0007 (0.0051) model time 0.2524 (0.2562) loss 6.9201 (5.8688) grad_norm 2.4726 (1.9538) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][160/625] eta 0:02:02 lr 0.000618 wd 0.0500 time 0.2563 (0.2633) data time 0.0010 (0.0048) model time 0.2553 (0.2561) loss 5.1941 (5.8929) grad_norm 1.6734 (1.9509) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][170/625] eta 0:01:59 lr 0.000618 wd 0.0500 time 0.2554 (0.2628) data time 0.0006 (0.0046) model time 0.2547 (0.2560) loss 6.0763 (5.8844) grad_norm 1.2762 (1.9390) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][180/625] eta 0:01:56 lr 0.000618 wd 0.0500 time 0.2576 (0.2624) data time 0.0011 (0.0044) model time 0.2565 (0.2559) loss 5.3170 (5.8713) grad_norm 3.7774 (1.9524) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][190/625] eta 0:01:53 lr 0.000618 wd 0.0500 time 0.2545 (0.2621) data time 0.0009 (0.0042) model time 0.2536 (0.2558) loss 5.4468 (5.8675) grad_norm 2.0068 (1.9713) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][200/625] eta 0:01:51 lr 0.000618 wd 0.0500 time 0.2554 (0.2618) data time 0.0008 (0.0041) model time 0.2546 (0.2558) loss 4.9484 (5.8675) grad_norm 2.2366 (1.9995) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][210/625] eta 0:01:48 lr 0.000617 wd 0.0500 time 0.2531 (0.2615) data time 0.0013 (0.0039) model time 0.2518 (0.2557) loss 5.0370 (5.8644) grad_norm 1.6346 (2.0422) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][220/625] eta 0:01:45 lr 0.000617 wd 0.0500 time 0.2582 (0.2612) data time 0.0006 (0.0038) model time 0.2576 (0.2557) loss 4.8467 (5.8559) grad_norm 2.0140 (2.0763) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][230/625] eta 0:01:43 lr 0.000617 wd 0.0500 time 0.2557 (0.2610) data time 0.0008 (0.0037) model time 0.2549 (0.2556) loss 5.1735 (5.8711) grad_norm 1.5802 (2.0744) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][240/625] eta 0:01:40 lr 0.000617 wd 0.0500 time 0.2543 (0.2617) data time 0.0007 (0.0035) model time 0.2536 (0.2567) loss 6.7164 (5.8860) grad_norm 1.3779 (2.0687) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][250/625] eta 0:01:38 lr 0.000617 wd 0.0500 time 0.2562 (0.2614) data time 0.0008 (0.0034) model time 0.2553 (0.2566) loss 6.6902 (5.8752) grad_norm 3.2721 (2.0816) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][260/625] eta 0:01:35 lr 0.000617 wd 0.0500 time 0.2657 (0.2613) data time 0.0009 (0.0033) model time 0.2648 (0.2566) loss 6.3075 (5.8723) grad_norm 3.9116 (2.0919) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][270/625] eta 0:01:32 lr 0.000616 wd 0.0500 time 0.2523 (0.2611) data time 0.0009 (0.0033) model time 0.2515 (0.2566) loss 5.5494 (5.8810) grad_norm 1.5466 (2.0981) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][280/625] eta 0:01:30 lr 0.000616 wd 0.0500 time 0.2559 (0.2609) data time 0.0009 (0.0032) model time 0.2550 (0.2565) loss 7.0806 (5.9010) grad_norm 3.2732 (2.1145) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][290/625] eta 0:01:27 lr 0.000616 wd 0.0500 time 0.2558 (0.2608) data time 0.0009 (0.0031) model time 0.2549 (0.2565) loss 5.1849 (5.9022) grad_norm 3.1321 (2.1196) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][300/625] eta 0:01:24 lr 0.000616 wd 0.0500 time 0.2599 (0.2606) data time 0.0008 (0.0030) model time 0.2591 (0.2564) loss 5.4807 (5.9010) grad_norm 3.5870 (2.1834) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][310/625] eta 0:01:22 lr 0.000616 wd 0.0500 time 0.2605 (0.2605) data time 0.0007 (0.0030) model time 0.2598 (0.2563) loss 5.5485 (5.8886) grad_norm 2.2713 (2.2202) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][320/625] eta 0:01:19 lr 0.000616 wd 0.0500 time 0.2536 (0.2603) data time 0.0007 (0.0029) model time 0.2529 (0.2563) loss 6.5421 (5.8832) grad_norm 1.8421 (2.2407) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][330/625] eta 0:01:16 lr 0.000616 wd 0.0500 time 0.2554 (0.2602) data time 0.0010 (0.0028) model time 0.2544 (0.2563) loss 5.7828 (5.8761) grad_norm 2.6386 (2.2527) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][340/625] eta 0:01:14 lr 0.000615 wd 0.0500 time 0.2528 (0.2601) data time 0.0009 (0.0028) model time 0.2519 (0.2562) loss 6.4282 (5.8763) grad_norm 1.7006 (2.2609) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][350/625] eta 0:01:11 lr 0.000615 wd 0.0500 time 0.2522 (0.2611) data time 0.0007 (0.0027) model time 0.2515 (0.2576) loss 6.2983 (5.8743) grad_norm 2.5220 (2.2793) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][360/625] eta 0:01:09 lr 0.000615 wd 0.0500 time 0.2546 (0.2610) data time 0.0009 (0.0027) model time 0.2537 (0.2575) loss 6.1290 (5.8769) grad_norm 3.6136 (2.2900) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][370/625] eta 0:01:06 lr 0.000615 wd 0.0500 time 0.2560 (0.2615) data time 0.0011 (0.0026) model time 0.2549 (0.2581) loss 6.2346 (5.8740) grad_norm 2.9428 (2.2903) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][380/625] eta 0:01:04 lr 0.000615 wd 0.0500 time 0.2536 (0.2613) data time 0.0011 (0.0026) model time 0.2524 (0.2580) loss 6.3666 (5.8780) grad_norm 3.6250 (2.2873) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][390/625] eta 0:01:01 lr 0.000615 wd 0.0500 time 0.2554 (0.2617) data time 0.0009 (0.0025) model time 0.2545 (0.2585) loss 5.9217 (5.8694) grad_norm 3.1388 (2.2900) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][400/625] eta 0:00:58 lr 0.000614 wd 0.0500 time 0.2517 (0.2615) data time 0.0008 (0.0025) model time 0.2509 (0.2584) loss 6.8451 (5.8732) grad_norm 1.8172 (2.2830) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][410/625] eta 0:00:56 lr 0.000614 wd 0.0500 time 0.2520 (0.2614) data time 0.0008 (0.0025) model time 0.2512 (0.2583) loss 6.7173 (5.8745) grad_norm 1.6087 (2.2738) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][420/625] eta 0:00:53 lr 0.000614 wd 0.0500 time 0.2629 (0.2622) data time 0.0007 (0.0024) model time 0.2623 (0.2593) loss 5.7491 (5.8731) grad_norm 2.3171 (2.2807) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][430/625] eta 0:00:51 lr 0.000614 wd 0.0500 time 0.2540 (0.2626) data time 0.0008 (0.0024) model time 0.2531 (0.2597) loss 6.7067 (5.8828) grad_norm 2.6992 (2.2758) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][440/625] eta 0:00:48 lr 0.000614 wd 0.0500 time 0.2534 (0.2629) data time 0.0007 (0.0024) model time 0.2528 (0.2601) loss 5.2886 (5.8817) grad_norm 1.7537 (2.2691) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][450/625] eta 0:00:45 lr 0.000614 wd 0.0500 time 0.2491 (0.2627) data time 0.0009 (0.0023) model time 0.2482 (0.2600) loss 5.8619 (5.8797) grad_norm 3.6451 (2.2920) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][460/625] eta 0:00:43 lr 0.000613 wd 0.0500 time 0.2524 (0.2626) data time 0.0008 (0.0023) model time 0.2516 (0.2599) loss 5.0461 (5.8802) grad_norm 2.2853 (2.3109) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][470/625] eta 0:00:40 lr 0.000613 wd 0.0500 time 0.2531 (0.2630) data time 0.0007 (0.0023) model time 0.2524 (0.2604) loss 5.0784 (5.8747) grad_norm 1.9550 (2.3305) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][480/625] eta 0:00:38 lr 0.000613 wd 0.0500 time 0.2565 (0.2629) data time 0.0007 (0.0022) model time 0.2558 (0.2603) loss 5.9846 (5.8678) grad_norm 2.3123 (2.3394) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][490/625] eta 0:00:35 lr 0.000613 wd 0.0500 time 0.2570 (0.2627) data time 0.0009 (0.0022) model time 0.2561 (0.2602) loss 5.3154 (5.8729) grad_norm 1.5382 (2.3413) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][500/625] eta 0:00:32 lr 0.000613 wd 0.0500 time 0.2563 (0.2626) data time 0.0012 (0.0022) model time 0.2551 (0.2601) loss 5.8710 (5.8672) grad_norm 1.9064 (2.3509) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][510/625] eta 0:00:30 lr 0.000613 wd 0.0500 time 0.2563 (0.2625) data time 0.0007 (0.0022) model time 0.2556 (0.2600) loss 5.4928 (5.8669) grad_norm 1.4182 (2.3523) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][520/625] eta 0:00:27 lr 0.000612 wd 0.0500 time 0.2572 (0.2623) data time 0.0008 (0.0021) model time 0.2564 (0.2599) loss 5.8852 (5.8644) grad_norm 3.5733 (2.3631) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][530/625] eta 0:00:24 lr 0.000612 wd 0.0500 time 0.2546 (0.2625) data time 0.0007 (0.0021) model time 0.2539 (0.2601) loss 5.7616 (5.8579) grad_norm 3.0527 (2.3667) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][540/625] eta 0:00:22 lr 0.000612 wd 0.0500 time 0.2524 (0.2624) data time 0.0009 (0.0021) model time 0.2515 (0.2600) loss 5.5480 (5.8616) grad_norm 2.0177 (2.3773) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][550/625] eta 0:00:19 lr 0.000612 wd 0.0500 time 0.2604 (0.2627) data time 0.0006 (0.0021) model time 0.2598 (0.2603) loss 6.4933 (5.8638) grad_norm 2.4380 (2.3753) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][560/625] eta 0:00:17 lr 0.000612 wd 0.0500 time 0.2588 (0.2629) data time 0.0008 (0.0021) model time 0.2580 (0.2606) loss 6.1288 (5.8710) grad_norm 2.3962 (2.3769) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][570/625] eta 0:00:14 lr 0.000612 wd 0.0500 time 0.2539 (0.2631) data time 0.0009 (0.0020) model time 0.2531 (0.2608) loss 6.2255 (5.8806) grad_norm 2.7108 (2.3811) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][580/625] eta 0:00:11 lr 0.000611 wd 0.0500 time 0.2538 (0.2635) data time 0.0009 (0.0020) model time 0.2530 (0.2612) loss 5.7790 (5.8865) grad_norm 1.6891 (2.3752) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][590/625] eta 0:00:09 lr 0.000611 wd 0.0500 time 0.2544 (0.2633) data time 0.0007 (0.0020) model time 0.2538 (0.2611) loss 6.2251 (5.8927) grad_norm 2.1016 (2.3713) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][600/625] eta 0:00:06 lr 0.000611 wd 0.0500 time 0.2550 (0.2632) data time 0.0011 (0.0020) model time 0.2539 (0.2610) loss 5.7531 (5.8935) grad_norm 1.7011 (2.3617) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][610/625] eta 0:00:03 lr 0.000611 wd 0.0500 time 0.2515 (0.2631) data time 0.0004 (0.0020) model time 0.2511 (0.2609) loss 6.4524 (5.8922) grad_norm 1.4784 (2.3591) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][620/625] eta 0:00:01 lr 0.000611 wd 0.0500 time 0.2536 (0.2629) data time 0.0004 (0.0020) model time 0.2532 (0.2607) loss 5.3666 (5.8880) grad_norm 2.2785 (2.3544) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 196 training takes 0:02:44 [2024-08-04 06:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:48:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:48:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.556 (0.556) Loss 0.6147 (0.6147) Acc@1 89.648 (89.648) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9771 (0.7555) Acc@1 79.248 (85.529) Acc@5 95.361 (97.408) Mem 9655MB [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0967 (0.8870) Acc@1 75.977 (82.099) Acc@5 94.141 (95.957) Mem 9655MB [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.814 Acc@5 95.965 [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.81% [2024-08-04 06:48:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:48:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.604 (0.604) Loss 0.5811 (0.5811) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.107) Loss 0.9229 (0.7186) Acc@1 80.322 (86.164) Acc@5 95.850 (97.541) Mem 9655MB [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.082) Loss 1.0459 (0.8444) Acc@1 75.684 (82.685) Acc@5 94.727 (96.257) Mem 9655MB [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.394 Acc@5 96.239 [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.39% [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:48:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:48:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][0/625] eta 0:07:16 lr 0.000611 wd 0.0500 time 0.6981 (0.6981) data time 0.4542 (0.4542) model time 0.0000 (0.0000) loss 5.3954 (5.3954) grad_norm 1.5460 (1.5460) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][10/625] eta 0:03:01 lr 0.000611 wd 0.0500 time 0.2574 (0.2959) data time 0.0006 (0.0421) model time 0.0000 (0.0000) loss 7.1747 (5.7505) grad_norm 2.5644 (2.4813) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][20/625] eta 0:02:47 lr 0.000610 wd 0.0500 time 0.2550 (0.2767) data time 0.0008 (0.0225) model time 0.0000 (0.0000) loss 5.9909 (5.8257) grad_norm 2.5690 (2.3002) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][30/625] eta 0:02:40 lr 0.000610 wd 0.0500 time 0.2521 (0.2700) data time 0.0008 (0.0155) model time 0.0000 (0.0000) loss 6.2586 (5.7857) grad_norm 1.5894 (2.1582) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][40/625] eta 0:02:35 lr 0.000610 wd 0.0500 time 0.2574 (0.2663) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 6.1957 (5.8253) grad_norm 4.3057 (2.2648) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][50/625] eta 0:02:31 lr 0.000610 wd 0.0500 time 0.2530 (0.2642) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 4.8953 (5.8898) grad_norm 1.9131 (2.2090) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][60/625] eta 0:02:28 lr 0.000610 wd 0.0500 time 0.2585 (0.2628) data time 0.0009 (0.0084) model time 0.2575 (0.2549) loss 4.9997 (5.8682) grad_norm 2.5148 (2.1541) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][70/625] eta 0:02:25 lr 0.000610 wd 0.0500 time 0.2569 (0.2618) data time 0.0009 (0.0073) model time 0.2559 (0.2549) loss 6.6748 (5.8472) grad_norm 4.3445 (2.1988) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][80/625] eta 0:02:22 lr 0.000609 wd 0.0500 time 0.2648 (0.2611) data time 0.0005 (0.0065) model time 0.2643 (0.2549) loss 6.4442 (5.8616) grad_norm 1.4888 (2.2174) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][90/625] eta 0:02:19 lr 0.000609 wd 0.0500 time 0.2675 (0.2608) data time 0.0006 (0.0059) model time 0.2669 (0.2556) loss 6.4692 (5.8745) grad_norm 1.9478 (2.2319) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][100/625] eta 0:02:17 lr 0.000609 wd 0.0500 time 0.2567 (0.2620) data time 0.0017 (0.0054) model time 0.2550 (0.2588) loss 5.3612 (5.8622) grad_norm 1.4689 (2.2611) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 06:49:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][110/625] eta 0:02:16 lr 0.000609 wd 0.0500 time 0.2587 (0.2641) data time 0.0009 (0.0050) model time 0.2578 (0.2631) loss 6.4746 (5.8800) grad_norm 6.3722 (inf) loss_scale 512.0000 (982.4865) mem 9655MB [2024-08-04 06:49:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][120/625] eta 0:02:13 lr 0.000609 wd 0.0500 time 0.2694 (0.2636) data time 0.0009 (0.0047) model time 0.2685 (0.2622) loss 6.0461 (5.9047) grad_norm 1.9877 (inf) loss_scale 512.0000 (943.6033) mem 9655MB [2024-08-04 06:49:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][130/625] eta 0:02:10 lr 0.000609 wd 0.0500 time 0.2568 (0.2631) data time 0.0010 (0.0044) model time 0.2558 (0.2614) loss 7.0199 (5.9204) grad_norm 2.0216 (inf) loss_scale 512.0000 (910.6565) mem 9655MB [2024-08-04 06:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][140/625] eta 0:02:07 lr 0.000608 wd 0.0500 time 0.2561 (0.2625) data time 0.0010 (0.0042) model time 0.2551 (0.2607) loss 6.4721 (5.9291) grad_norm 1.7241 (inf) loss_scale 512.0000 (882.3830) mem 9655MB [2024-08-04 06:49:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][150/625] eta 0:02:04 lr 0.000608 wd 0.0500 time 0.2529 (0.2621) data time 0.0007 (0.0039) model time 0.2523 (0.2600) loss 5.6439 (5.9304) grad_norm 1.4278 (inf) loss_scale 512.0000 (857.8543) mem 9655MB [2024-08-04 06:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][160/625] eta 0:02:01 lr 0.000608 wd 0.0500 time 0.2575 (0.2617) data time 0.0008 (0.0038) model time 0.2568 (0.2596) loss 6.5395 (5.9328) grad_norm 2.5225 (inf) loss_scale 512.0000 (836.3727) mem 9655MB [2024-08-04 06:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][170/625] eta 0:01:58 lr 0.000608 wd 0.0500 time 0.2545 (0.2614) data time 0.0009 (0.0036) model time 0.2536 (0.2593) loss 6.1712 (5.9341) grad_norm 2.2826 (inf) loss_scale 512.0000 (817.4035) mem 9655MB [2024-08-04 06:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][180/625] eta 0:01:56 lr 0.000608 wd 0.0500 time 0.2636 (0.2622) data time 0.0006 (0.0034) model time 0.2630 (0.2604) loss 6.3558 (5.9492) grad_norm 1.5565 (inf) loss_scale 512.0000 (800.5304) mem 9655MB [2024-08-04 06:49:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][190/625] eta 0:01:53 lr 0.000608 wd 0.0500 time 0.2553 (0.2618) data time 0.0009 (0.0033) model time 0.2544 (0.2600) loss 5.0901 (5.9493) grad_norm 3.0262 (inf) loss_scale 512.0000 (785.4241) mem 9655MB [2024-08-04 06:49:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][200/625] eta 0:01:51 lr 0.000607 wd 0.0500 time 0.2542 (0.2615) data time 0.0008 (0.0032) model time 0.2534 (0.2596) loss 6.3109 (5.9327) grad_norm 1.9284 (inf) loss_scale 512.0000 (771.8209) mem 9655MB [2024-08-04 06:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][210/625] eta 0:01:49 lr 0.000607 wd 0.0500 time 0.2604 (0.2630) data time 0.0005 (0.0031) model time 0.2599 (0.2617) loss 5.4977 (5.9082) grad_norm 1.5512 (inf) loss_scale 512.0000 (759.5071) mem 9655MB [2024-08-04 06:49:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][220/625] eta 0:01:46 lr 0.000607 wd 0.0500 time 0.2573 (0.2627) data time 0.0007 (0.0030) model time 0.2566 (0.2614) loss 5.5794 (5.8950) grad_norm 1.3217 (inf) loss_scale 512.0000 (748.3077) mem 9655MB [2024-08-04 06:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][230/625] eta 0:01:43 lr 0.000607 wd 0.0500 time 0.2587 (0.2624) data time 0.0006 (0.0029) model time 0.2581 (0.2610) loss 6.5645 (5.9126) grad_norm 2.3237 (inf) loss_scale 512.0000 (738.0779) mem 9655MB [2024-08-04 06:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][240/625] eta 0:01:40 lr 0.000607 wd 0.0500 time 0.2521 (0.2621) data time 0.0010 (0.0028) model time 0.2511 (0.2607) loss 6.3613 (5.9189) grad_norm 1.9317 (inf) loss_scale 512.0000 (728.6971) mem 9655MB [2024-08-04 06:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][250/625] eta 0:01:38 lr 0.000607 wd 0.0500 time 0.2571 (0.2619) data time 0.0011 (0.0027) model time 0.2560 (0.2604) loss 6.3428 (5.9172) grad_norm 1.6797 (inf) loss_scale 512.0000 (720.0637) mem 9655MB [2024-08-04 06:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][260/625] eta 0:01:35 lr 0.000606 wd 0.0500 time 0.2556 (0.2617) data time 0.0008 (0.0027) model time 0.2548 (0.2601) loss 6.0372 (5.9119) grad_norm 2.1749 (inf) loss_scale 512.0000 (712.0920) mem 9655MB [2024-08-04 06:49:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][270/625] eta 0:01:32 lr 0.000606 wd 0.0500 time 0.2539 (0.2615) data time 0.0012 (0.0026) model time 0.2526 (0.2599) loss 6.0013 (5.9023) grad_norm 1.4297 (inf) loss_scale 512.0000 (704.7085) mem 9655MB [2024-08-04 06:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][280/625] eta 0:01:30 lr 0.000606 wd 0.0500 time 0.2554 (0.2613) data time 0.0011 (0.0026) model time 0.2542 (0.2597) loss 5.9888 (5.9006) grad_norm 2.7075 (inf) loss_scale 512.0000 (697.8505) mem 9655MB [2024-08-04 06:49:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][290/625] eta 0:01:27 lr 0.000606 wd 0.0500 time 0.2568 (0.2611) data time 0.0006 (0.0025) model time 0.2561 (0.2595) loss 5.1276 (5.8899) grad_norm 2.5338 (inf) loss_scale 512.0000 (691.4639) mem 9655MB [2024-08-04 06:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][300/625] eta 0:01:24 lr 0.000606 wd 0.0500 time 0.2578 (0.2615) data time 0.0006 (0.0024) model time 0.2571 (0.2600) loss 5.7782 (5.8837) grad_norm 2.1346 (inf) loss_scale 512.0000 (685.5017) mem 9655MB [2024-08-04 06:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][310/625] eta 0:01:22 lr 0.000606 wd 0.0500 time 0.2559 (0.2613) data time 0.0007 (0.0024) model time 0.2552 (0.2598) loss 6.3765 (5.8870) grad_norm 2.5153 (inf) loss_scale 512.0000 (679.9228) mem 9655MB [2024-08-04 06:50:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][320/625] eta 0:01:19 lr 0.000606 wd 0.0500 time 0.2599 (0.2612) data time 0.0008 (0.0024) model time 0.2592 (0.2597) loss 4.9761 (5.8832) grad_norm 1.9959 (inf) loss_scale 512.0000 (674.6916) mem 9655MB [2024-08-04 06:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][330/625] eta 0:01:17 lr 0.000605 wd 0.0500 time 0.2521 (0.2610) data time 0.0010 (0.0023) model time 0.2511 (0.2595) loss 6.7304 (5.8958) grad_norm 2.1038 (inf) loss_scale 512.0000 (669.7764) mem 9655MB [2024-08-04 06:50:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][340/625] eta 0:01:14 lr 0.000605 wd 0.0500 time 0.2534 (0.2609) data time 0.0007 (0.0023) model time 0.2526 (0.2594) loss 5.5755 (5.8950) grad_norm 1.6700 (inf) loss_scale 512.0000 (665.1496) mem 9655MB [2024-08-04 06:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][350/625] eta 0:01:11 lr 0.000605 wd 0.0500 time 0.2385 (0.2612) data time 0.0007 (0.0022) model time 0.2378 (0.2598) loss 6.1268 (5.8853) grad_norm 4.3671 (inf) loss_scale 512.0000 (660.7863) mem 9655MB [2024-08-04 06:50:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][360/625] eta 0:01:09 lr 0.000605 wd 0.0500 time 0.2563 (0.2611) data time 0.0008 (0.0022) model time 0.2555 (0.2596) loss 4.7709 (5.8823) grad_norm 1.3604 (inf) loss_scale 512.0000 (656.6648) mem 9655MB [2024-08-04 06:50:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][370/625] eta 0:01:06 lr 0.000605 wd 0.0500 time 0.2523 (0.2610) data time 0.0008 (0.0022) model time 0.2515 (0.2595) loss 6.2461 (5.8808) grad_norm 3.1885 (inf) loss_scale 512.0000 (652.7655) mem 9655MB [2024-08-04 06:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][380/625] eta 0:01:03 lr 0.000605 wd 0.0500 time 0.2533 (0.2609) data time 0.0009 (0.0021) model time 0.2524 (0.2594) loss 5.4769 (5.8865) grad_norm 3.1061 (inf) loss_scale 512.0000 (649.0709) mem 9655MB [2024-08-04 06:50:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][390/625] eta 0:01:01 lr 0.000604 wd 0.0500 time 0.2538 (0.2608) data time 0.0012 (0.0021) model time 0.2526 (0.2593) loss 5.7581 (5.8867) grad_norm 3.6884 (inf) loss_scale 512.0000 (645.5652) mem 9655MB [2024-08-04 06:50:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][400/625] eta 0:00:58 lr 0.000604 wd 0.0500 time 0.2537 (0.2610) data time 0.0007 (0.0021) model time 0.2530 (0.2595) loss 4.4170 (5.8803) grad_norm 2.8048 (inf) loss_scale 512.0000 (642.2344) mem 9655MB [2024-08-04 06:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][410/625] eta 0:00:56 lr 0.000604 wd 0.0500 time 0.4366 (0.2613) data time 0.0013 (0.0020) model time 0.4353 (0.2599) loss 6.4876 (5.8875) grad_norm 1.2127 (inf) loss_scale 512.0000 (639.0657) mem 9655MB [2024-08-04 06:50:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][420/625] eta 0:00:53 lr 0.000604 wd 0.0500 time 0.2502 (0.2612) data time 0.0010 (0.0020) model time 0.2492 (0.2598) loss 5.7146 (5.8853) grad_norm 3.2726 (inf) loss_scale 512.0000 (636.0475) mem 9655MB [2024-08-04 06:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][430/625] eta 0:00:50 lr 0.000604 wd 0.0500 time 0.2539 (0.2610) data time 0.0008 (0.0020) model time 0.2532 (0.2597) loss 5.3729 (5.8848) grad_norm 1.7921 (inf) loss_scale 512.0000 (633.1694) mem 9655MB [2024-08-04 06:50:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][440/625] eta 0:00:48 lr 0.000604 wd 0.0500 time 0.2533 (0.2609) data time 0.0007 (0.0020) model time 0.2526 (0.2596) loss 5.4365 (5.8829) grad_norm 1.8727 (inf) loss_scale 512.0000 (630.4218) mem 9655MB [2024-08-04 06:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][450/625] eta 0:00:45 lr 0.000603 wd 0.0500 time 0.2597 (0.2608) data time 0.0008 (0.0019) model time 0.2589 (0.2595) loss 5.6175 (5.8789) grad_norm 2.1151 (inf) loss_scale 512.0000 (627.7960) mem 9655MB [2024-08-04 06:50:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][460/625] eta 0:00:43 lr 0.000603 wd 0.0500 time 0.2560 (0.2608) data time 0.0011 (0.0019) model time 0.2550 (0.2594) loss 6.4844 (5.8751) grad_norm 2.2443 (inf) loss_scale 512.0000 (625.2842) mem 9655MB [2024-08-04 06:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][470/625] eta 0:00:40 lr 0.000603 wd 0.0500 time 0.2553 (0.2607) data time 0.0007 (0.0019) model time 0.2545 (0.2593) loss 6.3212 (5.8754) grad_norm 2.7822 (inf) loss_scale 512.0000 (622.8790) mem 9655MB [2024-08-04 06:50:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][480/625] eta 0:00:37 lr 0.000603 wd 0.0500 time 0.2529 (0.2606) data time 0.0009 (0.0019) model time 0.2520 (0.2592) loss 5.8870 (5.8780) grad_norm 2.0168 (inf) loss_scale 512.0000 (620.5738) mem 9655MB [2024-08-04 06:50:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][490/625] eta 0:00:35 lr 0.000603 wd 0.0500 time 0.2587 (0.2604) data time 0.0008 (0.0019) model time 0.2579 (0.2591) loss 6.5532 (5.8750) grad_norm 2.1994 (inf) loss_scale 512.0000 (618.3625) mem 9655MB [2024-08-04 06:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][500/625] eta 0:00:32 lr 0.000603 wd 0.0500 time 0.2551 (0.2604) data time 0.0008 (0.0018) model time 0.2542 (0.2590) loss 5.6659 (5.8664) grad_norm 1.4432 (inf) loss_scale 512.0000 (616.2395) mem 9655MB [2024-08-04 06:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][510/625] eta 0:00:29 lr 0.000602 wd 0.0500 time 0.2552 (0.2603) data time 0.0008 (0.0018) model time 0.2544 (0.2590) loss 5.9182 (5.8630) grad_norm 2.0604 (inf) loss_scale 512.0000 (614.1996) mem 9655MB [2024-08-04 06:50:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][520/625] eta 0:00:27 lr 0.000602 wd 0.0500 time 0.2530 (0.2603) data time 0.0010 (0.0018) model time 0.2519 (0.2589) loss 5.8381 (5.8701) grad_norm 3.0094 (inf) loss_scale 512.0000 (612.2380) mem 9655MB [2024-08-04 06:50:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][530/625] eta 0:00:24 lr 0.000602 wd 0.0500 time 0.2626 (0.2602) data time 0.0008 (0.0018) model time 0.2618 (0.2589) loss 6.1706 (5.8743) grad_norm 3.5642 (inf) loss_scale 512.0000 (610.3503) mem 9655MB [2024-08-04 06:50:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][540/625] eta 0:00:22 lr 0.000602 wd 0.0500 time 0.2552 (0.2601) data time 0.0006 (0.0018) model time 0.2546 (0.2588) loss 5.3402 (5.8694) grad_norm 3.4735 (inf) loss_scale 512.0000 (608.5323) mem 9655MB [2024-08-04 06:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][550/625] eta 0:00:19 lr 0.000602 wd 0.0500 time 0.2540 (0.2601) data time 0.0011 (0.0018) model time 0.2529 (0.2587) loss 6.5066 (5.8662) grad_norm 1.5907 (inf) loss_scale 512.0000 (606.7804) mem 9655MB [2024-08-04 06:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][560/625] eta 0:00:16 lr 0.000602 wd 0.0500 time 0.2557 (0.2600) data time 0.0008 (0.0017) model time 0.2548 (0.2586) loss 6.0606 (5.8700) grad_norm 2.1008 (inf) loss_scale 512.0000 (605.0909) mem 9655MB [2024-08-04 06:51:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][570/625] eta 0:00:14 lr 0.000601 wd 0.0500 time 0.2511 (0.2599) data time 0.0007 (0.0017) model time 0.2504 (0.2586) loss 4.3685 (5.8641) grad_norm 2.0228 (inf) loss_scale 512.0000 (603.4606) mem 9655MB [2024-08-04 06:51:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][580/625] eta 0:00:11 lr 0.000601 wd 0.0500 time 0.2553 (0.2598) data time 0.0007 (0.0017) model time 0.2546 (0.2585) loss 5.4483 (5.8659) grad_norm 1.3329 (inf) loss_scale 512.0000 (601.8864) mem 9655MB [2024-08-04 06:51:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][590/625] eta 0:00:09 lr 0.000601 wd 0.0500 time 0.2556 (0.2600) data time 0.0007 (0.0017) model time 0.2549 (0.2587) loss 5.1609 (5.8640) grad_norm 2.2263 (inf) loss_scale 512.0000 (600.3655) mem 9655MB [2024-08-04 06:51:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][600/625] eta 0:00:06 lr 0.000601 wd 0.0500 time 0.2545 (0.2602) data time 0.0009 (0.0017) model time 0.2535 (0.2589) loss 7.0212 (5.8570) grad_norm 2.6526 (inf) loss_scale 512.0000 (598.8952) mem 9655MB [2024-08-04 06:51:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][610/625] eta 0:00:03 lr 0.000601 wd 0.0500 time 0.2516 (0.2607) data time 0.0008 (0.0017) model time 0.2508 (0.2595) loss 6.5544 (5.8517) grad_norm 2.5583 (inf) loss_scale 512.0000 (597.4730) mem 9655MB [2024-08-04 06:51:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][620/625] eta 0:00:01 lr 0.000601 wd 0.0500 time 0.2530 (0.2606) data time 0.0005 (0.0017) model time 0.2525 (0.2594) loss 6.3531 (5.8511) grad_norm 2.0854 (inf) loss_scale 512.0000 (596.0966) mem 9655MB [2024-08-04 06:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 197 training takes 0:02:42 [2024-08-04 06:51:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:51:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:51:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.536 (0.536) Loss 0.6050 (0.6050) Acc@1 88.818 (88.818) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 06:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9551 (0.7401) Acc@1 79.395 (85.574) Acc@5 95.850 (97.417) Mem 9655MB [2024-08-04 06:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0947 (0.8729) Acc@1 75.342 (82.215) Acc@5 94.385 (96.094) Mem 9655MB [2024-08-04 06:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.910 Acc@5 96.111 [2024-08-04 06:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 06:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.91% [2024-08-04 06:51:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:51:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.550 (0.550) Loss 0.5806 (0.5806) Acc@1 89.893 (89.893) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9219 (0.7180) Acc@1 80.420 (86.191) Acc@5 95.898 (97.563) Mem 9655MB [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0469 (0.8440) Acc@1 75.684 (82.706) Acc@5 94.727 (96.277) Mem 9655MB [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.414 Acc@5 96.259 [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.41% [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:51:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:51:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][0/625] eta 0:06:55 lr 0.000601 wd 0.0500 time 0.6652 (0.6652) data time 0.4232 (0.4232) model time 0.0000 (0.0000) loss 5.1722 (5.1722) grad_norm 1.4943 (1.4943) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][10/625] eta 0:03:00 lr 0.000600 wd 0.0500 time 0.2569 (0.2940) data time 0.0009 (0.0393) model time 0.0000 (0.0000) loss 6.0078 (5.7701) grad_norm 1.6270 (2.2956) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][20/625] eta 0:02:47 lr 0.000600 wd 0.0500 time 0.2549 (0.2762) data time 0.0010 (0.0210) model time 0.0000 (0.0000) loss 5.5677 (5.8877) grad_norm 3.4880 (2.6534) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][30/625] eta 0:02:43 lr 0.000600 wd 0.0500 time 0.2518 (0.2750) data time 0.0009 (0.0145) model time 0.0000 (0.0000) loss 6.3652 (5.9357) grad_norm 1.4499 (2.4357) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][40/625] eta 0:02:38 lr 0.000600 wd 0.0500 time 0.2561 (0.2704) data time 0.0009 (0.0113) model time 0.0000 (0.0000) loss 7.3050 (5.9430) grad_norm 1.6465 (2.2673) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][50/625] eta 0:02:37 lr 0.000600 wd 0.0500 time 0.2554 (0.2743) data time 0.0006 (0.0092) model time 0.0000 (0.0000) loss 6.0487 (5.9437) grad_norm 1.4066 (2.1251) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][60/625] eta 0:02:33 lr 0.000600 wd 0.0500 time 0.2607 (0.2715) data time 0.0008 (0.0079) model time 0.2599 (0.2561) loss 4.8107 (5.9290) grad_norm 2.4098 (2.0985) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][70/625] eta 0:02:30 lr 0.000599 wd 0.0500 time 0.2577 (0.2721) data time 0.0006 (0.0069) model time 0.2570 (0.2653) loss 6.2834 (5.9428) grad_norm 2.3876 (2.0920) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][80/625] eta 0:02:27 lr 0.000599 wd 0.0500 time 0.2560 (0.2703) data time 0.0011 (0.0062) model time 0.2549 (0.2624) loss 5.7064 (5.9473) grad_norm 2.5867 (2.1584) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][90/625] eta 0:02:23 lr 0.000599 wd 0.0500 time 0.2577 (0.2688) data time 0.0006 (0.0056) model time 0.2570 (0.2608) loss 6.5187 (5.9450) grad_norm 2.2396 (2.2295) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][100/625] eta 0:02:20 lr 0.000599 wd 0.0500 time 0.2576 (0.2676) data time 0.0005 (0.0051) model time 0.2571 (0.2597) loss 5.8844 (5.9476) grad_norm 1.9030 (2.2495) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][110/625] eta 0:02:19 lr 0.000599 wd 0.0500 time 0.2611 (0.2700) data time 0.0006 (0.0047) model time 0.2605 (0.2653) loss 4.8828 (5.9615) grad_norm 2.8531 (2.2504) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][120/625] eta 0:02:16 lr 0.000599 wd 0.0500 time 0.2593 (0.2702) data time 0.0006 (0.0044) model time 0.2587 (0.2663) loss 4.8510 (5.9846) grad_norm 3.8065 (2.2934) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][130/625] eta 0:02:13 lr 0.000598 wd 0.0500 time 0.2529 (0.2693) data time 0.0009 (0.0042) model time 0.2520 (0.2651) loss 6.6838 (5.9886) grad_norm 10.5409 (2.3733) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][140/625] eta 0:02:10 lr 0.000598 wd 0.0500 time 0.2578 (0.2684) data time 0.0007 (0.0039) model time 0.2571 (0.2641) loss 4.2554 (5.9830) grad_norm 2.6206 (2.3944) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][150/625] eta 0:02:07 lr 0.000598 wd 0.0500 time 0.2585 (0.2676) data time 0.0009 (0.0037) model time 0.2576 (0.2632) loss 6.2065 (5.9940) grad_norm 2.2215 (2.3581) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][160/625] eta 0:02:04 lr 0.000598 wd 0.0500 time 0.2540 (0.2669) data time 0.0009 (0.0036) model time 0.2531 (0.2624) loss 6.0515 (5.9851) grad_norm 3.2131 (2.4001) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][170/625] eta 0:02:01 lr 0.000598 wd 0.0500 time 0.2562 (0.2662) data time 0.0009 (0.0034) model time 0.2552 (0.2618) loss 6.4720 (5.9906) grad_norm 2.6173 (2.3896) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][180/625] eta 0:01:58 lr 0.000598 wd 0.0500 time 0.2589 (0.2656) data time 0.0007 (0.0033) model time 0.2582 (0.2612) loss 6.5518 (5.9996) grad_norm 1.8358 (2.4020) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][190/625] eta 0:01:55 lr 0.000598 wd 0.0500 time 0.2586 (0.2651) data time 0.0008 (0.0032) model time 0.2578 (0.2607) loss 6.5639 (5.9942) grad_norm 1.7889 (2.3898) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][200/625] eta 0:01:52 lr 0.000597 wd 0.0500 time 0.2571 (0.2647) data time 0.0008 (0.0031) model time 0.2563 (0.2605) loss 4.4784 (5.9935) grad_norm 1.9533 (2.3846) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][210/625] eta 0:01:49 lr 0.000597 wd 0.0500 time 0.2553 (0.2643) data time 0.0011 (0.0030) model time 0.2541 (0.2602) loss 5.9917 (5.9942) grad_norm 2.6583 (2.3875) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][220/625] eta 0:01:46 lr 0.000597 wd 0.0500 time 0.2547 (0.2639) data time 0.0010 (0.0029) model time 0.2536 (0.2598) loss 5.9056 (5.9786) grad_norm 2.1710 (2.3755) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][230/625] eta 0:01:44 lr 0.000597 wd 0.0500 time 0.2564 (0.2652) data time 0.0007 (0.0028) model time 0.2556 (0.2616) loss 6.8177 (5.9826) grad_norm 3.3273 (2.3681) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][240/625] eta 0:01:41 lr 0.000597 wd 0.0500 time 0.2533 (0.2647) data time 0.0008 (0.0027) model time 0.2525 (0.2612) loss 6.2162 (5.9796) grad_norm 1.8620 (2.3632) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][250/625] eta 0:01:39 lr 0.000597 wd 0.0500 time 0.2539 (0.2652) data time 0.0017 (0.0026) model time 0.2522 (0.2619) loss 4.8520 (5.9690) grad_norm 2.9844 (2.3671) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][260/625] eta 0:01:36 lr 0.000596 wd 0.0500 time 0.2634 (0.2649) data time 0.0005 (0.0026) model time 0.2629 (0.2617) loss 6.4425 (5.9579) grad_norm 2.3413 (2.3560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][270/625] eta 0:01:33 lr 0.000596 wd 0.0500 time 0.2378 (0.2646) data time 0.0009 (0.0025) model time 0.2369 (0.2614) loss 5.3437 (5.9487) grad_norm 3.7069 (2.3622) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][280/625] eta 0:01:31 lr 0.000596 wd 0.0500 time 0.2549 (0.2643) data time 0.0009 (0.0025) model time 0.2540 (0.2611) loss 4.8082 (5.9366) grad_norm 4.7978 (2.3727) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][290/625] eta 0:01:28 lr 0.000596 wd 0.0500 time 0.4700 (0.2647) data time 0.0017 (0.0024) model time 0.4683 (0.2617) loss 6.4345 (5.9309) grad_norm 2.1752 (2.3627) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][300/625] eta 0:01:25 lr 0.000596 wd 0.0500 time 0.2575 (0.2644) data time 0.0009 (0.0024) model time 0.2566 (0.2614) loss 6.0349 (5.9301) grad_norm 2.0238 (2.3586) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][310/625] eta 0:01:23 lr 0.000596 wd 0.0500 time 0.2537 (0.2641) data time 0.0009 (0.0023) model time 0.2528 (0.2612) loss 6.6391 (5.9366) grad_norm 2.6098 (2.3634) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][320/625] eta 0:01:20 lr 0.000595 wd 0.0500 time 0.2591 (0.2639) data time 0.0009 (0.0023) model time 0.2583 (0.2610) loss 4.7818 (5.9397) grad_norm 2.0128 (2.3589) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][330/625] eta 0:01:17 lr 0.000595 wd 0.0500 time 0.2540 (0.2637) data time 0.0008 (0.0022) model time 0.2532 (0.2608) loss 5.6038 (5.9401) grad_norm 2.6871 (2.3682) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][340/625] eta 0:01:15 lr 0.000595 wd 0.0500 time 0.2578 (0.2635) data time 0.0005 (0.0022) model time 0.2573 (0.2607) loss 4.6016 (5.9317) grad_norm 2.6907 (2.3815) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][350/625] eta 0:01:12 lr 0.000595 wd 0.0500 time 0.2575 (0.2633) data time 0.0008 (0.0022) model time 0.2567 (0.2605) loss 6.2293 (5.9328) grad_norm 3.0076 (2.3688) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][360/625] eta 0:01:09 lr 0.000595 wd 0.0500 time 0.2573 (0.2631) data time 0.0008 (0.0021) model time 0.2565 (0.2603) loss 7.4386 (5.9230) grad_norm 1.2978 (2.3597) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][370/625] eta 0:01:07 lr 0.000595 wd 0.0500 time 0.2522 (0.2634) data time 0.0009 (0.0021) model time 0.2514 (0.2607) loss 5.5659 (5.9164) grad_norm 2.8731 (2.3822) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][380/625] eta 0:01:04 lr 0.000594 wd 0.0500 time 0.2561 (0.2632) data time 0.0007 (0.0021) model time 0.2554 (0.2605) loss 5.0384 (5.9062) grad_norm 1.3632 (2.3834) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][390/625] eta 0:01:01 lr 0.000594 wd 0.0500 time 0.2562 (0.2635) data time 0.0008 (0.0020) model time 0.2555 (0.2610) loss 6.3914 (5.9029) grad_norm 1.6972 (2.3703) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][400/625] eta 0:00:59 lr 0.000594 wd 0.0500 time 0.2534 (0.2638) data time 0.0008 (0.0020) model time 0.2526 (0.2613) loss 6.1257 (5.9079) grad_norm 1.6883 (2.3511) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][410/625] eta 0:00:56 lr 0.000594 wd 0.0500 time 0.4416 (0.2641) data time 0.0006 (0.0020) model time 0.4410 (0.2617) loss 6.0829 (5.9103) grad_norm 1.7259 (2.3349) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][420/625] eta 0:00:54 lr 0.000594 wd 0.0500 time 0.2498 (0.2643) data time 0.0008 (0.0019) model time 0.2490 (0.2620) loss 5.8189 (5.9046) grad_norm 1.5643 (2.3227) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][430/625] eta 0:00:51 lr 0.000594 wd 0.0500 time 0.2542 (0.2641) data time 0.0008 (0.0019) model time 0.2534 (0.2618) loss 7.2474 (5.9102) grad_norm 1.4206 (2.3211) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][440/625] eta 0:00:48 lr 0.000593 wd 0.0500 time 0.2569 (0.2639) data time 0.0010 (0.0019) model time 0.2559 (0.2616) loss 5.6487 (5.9087) grad_norm 1.5920 (2.3079) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][450/625] eta 0:00:46 lr 0.000593 wd 0.0500 time 0.2585 (0.2638) data time 0.0007 (0.0019) model time 0.2578 (0.2615) loss 5.2236 (5.9061) grad_norm 2.2917 (2.3092) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][460/625] eta 0:00:43 lr 0.000593 wd 0.0500 time 0.2630 (0.2636) data time 0.0007 (0.0019) model time 0.2623 (0.2613) loss 6.5522 (5.9046) grad_norm 1.2218 (2.3017) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][470/625] eta 0:00:40 lr 0.000593 wd 0.0500 time 0.2563 (0.2635) data time 0.0008 (0.0018) model time 0.2555 (0.2612) loss 5.1647 (5.9017) grad_norm 2.7668 (2.3067) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][480/625] eta 0:00:38 lr 0.000593 wd 0.0500 time 0.2566 (0.2633) data time 0.0006 (0.0018) model time 0.2559 (0.2611) loss 6.3197 (5.9075) grad_norm 1.4395 (2.3096) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][490/625] eta 0:00:35 lr 0.000593 wd 0.0500 time 0.2541 (0.2632) data time 0.0007 (0.0018) model time 0.2534 (0.2610) loss 5.4586 (5.9007) grad_norm 2.5683 (2.3098) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][500/625] eta 0:00:32 lr 0.000593 wd 0.0500 time 0.2515 (0.2630) data time 0.0008 (0.0018) model time 0.2507 (0.2608) loss 6.2511 (5.8940) grad_norm 4.4595 (2.3129) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][510/625] eta 0:00:30 lr 0.000592 wd 0.0500 time 0.2533 (0.2633) data time 0.0007 (0.0018) model time 0.2526 (0.2611) loss 6.6857 (5.8922) grad_norm 2.0620 (2.3064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][520/625] eta 0:00:27 lr 0.000592 wd 0.0500 time 0.2540 (0.2631) data time 0.0006 (0.0017) model time 0.2534 (0.2610) loss 6.3799 (5.8893) grad_norm 2.0257 (2.3489) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][530/625] eta 0:00:24 lr 0.000592 wd 0.0500 time 0.2548 (0.2630) data time 0.0007 (0.0017) model time 0.2541 (0.2609) loss 6.1119 (5.8912) grad_norm 2.6456 (2.3481) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][540/625] eta 0:00:22 lr 0.000592 wd 0.0500 time 0.2551 (0.2629) data time 0.0008 (0.0017) model time 0.2542 (0.2608) loss 6.1423 (5.8845) grad_norm 3.9202 (2.3493) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][550/625] eta 0:00:19 lr 0.000592 wd 0.0500 time 0.2556 (0.2628) data time 0.0007 (0.0017) model time 0.2549 (0.2607) loss 6.0819 (5.8851) grad_norm 2.5774 (2.3873) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][560/625] eta 0:00:17 lr 0.000592 wd 0.0500 time 0.2538 (0.2627) data time 0.0008 (0.0017) model time 0.2530 (0.2606) loss 4.8901 (5.8802) grad_norm 3.6653 (2.3918) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][570/625] eta 0:00:14 lr 0.000591 wd 0.0500 time 0.2541 (0.2629) data time 0.0010 (0.0017) model time 0.2531 (0.2609) loss 5.8996 (5.8766) grad_norm 1.5854 (2.4009) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][580/625] eta 0:00:11 lr 0.000591 wd 0.0500 time 0.2515 (0.2628) data time 0.0008 (0.0017) model time 0.2507 (0.2608) loss 6.7862 (5.8777) grad_norm 1.7498 (2.4115) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][590/625] eta 0:00:09 lr 0.000591 wd 0.0500 time 0.2549 (0.2627) data time 0.0010 (0.0016) model time 0.2539 (0.2607) loss 4.4550 (5.8706) grad_norm 1.7305 (2.4080) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][600/625] eta 0:00:06 lr 0.000591 wd 0.0500 time 0.2581 (0.2626) data time 0.0008 (0.0016) model time 0.2573 (0.2606) loss 7.0126 (5.8788) grad_norm 2.6552 (2.4142) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][610/625] eta 0:00:03 lr 0.000591 wd 0.0500 time 0.2518 (0.2625) data time 0.0004 (0.0016) model time 0.2514 (0.2605) loss 5.7648 (5.8787) grad_norm 1.9149 (2.4072) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][620/625] eta 0:00:01 lr 0.000591 wd 0.0500 time 0.2523 (0.2623) data time 0.0005 (0.0016) model time 0.2518 (0.2603) loss 5.9882 (5.8810) grad_norm 2.0282 (2.3989) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 198 training takes 0:02:43 [2024-08-04 06:54:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:54:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:54:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.6201 (0.6201) Acc@1 89.014 (89.014) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 06:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 1.0000 (0.7540) Acc@1 79.248 (85.596) Acc@5 95.068 (97.430) Mem 9655MB [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0820 (0.8830) Acc@1 74.707 (82.264) Acc@5 95.264 (96.122) Mem 9655MB [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.940 Acc@5 96.093 [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.94% [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 06:54:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 06:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.5806 (0.5806) Acc@1 89.697 (89.697) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9214 (0.7178) Acc@1 80.469 (86.159) Acc@5 95.898 (97.607) Mem 9655MB [2024-08-04 06:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0469 (0.8437) Acc@1 75.830 (82.724) Acc@5 94.678 (96.310) Mem 9655MB [2024-08-04 06:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.438 Acc@5 96.289 [2024-08-04 06:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 06:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.44% [2024-08-04 06:54:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:54:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:54:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][0/625] eta 0:07:44 lr 0.000590 wd 0.0500 time 0.7424 (0.7424) data time 0.5022 (0.5022) model time 0.0000 (0.0000) loss 4.9160 (4.9160) grad_norm 1.5520 (1.5520) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][10/625] eta 0:03:04 lr 0.000590 wd 0.0500 time 0.2569 (0.2995) data time 0.0012 (0.0466) model time 0.0000 (0.0000) loss 5.4742 (5.7828) grad_norm 1.7818 (2.0015) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][20/625] eta 0:02:48 lr 0.000590 wd 0.0500 time 0.2621 (0.2792) data time 0.0007 (0.0249) model time 0.0000 (0.0000) loss 6.1554 (5.7312) grad_norm 1.4783 (2.4257) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][30/625] eta 0:02:41 lr 0.000590 wd 0.0500 time 0.2515 (0.2711) data time 0.0010 (0.0172) model time 0.0000 (0.0000) loss 6.3255 (5.7179) grad_norm 2.1641 (2.3269) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][40/625] eta 0:02:36 lr 0.000590 wd 0.0500 time 0.2584 (0.2676) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 6.2498 (5.7970) grad_norm 2.7686 (2.2592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][50/625] eta 0:02:32 lr 0.000590 wd 0.0500 time 0.2509 (0.2655) data time 0.0009 (0.0108) model time 0.0000 (0.0000) loss 6.4911 (5.7257) grad_norm 1.2919 (2.2565) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][60/625] eta 0:02:31 lr 0.000590 wd 0.0500 time 0.4090 (0.2685) data time 0.0011 (0.0092) model time 0.4079 (0.2831) loss 6.0463 (5.7250) grad_norm 1.4938 (2.1684) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][70/625] eta 0:02:29 lr 0.000589 wd 0.0500 time 0.2581 (0.2685) data time 0.0009 (0.0080) model time 0.2572 (0.2753) loss 6.4277 (5.7436) grad_norm 4.3738 (2.1509) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][80/625] eta 0:02:25 lr 0.000589 wd 0.0500 time 0.2528 (0.2669) data time 0.0007 (0.0072) model time 0.2521 (0.2683) loss 5.4749 (5.7320) grad_norm 2.5633 (2.1522) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][90/625] eta 0:02:22 lr 0.000589 wd 0.0500 time 0.2541 (0.2658) data time 0.0010 (0.0065) model time 0.2532 (0.2654) loss 5.7610 (5.7584) grad_norm 1.4245 (2.1437) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][100/625] eta 0:02:20 lr 0.000589 wd 0.0500 time 0.2567 (0.2684) data time 0.0006 (0.0059) model time 0.2561 (0.2705) loss 5.6118 (5.7499) grad_norm 2.2744 (2.1390) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][110/625] eta 0:02:17 lr 0.000589 wd 0.0500 time 0.2547 (0.2671) data time 0.0016 (0.0055) model time 0.2531 (0.2676) loss 5.0272 (5.7491) grad_norm 2.3654 (2.1293) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][120/625] eta 0:02:14 lr 0.000589 wd 0.0500 time 0.2538 (0.2664) data time 0.0005 (0.0051) model time 0.2533 (0.2661) loss 6.1026 (5.7497) grad_norm 2.3592 (2.1520) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][130/625] eta 0:02:11 lr 0.000588 wd 0.0500 time 0.2547 (0.2655) data time 0.0007 (0.0048) model time 0.2540 (0.2647) loss 5.1799 (5.7519) grad_norm 2.5946 (2.1375) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][140/625] eta 0:02:09 lr 0.000588 wd 0.0500 time 0.2549 (0.2664) data time 0.0009 (0.0045) model time 0.2540 (0.2659) loss 6.7054 (5.7624) grad_norm 1.7486 (2.1345) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][150/625] eta 0:02:06 lr 0.000588 wd 0.0500 time 0.2551 (0.2657) data time 0.0006 (0.0043) model time 0.2545 (0.2649) loss 4.5747 (5.7452) grad_norm 2.5909 (2.1191) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][160/625] eta 0:02:03 lr 0.000588 wd 0.0500 time 0.2590 (0.2651) data time 0.0008 (0.0041) model time 0.2582 (0.2641) loss 4.8859 (5.7353) grad_norm 2.3977 (2.1311) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][170/625] eta 0:02:00 lr 0.000588 wd 0.0500 time 0.2543 (0.2647) data time 0.0009 (0.0039) model time 0.2535 (0.2634) loss 6.5209 (5.7435) grad_norm 2.5846 (2.1143) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][180/625] eta 0:01:58 lr 0.000588 wd 0.0500 time 0.2567 (0.2652) data time 0.0008 (0.0037) model time 0.2559 (0.2642) loss 4.6117 (5.7347) grad_norm 3.0324 (2.1330) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][190/625] eta 0:01:55 lr 0.000587 wd 0.0500 time 0.2595 (0.2647) data time 0.0008 (0.0036) model time 0.2587 (0.2635) loss 5.4915 (5.7338) grad_norm 1.4750 (2.1306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][200/625] eta 0:01:52 lr 0.000587 wd 0.0500 time 0.2571 (0.2652) data time 0.0016 (0.0034) model time 0.2555 (0.2642) loss 6.2553 (5.7381) grad_norm 3.1331 (2.1470) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][210/625] eta 0:01:49 lr 0.000587 wd 0.0500 time 0.2553 (0.2649) data time 0.0006 (0.0033) model time 0.2546 (0.2638) loss 6.7865 (5.7458) grad_norm 2.1221 (2.1585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][220/625] eta 0:01:47 lr 0.000587 wd 0.0500 time 0.2550 (0.2646) data time 0.0009 (0.0032) model time 0.2541 (0.2634) loss 5.8878 (5.7384) grad_norm 1.3816 (2.1469) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][230/625] eta 0:01:44 lr 0.000587 wd 0.0500 time 0.2571 (0.2643) data time 0.0007 (0.0031) model time 0.2564 (0.2630) loss 5.3451 (5.7334) grad_norm 2.3849 (2.1412) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][240/625] eta 0:01:42 lr 0.000587 wd 0.0500 time 0.4401 (0.2655) data time 0.0008 (0.0030) model time 0.4393 (0.2646) loss 6.8619 (5.7355) grad_norm 1.3724 (2.1351) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][250/625] eta 0:01:39 lr 0.000586 wd 0.0500 time 0.2529 (0.2651) data time 0.0008 (0.0029) model time 0.2521 (0.2641) loss 6.6344 (5.7279) grad_norm 3.6601 (2.1442) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][260/625] eta 0:01:36 lr 0.000586 wd 0.0500 time 0.2527 (0.2648) data time 0.0008 (0.0029) model time 0.2519 (0.2638) loss 5.9983 (5.7396) grad_norm 1.6013 (2.1410) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][270/625] eta 0:01:34 lr 0.000586 wd 0.0500 time 0.2580 (0.2652) data time 0.0009 (0.0028) model time 0.2571 (0.2642) loss 6.8372 (5.7442) grad_norm 4.3617 (2.1860) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][280/625] eta 0:01:31 lr 0.000586 wd 0.0500 time 0.2563 (0.2648) data time 0.0006 (0.0027) model time 0.2557 (0.2638) loss 6.6122 (5.7416) grad_norm 1.4865 (2.1896) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][290/625] eta 0:01:28 lr 0.000586 wd 0.0500 time 0.2535 (0.2650) data time 0.0008 (0.0027) model time 0.2526 (0.2639) loss 5.8297 (5.7404) grad_norm 1.8971 (2.2245) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][300/625] eta 0:01:26 lr 0.000586 wd 0.0500 time 0.2602 (0.2647) data time 0.0007 (0.0026) model time 0.2595 (0.2636) loss 6.5567 (5.7431) grad_norm 6.4185 (2.2345) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][310/625] eta 0:01:23 lr 0.000586 wd 0.0500 time 0.2579 (0.2644) data time 0.0006 (0.0026) model time 0.2573 (0.2633) loss 6.3271 (5.7436) grad_norm 4.1203 (2.2470) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][320/625] eta 0:01:20 lr 0.000585 wd 0.0500 time 0.2568 (0.2641) data time 0.0007 (0.0025) model time 0.2561 (0.2630) loss 5.8391 (5.7621) grad_norm 1.4301 (2.2461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][330/625] eta 0:01:17 lr 0.000585 wd 0.0500 time 0.2557 (0.2639) data time 0.0007 (0.0025) model time 0.2550 (0.2627) loss 4.8871 (5.7489) grad_norm 2.1577 (2.2540) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][340/625] eta 0:01:15 lr 0.000585 wd 0.0500 time 0.2534 (0.2637) data time 0.0007 (0.0024) model time 0.2527 (0.2624) loss 4.8372 (5.7518) grad_norm 1.9326 (2.2751) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][350/625] eta 0:01:12 lr 0.000585 wd 0.0500 time 0.2552 (0.2634) data time 0.0009 (0.0024) model time 0.2542 (0.2622) loss 6.1481 (5.7543) grad_norm 3.5549 (2.2740) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][360/625] eta 0:01:09 lr 0.000585 wd 0.0500 time 0.2552 (0.2638) data time 0.0010 (0.0023) model time 0.2542 (0.2626) loss 6.8871 (5.7649) grad_norm 1.7652 (2.2596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][370/625] eta 0:01:07 lr 0.000585 wd 0.0500 time 0.2566 (0.2636) data time 0.0009 (0.0023) model time 0.2558 (0.2623) loss 7.5299 (5.7801) grad_norm 3.5741 (2.2476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][380/625] eta 0:01:04 lr 0.000584 wd 0.0500 time 0.2517 (0.2634) data time 0.0011 (0.0023) model time 0.2507 (0.2621) loss 5.8732 (5.7772) grad_norm 1.8288 (2.2473) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][390/625] eta 0:01:01 lr 0.000584 wd 0.0500 time 0.2576 (0.2632) data time 0.0008 (0.0022) model time 0.2568 (0.2619) loss 5.8837 (5.7821) grad_norm 1.8436 (2.2410) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][400/625] eta 0:00:59 lr 0.000584 wd 0.0500 time 0.2553 (0.2630) data time 0.0006 (0.0022) model time 0.2547 (0.2617) loss 6.2683 (5.7915) grad_norm 2.5467 (2.2389) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][410/625] eta 0:00:56 lr 0.000584 wd 0.0500 time 0.2538 (0.2629) data time 0.0010 (0.0022) model time 0.2528 (0.2616) loss 6.2344 (5.7901) grad_norm 2.1358 (2.2399) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][420/625] eta 0:00:53 lr 0.000584 wd 0.0500 time 0.2529 (0.2627) data time 0.0011 (0.0021) model time 0.2518 (0.2614) loss 6.6289 (5.7937) grad_norm 1.9110 (2.2386) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][430/625] eta 0:00:51 lr 0.000584 wd 0.0500 time 0.2559 (0.2626) data time 0.0007 (0.0021) model time 0.2552 (0.2613) loss 5.8228 (5.7943) grad_norm 1.3053 (2.2387) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][440/625] eta 0:00:48 lr 0.000583 wd 0.0500 time 0.2575 (0.2624) data time 0.0010 (0.0021) model time 0.2565 (0.2611) loss 5.5440 (5.7999) grad_norm 2.3019 (2.2345) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][450/625] eta 0:00:45 lr 0.000583 wd 0.0500 time 0.2581 (0.2623) data time 0.0006 (0.0021) model time 0.2575 (0.2610) loss 4.8442 (5.8068) grad_norm 1.9480 (2.2511) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][460/625] eta 0:00:43 lr 0.000583 wd 0.0500 time 0.2579 (0.2629) data time 0.0008 (0.0020) model time 0.2571 (0.2616) loss 6.1885 (5.8114) grad_norm 3.2920 (2.2646) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][470/625] eta 0:00:40 lr 0.000583 wd 0.0500 time 0.2557 (0.2630) data time 0.0006 (0.0020) model time 0.2551 (0.2617) loss 6.7323 (5.8126) grad_norm 3.1243 (2.2734) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][480/625] eta 0:00:38 lr 0.000583 wd 0.0500 time 0.2567 (0.2632) data time 0.0008 (0.0020) model time 0.2558 (0.2620) loss 5.2114 (5.8130) grad_norm 1.5704 (2.2792) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][490/625] eta 0:00:35 lr 0.000583 wd 0.0500 time 0.2562 (0.2630) data time 0.0008 (0.0020) model time 0.2554 (0.2618) loss 5.7982 (5.8148) grad_norm 1.8644 (2.2764) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][500/625] eta 0:00:32 lr 0.000582 wd 0.0500 time 0.2569 (0.2629) data time 0.0007 (0.0019) model time 0.2562 (0.2617) loss 5.1480 (5.8161) grad_norm 1.6417 (2.2667) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][510/625] eta 0:00:30 lr 0.000582 wd 0.0500 time 0.2545 (0.2628) data time 0.0008 (0.0019) model time 0.2537 (0.2615) loss 5.2848 (5.8117) grad_norm 2.0460 (2.2599) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][520/625] eta 0:00:27 lr 0.000582 wd 0.0500 time 0.2613 (0.2627) data time 0.0009 (0.0019) model time 0.2604 (0.2614) loss 4.8951 (5.8111) grad_norm 2.1939 (2.2546) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][530/625] eta 0:00:24 lr 0.000582 wd 0.0500 time 0.2578 (0.2625) data time 0.0006 (0.0019) model time 0.2572 (0.2613) loss 6.1763 (5.8128) grad_norm 1.9466 (2.2553) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][540/625] eta 0:00:22 lr 0.000582 wd 0.0500 time 0.2634 (0.2624) data time 0.0009 (0.0019) model time 0.2625 (0.2612) loss 4.9957 (5.8112) grad_norm 2.3973 (2.2574) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][550/625] eta 0:00:19 lr 0.000582 wd 0.0500 time 0.2558 (0.2623) data time 0.0006 (0.0018) model time 0.2552 (0.2610) loss 5.2018 (5.8172) grad_norm 3.0881 (2.2654) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][560/625] eta 0:00:17 lr 0.000581 wd 0.0500 time 0.2559 (0.2622) data time 0.0010 (0.0018) model time 0.2549 (0.2609) loss 5.9027 (5.8196) grad_norm 1.6508 (2.2587) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][570/625] eta 0:00:14 lr 0.000581 wd 0.0500 time 0.2554 (0.2621) data time 0.0010 (0.0018) model time 0.2543 (0.2608) loss 5.2316 (5.8170) grad_norm 1.8202 (2.2670) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][580/625] eta 0:00:11 lr 0.000581 wd 0.0500 time 0.2547 (0.2620) data time 0.0008 (0.0018) model time 0.2539 (0.2607) loss 6.2059 (5.8161) grad_norm 2.1061 (2.2657) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][590/625] eta 0:00:09 lr 0.000581 wd 0.0500 time 0.2600 (0.2619) data time 0.0009 (0.0018) model time 0.2591 (0.2606) loss 6.0626 (5.8220) grad_norm 1.5545 (2.2595) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][600/625] eta 0:00:06 lr 0.000581 wd 0.0500 time 0.2561 (0.2618) data time 0.0013 (0.0018) model time 0.2549 (0.2605) loss 6.5833 (5.8247) grad_norm 2.7877 (2.2610) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][610/625] eta 0:00:03 lr 0.000581 wd 0.0500 time 0.2534 (0.2617) data time 0.0006 (0.0018) model time 0.2528 (0.2604) loss 4.5389 (5.8285) grad_norm 1.4087 (2.2532) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][620/625] eta 0:00:01 lr 0.000581 wd 0.0500 time 0.2522 (0.2615) data time 0.0004 (0.0017) model time 0.2518 (0.2603) loss 4.5075 (5.8248) grad_norm 1.9235 (2.2531) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:56:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 199 training takes 0:02:43 [2024-08-04 06:56:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:56:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.6499 (0.6499) Acc@1 89.746 (89.746) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 06:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 1.0254 (0.7781) Acc@1 78.076 (85.476) Acc@5 95.361 (97.523) Mem 9655MB [2024-08-04 06:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.1094 (0.9077) Acc@1 75.879 (82.110) Acc@5 94.678 (96.098) Mem 9655MB [2024-08-04 06:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.776 Acc@5 96.079 [2024-08-04 06:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-04 06:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.707 (0.707) Loss 0.5806 (0.5806) Acc@1 89.697 (89.697) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.9224 (0.7177) Acc@1 80.518 (86.168) Acc@5 95.850 (97.607) Mem 9655MB [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0439 (0.8432) Acc@1 75.781 (82.738) Acc@5 94.824 (96.322) Mem 9655MB [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.460 Acc@5 96.297 [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.46% [2024-08-04 06:57:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:57:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:57:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][0/625] eta 0:07:04 lr 0.000580 wd 0.0500 time 0.6793 (0.6793) data time 0.4240 (0.4240) model time 0.0000 (0.0000) loss 5.6940 (5.6940) grad_norm 2.2568 (2.2568) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][10/625] eta 0:03:01 lr 0.000580 wd 0.0500 time 0.2555 (0.2945) data time 0.0008 (0.0394) model time 0.0000 (0.0000) loss 5.5961 (5.6300) grad_norm 2.5340 (2.2155) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][20/625] eta 0:02:47 lr 0.000580 wd 0.0500 time 0.2737 (0.2769) data time 0.0008 (0.0210) model time 0.0000 (0.0000) loss 6.3650 (5.8343) grad_norm 2.1242 (2.0384) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][30/625] eta 0:02:40 lr 0.000580 wd 0.0500 time 0.2583 (0.2704) data time 0.0005 (0.0146) model time 0.0000 (0.0000) loss 5.6361 (5.7809) grad_norm 1.6668 (2.0306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][40/625] eta 0:02:36 lr 0.000580 wd 0.0500 time 0.2518 (0.2670) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 6.0497 (5.7871) grad_norm 2.0713 (2.0572) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][50/625] eta 0:02:34 lr 0.000580 wd 0.0500 time 0.2532 (0.2690) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 5.4049 (5.8136) grad_norm 1.7940 (2.2297) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][60/625] eta 0:02:30 lr 0.000579 wd 0.0500 time 0.2567 (0.2670) data time 0.0009 (0.0078) model time 0.2559 (0.2555) loss 6.6063 (5.7678) grad_norm 1.8550 (2.2114) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][70/625] eta 0:02:28 lr 0.000579 wd 0.0500 time 0.2547 (0.2680) data time 0.0009 (0.0069) model time 0.2538 (0.2645) loss 5.0590 (5.7681) grad_norm 1.7607 (2.1743) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][80/625] eta 0:02:26 lr 0.000579 wd 0.0500 time 0.2620 (0.2690) data time 0.0008 (0.0061) model time 0.2612 (0.2681) loss 4.6069 (5.7365) grad_norm 2.1900 (2.2027) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][90/625] eta 0:02:23 lr 0.000579 wd 0.0500 time 0.2758 (0.2679) data time 0.0008 (0.0056) model time 0.2750 (0.2655) loss 6.2560 (5.7598) grad_norm 1.8606 (2.1719) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][100/625] eta 0:02:20 lr 0.000579 wd 0.0500 time 0.2576 (0.2667) data time 0.0006 (0.0051) model time 0.2570 (0.2634) loss 5.5265 (5.7699) grad_norm 1.9613 (2.1582) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][110/625] eta 0:02:16 lr 0.000579 wd 0.0500 time 0.2596 (0.2657) data time 0.0007 (0.0047) model time 0.2589 (0.2620) loss 5.8405 (5.8149) grad_norm 1.6864 (2.1649) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][120/625] eta 0:02:13 lr 0.000579 wd 0.0500 time 0.2559 (0.2650) data time 0.0009 (0.0044) model time 0.2550 (0.2611) loss 5.2831 (5.8111) grad_norm 1.6698 (2.1698) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][130/625] eta 0:02:10 lr 0.000578 wd 0.0500 time 0.2546 (0.2643) data time 0.0008 (0.0041) model time 0.2539 (0.2604) loss 6.7414 (5.8368) grad_norm 1.8410 (2.1743) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][140/625] eta 0:02:07 lr 0.000578 wd 0.0500 time 0.2548 (0.2636) data time 0.0012 (0.0039) model time 0.2536 (0.2597) loss 6.3239 (5.8262) grad_norm 1.8104 (2.1765) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][150/625] eta 0:02:04 lr 0.000578 wd 0.0500 time 0.2530 (0.2631) data time 0.0011 (0.0037) model time 0.2518 (0.2591) loss 5.4814 (5.8162) grad_norm 1.6942 (2.1651) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][160/625] eta 0:02:02 lr 0.000578 wd 0.0500 time 0.2543 (0.2639) data time 0.0007 (0.0035) model time 0.2535 (0.2606) loss 6.2785 (5.8145) grad_norm 1.7126 (2.1427) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][170/625] eta 0:01:59 lr 0.000578 wd 0.0500 time 0.2586 (0.2634) data time 0.0008 (0.0034) model time 0.2579 (0.2601) loss 6.4144 (5.7977) grad_norm 1.5933 (2.1156) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][180/625] eta 0:01:57 lr 0.000578 wd 0.0500 time 0.2531 (0.2635) data time 0.0010 (0.0033) model time 0.2521 (0.2605) loss 6.7142 (5.8116) grad_norm 2.4652 (2.1335) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][190/625] eta 0:01:54 lr 0.000577 wd 0.0500 time 0.2502 (0.2631) data time 0.0007 (0.0031) model time 0.2495 (0.2600) loss 5.5732 (5.8188) grad_norm 1.6506 (2.1292) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][200/625] eta 0:01:51 lr 0.000577 wd 0.0500 time 0.2581 (0.2627) data time 0.0006 (0.0030) model time 0.2575 (0.2596) loss 5.0123 (5.8142) grad_norm 2.3010 (2.1278) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][210/625] eta 0:01:49 lr 0.000577 wd 0.0500 time 0.2556 (0.2639) data time 0.0008 (0.0029) model time 0.2548 (0.2614) loss 6.1883 (5.8224) grad_norm 1.7532 (2.1223) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:58:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][220/625] eta 0:01:46 lr 0.000577 wd 0.0500 time 0.2536 (0.2636) data time 0.0006 (0.0028) model time 0.2529 (0.2610) loss 6.9983 (5.8307) grad_norm 1.9497 (2.1346) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:58:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][230/625] eta 0:01:44 lr 0.000577 wd 0.0500 time 0.2519 (0.2640) data time 0.0011 (0.0027) model time 0.2508 (0.2616) loss 5.8635 (5.8280) grad_norm 2.9652 (2.1614) loss_scale 1024.0000 (520.8658) mem 9655MB [2024-08-04 06:58:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][240/625] eta 0:01:41 lr 0.000577 wd 0.0500 time 0.2578 (0.2636) data time 0.0008 (0.0027) model time 0.2571 (0.2613) loss 4.9963 (5.8185) grad_norm 2.3518 (2.1670) loss_scale 1024.0000 (541.7427) mem 9655MB [2024-08-04 06:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][250/625] eta 0:01:38 lr 0.000576 wd 0.0500 time 0.2575 (0.2633) data time 0.0006 (0.0026) model time 0.2569 (0.2610) loss 5.4754 (5.8303) grad_norm 5.3749 (2.1604) loss_scale 1024.0000 (560.9562) mem 9655MB [2024-08-04 06:58:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][260/625] eta 0:01:36 lr 0.000576 wd 0.0500 time 0.2690 (0.2631) data time 0.0018 (0.0025) model time 0.2672 (0.2607) loss 4.8278 (5.8230) grad_norm 2.5805 (2.1620) loss_scale 1024.0000 (578.6973) mem 9655MB [2024-08-04 06:58:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][270/625] eta 0:01:33 lr 0.000576 wd 0.0500 time 0.2556 (0.2635) data time 0.0008 (0.0025) model time 0.2548 (0.2613) loss 6.9725 (5.8370) grad_norm 1.9343 (2.1509) loss_scale 1024.0000 (595.1292) mem 9655MB [2024-08-04 06:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][280/625] eta 0:01:30 lr 0.000576 wd 0.0500 time 0.2563 (0.2633) data time 0.0008 (0.0024) model time 0.2555 (0.2611) loss 6.6018 (5.8283) grad_norm 1.6868 (2.1481) loss_scale 1024.0000 (610.3915) mem 9655MB [2024-08-04 06:58:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][290/625] eta 0:01:28 lr 0.000576 wd 0.0500 time 0.2550 (0.2630) data time 0.0008 (0.0024) model time 0.2542 (0.2608) loss 5.7439 (5.8396) grad_norm 1.6451 (2.1408) loss_scale 1024.0000 (624.6048) mem 9655MB [2024-08-04 06:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][300/625] eta 0:01:25 lr 0.000576 wd 0.0500 time 0.2558 (0.2633) data time 0.0011 (0.0023) model time 0.2547 (0.2612) loss 4.5061 (5.8275) grad_norm 1.4063 (2.1457) loss_scale 1024.0000 (637.8738) mem 9655MB [2024-08-04 06:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][310/625] eta 0:01:22 lr 0.000575 wd 0.0500 time 0.2584 (0.2634) data time 0.0006 (0.0023) model time 0.2579 (0.2614) loss 7.2643 (5.8328) grad_norm 3.5538 (2.1610) loss_scale 1024.0000 (650.2894) mem 9655MB [2024-08-04 06:58:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][320/625] eta 0:01:20 lr 0.000575 wd 0.0500 time 0.2686 (0.2633) data time 0.0009 (0.0022) model time 0.2676 (0.2613) loss 6.1471 (5.8293) grad_norm 2.0764 (2.1679) loss_scale 1024.0000 (661.9315) mem 9655MB [2024-08-04 06:58:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][330/625] eta 0:01:17 lr 0.000575 wd 0.0500 time 0.2538 (0.2634) data time 0.0008 (0.0022) model time 0.2530 (0.2614) loss 6.0347 (5.8323) grad_norm 2.2543 (2.1676) loss_scale 1024.0000 (672.8701) mem 9655MB [2024-08-04 06:58:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][340/625] eta 0:01:15 lr 0.000575 wd 0.0500 time 0.2579 (0.2632) data time 0.0006 (0.0022) model time 0.2573 (0.2612) loss 4.3916 (5.8252) grad_norm 2.1206 (2.1654) loss_scale 1024.0000 (683.1672) mem 9655MB [2024-08-04 06:58:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][350/625] eta 0:01:12 lr 0.000575 wd 0.0500 time 0.2586 (0.2629) data time 0.0006 (0.0021) model time 0.2580 (0.2610) loss 5.4412 (5.8079) grad_norm 2.7711 (2.1740) loss_scale 1024.0000 (692.8775) mem 9655MB [2024-08-04 06:58:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][360/625] eta 0:01:09 lr 0.000575 wd 0.0500 time 0.4516 (0.2639) data time 0.0008 (0.0021) model time 0.4508 (0.2621) loss 5.8643 (5.8191) grad_norm 2.0813 (2.2100) loss_scale 1024.0000 (702.0499) mem 9655MB [2024-08-04 06:58:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][370/625] eta 0:01:07 lr 0.000575 wd 0.0500 time 0.2592 (0.2642) data time 0.0008 (0.0021) model time 0.2585 (0.2625) loss 6.1562 (5.8208) grad_norm 1.7070 (2.2088) loss_scale 1024.0000 (710.7278) mem 9655MB [2024-08-04 06:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][380/625] eta 0:01:04 lr 0.000574 wd 0.0500 time 0.2536 (0.2640) data time 0.0009 (0.0020) model time 0.2527 (0.2623) loss 4.9772 (5.8165) grad_norm 1.6262 (2.1954) loss_scale 1024.0000 (718.9501) mem 9655MB [2024-08-04 06:58:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][390/625] eta 0:01:01 lr 0.000574 wd 0.0500 time 0.2515 (0.2638) data time 0.0010 (0.0020) model time 0.2506 (0.2621) loss 6.7957 (5.8120) grad_norm 1.4066 (2.1852) loss_scale 1024.0000 (726.7519) mem 9655MB [2024-08-04 06:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][400/625] eta 0:00:59 lr 0.000574 wd 0.0500 time 0.2582 (0.2636) data time 0.0009 (0.0020) model time 0.2574 (0.2619) loss 5.3606 (5.8029) grad_norm 1.7454 (2.1895) loss_scale 1024.0000 (734.1646) mem 9655MB [2024-08-04 06:58:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][410/625] eta 0:00:56 lr 0.000574 wd 0.0500 time 0.2579 (0.2634) data time 0.0007 (0.0019) model time 0.2572 (0.2617) loss 6.6923 (5.8016) grad_norm 1.6912 (2.1811) loss_scale 1024.0000 (741.2165) mem 9655MB [2024-08-04 06:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][420/625] eta 0:00:53 lr 0.000574 wd 0.0500 time 0.2568 (0.2632) data time 0.0006 (0.0019) model time 0.2562 (0.2615) loss 6.9691 (5.8106) grad_norm 2.0881 (2.2084) loss_scale 1024.0000 (747.9335) mem 9655MB [2024-08-04 06:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][430/625] eta 0:00:51 lr 0.000574 wd 0.0500 time 0.2634 (0.2631) data time 0.0008 (0.0019) model time 0.2626 (0.2614) loss 6.1308 (5.8104) grad_norm 1.6911 (2.2342) loss_scale 1024.0000 (754.3387) mem 9655MB [2024-08-04 06:58:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][440/625] eta 0:00:48 lr 0.000573 wd 0.0500 time 0.2579 (0.2630) data time 0.0009 (0.0019) model time 0.2569 (0.2612) loss 5.6270 (5.8126) grad_norm 2.0233 (2.2537) loss_scale 1024.0000 (760.4535) mem 9655MB [2024-08-04 06:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][450/625] eta 0:00:45 lr 0.000573 wd 0.0500 time 0.2520 (0.2628) data time 0.0012 (0.0019) model time 0.2508 (0.2611) loss 5.8200 (5.8082) grad_norm 3.3449 (2.2558) loss_scale 1024.0000 (766.2971) mem 9655MB [2024-08-04 06:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][460/625] eta 0:00:43 lr 0.000573 wd 0.0500 time 0.2551 (0.2626) data time 0.0008 (0.0018) model time 0.2542 (0.2609) loss 5.5971 (5.8090) grad_norm 5.0018 (2.2609) loss_scale 1024.0000 (771.8872) mem 9655MB [2024-08-04 06:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][470/625] eta 0:00:40 lr 0.000573 wd 0.0500 time 0.2559 (0.2625) data time 0.0006 (0.0018) model time 0.2553 (0.2608) loss 5.3369 (5.8114) grad_norm 2.0581 (2.2616) loss_scale 1024.0000 (777.2399) mem 9655MB [2024-08-04 06:59:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][480/625] eta 0:00:38 lr 0.000573 wd 0.0500 time 0.2547 (0.2623) data time 0.0009 (0.0018) model time 0.2538 (0.2606) loss 5.2566 (5.8081) grad_norm 2.4719 (2.2769) loss_scale 1024.0000 (782.3701) mem 9655MB [2024-08-04 06:59:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][490/625] eta 0:00:35 lr 0.000573 wd 0.0500 time 0.2558 (0.2622) data time 0.0009 (0.0018) model time 0.2548 (0.2605) loss 5.4654 (5.8109) grad_norm 2.3326 (2.2884) loss_scale 1024.0000 (787.2912) mem 9655MB [2024-08-04 06:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][500/625] eta 0:00:32 lr 0.000572 wd 0.0500 time 0.2568 (0.2625) data time 0.0008 (0.0018) model time 0.2560 (0.2609) loss 5.0226 (5.8068) grad_norm 1.4767 (2.2984) loss_scale 1024.0000 (792.0160) mem 9655MB [2024-08-04 06:59:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][510/625] eta 0:00:30 lr 0.000572 wd 0.0500 time 0.2571 (0.2628) data time 0.0009 (0.0018) model time 0.2563 (0.2612) loss 5.2430 (5.8049) grad_norm 1.6288 (inf) loss_scale 512.0000 (793.5499) mem 9655MB [2024-08-04 06:59:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][520/625] eta 0:00:27 lr 0.000572 wd 0.0500 time 0.2537 (0.2627) data time 0.0008 (0.0017) model time 0.2529 (0.2611) loss 5.1456 (5.8128) grad_norm 1.9940 (inf) loss_scale 512.0000 (788.1459) mem 9655MB [2024-08-04 06:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][530/625] eta 0:00:24 lr 0.000572 wd 0.0500 time 0.2484 (0.2626) data time 0.0010 (0.0017) model time 0.2475 (0.2610) loss 5.5922 (5.8171) grad_norm 4.1974 (inf) loss_scale 512.0000 (782.9454) mem 9655MB [2024-08-04 06:59:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][540/625] eta 0:00:22 lr 0.000572 wd 0.0500 time 0.2531 (0.2625) data time 0.0008 (0.0017) model time 0.2523 (0.2609) loss 5.9888 (5.8138) grad_norm 2.5996 (inf) loss_scale 512.0000 (777.9372) mem 9655MB [2024-08-04 06:59:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][550/625] eta 0:00:19 lr 0.000572 wd 0.0500 time 0.2520 (0.2624) data time 0.0009 (0.0017) model time 0.2510 (0.2607) loss 6.5297 (5.8207) grad_norm 1.6150 (inf) loss_scale 512.0000 (773.1107) mem 9655MB [2024-08-04 06:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][560/625] eta 0:00:17 lr 0.000572 wd 0.0500 time 0.2524 (0.2622) data time 0.0009 (0.0017) model time 0.2515 (0.2606) loss 5.3548 (5.8192) grad_norm 1.4673 (inf) loss_scale 512.0000 (768.4563) mem 9655MB [2024-08-04 06:59:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][570/625] eta 0:00:14 lr 0.000571 wd 0.0500 time 0.2568 (0.2622) data time 0.0008 (0.0017) model time 0.2560 (0.2606) loss 4.9039 (5.8126) grad_norm 3.1371 (inf) loss_scale 512.0000 (763.9650) mem 9655MB [2024-08-04 06:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][580/625] eta 0:00:11 lr 0.000571 wd 0.0500 time 0.2583 (0.2621) data time 0.0007 (0.0017) model time 0.2576 (0.2605) loss 4.5372 (5.8106) grad_norm 2.2069 (inf) loss_scale 512.0000 (759.6282) mem 9655MB [2024-08-04 06:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][590/625] eta 0:00:09 lr 0.000571 wd 0.0500 time 0.2531 (0.2619) data time 0.0007 (0.0016) model time 0.2524 (0.2604) loss 5.5276 (5.8154) grad_norm 2.8146 (inf) loss_scale 512.0000 (755.4382) mem 9655MB [2024-08-04 06:59:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][600/625] eta 0:00:06 lr 0.000571 wd 0.0500 time 0.2561 (0.2619) data time 0.0007 (0.0016) model time 0.2554 (0.2603) loss 6.5979 (5.8148) grad_norm 2.4794 (inf) loss_scale 512.0000 (751.3877) mem 9655MB [2024-08-04 06:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][610/625] eta 0:00:03 lr 0.000571 wd 0.0500 time 0.2530 (0.2621) data time 0.0006 (0.0016) model time 0.2524 (0.2605) loss 6.0343 (5.8184) grad_norm 2.5392 (inf) loss_scale 512.0000 (747.4697) mem 9655MB [2024-08-04 06:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][620/625] eta 0:00:01 lr 0.000571 wd 0.0500 time 0.2523 (0.2619) data time 0.0004 (0.0016) model time 0.2519 (0.2604) loss 6.4821 (5.8196) grad_norm 2.0184 (inf) loss_scale 512.0000 (743.6779) mem 9655MB [2024-08-04 06:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 200 training takes 0:02:43 [2024-08-04 06:59:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 06:59:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 06:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.6284 (0.6284) Acc@1 89.014 (89.014) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 06:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9771 (0.7634) Acc@1 78.906 (85.409) Acc@5 95.605 (97.452) Mem 9655MB [2024-08-04 06:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0928 (0.8961) Acc@1 76.025 (81.975) Acc@5 94.678 (96.068) Mem 9655MB [2024-08-04 06:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.736 Acc@5 96.061 [2024-08-04 06:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-04 06:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.729 (0.729) Loss 0.5806 (0.5806) Acc@1 89.697 (89.697) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 06:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.125) Loss 0.9204 (0.7171) Acc@1 80.615 (86.164) Acc@5 95.996 (97.603) Mem 9655MB [2024-08-04 06:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0439 (0.8427) Acc@1 75.928 (82.759) Acc@5 94.873 (96.312) Mem 9655MB [2024-08-04 06:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.478 Acc@5 96.295 [2024-08-04 06:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 06:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.48% [2024-08-04 06:59:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 06:59:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 06:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][0/625] eta 0:07:19 lr 0.000570 wd 0.0500 time 0.7027 (0.7027) data time 0.4645 (0.4645) model time 0.0000 (0.0000) loss 5.1868 (5.1868) grad_norm 1.7443 (1.7443) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][10/625] eta 0:03:02 lr 0.000570 wd 0.0500 time 0.2566 (0.2970) data time 0.0008 (0.0431) model time 0.0000 (0.0000) loss 5.1747 (5.7669) grad_norm 3.1706 (2.2256) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][20/625] eta 0:02:48 lr 0.000570 wd 0.0500 time 0.2651 (0.2779) data time 0.0007 (0.0231) model time 0.0000 (0.0000) loss 6.6890 (5.8880) grad_norm 3.4794 (2.3556) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 06:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][30/625] eta 0:02:40 lr 0.000570 wd 0.0500 time 0.2486 (0.2701) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 6.7667 (5.8547) grad_norm 2.2236 (2.5096) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][40/625] eta 0:02:41 lr 0.000570 wd 0.0500 time 0.2526 (0.2759) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 5.1109 (5.8711) grad_norm 1.5246 (2.2951) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][50/625] eta 0:02:36 lr 0.000570 wd 0.0500 time 0.2585 (0.2721) data time 0.0008 (0.0101) model time 0.0000 (0.0000) loss 4.7788 (5.8457) grad_norm 2.9115 (2.3117) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][60/625] eta 0:02:32 lr 0.000570 wd 0.0500 time 0.2582 (0.2694) data time 0.0007 (0.0086) model time 0.2575 (0.2550) loss 5.1825 (5.8214) grad_norm 2.0516 (2.3716) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][70/625] eta 0:02:28 lr 0.000569 wd 0.0500 time 0.2516 (0.2676) data time 0.0008 (0.0075) model time 0.2508 (0.2554) loss 5.5927 (5.8338) grad_norm 2.3987 (2.3492) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][80/625] eta 0:02:25 lr 0.000569 wd 0.0500 time 0.2515 (0.2663) data time 0.0011 (0.0067) model time 0.2504 (0.2555) loss 5.2167 (5.8234) grad_norm 2.5511 (2.3276) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][90/625] eta 0:02:21 lr 0.000569 wd 0.0500 time 0.2556 (0.2653) data time 0.0010 (0.0060) model time 0.2546 (0.2558) loss 5.6931 (5.8306) grad_norm 2.7672 (2.2812) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][100/625] eta 0:02:18 lr 0.000569 wd 0.0500 time 0.2570 (0.2645) data time 0.0010 (0.0056) model time 0.2560 (0.2557) loss 5.1838 (5.7903) grad_norm 3.0169 (2.2737) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][110/625] eta 0:02:16 lr 0.000569 wd 0.0500 time 0.2526 (0.2654) data time 0.0008 (0.0051) model time 0.2518 (0.2587) loss 6.1444 (5.7729) grad_norm 3.6212 (2.2934) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][120/625] eta 0:02:13 lr 0.000569 wd 0.0500 time 0.2498 (0.2647) data time 0.0009 (0.0048) model time 0.2489 (0.2583) loss 5.3033 (5.7517) grad_norm 1.7341 (2.3276) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][130/625] eta 0:02:11 lr 0.000568 wd 0.0500 time 0.2561 (0.2657) data time 0.0008 (0.0045) model time 0.2553 (0.2607) loss 6.1036 (5.7488) grad_norm 2.5923 (2.3341) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][140/625] eta 0:02:08 lr 0.000568 wd 0.0500 time 0.2524 (0.2651) data time 0.0011 (0.0042) model time 0.2513 (0.2601) loss 4.6529 (5.7468) grad_norm 1.8146 (2.3573) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][150/625] eta 0:02:05 lr 0.000568 wd 0.0500 time 0.2562 (0.2644) data time 0.0007 (0.0040) model time 0.2556 (0.2596) loss 7.0989 (5.7656) grad_norm 1.4714 (2.3480) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][160/625] eta 0:02:02 lr 0.000568 wd 0.0500 time 0.2560 (0.2638) data time 0.0007 (0.0038) model time 0.2553 (0.2591) loss 6.5810 (5.7571) grad_norm 3.5927 (2.3342) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][170/625] eta 0:02:00 lr 0.000568 wd 0.0500 time 0.2545 (0.2645) data time 0.0010 (0.0037) model time 0.2536 (0.2604) loss 6.6740 (5.7620) grad_norm 1.6200 (2.3110) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][180/625] eta 0:01:58 lr 0.000568 wd 0.0500 time 0.2567 (0.2652) data time 0.0009 (0.0035) model time 0.2558 (0.2615) loss 5.5656 (5.7671) grad_norm 1.9849 (2.2795) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][190/625] eta 0:01:55 lr 0.000567 wd 0.0500 time 0.2565 (0.2657) data time 0.0010 (0.0034) model time 0.2555 (0.2624) loss 6.4401 (5.7774) grad_norm 1.7874 (2.2564) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][200/625] eta 0:01:52 lr 0.000567 wd 0.0500 time 0.2535 (0.2652) data time 0.0010 (0.0033) model time 0.2524 (0.2619) loss 6.6106 (5.7854) grad_norm 2.5782 (2.2617) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][210/625] eta 0:01:49 lr 0.000567 wd 0.0500 time 0.2542 (0.2647) data time 0.0010 (0.0031) model time 0.2532 (0.2615) loss 6.2565 (5.7951) grad_norm 1.6574 (2.2579) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][220/625] eta 0:01:47 lr 0.000567 wd 0.0500 time 0.2572 (0.2643) data time 0.0011 (0.0030) model time 0.2561 (0.2610) loss 4.6628 (5.7820) grad_norm 1.6300 (2.2729) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][230/625] eta 0:01:44 lr 0.000567 wd 0.0500 time 0.2567 (0.2639) data time 0.0009 (0.0030) model time 0.2558 (0.2607) loss 6.2930 (5.7988) grad_norm 1.6151 (2.2777) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][240/625] eta 0:01:41 lr 0.000567 wd 0.0500 time 0.2614 (0.2636) data time 0.0009 (0.0029) model time 0.2605 (0.2604) loss 6.4985 (5.8129) grad_norm 1.7104 (2.2674) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][250/625] eta 0:01:38 lr 0.000566 wd 0.0500 time 0.2537 (0.2633) data time 0.0007 (0.0028) model time 0.2530 (0.2602) loss 5.5083 (5.8110) grad_norm 2.8634 (2.2835) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][260/625] eta 0:01:36 lr 0.000566 wd 0.0500 time 0.2537 (0.2630) data time 0.0008 (0.0027) model time 0.2529 (0.2599) loss 4.8964 (5.8122) grad_norm 1.3799 (2.2855) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][270/625] eta 0:01:33 lr 0.000566 wd 0.0500 time 0.2593 (0.2628) data time 0.0009 (0.0027) model time 0.2584 (0.2597) loss 6.5205 (5.8224) grad_norm 1.9160 (2.2778) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][280/625] eta 0:01:30 lr 0.000566 wd 0.0500 time 0.2547 (0.2626) data time 0.0010 (0.0026) model time 0.2537 (0.2595) loss 5.4580 (5.8220) grad_norm 1.8962 (2.2633) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][290/625] eta 0:01:28 lr 0.000566 wd 0.0500 time 0.2574 (0.2627) data time 0.0009 (0.0025) model time 0.2565 (0.2598) loss 5.2356 (5.8082) grad_norm 1.6387 (2.2591) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][300/625] eta 0:01:25 lr 0.000566 wd 0.0500 time 0.2554 (0.2625) data time 0.0008 (0.0025) model time 0.2546 (0.2596) loss 6.4819 (5.8091) grad_norm 1.9406 (2.2416) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][310/625] eta 0:01:22 lr 0.000566 wd 0.0500 time 0.2543 (0.2623) data time 0.0008 (0.0024) model time 0.2534 (0.2594) loss 6.2427 (5.8058) grad_norm 1.5968 (2.2339) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][320/625] eta 0:01:19 lr 0.000565 wd 0.0500 time 0.2638 (0.2621) data time 0.0009 (0.0024) model time 0.2629 (0.2593) loss 4.9854 (5.8030) grad_norm 3.0507 (2.2556) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][330/625] eta 0:01:17 lr 0.000565 wd 0.0500 time 0.2532 (0.2619) data time 0.0008 (0.0024) model time 0.2524 (0.2591) loss 5.7084 (5.8051) grad_norm 2.0531 (2.2630) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][340/625] eta 0:01:14 lr 0.000565 wd 0.0500 time 0.2565 (0.2617) data time 0.0008 (0.0023) model time 0.2557 (0.2590) loss 6.4509 (5.8183) grad_norm 1.3512 (2.2567) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][350/625] eta 0:01:11 lr 0.000565 wd 0.0500 time 0.2537 (0.2616) data time 0.0009 (0.0023) model time 0.2528 (0.2589) loss 5.9591 (5.8147) grad_norm 2.2246 (2.2562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][360/625] eta 0:01:09 lr 0.000565 wd 0.0500 time 0.2520 (0.2615) data time 0.0007 (0.0022) model time 0.2512 (0.2588) loss 6.1283 (5.8241) grad_norm 2.8450 (2.2720) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][370/625] eta 0:01:06 lr 0.000565 wd 0.0500 time 0.2532 (0.2614) data time 0.0007 (0.0022) model time 0.2525 (0.2587) loss 6.1363 (5.8258) grad_norm 1.9165 (2.2625) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][380/625] eta 0:01:04 lr 0.000564 wd 0.0500 time 0.2582 (0.2616) data time 0.0008 (0.0022) model time 0.2574 (0.2590) loss 6.4259 (5.8267) grad_norm 2.2536 (2.2638) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][390/625] eta 0:01:01 lr 0.000564 wd 0.0500 time 0.2554 (0.2614) data time 0.0006 (0.0021) model time 0.2548 (0.2589) loss 6.4127 (5.8307) grad_norm 2.2266 (2.2730) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][400/625] eta 0:00:58 lr 0.000564 wd 0.0500 time 0.2528 (0.2613) data time 0.0008 (0.0021) model time 0.2521 (0.2588) loss 5.8300 (5.8320) grad_norm 1.6570 (2.2741) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][410/625] eta 0:00:56 lr 0.000564 wd 0.0500 time 0.4630 (0.2617) data time 0.0007 (0.0021) model time 0.4623 (0.2593) loss 6.4326 (5.8382) grad_norm 2.0569 (2.2806) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][420/625] eta 0:00:53 lr 0.000564 wd 0.0500 time 0.2573 (0.2616) data time 0.0008 (0.0021) model time 0.2565 (0.2592) loss 6.1714 (5.8463) grad_norm 1.6581 (2.2649) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][430/625] eta 0:00:50 lr 0.000564 wd 0.0500 time 0.2543 (0.2615) data time 0.0007 (0.0020) model time 0.2535 (0.2591) loss 5.7414 (5.8491) grad_norm 3.0868 (2.2688) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][440/625] eta 0:00:48 lr 0.000563 wd 0.0500 time 0.2548 (0.2617) data time 0.0009 (0.0020) model time 0.2539 (0.2594) loss 6.0568 (5.8535) grad_norm 1.6569 (2.2579) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][450/625] eta 0:00:45 lr 0.000563 wd 0.0500 time 0.2532 (0.2616) data time 0.0008 (0.0020) model time 0.2524 (0.2593) loss 6.1316 (5.8588) grad_norm 2.3870 (2.2631) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][460/625] eta 0:00:43 lr 0.000563 wd 0.0500 time 0.2611 (0.2619) data time 0.0006 (0.0020) model time 0.2605 (0.2597) loss 5.5112 (5.8608) grad_norm 1.9225 (2.2621) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][470/625] eta 0:00:40 lr 0.000563 wd 0.0500 time 0.2559 (0.2618) data time 0.0010 (0.0019) model time 0.2549 (0.2596) loss 5.0995 (5.8575) grad_norm 1.4483 (2.2607) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][480/625] eta 0:00:37 lr 0.000563 wd 0.0500 time 0.2564 (0.2617) data time 0.0008 (0.0019) model time 0.2556 (0.2595) loss 5.7453 (5.8566) grad_norm 1.9877 (2.2535) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:01:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][490/625] eta 0:00:35 lr 0.000563 wd 0.0500 time 0.2609 (0.2616) data time 0.0008 (0.0019) model time 0.2601 (0.2594) loss 5.1826 (5.8561) grad_norm 2.4394 (2.2466) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][500/625] eta 0:00:32 lr 0.000563 wd 0.0500 time 0.2533 (0.2615) data time 0.0009 (0.0019) model time 0.2524 (0.2593) loss 5.7381 (5.8572) grad_norm 2.0590 (2.2441) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][510/625] eta 0:00:30 lr 0.000562 wd 0.0500 time 0.2562 (0.2621) data time 0.0007 (0.0019) model time 0.2555 (0.2601) loss 6.4581 (5.8598) grad_norm 2.7841 (2.2484) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][520/625] eta 0:00:27 lr 0.000562 wd 0.0500 time 0.2527 (0.2620) data time 0.0009 (0.0018) model time 0.2517 (0.2600) loss 6.4898 (5.8589) grad_norm 2.2189 (2.2436) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][530/625] eta 0:00:24 lr 0.000562 wd 0.0500 time 0.2547 (0.2623) data time 0.0008 (0.0018) model time 0.2540 (0.2603) loss 5.8029 (5.8579) grad_norm 1.7230 (2.2411) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][540/625] eta 0:00:22 lr 0.000562 wd 0.0500 time 0.2540 (0.2624) data time 0.0008 (0.0018) model time 0.2532 (0.2605) loss 7.0458 (5.8627) grad_norm 2.3484 (2.2397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][550/625] eta 0:00:19 lr 0.000562 wd 0.0500 time 0.2598 (0.2623) data time 0.0006 (0.0018) model time 0.2592 (0.2604) loss 4.6959 (5.8658) grad_norm 2.8986 (2.2391) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][560/625] eta 0:00:17 lr 0.000562 wd 0.0500 time 0.2520 (0.2622) data time 0.0009 (0.0018) model time 0.2511 (0.2603) loss 5.9783 (5.8617) grad_norm 1.7459 (2.2345) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][570/625] eta 0:00:14 lr 0.000561 wd 0.0500 time 0.2594 (0.2621) data time 0.0007 (0.0018) model time 0.2586 (0.2601) loss 6.7366 (5.8674) grad_norm 1.3357 (2.2301) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][580/625] eta 0:00:11 lr 0.000561 wd 0.0500 time 0.2553 (0.2620) data time 0.0009 (0.0017) model time 0.2544 (0.2601) loss 5.7831 (5.8704) grad_norm 2.4847 (2.2472) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][590/625] eta 0:00:09 lr 0.000561 wd 0.0500 time 0.2561 (0.2619) data time 0.0007 (0.0017) model time 0.2554 (0.2600) loss 5.5970 (5.8760) grad_norm 3.7118 (2.2589) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][600/625] eta 0:00:06 lr 0.000561 wd 0.0500 time 0.2530 (0.2618) data time 0.0007 (0.0017) model time 0.2523 (0.2599) loss 5.6572 (5.8771) grad_norm 5.1830 (2.2748) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][610/625] eta 0:00:03 lr 0.000561 wd 0.0500 time 0.2541 (0.2617) data time 0.0004 (0.0017) model time 0.2537 (0.2598) loss 6.5110 (5.8742) grad_norm 2.3305 (2.2906) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][620/625] eta 0:00:01 lr 0.000561 wd 0.0500 time 0.2514 (0.2616) data time 0.0004 (0.0017) model time 0.2510 (0.2597) loss 5.5255 (5.8767) grad_norm 3.0607 (2.3004) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 201 training takes 0:02:43 [2024-08-04 07:02:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:02:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:02:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.654 (0.654) Loss 0.6343 (0.6343) Acc@1 89.111 (89.111) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 07:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.111) Loss 0.9976 (0.7685) Acc@1 78.613 (85.525) Acc@5 95.410 (97.306) Mem 9655MB [2024-08-04 07:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.1133 (0.8996) Acc@1 76.172 (82.157) Acc@5 93.652 (96.026) Mem 9655MB [2024-08-04 07:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.872 Acc@5 96.017 [2024-08-04 07:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-04 07:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5801 (0.5801) Acc@1 89.746 (89.746) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:02:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.134) Loss 0.9194 (0.7168) Acc@1 80.615 (86.213) Acc@5 96.045 (97.590) Mem 9655MB [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.097) Loss 1.0420 (0.8423) Acc@1 76.025 (82.796) Acc@5 94.922 (96.291) Mem 9655MB [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.504 Acc@5 96.283 [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.50% [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:02:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][0/625] eta 0:08:33 lr 0.000561 wd 0.0500 time 0.8212 (0.8212) data time 0.5514 (0.5514) model time 0.0000 (0.0000) loss 5.6351 (5.6351) grad_norm 3.5665 (3.5665) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][10/625] eta 0:03:14 lr 0.000560 wd 0.0500 time 0.2565 (0.3170) data time 0.0006 (0.0510) model time 0.0000 (0.0000) loss 6.8743 (5.8523) grad_norm 2.1722 (2.8797) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][20/625] eta 0:02:54 lr 0.000560 wd 0.0500 time 0.2563 (0.2880) data time 0.0007 (0.0271) model time 0.0000 (0.0000) loss 5.2142 (5.8891) grad_norm 1.6109 (2.6850) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][30/625] eta 0:02:49 lr 0.000560 wd 0.0500 time 0.2598 (0.2844) data time 0.0006 (0.0187) model time 0.0000 (0.0000) loss 6.0502 (5.7921) grad_norm 1.7170 (2.5126) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][40/625] eta 0:02:45 lr 0.000560 wd 0.0500 time 0.2602 (0.2828) data time 0.0006 (0.0143) model time 0.0000 (0.0000) loss 6.1316 (5.7670) grad_norm 2.1385 (2.3345) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][50/625] eta 0:02:44 lr 0.000560 wd 0.0500 time 0.2571 (0.2856) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 5.6305 (5.7790) grad_norm 1.5898 (2.2270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][60/625] eta 0:02:38 lr 0.000560 wd 0.0500 time 0.2548 (0.2806) data time 0.0007 (0.0099) model time 0.2541 (0.2543) loss 4.7250 (5.7866) grad_norm 1.5272 (2.1421) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][70/625] eta 0:02:33 lr 0.000559 wd 0.0500 time 0.2540 (0.2771) data time 0.0006 (0.0086) model time 0.2533 (0.2546) loss 5.4351 (5.7715) grad_norm 2.3218 (2.1433) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][80/625] eta 0:02:30 lr 0.000559 wd 0.0500 time 0.2552 (0.2765) data time 0.0012 (0.0077) model time 0.2540 (0.2602) loss 6.4355 (5.7920) grad_norm 2.0956 (2.1248) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][90/625] eta 0:02:26 lr 0.000559 wd 0.0500 time 0.2577 (0.2745) data time 0.0006 (0.0070) model time 0.2571 (0.2594) loss 7.1888 (5.8298) grad_norm 2.3726 (2.0980) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][100/625] eta 0:02:23 lr 0.000559 wd 0.0500 time 0.2556 (0.2727) data time 0.0007 (0.0064) model time 0.2548 (0.2587) loss 4.9258 (5.8161) grad_norm 1.8176 (2.1332) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][110/625] eta 0:02:19 lr 0.000559 wd 0.0500 time 0.2540 (0.2713) data time 0.0007 (0.0059) model time 0.2533 (0.2584) loss 5.3658 (5.8088) grad_norm 2.1406 (2.1290) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][120/625] eta 0:02:17 lr 0.000559 wd 0.0500 time 0.2610 (0.2715) data time 0.0009 (0.0055) model time 0.2601 (0.2604) loss 5.9331 (5.8067) grad_norm 2.0220 (2.1401) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][130/625] eta 0:02:13 lr 0.000558 wd 0.0500 time 0.2561 (0.2703) data time 0.0009 (0.0051) model time 0.2553 (0.2597) loss 6.4306 (5.8143) grad_norm 2.2182 (2.1444) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][140/625] eta 0:02:10 lr 0.000558 wd 0.0500 time 0.2523 (0.2693) data time 0.0010 (0.0048) model time 0.2513 (0.2591) loss 5.5398 (5.8225) grad_norm 2.1162 (2.1395) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][150/625] eta 0:02:07 lr 0.000558 wd 0.0500 time 0.2563 (0.2684) data time 0.0008 (0.0046) model time 0.2555 (0.2587) loss 5.8759 (5.8263) grad_norm 1.6822 (2.1343) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][160/625] eta 0:02:04 lr 0.000558 wd 0.0500 time 0.2544 (0.2676) data time 0.0006 (0.0043) model time 0.2538 (0.2583) loss 5.0294 (5.7923) grad_norm 1.6338 (2.1000) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][170/625] eta 0:02:01 lr 0.000558 wd 0.0500 time 0.2556 (0.2669) data time 0.0009 (0.0041) model time 0.2547 (0.2580) loss 6.1737 (5.8263) grad_norm 1.2053 (2.0869) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][180/625] eta 0:01:58 lr 0.000558 wd 0.0500 time 0.2579 (0.2663) data time 0.0006 (0.0040) model time 0.2574 (0.2578) loss 5.6898 (5.8093) grad_norm 1.6272 (2.0777) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][190/625] eta 0:01:55 lr 0.000558 wd 0.0500 time 0.2543 (0.2657) data time 0.0009 (0.0038) model time 0.2534 (0.2576) loss 6.1956 (5.8161) grad_norm 2.0509 (2.1193) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][200/625] eta 0:01:53 lr 0.000557 wd 0.0500 time 0.2540 (0.2668) data time 0.0009 (0.0037) model time 0.2530 (0.2595) loss 6.5920 (5.8214) grad_norm 4.2209 (2.2184) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][210/625] eta 0:01:50 lr 0.000557 wd 0.0500 time 0.2695 (0.2663) data time 0.0008 (0.0035) model time 0.2688 (0.2592) loss 5.3653 (5.8169) grad_norm 2.7127 (2.2828) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][220/625] eta 0:01:47 lr 0.000557 wd 0.0500 time 0.2579 (0.2658) data time 0.0006 (0.0034) model time 0.2573 (0.2589) loss 6.7411 (5.8339) grad_norm 2.3588 (2.2909) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][230/625] eta 0:01:44 lr 0.000557 wd 0.0500 time 0.2543 (0.2654) data time 0.0010 (0.0033) model time 0.2533 (0.2587) loss 6.0296 (5.8467) grad_norm 1.8365 (2.2723) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][240/625] eta 0:01:42 lr 0.000557 wd 0.0500 time 0.2574 (0.2650) data time 0.0009 (0.0032) model time 0.2565 (0.2585) loss 4.4032 (5.8332) grad_norm 2.5868 (2.2698) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][250/625] eta 0:01:39 lr 0.000557 wd 0.0500 time 0.2560 (0.2650) data time 0.0008 (0.0031) model time 0.2552 (0.2588) loss 5.5794 (5.8429) grad_norm 12.5745 (2.3107) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][260/625] eta 0:01:36 lr 0.000556 wd 0.0500 time 0.2597 (0.2646) data time 0.0007 (0.0030) model time 0.2590 (0.2586) loss 4.9227 (5.8483) grad_norm 3.0143 (2.3437) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][270/625] eta 0:01:33 lr 0.000556 wd 0.0500 time 0.2560 (0.2643) data time 0.0009 (0.0030) model time 0.2551 (0.2584) loss 5.7707 (5.8410) grad_norm 1.7566 (2.3326) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][280/625] eta 0:01:31 lr 0.000556 wd 0.0500 time 0.2555 (0.2640) data time 0.0007 (0.0029) model time 0.2548 (0.2583) loss 5.6263 (5.8482) grad_norm 1.6999 (2.3385) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][290/625] eta 0:01:28 lr 0.000556 wd 0.0500 time 0.2522 (0.2637) data time 0.0007 (0.0028) model time 0.2514 (0.2581) loss 5.1760 (5.8529) grad_norm 2.2195 (2.3704) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][300/625] eta 0:01:25 lr 0.000556 wd 0.0500 time 0.2571 (0.2634) data time 0.0008 (0.0028) model time 0.2563 (0.2580) loss 5.1711 (5.8477) grad_norm 2.1170 (2.3688) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][310/625] eta 0:01:22 lr 0.000556 wd 0.0500 time 0.2515 (0.2632) data time 0.0009 (0.0027) model time 0.2507 (0.2578) loss 6.2822 (5.8408) grad_norm 1.2996 (2.3599) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][320/625] eta 0:01:20 lr 0.000555 wd 0.0500 time 0.2546 (0.2630) data time 0.0010 (0.0027) model time 0.2536 (0.2578) loss 6.3231 (5.8415) grad_norm 2.3406 (2.3564) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][330/625] eta 0:01:17 lr 0.000555 wd 0.0500 time 0.2532 (0.2627) data time 0.0007 (0.0026) model time 0.2525 (0.2576) loss 5.1992 (5.8358) grad_norm 1.9300 (2.3446) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][340/625] eta 0:01:14 lr 0.000555 wd 0.0500 time 0.2580 (0.2626) data time 0.0007 (0.0026) model time 0.2572 (0.2576) loss 5.7096 (5.8267) grad_norm 4.7049 (2.3536) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][350/625] eta 0:01:12 lr 0.000555 wd 0.0500 time 0.2513 (0.2635) data time 0.0009 (0.0025) model time 0.2503 (0.2588) loss 5.2980 (5.8208) grad_norm 1.3874 (2.3510) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][360/625] eta 0:01:09 lr 0.000555 wd 0.0500 time 0.2585 (0.2639) data time 0.0008 (0.0025) model time 0.2577 (0.2594) loss 6.2726 (5.8291) grad_norm 1.8671 (2.3326) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][370/625] eta 0:01:07 lr 0.000555 wd 0.0500 time 0.2676 (0.2637) data time 0.0007 (0.0024) model time 0.2669 (0.2593) loss 4.8109 (5.8224) grad_norm 3.1198 (2.3286) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][380/625] eta 0:01:04 lr 0.000555 wd 0.0500 time 0.2550 (0.2635) data time 0.0007 (0.0024) model time 0.2543 (0.2592) loss 5.5764 (5.8208) grad_norm 2.9560 (2.3341) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][390/625] eta 0:01:01 lr 0.000554 wd 0.0500 time 0.2585 (0.2633) data time 0.0018 (0.0023) model time 0.2567 (0.2590) loss 5.8403 (5.8219) grad_norm 2.9930 (2.3339) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][400/625] eta 0:00:59 lr 0.000554 wd 0.0500 time 0.2525 (0.2631) data time 0.0008 (0.0023) model time 0.2517 (0.2589) loss 6.4905 (5.8225) grad_norm 2.5941 (2.3477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][410/625] eta 0:00:56 lr 0.000554 wd 0.0500 time 0.2550 (0.2630) data time 0.0008 (0.0023) model time 0.2543 (0.2588) loss 5.3936 (5.8191) grad_norm 2.0468 (2.3439) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][420/625] eta 0:00:53 lr 0.000554 wd 0.0500 time 0.2601 (0.2628) data time 0.0008 (0.0022) model time 0.2593 (0.2587) loss 5.0867 (5.8163) grad_norm 4.6557 (2.3642) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][430/625] eta 0:00:51 lr 0.000554 wd 0.0500 time 0.2516 (0.2626) data time 0.0008 (0.0022) model time 0.2509 (0.2586) loss 5.3009 (5.8096) grad_norm 2.4581 (2.3652) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][440/625] eta 0:00:48 lr 0.000554 wd 0.0500 time 0.2553 (0.2625) data time 0.0007 (0.0022) model time 0.2546 (0.2585) loss 7.7710 (5.8194) grad_norm 3.4643 (2.3546) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][450/625] eta 0:00:45 lr 0.000553 wd 0.0500 time 0.2555 (0.2623) data time 0.0009 (0.0022) model time 0.2546 (0.2584) loss 5.1553 (5.8206) grad_norm 2.1939 (2.3493) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][460/625] eta 0:00:43 lr 0.000553 wd 0.0500 time 0.2545 (0.2625) data time 0.0009 (0.0021) model time 0.2536 (0.2587) loss 5.0959 (5.8250) grad_norm 1.8877 (2.3612) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][470/625] eta 0:00:40 lr 0.000553 wd 0.0500 time 0.2532 (0.2623) data time 0.0011 (0.0021) model time 0.2521 (0.2586) loss 6.4190 (5.8285) grad_norm 2.7575 (2.3720) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][480/625] eta 0:00:38 lr 0.000553 wd 0.0500 time 0.2583 (0.2622) data time 0.0008 (0.0021) model time 0.2575 (0.2585) loss 5.1152 (5.8221) grad_norm 4.0580 (2.3755) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][490/625] eta 0:00:35 lr 0.000553 wd 0.0500 time 0.2516 (0.2620) data time 0.0008 (0.0021) model time 0.2507 (0.2584) loss 5.8439 (5.8229) grad_norm 2.0574 (2.3807) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][500/625] eta 0:00:32 lr 0.000553 wd 0.0500 time 0.2566 (0.2619) data time 0.0006 (0.0020) model time 0.2560 (0.2583) loss 5.9075 (5.8227) grad_norm 1.5890 (2.3899) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][510/625] eta 0:00:30 lr 0.000552 wd 0.0500 time 0.2587 (0.2622) data time 0.0008 (0.0020) model time 0.2579 (0.2587) loss 5.2431 (5.8258) grad_norm 1.6506 (2.3885) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][520/625] eta 0:00:27 lr 0.000552 wd 0.0500 time 0.2549 (0.2621) data time 0.0008 (0.0020) model time 0.2541 (0.2586) loss 5.0934 (5.8244) grad_norm 3.8155 (2.4055) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][530/625] eta 0:00:24 lr 0.000552 wd 0.0500 time 0.2547 (0.2620) data time 0.0007 (0.0020) model time 0.2540 (0.2585) loss 5.8891 (5.8294) grad_norm 1.9532 (2.4008) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][540/625] eta 0:00:22 lr 0.000552 wd 0.0500 time 0.2554 (0.2618) data time 0.0012 (0.0020) model time 0.2542 (0.2584) loss 6.5330 (5.8340) grad_norm 1.5806 (2.3910) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][550/625] eta 0:00:19 lr 0.000552 wd 0.0500 time 0.2513 (0.2617) data time 0.0008 (0.0019) model time 0.2505 (0.2583) loss 5.6078 (5.8314) grad_norm 1.9194 (2.3835) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][560/625] eta 0:00:17 lr 0.000552 wd 0.0500 time 0.2577 (0.2619) data time 0.0011 (0.0019) model time 0.2567 (0.2586) loss 6.4542 (5.8271) grad_norm 2.2846 (2.3765) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][570/625] eta 0:00:14 lr 0.000552 wd 0.0500 time 0.2543 (0.2621) data time 0.0007 (0.0019) model time 0.2536 (0.2589) loss 6.5240 (5.8318) grad_norm 2.4767 (2.3793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][580/625] eta 0:00:11 lr 0.000551 wd 0.0500 time 0.2554 (0.2620) data time 0.0009 (0.0019) model time 0.2546 (0.2588) loss 5.3387 (5.8367) grad_norm 1.8917 (2.3756) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][590/625] eta 0:00:09 lr 0.000551 wd 0.0500 time 0.2531 (0.2619) data time 0.0009 (0.0019) model time 0.2522 (0.2587) loss 6.2309 (5.8410) grad_norm 2.0350 (2.3700) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][600/625] eta 0:00:06 lr 0.000551 wd 0.0500 time 0.2536 (0.2618) data time 0.0009 (0.0018) model time 0.2527 (0.2587) loss 5.8538 (5.8382) grad_norm 2.4825 (2.3682) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][610/625] eta 0:00:03 lr 0.000551 wd 0.0500 time 0.2552 (0.2617) data time 0.0006 (0.0018) model time 0.2546 (0.2586) loss 5.9461 (5.8403) grad_norm 1.2522 (2.3678) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][620/625] eta 0:00:01 lr 0.000551 wd 0.0500 time 0.2534 (0.2616) data time 0.0005 (0.0018) model time 0.2529 (0.2585) loss 6.5770 (5.8440) grad_norm 2.5781 (2.3782) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 202 training takes 0:02:43 [2024-08-04 07:05:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:05:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:05:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.638 (0.638) Loss 0.6074 (0.6074) Acc@1 89.502 (89.502) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:05:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 0.9365 (0.7403) Acc@1 80.371 (85.804) Acc@5 95.801 (97.505) Mem 9655MB [2024-08-04 07:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.0703 (0.8709) Acc@1 76.172 (82.436) Acc@5 94.482 (96.136) Mem 9655MB [2024-08-04 07:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.142 Acc@5 96.137 [2024-08-04 07:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 07:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.14% [2024-08-04 07:05:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:05:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.549 (0.549) Loss 0.5811 (0.5811) Acc@1 89.844 (89.844) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9194 (0.7164) Acc@1 80.615 (86.257) Acc@5 96.094 (97.612) Mem 9655MB [2024-08-04 07:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0430 (0.8419) Acc@1 76.025 (82.824) Acc@5 94.873 (96.308) Mem 9655MB [2024-08-04 07:05:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.536 Acc@5 96.299 [2024-08-04 07:05:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:05:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.54% [2024-08-04 07:05:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:05:28 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][0/625] eta 0:08:17 lr 0.000551 wd 0.0500 time 0.7966 (0.7966) data time 0.5453 (0.5453) model time 0.0000 (0.0000) loss 4.8358 (4.8358) grad_norm 3.3236 (3.3236) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][10/625] eta 0:03:06 lr 0.000551 wd 0.0500 time 0.2521 (0.3036) data time 0.0010 (0.0513) model time 0.0000 (0.0000) loss 6.2715 (5.6682) grad_norm 2.6760 (3.4638) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][20/625] eta 0:02:49 lr 0.000550 wd 0.0500 time 0.2570 (0.2807) data time 0.0006 (0.0273) model time 0.0000 (0.0000) loss 6.0892 (5.9205) grad_norm 2.4299 (3.0989) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][30/625] eta 0:02:44 lr 0.000550 wd 0.0500 time 0.3740 (0.2762) data time 0.0006 (0.0188) model time 0.0000 (0.0000) loss 7.0178 (6.0227) grad_norm 1.8960 (2.7190) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][40/625] eta 0:02:38 lr 0.000550 wd 0.0500 time 0.2570 (0.2709) data time 0.0005 (0.0144) model time 0.0000 (0.0000) loss 5.2436 (5.9181) grad_norm 2.7603 (2.5656) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][50/625] eta 0:02:34 lr 0.000550 wd 0.0500 time 0.2512 (0.2679) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 5.6082 (5.8516) grad_norm 1.8765 (2.4142) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][60/625] eta 0:02:30 lr 0.000550 wd 0.0500 time 0.2580 (0.2658) data time 0.0008 (0.0100) model time 0.2572 (0.2541) loss 5.9266 (5.8689) grad_norm 2.2068 (2.3863) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][70/625] eta 0:02:26 lr 0.000550 wd 0.0500 time 0.2513 (0.2642) data time 0.0011 (0.0087) model time 0.2503 (0.2540) loss 6.5641 (5.9008) grad_norm 2.3745 (2.4171) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][80/625] eta 0:02:23 lr 0.000549 wd 0.0500 time 0.2494 (0.2631) data time 0.0008 (0.0077) model time 0.2486 (0.2541) loss 5.6163 (5.8782) grad_norm 3.6895 (2.4442) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][90/625] eta 0:02:20 lr 0.000549 wd 0.0500 time 0.2520 (0.2624) data time 0.0007 (0.0070) model time 0.2513 (0.2544) loss 6.4800 (5.9279) grad_norm 2.8509 (2.3819) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][100/625] eta 0:02:17 lr 0.000549 wd 0.0500 time 0.2573 (0.2617) data time 0.0007 (0.0064) model time 0.2566 (0.2545) loss 5.7262 (5.9636) grad_norm 1.4315 (2.3368) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][110/625] eta 0:02:16 lr 0.000549 wd 0.0500 time 0.2507 (0.2660) data time 0.0010 (0.0059) model time 0.2497 (0.2635) loss 6.0006 (5.9749) grad_norm 1.4324 (2.3353) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][120/625] eta 0:02:14 lr 0.000549 wd 0.0500 time 0.2575 (0.2667) data time 0.0008 (0.0055) model time 0.2567 (0.2650) loss 5.9644 (5.9244) grad_norm 2.2007 (2.3085) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][130/625] eta 0:02:11 lr 0.000549 wd 0.0500 time 0.2561 (0.2659) data time 0.0007 (0.0051) model time 0.2554 (0.2637) loss 5.4504 (5.9258) grad_norm 3.3063 (2.3020) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][140/625] eta 0:02:09 lr 0.000548 wd 0.0500 time 0.2567 (0.2664) data time 0.0010 (0.0048) model time 0.2557 (0.2647) loss 5.3619 (5.9310) grad_norm 1.7031 (2.3341) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][150/625] eta 0:02:06 lr 0.000548 wd 0.0500 time 0.2541 (0.2670) data time 0.0009 (0.0046) model time 0.2532 (0.2657) loss 5.2778 (5.9345) grad_norm 1.3590 (2.3168) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][160/625] eta 0:02:03 lr 0.000548 wd 0.0500 time 0.2553 (0.2663) data time 0.0013 (0.0043) model time 0.2540 (0.2646) loss 5.8118 (5.9446) grad_norm 1.8530 (2.2913) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][170/625] eta 0:02:00 lr 0.000548 wd 0.0500 time 0.2561 (0.2657) data time 0.0009 (0.0041) model time 0.2552 (0.2639) loss 5.6418 (5.9363) grad_norm 2.3850 (2.2945) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][180/625] eta 0:01:58 lr 0.000548 wd 0.0500 time 0.2575 (0.2652) data time 0.0006 (0.0040) model time 0.2569 (0.2632) loss 5.7482 (5.9396) grad_norm 2.2171 (2.3354) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][190/625] eta 0:01:55 lr 0.000548 wd 0.0500 time 0.3884 (0.2654) data time 0.0008 (0.0038) model time 0.3875 (0.2636) loss 5.1347 (5.9294) grad_norm 2.2631 (2.3354) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][200/625] eta 0:01:52 lr 0.000548 wd 0.0500 time 0.2547 (0.2649) data time 0.0010 (0.0037) model time 0.2536 (0.2630) loss 6.8439 (5.9354) grad_norm 1.6035 (2.3079) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][210/625] eta 0:01:50 lr 0.000547 wd 0.0500 time 0.2547 (0.2664) data time 0.0007 (0.0035) model time 0.2540 (0.2650) loss 6.8191 (5.9455) grad_norm 2.0387 (2.2894) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][220/625] eta 0:01:47 lr 0.000547 wd 0.0500 time 0.2581 (0.2660) data time 0.0007 (0.0034) model time 0.2574 (0.2645) loss 5.1905 (5.9366) grad_norm 2.5240 (2.2860) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][230/625] eta 0:01:45 lr 0.000547 wd 0.0500 time 0.2590 (0.2661) data time 0.0009 (0.0033) model time 0.2581 (0.2647) loss 4.7477 (5.9339) grad_norm 1.6635 (2.2837) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][240/625] eta 0:01:42 lr 0.000547 wd 0.0500 time 0.2575 (0.2657) data time 0.0010 (0.0032) model time 0.2565 (0.2642) loss 6.1552 (5.9276) grad_norm 2.2001 (2.2872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][250/625] eta 0:01:39 lr 0.000547 wd 0.0500 time 0.2532 (0.2653) data time 0.0007 (0.0031) model time 0.2524 (0.2637) loss 5.9296 (5.9157) grad_norm 2.2212 (2.2839) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][260/625] eta 0:01:36 lr 0.000547 wd 0.0500 time 0.2516 (0.2650) data time 0.0009 (0.0030) model time 0.2507 (0.2633) loss 5.9788 (5.9198) grad_norm 2.9910 (2.2750) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][270/625] eta 0:01:33 lr 0.000546 wd 0.0500 time 0.2546 (0.2646) data time 0.0008 (0.0030) model time 0.2538 (0.2630) loss 5.8615 (5.9275) grad_norm 2.0508 (2.2847) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][280/625] eta 0:01:31 lr 0.000546 wd 0.0500 time 0.2538 (0.2643) data time 0.0009 (0.0029) model time 0.2529 (0.2626) loss 5.7879 (5.9257) grad_norm 2.3141 (2.2992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][290/625] eta 0:01:28 lr 0.000546 wd 0.0500 time 0.2575 (0.2640) data time 0.0009 (0.0028) model time 0.2566 (0.2623) loss 6.5715 (5.9310) grad_norm 1.2740 (2.2914) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][300/625] eta 0:01:25 lr 0.000546 wd 0.0500 time 0.2563 (0.2637) data time 0.0007 (0.0028) model time 0.2556 (0.2619) loss 4.9085 (5.9246) grad_norm 2.1233 (2.3442) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][310/625] eta 0:01:22 lr 0.000546 wd 0.0500 time 0.2538 (0.2635) data time 0.0009 (0.0027) model time 0.2529 (0.2617) loss 5.0369 (5.9176) grad_norm 3.6222 (2.3536) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][320/625] eta 0:01:20 lr 0.000546 wd 0.0500 time 0.2512 (0.2632) data time 0.0011 (0.0026) model time 0.2501 (0.2614) loss 6.0724 (5.9220) grad_norm 2.2913 (2.3498) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][330/625] eta 0:01:17 lr 0.000545 wd 0.0500 time 0.2550 (0.2636) data time 0.0007 (0.0026) model time 0.2542 (0.2618) loss 4.6526 (5.9101) grad_norm 3.2197 (2.3588) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:06:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][340/625] eta 0:01:15 lr 0.000545 wd 0.0500 time 0.2582 (0.2633) data time 0.0008 (0.0025) model time 0.2574 (0.2616) loss 6.4096 (5.9191) grad_norm 2.3527 (2.3856) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][350/625] eta 0:01:12 lr 0.000545 wd 0.0500 time 0.2573 (0.2631) data time 0.0008 (0.0025) model time 0.2565 (0.2614) loss 5.8522 (5.9101) grad_norm 1.6249 (2.3687) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][360/625] eta 0:01:09 lr 0.000545 wd 0.0500 time 0.2589 (0.2635) data time 0.0006 (0.0025) model time 0.2582 (0.2619) loss 5.2511 (5.8987) grad_norm 1.5211 (2.3615) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][370/625] eta 0:01:07 lr 0.000545 wd 0.0500 time 0.2532 (0.2633) data time 0.0007 (0.0024) model time 0.2525 (0.2616) loss 5.2114 (5.8983) grad_norm 2.4144 (2.3645) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][380/625] eta 0:01:04 lr 0.000545 wd 0.0500 time 0.2564 (0.2631) data time 0.0008 (0.0024) model time 0.2556 (0.2614) loss 5.3105 (5.9044) grad_norm 2.0207 (2.3633) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][390/625] eta 0:01:01 lr 0.000545 wd 0.0500 time 0.2565 (0.2629) data time 0.0007 (0.0023) model time 0.2558 (0.2612) loss 5.9281 (5.9001) grad_norm 2.3627 (2.3553) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][400/625] eta 0:00:59 lr 0.000544 wd 0.0500 time 0.2529 (0.2631) data time 0.0010 (0.0023) model time 0.2519 (0.2614) loss 6.2212 (5.8920) grad_norm 1.8367 (2.3462) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][410/625] eta 0:00:56 lr 0.000544 wd 0.0500 time 0.2554 (0.2629) data time 0.0009 (0.0023) model time 0.2545 (0.2613) loss 6.7902 (5.8923) grad_norm 2.3119 (2.3353) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][420/625] eta 0:00:53 lr 0.000544 wd 0.0500 time 0.2557 (0.2627) data time 0.0008 (0.0022) model time 0.2548 (0.2611) loss 5.4611 (5.8829) grad_norm 2.3173 (2.3476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][430/625] eta 0:00:51 lr 0.000544 wd 0.0500 time 0.2545 (0.2626) data time 0.0007 (0.0022) model time 0.2538 (0.2609) loss 5.2721 (5.8851) grad_norm 1.5653 (2.3505) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][440/625] eta 0:00:48 lr 0.000544 wd 0.0500 time 0.2596 (0.2624) data time 0.0009 (0.0022) model time 0.2587 (0.2608) loss 6.4198 (5.8775) grad_norm 1.7215 (2.3441) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][450/625] eta 0:00:46 lr 0.000544 wd 0.0500 time 0.2570 (0.2631) data time 0.0009 (0.0021) model time 0.2562 (0.2616) loss 6.9085 (5.8836) grad_norm 2.9677 (2.3450) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][460/625] eta 0:00:43 lr 0.000543 wd 0.0500 time 0.2588 (0.2630) data time 0.0007 (0.0021) model time 0.2581 (0.2615) loss 5.5998 (5.8825) grad_norm 2.0350 (2.3462) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][470/625] eta 0:00:40 lr 0.000543 wd 0.0500 time 0.2584 (0.2628) data time 0.0006 (0.0021) model time 0.2578 (0.2613) loss 5.6717 (5.8766) grad_norm 2.2140 (2.3405) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][480/625] eta 0:00:38 lr 0.000543 wd 0.0500 time 0.2592 (0.2630) data time 0.0006 (0.0021) model time 0.2586 (0.2615) loss 4.8465 (5.8749) grad_norm 2.0321 (2.3315) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][490/625] eta 0:00:35 lr 0.000543 wd 0.0500 time 0.2553 (0.2629) data time 0.0006 (0.0020) model time 0.2546 (0.2614) loss 5.7314 (5.8699) grad_norm 1.9524 (2.3265) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][500/625] eta 0:00:32 lr 0.000543 wd 0.0500 time 0.2582 (0.2627) data time 0.0007 (0.0020) model time 0.2575 (0.2612) loss 6.7076 (5.8665) grad_norm 1.4363 (2.3642) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][510/625] eta 0:00:30 lr 0.000543 wd 0.0500 time 0.2572 (0.2626) data time 0.0006 (0.0020) model time 0.2566 (0.2611) loss 6.4270 (5.8697) grad_norm 1.7414 (2.3600) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][520/625] eta 0:00:27 lr 0.000543 wd 0.0500 time 0.2571 (0.2625) data time 0.0008 (0.0020) model time 0.2563 (0.2610) loss 6.4228 (5.8720) grad_norm 2.6536 (2.3570) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][530/625] eta 0:00:24 lr 0.000542 wd 0.0500 time 0.2560 (0.2624) data time 0.0007 (0.0020) model time 0.2554 (0.2609) loss 6.3152 (5.8697) grad_norm 3.2431 (2.3537) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][540/625] eta 0:00:22 lr 0.000542 wd 0.0500 time 0.2536 (0.2622) data time 0.0011 (0.0019) model time 0.2525 (0.2607) loss 4.8340 (5.8716) grad_norm 4.2059 (2.4069) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][550/625] eta 0:00:19 lr 0.000542 wd 0.0500 time 0.2531 (0.2621) data time 0.0007 (0.0019) model time 0.2523 (0.2606) loss 5.3550 (5.8729) grad_norm 2.1251 (2.4123) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][560/625] eta 0:00:17 lr 0.000542 wd 0.0500 time 0.2535 (0.2620) data time 0.0007 (0.0019) model time 0.2528 (0.2605) loss 6.5850 (5.8825) grad_norm 1.4917 (2.4023) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][570/625] eta 0:00:14 lr 0.000542 wd 0.0500 time 0.2554 (0.2619) data time 0.0010 (0.0019) model time 0.2544 (0.2604) loss 5.9901 (5.8791) grad_norm 2.1758 (2.3974) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][580/625] eta 0:00:11 lr 0.000542 wd 0.0500 time 0.2513 (0.2618) data time 0.0009 (0.0019) model time 0.2504 (0.2603) loss 6.5629 (5.8788) grad_norm 2.3364 (2.3920) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][590/625] eta 0:00:09 lr 0.000541 wd 0.0500 time 0.2537 (0.2617) data time 0.0008 (0.0018) model time 0.2529 (0.2602) loss 5.3717 (5.8718) grad_norm 2.1726 (2.3825) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][600/625] eta 0:00:06 lr 0.000541 wd 0.0500 time 0.2566 (0.2618) data time 0.0009 (0.0018) model time 0.2557 (0.2604) loss 5.7721 (5.8744) grad_norm 1.5476 (2.3757) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][610/625] eta 0:00:03 lr 0.000541 wd 0.0500 time 0.2533 (0.2621) data time 0.0004 (0.0018) model time 0.2529 (0.2606) loss 6.2758 (5.8690) grad_norm 2.7527 (2.3785) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][620/625] eta 0:00:01 lr 0.000541 wd 0.0500 time 0.2528 (0.2619) data time 0.0004 (0.0018) model time 0.2524 (0.2605) loss 5.7205 (5.8701) grad_norm 1.9305 (2.3716) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 203 training takes 0:02:43 [2024-08-04 07:08:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:08:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.532 (0.532) Loss 0.6240 (0.6240) Acc@1 89.600 (89.600) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 07:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9585 (0.7561) Acc@1 79.590 (85.662) Acc@5 95.898 (97.505) Mem 9655MB [2024-08-04 07:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0732 (0.8817) Acc@1 76.562 (82.334) Acc@5 94.629 (96.215) Mem 9655MB [2024-08-04 07:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.996 Acc@5 96.195 [2024-08-04 07:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 07:08:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.928 (0.928) Loss 0.5811 (0.5811) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.137) Loss 0.9189 (0.7164) Acc@1 80.615 (86.248) Acc@5 96.191 (97.638) Mem 9655MB [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0420 (0.8417) Acc@1 75.977 (82.833) Acc@5 94.873 (96.329) Mem 9655MB [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.546 Acc@5 96.319 [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.55% [2024-08-04 07:08:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:08:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][0/625] eta 0:08:12 lr 0.000541 wd 0.0500 time 0.7873 (0.7873) data time 0.5397 (0.5397) model time 0.0000 (0.0000) loss 5.7327 (5.7327) grad_norm 1.6594 (1.6594) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:08:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][10/625] eta 0:03:07 lr 0.000541 wd 0.0500 time 0.2560 (0.3046) data time 0.0007 (0.0499) model time 0.0000 (0.0000) loss 5.1333 (6.0982) grad_norm 1.2948 (1.9489) loss_scale 1024.0000 (651.6364) mem 9655MB [2024-08-04 07:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][20/625] eta 0:02:50 lr 0.000541 wd 0.0500 time 0.2591 (0.2818) data time 0.0008 (0.0265) model time 0.0000 (0.0000) loss 6.1654 (6.0279) grad_norm 3.9532 (2.4392) loss_scale 1024.0000 (828.9524) mem 9655MB [2024-08-04 07:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][30/625] eta 0:02:42 lr 0.000540 wd 0.0500 time 0.2692 (0.2739) data time 0.0010 (0.0183) model time 0.0000 (0.0000) loss 6.4841 (5.9273) grad_norm 1.7631 (2.4524) loss_scale 1024.0000 (891.8710) mem 9655MB [2024-08-04 07:08:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][40/625] eta 0:02:40 lr 0.000540 wd 0.0500 time 0.2531 (0.2743) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 6.3365 (5.8857) grad_norm 2.8279 (2.4234) loss_scale 1024.0000 (924.0976) mem 9655MB [2024-08-04 07:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][50/625] eta 0:02:35 lr 0.000540 wd 0.0500 time 0.2565 (0.2706) data time 0.0011 (0.0114) model time 0.0000 (0.0000) loss 5.2819 (5.8143) grad_norm 1.7688 (2.3255) loss_scale 1024.0000 (943.6863) mem 9655MB [2024-08-04 07:08:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][60/625] eta 0:02:31 lr 0.000540 wd 0.0500 time 0.2574 (0.2683) data time 0.0005 (0.0097) model time 0.2569 (0.2554) loss 6.5925 (5.8277) grad_norm 1.7382 (2.2207) loss_scale 1024.0000 (956.8525) mem 9655MB [2024-08-04 07:08:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][70/625] eta 0:02:28 lr 0.000540 wd 0.0500 time 0.2547 (0.2668) data time 0.0009 (0.0085) model time 0.2538 (0.2560) loss 6.0873 (5.8508) grad_norm 1.5296 (2.1652) loss_scale 1024.0000 (966.3099) mem 9655MB [2024-08-04 07:08:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][80/625] eta 0:02:24 lr 0.000540 wd 0.0500 time 0.2566 (0.2655) data time 0.0009 (0.0076) model time 0.2557 (0.2559) loss 5.4886 (5.8356) grad_norm 2.1635 (2.1315) loss_scale 1024.0000 (973.4321) mem 9655MB [2024-08-04 07:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][90/625] eta 0:02:21 lr 0.000539 wd 0.0500 time 0.2606 (0.2645) data time 0.0008 (0.0068) model time 0.2597 (0.2558) loss 6.5985 (5.8558) grad_norm 1.9540 (2.1109) loss_scale 1024.0000 (978.9890) mem 9655MB [2024-08-04 07:08:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][100/625] eta 0:02:18 lr 0.000539 wd 0.0500 time 0.2556 (0.2637) data time 0.0009 (0.0063) model time 0.2547 (0.2556) loss 4.8775 (5.7979) grad_norm 2.9747 (2.1532) loss_scale 1024.0000 (983.4455) mem 9655MB [2024-08-04 07:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][110/625] eta 0:02:15 lr 0.000539 wd 0.0500 time 0.2536 (0.2629) data time 0.0007 (0.0058) model time 0.2528 (0.2554) loss 5.5195 (5.7863) grad_norm 2.4646 (2.1905) loss_scale 1024.0000 (987.0991) mem 9655MB [2024-08-04 07:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][120/625] eta 0:02:12 lr 0.000539 wd 0.0500 time 0.2544 (0.2623) data time 0.0007 (0.0054) model time 0.2537 (0.2553) loss 5.8602 (5.8204) grad_norm 1.7144 (2.1982) loss_scale 1024.0000 (990.1488) mem 9655MB [2024-08-04 07:08:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][130/625] eta 0:02:09 lr 0.000539 wd 0.0500 time 0.2537 (0.2618) data time 0.0010 (0.0050) model time 0.2527 (0.2552) loss 6.1340 (5.8022) grad_norm 2.1105 (2.2525) loss_scale 1024.0000 (992.7328) mem 9655MB [2024-08-04 07:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][140/625] eta 0:02:06 lr 0.000539 wd 0.0500 time 0.2539 (0.2613) data time 0.0007 (0.0047) model time 0.2531 (0.2551) loss 5.3417 (5.7971) grad_norm 1.7957 (2.2301) loss_scale 1024.0000 (994.9504) mem 9655MB [2024-08-04 07:08:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][150/625] eta 0:02:03 lr 0.000539 wd 0.0500 time 0.2548 (0.2609) data time 0.0009 (0.0045) model time 0.2539 (0.2550) loss 4.4646 (5.7807) grad_norm 2.9687 (2.2947) loss_scale 1024.0000 (996.8742) mem 9655MB [2024-08-04 07:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][160/625] eta 0:02:01 lr 0.000538 wd 0.0500 time 0.2554 (0.2606) data time 0.0010 (0.0043) model time 0.2545 (0.2550) loss 6.2254 (5.7746) grad_norm 4.0269 (2.3334) loss_scale 1024.0000 (998.5590) mem 9655MB [2024-08-04 07:09:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][170/625] eta 0:01:58 lr 0.000538 wd 0.0500 time 0.2638 (0.2603) data time 0.0007 (0.0041) model time 0.2631 (0.2550) loss 5.8606 (5.7809) grad_norm 3.3755 (2.3364) loss_scale 1024.0000 (1000.0468) mem 9655MB [2024-08-04 07:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][180/625] eta 0:01:56 lr 0.000538 wd 0.0500 time 0.2550 (0.2608) data time 0.0008 (0.0039) model time 0.2542 (0.2560) loss 5.3515 (5.7916) grad_norm 4.8682 (2.4858) loss_scale 1024.0000 (1001.3702) mem 9655MB [2024-08-04 07:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][190/625] eta 0:01:53 lr 0.000538 wd 0.0500 time 0.2560 (0.2613) data time 0.0009 (0.0038) model time 0.2551 (0.2569) loss 4.4338 (5.7762) grad_norm 2.2827 (2.5366) loss_scale 1024.0000 (1002.5550) mem 9655MB [2024-08-04 07:09:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][200/625] eta 0:01:51 lr 0.000538 wd 0.0500 time 0.2555 (0.2620) data time 0.0008 (0.0036) model time 0.2547 (0.2581) loss 5.0389 (5.7764) grad_norm 2.5395 (2.5416) loss_scale 1024.0000 (1003.6219) mem 9655MB [2024-08-04 07:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][210/625] eta 0:01:48 lr 0.000538 wd 0.0500 time 0.2561 (0.2623) data time 0.0008 (0.0035) model time 0.2553 (0.2588) loss 4.6190 (5.7630) grad_norm 2.1951 (2.5322) loss_scale 1024.0000 (1004.5877) mem 9655MB [2024-08-04 07:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][220/625] eta 0:01:46 lr 0.000537 wd 0.0500 time 0.2531 (0.2620) data time 0.0009 (0.0034) model time 0.2523 (0.2585) loss 6.3438 (5.7637) grad_norm 2.1180 (2.5193) loss_scale 1024.0000 (1005.4661) mem 9655MB [2024-08-04 07:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][230/625] eta 0:01:43 lr 0.000537 wd 0.0500 time 0.2571 (0.2618) data time 0.0008 (0.0033) model time 0.2563 (0.2584) loss 6.5143 (5.7693) grad_norm 1.8937 (2.4941) loss_scale 1024.0000 (1006.2684) mem 9655MB [2024-08-04 07:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][240/625] eta 0:01:40 lr 0.000537 wd 0.0500 time 0.2543 (0.2615) data time 0.0006 (0.0032) model time 0.2537 (0.2581) loss 6.9634 (5.7783) grad_norm 2.0039 (2.5229) loss_scale 1024.0000 (1007.0041) mem 9655MB [2024-08-04 07:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][250/625] eta 0:01:38 lr 0.000537 wd 0.0500 time 0.2525 (0.2613) data time 0.0010 (0.0031) model time 0.2515 (0.2580) loss 5.5230 (5.7772) grad_norm 1.5536 (2.4981) loss_scale 1024.0000 (1007.6813) mem 9655MB [2024-08-04 07:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][260/625] eta 0:01:35 lr 0.000537 wd 0.0500 time 0.2550 (0.2619) data time 0.0010 (0.0030) model time 0.2540 (0.2588) loss 6.2538 (5.7914) grad_norm 2.3828 (2.4866) loss_scale 1024.0000 (1008.3065) mem 9655MB [2024-08-04 07:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][270/625] eta 0:01:33 lr 0.000537 wd 0.0500 time 0.2606 (0.2623) data time 0.0007 (0.0029) model time 0.2600 (0.2594) loss 5.2240 (5.7842) grad_norm 2.4071 (2.4913) loss_scale 1024.0000 (1008.8856) mem 9655MB [2024-08-04 07:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][280/625] eta 0:01:30 lr 0.000536 wd 0.0500 time 0.2671 (0.2621) data time 0.0008 (0.0029) model time 0.2663 (0.2593) loss 6.2092 (5.7945) grad_norm 2.3817 (2.4768) loss_scale 1024.0000 (1009.4235) mem 9655MB [2024-08-04 07:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][290/625] eta 0:01:27 lr 0.000536 wd 0.0500 time 0.2557 (0.2619) data time 0.0009 (0.0028) model time 0.2548 (0.2591) loss 5.6856 (5.8060) grad_norm 1.3519 (2.4617) loss_scale 1024.0000 (1009.9244) mem 9655MB [2024-08-04 07:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][300/625] eta 0:01:25 lr 0.000536 wd 0.0500 time 0.2557 (0.2617) data time 0.0010 (0.0027) model time 0.2547 (0.2590) loss 4.7888 (5.8053) grad_norm 1.7761 (2.4493) loss_scale 1024.0000 (1010.3920) mem 9655MB [2024-08-04 07:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][310/625] eta 0:01:22 lr 0.000536 wd 0.0500 time 0.2576 (0.2616) data time 0.0010 (0.0027) model time 0.2566 (0.2589) loss 4.6495 (5.8083) grad_norm 3.2516 (2.4336) loss_scale 1024.0000 (1010.8296) mem 9655MB [2024-08-04 07:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][320/625] eta 0:01:19 lr 0.000536 wd 0.0500 time 0.3849 (0.2618) data time 0.0011 (0.0026) model time 0.3838 (0.2592) loss 5.5696 (5.8148) grad_norm 1.5311 (2.4376) loss_scale 1024.0000 (1011.2399) mem 9655MB [2024-08-04 07:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][330/625] eta 0:01:17 lr 0.000536 wd 0.0500 time 0.2524 (0.2616) data time 0.0009 (0.0026) model time 0.2515 (0.2590) loss 6.5370 (5.8228) grad_norm 1.7364 (2.4375) loss_scale 1024.0000 (1011.6254) mem 9655MB [2024-08-04 07:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][340/625] eta 0:01:14 lr 0.000536 wd 0.0500 time 0.2550 (0.2614) data time 0.0012 (0.0025) model time 0.2538 (0.2589) loss 6.4394 (5.8214) grad_norm 1.8789 (2.4418) loss_scale 1024.0000 (1011.9883) mem 9655MB [2024-08-04 07:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][350/625] eta 0:01:12 lr 0.000535 wd 0.0500 time 0.2580 (0.2618) data time 0.0006 (0.0025) model time 0.2574 (0.2594) loss 7.1707 (5.8252) grad_norm 3.3005 (2.4443) loss_scale 1024.0000 (1012.3305) mem 9655MB [2024-08-04 07:09:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][360/625] eta 0:01:09 lr 0.000535 wd 0.0500 time 0.2518 (0.2616) data time 0.0008 (0.0024) model time 0.2509 (0.2592) loss 6.4420 (5.8310) grad_norm 3.0859 (2.5291) loss_scale 1024.0000 (1012.6537) mem 9655MB [2024-08-04 07:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][370/625] eta 0:01:06 lr 0.000535 wd 0.0500 time 0.2532 (0.2615) data time 0.0009 (0.0024) model time 0.2524 (0.2591) loss 6.5536 (5.8380) grad_norm 2.8583 (2.5488) loss_scale 1024.0000 (1012.9596) mem 9655MB [2024-08-04 07:09:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][380/625] eta 0:01:04 lr 0.000535 wd 0.0500 time 0.2555 (0.2613) data time 0.0011 (0.0024) model time 0.2544 (0.2590) loss 6.1051 (5.8421) grad_norm 1.8515 (2.5535) loss_scale 1024.0000 (1013.2493) mem 9655MB [2024-08-04 07:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][390/625] eta 0:01:01 lr 0.000535 wd 0.0500 time 0.2553 (0.2617) data time 0.0009 (0.0023) model time 0.2544 (0.2594) loss 4.7540 (5.8399) grad_norm 1.3739 (2.5602) loss_scale 1024.0000 (1013.5243) mem 9655MB [2024-08-04 07:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][400/625] eta 0:00:58 lr 0.000535 wd 0.0500 time 0.2595 (0.2616) data time 0.0009 (0.0023) model time 0.2586 (0.2593) loss 5.6779 (5.8446) grad_norm 1.8427 (2.5395) loss_scale 1024.0000 (1013.7855) mem 9655MB [2024-08-04 07:10:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][410/625] eta 0:00:56 lr 0.000534 wd 0.0500 time 0.4661 (0.2620) data time 0.0010 (0.0023) model time 0.4651 (0.2598) loss 5.5254 (5.8398) grad_norm 1.3428 (2.5246) loss_scale 1024.0000 (1014.0341) mem 9655MB [2024-08-04 07:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][420/625] eta 0:00:53 lr 0.000534 wd 0.0500 time 0.4778 (0.2629) data time 0.0008 (0.0022) model time 0.4769 (0.2609) loss 6.7690 (5.8460) grad_norm 3.5795 (2.5224) loss_scale 1024.0000 (1014.2708) mem 9655MB [2024-08-04 07:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][430/625] eta 0:00:51 lr 0.000534 wd 0.0500 time 0.2501 (0.2628) data time 0.0010 (0.0022) model time 0.2491 (0.2608) loss 5.4174 (5.8444) grad_norm 1.8281 (2.5304) loss_scale 1024.0000 (1014.4965) mem 9655MB [2024-08-04 07:10:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][440/625] eta 0:00:48 lr 0.000534 wd 0.0500 time 0.2546 (0.2626) data time 0.0008 (0.0022) model time 0.2538 (0.2606) loss 5.2065 (5.8446) grad_norm 1.4645 (2.5206) loss_scale 1024.0000 (1014.7120) mem 9655MB [2024-08-04 07:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][450/625] eta 0:00:46 lr 0.000534 wd 0.0500 time 0.2583 (0.2629) data time 0.0006 (0.0021) model time 0.2578 (0.2609) loss 6.1594 (5.8437) grad_norm 1.3476 (2.5081) loss_scale 1024.0000 (1014.9180) mem 9655MB [2024-08-04 07:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][460/625] eta 0:00:43 lr 0.000534 wd 0.0500 time 0.2579 (0.2627) data time 0.0009 (0.0021) model time 0.2570 (0.2608) loss 6.3996 (5.8441) grad_norm 1.4778 (2.5000) loss_scale 1024.0000 (1015.1150) mem 9655MB [2024-08-04 07:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][470/625] eta 0:00:40 lr 0.000534 wd 0.0500 time 0.2538 (0.2626) data time 0.0007 (0.0021) model time 0.2531 (0.2606) loss 5.9252 (5.8482) grad_norm 2.4610 (2.4905) loss_scale 1024.0000 (1015.3036) mem 9655MB [2024-08-04 07:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][480/625] eta 0:00:38 lr 0.000533 wd 0.0500 time 0.2537 (0.2624) data time 0.0007 (0.0021) model time 0.2530 (0.2605) loss 5.1184 (5.8441) grad_norm 1.6428 (2.4747) loss_scale 1024.0000 (1015.4844) mem 9655MB [2024-08-04 07:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][490/625] eta 0:00:35 lr 0.000533 wd 0.0500 time 0.2546 (0.2623) data time 0.0009 (0.0020) model time 0.2538 (0.2604) loss 5.2732 (5.8427) grad_norm 4.4647 (2.4762) loss_scale 1024.0000 (1015.6578) mem 9655MB [2024-08-04 07:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][500/625] eta 0:00:32 lr 0.000533 wd 0.0500 time 0.2566 (0.2622) data time 0.0009 (0.0020) model time 0.2557 (0.2602) loss 6.8575 (5.8471) grad_norm 3.8954 (2.4807) loss_scale 1024.0000 (1015.8244) mem 9655MB [2024-08-04 07:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][510/625] eta 0:00:30 lr 0.000533 wd 0.0500 time 0.2592 (0.2620) data time 0.0006 (0.0020) model time 0.2586 (0.2601) loss 4.1015 (5.8446) grad_norm 2.2586 (2.4759) loss_scale 1024.0000 (1015.9843) mem 9655MB [2024-08-04 07:10:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][520/625] eta 0:00:27 lr 0.000533 wd 0.0500 time 0.2641 (0.2619) data time 0.0008 (0.0020) model time 0.2632 (0.2600) loss 4.9101 (5.8488) grad_norm 2.8319 (2.4786) loss_scale 1024.0000 (1016.1382) mem 9655MB [2024-08-04 07:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][530/625] eta 0:00:24 lr 0.000533 wd 0.0500 time 0.2597 (0.2618) data time 0.0011 (0.0020) model time 0.2585 (0.2599) loss 6.4994 (5.8486) grad_norm 2.0603 (2.4806) loss_scale 1024.0000 (1016.2863) mem 9655MB [2024-08-04 07:10:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][540/625] eta 0:00:22 lr 0.000532 wd 0.0500 time 0.2555 (0.2617) data time 0.0008 (0.0019) model time 0.2547 (0.2599) loss 5.6795 (5.8497) grad_norm 1.8203 (2.4735) loss_scale 1024.0000 (1016.4288) mem 9655MB [2024-08-04 07:10:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][550/625] eta 0:00:19 lr 0.000532 wd 0.0500 time 0.2556 (0.2617) data time 0.0015 (0.0019) model time 0.2541 (0.2598) loss 6.7382 (5.8519) grad_norm 1.9102 (2.4683) loss_scale 1024.0000 (1016.5662) mem 9655MB [2024-08-04 07:10:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][560/625] eta 0:00:17 lr 0.000532 wd 0.0500 time 0.2562 (0.2616) data time 0.0010 (0.0019) model time 0.2552 (0.2597) loss 6.6312 (5.8503) grad_norm 1.7374 (2.4716) loss_scale 1024.0000 (1016.6988) mem 9655MB [2024-08-04 07:10:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][570/625] eta 0:00:14 lr 0.000532 wd 0.0500 time 0.2576 (0.2615) data time 0.0009 (0.0019) model time 0.2567 (0.2596) loss 6.3828 (5.8493) grad_norm 3.1284 (2.4991) loss_scale 1024.0000 (1016.8266) mem 9655MB [2024-08-04 07:10:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][580/625] eta 0:00:11 lr 0.000532 wd 0.0500 time 0.2556 (0.2617) data time 0.0009 (0.0019) model time 0.2547 (0.2599) loss 5.5839 (5.8513) grad_norm 3.0183 (2.4985) loss_scale 1024.0000 (1016.9501) mem 9655MB [2024-08-04 07:10:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][590/625] eta 0:00:09 lr 0.000532 wd 0.0500 time 0.2529 (0.2616) data time 0.0011 (0.0019) model time 0.2518 (0.2598) loss 6.1609 (5.8609) grad_norm 1.6989 (2.4877) loss_scale 1024.0000 (1017.0694) mem 9655MB [2024-08-04 07:10:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][600/625] eta 0:00:06 lr 0.000532 wd 0.0500 time 0.2521 (0.2615) data time 0.0006 (0.0019) model time 0.2515 (0.2597) loss 4.6220 (5.8632) grad_norm 1.7353 (2.4747) loss_scale 1024.0000 (1017.1847) mem 9655MB [2024-08-04 07:10:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][610/625] eta 0:00:03 lr 0.000531 wd 0.0500 time 0.2535 (0.2614) data time 0.0006 (0.0018) model time 0.2529 (0.2596) loss 5.8154 (5.8611) grad_norm 3.1212 (2.4758) loss_scale 1024.0000 (1017.2962) mem 9655MB [2024-08-04 07:10:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][620/625] eta 0:00:01 lr 0.000531 wd 0.0500 time 0.2534 (0.2613) data time 0.0003 (0.0018) model time 0.2531 (0.2595) loss 5.6776 (5.8601) grad_norm 3.5157 (2.4758) loss_scale 1024.0000 (1017.4042) mem 9655MB [2024-08-04 07:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 204 training takes 0:02:43 [2024-08-04 07:11:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:11:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:11:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.552 (0.552) Loss 0.6226 (0.6226) Acc@1 89.160 (89.160) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 07:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9600 (0.7507) Acc@1 79.688 (85.671) Acc@5 95.459 (97.421) Mem 9655MB [2024-08-04 07:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0938 (0.8807) Acc@1 75.537 (82.278) Acc@5 94.922 (96.101) Mem 9655MB [2024-08-04 07:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.972 Acc@5 96.113 [2024-08-04 07:11:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 07:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.845 (0.845) Loss 0.5815 (0.5815) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:11:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.9175 (0.7158) Acc@1 80.664 (86.222) Acc@5 96.191 (97.647) Mem 9655MB [2024-08-04 07:11:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0430 (0.8413) Acc@1 76.074 (82.819) Acc@5 94.873 (96.338) Mem 9655MB [2024-08-04 07:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.536 Acc@5 96.327 [2024-08-04 07:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][0/625] eta 0:11:33 lr 0.000531 wd 0.0500 time 1.1098 (1.1098) data time 0.6260 (0.6260) model time 0.0000 (0.0000) loss 7.0227 (7.0227) grad_norm 2.2561 (2.2561) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][10/625] eta 0:03:34 lr 0.000531 wd 0.0500 time 0.2537 (0.3481) data time 0.0008 (0.0577) model time 0.0000 (0.0000) loss 4.5263 (5.8019) grad_norm 1.9133 (1.8693) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][20/625] eta 0:03:04 lr 0.000531 wd 0.0500 time 0.2596 (0.3044) data time 0.0008 (0.0307) model time 0.0000 (0.0000) loss 6.0025 (5.7322) grad_norm 3.8361 (2.2078) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][30/625] eta 0:02:55 lr 0.000531 wd 0.0500 time 0.2529 (0.2948) data time 0.0009 (0.0211) model time 0.0000 (0.0000) loss 6.6708 (5.7486) grad_norm 2.4535 (2.1570) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][40/625] eta 0:02:49 lr 0.000530 wd 0.0500 time 0.2545 (0.2900) data time 0.0009 (0.0162) model time 0.0000 (0.0000) loss 5.3286 (5.6677) grad_norm 2.0836 (2.2107) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][50/625] eta 0:02:42 lr 0.000530 wd 0.0500 time 0.2520 (0.2834) data time 0.0008 (0.0131) model time 0.0000 (0.0000) loss 5.4799 (5.7824) grad_norm 2.0940 (2.2552) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][60/625] eta 0:02:39 lr 0.000530 wd 0.0500 time 0.2640 (0.2815) data time 0.0005 (0.0111) model time 0.2635 (0.2705) loss 5.8548 (5.8231) grad_norm 2.2889 (2.2964) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][70/625] eta 0:02:34 lr 0.000530 wd 0.0500 time 0.2514 (0.2777) data time 0.0008 (0.0098) model time 0.2506 (0.2619) loss 5.5931 (5.7910) grad_norm 1.9403 (2.2556) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:11:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][80/625] eta 0:02:29 lr 0.000530 wd 0.0500 time 0.2536 (0.2750) data time 0.0009 (0.0087) model time 0.2526 (0.2595) loss 5.6710 (5.8162) grad_norm 1.4137 (inf) loss_scale 512.0000 (1011.3580) mem 9655MB [2024-08-04 07:11:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][90/625] eta 0:02:26 lr 0.000530 wd 0.0500 time 0.2538 (0.2729) data time 0.0015 (0.0078) model time 0.2523 (0.2584) loss 5.6349 (5.8128) grad_norm 1.3927 (inf) loss_scale 512.0000 (956.4835) mem 9655MB [2024-08-04 07:11:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][100/625] eta 0:02:22 lr 0.000530 wd 0.0500 time 0.2577 (0.2713) data time 0.0005 (0.0071) model time 0.2572 (0.2579) loss 5.4643 (5.8048) grad_norm 1.6100 (inf) loss_scale 512.0000 (912.4752) mem 9655MB [2024-08-04 07:11:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][110/625] eta 0:02:20 lr 0.000529 wd 0.0500 time 0.2539 (0.2735) data time 0.0007 (0.0066) model time 0.2532 (0.2640) loss 5.5810 (5.7907) grad_norm 1.6904 (inf) loss_scale 512.0000 (876.3964) mem 9655MB [2024-08-04 07:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][120/625] eta 0:02:17 lr 0.000529 wd 0.0500 time 0.2565 (0.2721) data time 0.0006 (0.0061) model time 0.2559 (0.2628) loss 5.5989 (5.8018) grad_norm 3.7929 (inf) loss_scale 512.0000 (846.2810) mem 9655MB [2024-08-04 07:11:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][130/625] eta 0:02:14 lr 0.000529 wd 0.0500 time 0.2542 (0.2708) data time 0.0010 (0.0057) model time 0.2532 (0.2618) loss 5.0745 (5.7821) grad_norm 4.1565 (inf) loss_scale 512.0000 (820.7634) mem 9655MB [2024-08-04 07:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][140/625] eta 0:02:12 lr 0.000529 wd 0.0500 time 0.2578 (0.2726) data time 0.0007 (0.0054) model time 0.2571 (0.2655) loss 5.9461 (5.7952) grad_norm 2.9957 (inf) loss_scale 512.0000 (798.8652) mem 9655MB [2024-08-04 07:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][150/625] eta 0:02:08 lr 0.000529 wd 0.0500 time 0.2541 (0.2714) data time 0.0009 (0.0051) model time 0.2532 (0.2643) loss 4.5617 (5.8105) grad_norm 3.2450 (inf) loss_scale 512.0000 (779.8675) mem 9655MB [2024-08-04 07:11:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][160/625] eta 0:02:05 lr 0.000529 wd 0.0500 time 0.2614 (0.2705) data time 0.0007 (0.0048) model time 0.2607 (0.2636) loss 6.4948 (5.8058) grad_norm 2.3011 (inf) loss_scale 512.0000 (763.2298) mem 9655MB [2024-08-04 07:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][170/625] eta 0:02:03 lr 0.000528 wd 0.0500 time 0.2617 (0.2708) data time 0.0007 (0.0046) model time 0.2610 (0.2645) loss 5.4245 (5.8085) grad_norm 3.5607 (inf) loss_scale 512.0000 (748.5380) mem 9655MB [2024-08-04 07:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][180/625] eta 0:02:00 lr 0.000528 wd 0.0500 time 0.2892 (0.2702) data time 0.0006 (0.0044) model time 0.2886 (0.2640) loss 5.2393 (5.8031) grad_norm 1.9980 (inf) loss_scale 512.0000 (735.4696) mem 9655MB [2024-08-04 07:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][190/625] eta 0:01:57 lr 0.000528 wd 0.0500 time 0.2535 (0.2694) data time 0.0009 (0.0042) model time 0.2526 (0.2634) loss 6.3356 (5.7995) grad_norm 1.9326 (inf) loss_scale 512.0000 (723.7696) mem 9655MB [2024-08-04 07:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][200/625] eta 0:01:54 lr 0.000528 wd 0.0500 time 0.2578 (0.2688) data time 0.0009 (0.0040) model time 0.2568 (0.2628) loss 5.1735 (5.7834) grad_norm 7.8052 (inf) loss_scale 512.0000 (713.2338) mem 9655MB [2024-08-04 07:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][210/625] eta 0:01:51 lr 0.000528 wd 0.0500 time 0.2600 (0.2682) data time 0.0006 (0.0039) model time 0.2594 (0.2624) loss 5.5340 (5.7913) grad_norm 1.5931 (inf) loss_scale 512.0000 (703.6967) mem 9655MB [2024-08-04 07:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][220/625] eta 0:01:48 lr 0.000528 wd 0.0500 time 0.2573 (0.2677) data time 0.0008 (0.0038) model time 0.2565 (0.2621) loss 6.2003 (5.7980) grad_norm 1.5360 (inf) loss_scale 512.0000 (695.0226) mem 9655MB [2024-08-04 07:12:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][230/625] eta 0:01:45 lr 0.000528 wd 0.0500 time 0.2535 (0.2672) data time 0.0007 (0.0036) model time 0.2529 (0.2617) loss 5.7082 (5.7948) grad_norm 5.5838 (inf) loss_scale 512.0000 (687.0996) mem 9655MB [2024-08-04 07:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][240/625] eta 0:01:42 lr 0.000527 wd 0.0500 time 0.2589 (0.2668) data time 0.0009 (0.0035) model time 0.2579 (0.2614) loss 6.6071 (5.7910) grad_norm 3.3049 (inf) loss_scale 512.0000 (679.8340) mem 9655MB [2024-08-04 07:12:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][250/625] eta 0:01:39 lr 0.000527 wd 0.0500 time 0.2568 (0.2664) data time 0.0008 (0.0034) model time 0.2560 (0.2611) loss 6.5009 (5.7892) grad_norm 3.4697 (inf) loss_scale 512.0000 (673.1474) mem 9655MB [2024-08-04 07:12:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][260/625] eta 0:01:37 lr 0.000527 wd 0.0500 time 0.2545 (0.2660) data time 0.0007 (0.0033) model time 0.2537 (0.2608) loss 5.2531 (5.7911) grad_norm 2.2892 (inf) loss_scale 512.0000 (666.9732) mem 9655MB [2024-08-04 07:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][270/625] eta 0:01:34 lr 0.000527 wd 0.0500 time 0.2539 (0.2656) data time 0.0008 (0.0032) model time 0.2531 (0.2605) loss 6.2919 (5.7945) grad_norm 2.0858 (inf) loss_scale 512.0000 (661.2546) mem 9655MB [2024-08-04 07:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][280/625] eta 0:01:31 lr 0.000527 wd 0.0500 time 0.2608 (0.2652) data time 0.0008 (0.0032) model time 0.2600 (0.2602) loss 7.2805 (5.8010) grad_norm 2.1371 (inf) loss_scale 512.0000 (655.9431) mem 9655MB [2024-08-04 07:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][290/625] eta 0:01:28 lr 0.000527 wd 0.0500 time 0.2550 (0.2649) data time 0.0010 (0.0031) model time 0.2540 (0.2600) loss 6.3337 (5.7946) grad_norm 1.7614 (inf) loss_scale 512.0000 (650.9966) mem 9655MB [2024-08-04 07:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][300/625] eta 0:01:25 lr 0.000526 wd 0.0500 time 0.2582 (0.2646) data time 0.0010 (0.0030) model time 0.2571 (0.2598) loss 6.2824 (5.7916) grad_norm 2.1809 (inf) loss_scale 512.0000 (646.3787) mem 9655MB [2024-08-04 07:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][310/625] eta 0:01:23 lr 0.000526 wd 0.0500 time 0.2565 (0.2643) data time 0.0007 (0.0029) model time 0.2558 (0.2596) loss 5.2638 (5.7794) grad_norm 2.3029 (inf) loss_scale 512.0000 (642.0579) mem 9655MB [2024-08-04 07:12:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][320/625] eta 0:01:20 lr 0.000526 wd 0.0500 time 0.2555 (0.2640) data time 0.0010 (0.0029) model time 0.2544 (0.2594) loss 6.4823 (5.7861) grad_norm 2.4002 (inf) loss_scale 512.0000 (638.0062) mem 9655MB [2024-08-04 07:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][330/625] eta 0:01:17 lr 0.000526 wd 0.0500 time 0.2544 (0.2641) data time 0.0008 (0.0028) model time 0.2537 (0.2596) loss 6.0858 (5.7910) grad_norm 2.1550 (inf) loss_scale 512.0000 (634.1994) mem 9655MB [2024-08-04 07:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][340/625] eta 0:01:15 lr 0.000526 wd 0.0500 time 0.2531 (0.2638) data time 0.0011 (0.0028) model time 0.2520 (0.2595) loss 5.7878 (5.7954) grad_norm 1.7252 (inf) loss_scale 512.0000 (630.6158) mem 9655MB [2024-08-04 07:12:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][350/625] eta 0:01:12 lr 0.000526 wd 0.0500 time 0.2546 (0.2636) data time 0.0009 (0.0027) model time 0.2537 (0.2593) loss 5.7833 (5.7999) grad_norm 4.2189 (inf) loss_scale 512.0000 (627.2365) mem 9655MB [2024-08-04 07:12:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][360/625] eta 0:01:09 lr 0.000526 wd 0.0500 time 0.2538 (0.2634) data time 0.0010 (0.0027) model time 0.2529 (0.2592) loss 5.2378 (5.8110) grad_norm 2.8684 (inf) loss_scale 512.0000 (624.0443) mem 9655MB [2024-08-04 07:12:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][370/625] eta 0:01:07 lr 0.000525 wd 0.0500 time 0.2501 (0.2631) data time 0.0010 (0.0026) model time 0.2492 (0.2590) loss 4.6802 (5.8055) grad_norm 2.2466 (inf) loss_scale 512.0000 (621.0243) mem 9655MB [2024-08-04 07:12:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][380/625] eta 0:01:04 lr 0.000525 wd 0.0500 time 0.2505 (0.2630) data time 0.0007 (0.0026) model time 0.2498 (0.2588) loss 6.1130 (5.8077) grad_norm 2.2239 (inf) loss_scale 512.0000 (618.1627) mem 9655MB [2024-08-04 07:12:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][390/625] eta 0:01:01 lr 0.000525 wd 0.0500 time 0.2534 (0.2628) data time 0.0010 (0.0025) model time 0.2524 (0.2587) loss 6.3194 (5.8110) grad_norm 1.2581 (inf) loss_scale 512.0000 (615.4476) mem 9655MB [2024-08-04 07:12:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][400/625] eta 0:00:59 lr 0.000525 wd 0.0500 time 0.2593 (0.2626) data time 0.0007 (0.0025) model time 0.2587 (0.2586) loss 6.1161 (5.8114) grad_norm 1.6855 (inf) loss_scale 512.0000 (612.8678) mem 9655MB [2024-08-04 07:12:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][410/625] eta 0:00:56 lr 0.000525 wd 0.0500 time 0.2546 (0.2628) data time 0.0009 (0.0025) model time 0.2538 (0.2590) loss 5.2685 (5.8050) grad_norm 2.5693 (inf) loss_scale 512.0000 (610.4136) mem 9655MB [2024-08-04 07:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][420/625] eta 0:00:53 lr 0.000525 wd 0.0500 time 0.2548 (0.2627) data time 0.0008 (0.0024) model time 0.2540 (0.2589) loss 5.7165 (5.8025) grad_norm 3.4676 (inf) loss_scale 512.0000 (608.0760) mem 9655MB [2024-08-04 07:12:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][430/625] eta 0:00:51 lr 0.000524 wd 0.0500 time 0.2543 (0.2625) data time 0.0012 (0.0024) model time 0.2531 (0.2588) loss 5.8399 (5.8031) grad_norm 2.1747 (inf) loss_scale 512.0000 (605.8469) mem 9655MB [2024-08-04 07:13:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][440/625] eta 0:00:48 lr 0.000524 wd 0.0500 time 0.2570 (0.2633) data time 0.0008 (0.0023) model time 0.2563 (0.2597) loss 7.0122 (5.8108) grad_norm 2.2058 (inf) loss_scale 512.0000 (603.7188) mem 9655MB [2024-08-04 07:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][450/625] eta 0:00:46 lr 0.000524 wd 0.0500 time 0.2543 (0.2631) data time 0.0008 (0.0023) model time 0.2534 (0.2596) loss 5.0120 (5.8090) grad_norm 3.5196 (inf) loss_scale 512.0000 (601.6851) mem 9655MB [2024-08-04 07:13:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][460/625] eta 0:00:43 lr 0.000524 wd 0.0500 time 0.2565 (0.2637) data time 0.0007 (0.0023) model time 0.2558 (0.2603) loss 6.4322 (5.7997) grad_norm 2.2200 (inf) loss_scale 512.0000 (599.7397) mem 9655MB [2024-08-04 07:13:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][470/625] eta 0:00:40 lr 0.000524 wd 0.0500 time 0.2537 (0.2638) data time 0.0007 (0.0023) model time 0.2530 (0.2605) loss 6.0500 (5.7962) grad_norm 3.1854 (inf) loss_scale 512.0000 (597.8769) mem 9655MB [2024-08-04 07:13:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][480/625] eta 0:00:38 lr 0.000524 wd 0.0500 time 0.2579 (0.2636) data time 0.0008 (0.0022) model time 0.2571 (0.2604) loss 5.8184 (5.7914) grad_norm 2.4350 (inf) loss_scale 512.0000 (596.0915) mem 9655MB [2024-08-04 07:13:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][490/625] eta 0:00:35 lr 0.000524 wd 0.0500 time 0.2517 (0.2635) data time 0.0008 (0.0022) model time 0.2510 (0.2602) loss 5.6359 (5.8001) grad_norm 4.1123 (inf) loss_scale 512.0000 (594.3788) mem 9655MB [2024-08-04 07:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][500/625] eta 0:00:32 lr 0.000523 wd 0.0500 time 0.2588 (0.2633) data time 0.0009 (0.0022) model time 0.2579 (0.2601) loss 4.8336 (5.8008) grad_norm 2.1570 (inf) loss_scale 512.0000 (592.7345) mem 9655MB [2024-08-04 07:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][510/625] eta 0:00:30 lr 0.000523 wd 0.0500 time 0.2516 (0.2636) data time 0.0011 (0.0022) model time 0.2505 (0.2605) loss 5.5946 (5.8061) grad_norm 2.5560 (inf) loss_scale 512.0000 (591.1546) mem 9655MB [2024-08-04 07:13:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][520/625] eta 0:00:27 lr 0.000523 wd 0.0500 time 0.2591 (0.2634) data time 0.0009 (0.0021) model time 0.2582 (0.2603) loss 5.7307 (5.8029) grad_norm 3.7321 (inf) loss_scale 512.0000 (589.6353) mem 9655MB [2024-08-04 07:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][530/625] eta 0:00:25 lr 0.000523 wd 0.0500 time 0.2589 (0.2633) data time 0.0008 (0.0021) model time 0.2582 (0.2602) loss 6.5631 (5.7957) grad_norm 1.9360 (inf) loss_scale 512.0000 (588.1733) mem 9655MB [2024-08-04 07:13:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][540/625] eta 0:00:22 lr 0.000523 wd 0.0500 time 0.2561 (0.2631) data time 0.0007 (0.0021) model time 0.2554 (0.2601) loss 7.0112 (5.7963) grad_norm 2.2094 (inf) loss_scale 512.0000 (586.7652) mem 9655MB [2024-08-04 07:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][550/625] eta 0:00:19 lr 0.000523 wd 0.0500 time 0.2564 (0.2630) data time 0.0007 (0.0021) model time 0.2557 (0.2600) loss 6.3343 (5.7963) grad_norm 2.4757 (inf) loss_scale 512.0000 (585.4083) mem 9655MB [2024-08-04 07:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][560/625] eta 0:00:17 lr 0.000522 wd 0.0500 time 0.2569 (0.2629) data time 0.0007 (0.0020) model time 0.2562 (0.2599) loss 5.6628 (5.7974) grad_norm 1.6201 (inf) loss_scale 512.0000 (584.0998) mem 9655MB [2024-08-04 07:13:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][570/625] eta 0:00:14 lr 0.000522 wd 0.0500 time 0.2536 (0.2627) data time 0.0007 (0.0020) model time 0.2530 (0.2598) loss 4.9994 (5.7960) grad_norm 2.4275 (inf) loss_scale 512.0000 (582.8371) mem 9655MB [2024-08-04 07:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][580/625] eta 0:00:11 lr 0.000522 wd 0.0500 time 0.2569 (0.2626) data time 0.0008 (0.0020) model time 0.2562 (0.2597) loss 6.8744 (5.8000) grad_norm 1.4318 (inf) loss_scale 512.0000 (581.6179) mem 9655MB [2024-08-04 07:13:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][590/625] eta 0:00:09 lr 0.000522 wd 0.0500 time 0.2529 (0.2625) data time 0.0009 (0.0020) model time 0.2520 (0.2596) loss 6.3432 (5.7967) grad_norm 1.7494 (inf) loss_scale 512.0000 (580.4399) mem 9655MB [2024-08-04 07:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][600/625] eta 0:00:06 lr 0.000522 wd 0.0500 time 0.2573 (0.2624) data time 0.0009 (0.0020) model time 0.2565 (0.2595) loss 5.8716 (5.8013) grad_norm 1.4221 (inf) loss_scale 512.0000 (579.3012) mem 9655MB [2024-08-04 07:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][610/625] eta 0:00:03 lr 0.000522 wd 0.0500 time 0.2527 (0.2623) data time 0.0005 (0.0020) model time 0.2522 (0.2594) loss 5.9698 (5.8073) grad_norm 2.1158 (inf) loss_scale 512.0000 (578.1997) mem 9655MB [2024-08-04 07:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][620/625] eta 0:00:01 lr 0.000522 wd 0.0500 time 0.2543 (0.2621) data time 0.0003 (0.0019) model time 0.2540 (0.2593) loss 6.2052 (5.8088) grad_norm 1.7344 (inf) loss_scale 512.0000 (577.1337) mem 9655MB [2024-08-04 07:13:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 205 training takes 0:02:43 [2024-08-04 07:13:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:13:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.6187 (0.6187) Acc@1 89.404 (89.404) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9463 (0.7528) Acc@1 80.127 (85.844) Acc@5 95.459 (97.519) Mem 9655MB [2024-08-04 07:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0898 (0.8900) Acc@1 75.732 (82.250) Acc@5 94.580 (96.145) Mem 9655MB [2024-08-04 07:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.014 Acc@5 96.167 [2024-08-04 07:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-04 07:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.973 (0.973) Loss 0.5815 (0.5815) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.142) Loss 0.9185 (0.7159) Acc@1 80.811 (86.270) Acc@5 96.143 (97.643) Mem 9655MB [2024-08-04 07:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.101) Loss 1.0410 (0.8412) Acc@1 76.123 (82.859) Acc@5 94.922 (96.340) Mem 9655MB [2024-08-04 07:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.568 Acc@5 96.329 [2024-08-04 07:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.57% [2024-08-04 07:13:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:13:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][0/625] eta 0:07:11 lr 0.000521 wd 0.0500 time 0.6906 (0.6906) data time 0.4480 (0.4480) model time 0.0000 (0.0000) loss 6.2162 (6.2162) grad_norm 13.2711 (13.2711) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:13:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][10/625] eta 0:03:01 lr 0.000521 wd 0.0500 time 0.2561 (0.2944) data time 0.0007 (0.0416) model time 0.0000 (0.0000) loss 6.0921 (5.9681) grad_norm 1.9446 (3.3315) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][20/625] eta 0:02:47 lr 0.000521 wd 0.0500 time 0.2581 (0.2762) data time 0.0006 (0.0222) model time 0.0000 (0.0000) loss 6.5459 (5.9193) grad_norm 1.6840 (3.1596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][30/625] eta 0:02:43 lr 0.000521 wd 0.0500 time 0.2556 (0.2755) data time 0.0011 (0.0153) model time 0.0000 (0.0000) loss 6.0217 (5.9280) grad_norm 1.8569 (2.8858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][40/625] eta 0:02:38 lr 0.000521 wd 0.0500 time 0.2580 (0.2710) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 6.2579 (5.9054) grad_norm 4.0177 (2.7503) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][50/625] eta 0:02:34 lr 0.000521 wd 0.0500 time 0.2540 (0.2682) data time 0.0007 (0.0097) model time 0.0000 (0.0000) loss 5.8249 (5.8766) grad_norm 2.2239 (2.6346) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][60/625] eta 0:02:30 lr 0.000520 wd 0.0500 time 0.2582 (0.2660) data time 0.0006 (0.0083) model time 0.2576 (0.2543) loss 5.4754 (5.9038) grad_norm 1.5397 (2.5184) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][70/625] eta 0:02:29 lr 0.000520 wd 0.0500 time 0.2540 (0.2697) data time 0.0006 (0.0072) model time 0.2534 (0.2729) loss 4.7177 (5.9013) grad_norm 2.5014 (2.5355) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][80/625] eta 0:02:26 lr 0.000520 wd 0.0500 time 0.2567 (0.2679) data time 0.0012 (0.0064) model time 0.2555 (0.2666) loss 6.2787 (5.8729) grad_norm 2.9055 (2.5360) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][90/625] eta 0:02:22 lr 0.000520 wd 0.0500 time 0.2563 (0.2666) data time 0.0006 (0.0058) model time 0.2556 (0.2637) loss 5.5390 (5.8678) grad_norm 1.5387 (2.4613) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][100/625] eta 0:02:19 lr 0.000520 wd 0.0500 time 0.2574 (0.2655) data time 0.0008 (0.0054) model time 0.2566 (0.2618) loss 5.3516 (5.8601) grad_norm 1.7501 (2.4353) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][110/625] eta 0:02:18 lr 0.000520 wd 0.0500 time 0.4696 (0.2683) data time 0.0007 (0.0050) model time 0.4689 (0.2674) loss 5.7090 (5.8591) grad_norm 1.9022 (2.4691) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][120/625] eta 0:02:14 lr 0.000520 wd 0.0500 time 0.2543 (0.2673) data time 0.0011 (0.0046) model time 0.2531 (0.2657) loss 5.6528 (5.8446) grad_norm 2.8806 (2.4931) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][130/625] eta 0:02:12 lr 0.000519 wd 0.0500 time 0.2550 (0.2679) data time 0.0006 (0.0044) model time 0.2543 (0.2667) loss 6.6619 (5.8395) grad_norm 1.5959 (2.4730) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][140/625] eta 0:02:09 lr 0.000519 wd 0.0500 time 0.2603 (0.2671) data time 0.0006 (0.0041) model time 0.2596 (0.2656) loss 6.1154 (5.8394) grad_norm 3.6103 (2.4826) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][150/625] eta 0:02:06 lr 0.000519 wd 0.0500 time 0.2555 (0.2663) data time 0.0008 (0.0039) model time 0.2547 (0.2644) loss 4.8832 (5.8436) grad_norm 4.8303 (2.5347) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][160/625] eta 0:02:03 lr 0.000519 wd 0.0500 time 0.2579 (0.2657) data time 0.0008 (0.0037) model time 0.2571 (0.2637) loss 6.2493 (5.8454) grad_norm 3.0091 (2.5175) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][170/625] eta 0:02:00 lr 0.000519 wd 0.0500 time 0.2517 (0.2652) data time 0.0012 (0.0036) model time 0.2505 (0.2629) loss 6.9150 (5.8381) grad_norm 2.1385 (2.5309) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][180/625] eta 0:01:58 lr 0.000519 wd 0.0500 time 0.2581 (0.2654) data time 0.0006 (0.0034) model time 0.2575 (0.2635) loss 5.4619 (5.8485) grad_norm 1.3510 (2.5094) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][190/625] eta 0:01:55 lr 0.000518 wd 0.0500 time 0.2576 (0.2658) data time 0.0009 (0.0033) model time 0.2567 (0.2640) loss 4.6729 (5.8500) grad_norm 1.9281 (2.4730) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][200/625] eta 0:01:52 lr 0.000518 wd 0.0500 time 0.2547 (0.2653) data time 0.0009 (0.0032) model time 0.2538 (0.2634) loss 6.1880 (5.8501) grad_norm 2.5394 (2.4828) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][210/625] eta 0:01:49 lr 0.000518 wd 0.0500 time 0.2548 (0.2648) data time 0.0008 (0.0030) model time 0.2540 (0.2628) loss 6.2145 (5.8487) grad_norm 2.1842 (2.4555) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][220/625] eta 0:01:47 lr 0.000518 wd 0.0500 time 0.2539 (0.2644) data time 0.0019 (0.0030) model time 0.2520 (0.2623) loss 5.9818 (5.8397) grad_norm 2.1388 (2.4498) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][230/625] eta 0:01:44 lr 0.000518 wd 0.0500 time 0.2586 (0.2640) data time 0.0006 (0.0029) model time 0.2579 (0.2619) loss 6.5796 (5.8670) grad_norm 4.0025 (2.5180) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][240/625] eta 0:01:41 lr 0.000518 wd 0.0500 time 0.2573 (0.2637) data time 0.0009 (0.0028) model time 0.2564 (0.2616) loss 5.3509 (5.8566) grad_norm 3.2254 (2.5448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][250/625] eta 0:01:38 lr 0.000518 wd 0.0500 time 0.2542 (0.2634) data time 0.0007 (0.0027) model time 0.2535 (0.2613) loss 6.6764 (5.8574) grad_norm 1.9280 (2.5313) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][260/625] eta 0:01:36 lr 0.000517 wd 0.0500 time 0.2536 (0.2631) data time 0.0009 (0.0026) model time 0.2527 (0.2609) loss 5.2639 (5.8463) grad_norm 2.9734 (2.5527) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][270/625] eta 0:01:33 lr 0.000517 wd 0.0500 time 0.2647 (0.2629) data time 0.0008 (0.0026) model time 0.2639 (0.2607) loss 5.0237 (5.8402) grad_norm 3.4485 (2.5596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][280/625] eta 0:01:30 lr 0.000517 wd 0.0500 time 0.2577 (0.2632) data time 0.0010 (0.0025) model time 0.2566 (0.2612) loss 5.8302 (5.8373) grad_norm 1.8773 (2.5577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][290/625] eta 0:01:28 lr 0.000517 wd 0.0500 time 0.2539 (0.2630) data time 0.0007 (0.0025) model time 0.2532 (0.2610) loss 4.9614 (5.8372) grad_norm 1.6062 (2.5472) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][300/625] eta 0:01:25 lr 0.000517 wd 0.0500 time 0.2573 (0.2641) data time 0.0006 (0.0024) model time 0.2568 (0.2623) loss 5.7077 (5.8432) grad_norm 2.0365 (2.5417) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][310/625] eta 0:01:23 lr 0.000517 wd 0.0500 time 0.2545 (0.2638) data time 0.0010 (0.0024) model time 0.2535 (0.2620) loss 6.5665 (5.8394) grad_norm 2.0039 (2.5260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][320/625] eta 0:01:20 lr 0.000516 wd 0.0500 time 0.2577 (0.2635) data time 0.0008 (0.0023) model time 0.2568 (0.2617) loss 6.0241 (5.8437) grad_norm 2.9324 (2.5087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][330/625] eta 0:01:17 lr 0.000516 wd 0.0500 time 0.2564 (0.2633) data time 0.0007 (0.0023) model time 0.2557 (0.2615) loss 6.7763 (5.8528) grad_norm 1.8997 (2.5240) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][340/625] eta 0:01:14 lr 0.000516 wd 0.0500 time 0.2531 (0.2630) data time 0.0007 (0.0022) model time 0.2524 (0.2612) loss 5.3100 (5.8408) grad_norm 1.7805 (2.5397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][350/625] eta 0:01:12 lr 0.000516 wd 0.0500 time 0.2502 (0.2633) data time 0.0007 (0.0022) model time 0.2496 (0.2616) loss 5.8759 (5.8446) grad_norm 1.9077 (2.5380) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][360/625] eta 0:01:09 lr 0.000516 wd 0.0500 time 0.2607 (0.2631) data time 0.0007 (0.0022) model time 0.2600 (0.2614) loss 6.2278 (5.8507) grad_norm 1.5949 (2.5138) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][370/625] eta 0:01:07 lr 0.000516 wd 0.0500 time 0.2591 (0.2635) data time 0.0007 (0.0021) model time 0.2584 (0.2618) loss 4.8028 (5.8489) grad_norm 1.5630 (2.4970) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][380/625] eta 0:01:04 lr 0.000516 wd 0.0500 time 0.2662 (0.2636) data time 0.0007 (0.0021) model time 0.2655 (0.2620) loss 5.3914 (5.8503) grad_norm 2.8345 (2.5010) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][390/625] eta 0:01:01 lr 0.000515 wd 0.0500 time 0.2563 (0.2634) data time 0.0006 (0.0021) model time 0.2557 (0.2618) loss 5.9774 (5.8519) grad_norm 4.6252 (2.5230) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][400/625] eta 0:00:59 lr 0.000515 wd 0.0500 time 0.2555 (0.2633) data time 0.0009 (0.0020) model time 0.2546 (0.2616) loss 6.3303 (5.8586) grad_norm 1.7134 (2.5256) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][410/625] eta 0:00:56 lr 0.000515 wd 0.0500 time 0.2575 (0.2631) data time 0.0008 (0.0020) model time 0.2567 (0.2614) loss 6.1974 (5.8650) grad_norm 2.2278 (2.5175) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][420/625] eta 0:00:53 lr 0.000515 wd 0.0500 time 0.2570 (0.2629) data time 0.0009 (0.0020) model time 0.2562 (0.2612) loss 6.1918 (5.8662) grad_norm 1.3348 (2.5064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][430/625] eta 0:00:51 lr 0.000515 wd 0.0500 time 0.2572 (0.2627) data time 0.0008 (0.0020) model time 0.2564 (0.2611) loss 5.0715 (5.8627) grad_norm 3.3650 (2.5100) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][440/625] eta 0:00:48 lr 0.000515 wd 0.0500 time 0.2553 (0.2625) data time 0.0009 (0.0019) model time 0.2545 (0.2609) loss 6.6532 (5.8594) grad_norm 3.0849 (2.5327) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][450/625] eta 0:00:45 lr 0.000514 wd 0.0500 time 0.3490 (0.2626) data time 0.0011 (0.0019) model time 0.3479 (0.2610) loss 5.7823 (5.8627) grad_norm 1.4885 (2.5258) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][460/625] eta 0:00:43 lr 0.000514 wd 0.0500 time 0.2610 (0.2625) data time 0.0008 (0.0019) model time 0.2602 (0.2608) loss 5.8898 (5.8602) grad_norm 1.9837 (2.5203) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][470/625] eta 0:00:40 lr 0.000514 wd 0.0500 time 0.2536 (0.2623) data time 0.0010 (0.0019) model time 0.2526 (0.2607) loss 4.6927 (5.8567) grad_norm 1.8188 (2.5386) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][480/625] eta 0:00:38 lr 0.000514 wd 0.0500 time 0.2582 (0.2622) data time 0.0006 (0.0019) model time 0.2575 (0.2605) loss 6.2317 (5.8548) grad_norm 1.4088 (2.5364) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][490/625] eta 0:00:35 lr 0.000514 wd 0.0500 time 0.2539 (0.2620) data time 0.0011 (0.0018) model time 0.2529 (0.2604) loss 6.5562 (5.8598) grad_norm 2.7024 (2.5287) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][500/625] eta 0:00:32 lr 0.000514 wd 0.0500 time 0.2585 (0.2619) data time 0.0006 (0.0018) model time 0.2579 (0.2603) loss 6.3688 (5.8588) grad_norm 1.5296 (2.5240) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][510/625] eta 0:00:30 lr 0.000514 wd 0.0500 time 0.2564 (0.2618) data time 0.0007 (0.0018) model time 0.2557 (0.2602) loss 4.7210 (5.8579) grad_norm 2.2802 (2.5257) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][520/625] eta 0:00:27 lr 0.000513 wd 0.0500 time 0.2595 (0.2617) data time 0.0009 (0.0018) model time 0.2586 (0.2601) loss 6.0617 (5.8532) grad_norm 3.2514 (2.5241) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][530/625] eta 0:00:24 lr 0.000513 wd 0.0500 time 0.2532 (0.2616) data time 0.0008 (0.0018) model time 0.2524 (0.2600) loss 6.2194 (5.8456) grad_norm 2.6988 (2.5188) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][540/625] eta 0:00:22 lr 0.000513 wd 0.0500 time 0.2523 (0.2615) data time 0.0008 (0.0018) model time 0.2514 (0.2599) loss 5.1476 (5.8454) grad_norm 2.3578 (2.5119) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][550/625] eta 0:00:19 lr 0.000513 wd 0.0500 time 0.2550 (0.2616) data time 0.0010 (0.0017) model time 0.2540 (0.2600) loss 6.3636 (5.8433) grad_norm 3.5533 (2.5071) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][560/625] eta 0:00:16 lr 0.000513 wd 0.0500 time 0.2566 (0.2615) data time 0.0009 (0.0017) model time 0.2557 (0.2599) loss 6.8473 (5.8417) grad_norm 2.0309 (2.5032) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][570/625] eta 0:00:14 lr 0.000513 wd 0.0500 time 0.2498 (0.2614) data time 0.0008 (0.0017) model time 0.2491 (0.2598) loss 5.4118 (5.8443) grad_norm 6.7005 (2.5053) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][580/625] eta 0:00:11 lr 0.000512 wd 0.0500 time 0.2510 (0.2613) data time 0.0008 (0.0017) model time 0.2502 (0.2597) loss 6.0020 (5.8444) grad_norm 1.8571 (2.5066) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][590/625] eta 0:00:09 lr 0.000512 wd 0.0500 time 0.2542 (0.2612) data time 0.0007 (0.0017) model time 0.2534 (0.2596) loss 6.4336 (5.8462) grad_norm 1.8493 (2.5063) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][600/625] eta 0:00:06 lr 0.000512 wd 0.0500 time 0.2576 (0.2614) data time 0.0008 (0.0017) model time 0.2568 (0.2599) loss 6.3351 (5.8451) grad_norm 3.2798 (2.5066) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][610/625] eta 0:00:03 lr 0.000512 wd 0.0500 time 0.2552 (0.2616) data time 0.0004 (0.0017) model time 0.2549 (0.2601) loss 6.5451 (5.8499) grad_norm 1.7583 (2.4970) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][620/625] eta 0:00:01 lr 0.000512 wd 0.0500 time 0.2541 (0.2618) data time 0.0005 (0.0016) model time 0.2536 (0.2603) loss 5.0029 (5.8493) grad_norm 1.7690 (2.4979) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 206 training takes 0:02:43 [2024-08-04 07:16:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:16:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.557 (0.557) Loss 0.6040 (0.6040) Acc@1 90.039 (90.039) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 07:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9702 (0.7436) Acc@1 79.688 (86.040) Acc@5 95.654 (97.496) Mem 9655MB [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0645 (0.8791) Acc@1 76.270 (82.524) Acc@5 95.068 (96.126) Mem 9655MB [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.208 Acc@5 96.143 [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.21% [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:16:40 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:16:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.538 (0.538) Loss 0.5815 (0.5815) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.103) Loss 0.9180 (0.7155) Acc@1 80.859 (86.257) Acc@5 96.094 (97.638) Mem 9655MB [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0391 (0.8406) Acc@1 76.172 (82.864) Acc@5 94.922 (96.317) Mem 9655MB [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.574 Acc@5 96.307 [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.57% [2024-08-04 07:16:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:16:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][0/625] eta 0:07:22 lr 0.000512 wd 0.0500 time 0.7072 (0.7072) data time 0.4677 (0.4677) model time 0.0000 (0.0000) loss 5.6281 (5.6281) grad_norm 1.6855 (1.6855) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][10/625] eta 0:03:03 lr 0.000512 wd 0.0500 time 0.2548 (0.2976) data time 0.0010 (0.0435) model time 0.0000 (0.0000) loss 5.3343 (5.7332) grad_norm 2.8094 (1.9981) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][20/625] eta 0:02:48 lr 0.000511 wd 0.0500 time 0.2536 (0.2782) data time 0.0010 (0.0232) model time 0.0000 (0.0000) loss 5.1963 (5.8150) grad_norm 2.1846 (2.4242) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][30/625] eta 0:02:41 lr 0.000511 wd 0.0500 time 0.2528 (0.2707) data time 0.0010 (0.0161) model time 0.0000 (0.0000) loss 6.6107 (5.7827) grad_norm 2.3281 (2.7260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][40/625] eta 0:02:36 lr 0.000511 wd 0.0500 time 0.2592 (0.2669) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 4.7803 (5.7428) grad_norm 2.9784 (2.6120) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][50/625] eta 0:02:32 lr 0.000511 wd 0.0500 time 0.2609 (0.2648) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 5.0965 (5.7949) grad_norm 3.0194 (2.4917) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][60/625] eta 0:02:28 lr 0.000511 wd 0.0500 time 0.2555 (0.2635) data time 0.0009 (0.0086) model time 0.2546 (0.2559) loss 5.8298 (5.8028) grad_norm 2.0781 (2.4334) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][70/625] eta 0:02:25 lr 0.000511 wd 0.0500 time 0.2529 (0.2624) data time 0.0008 (0.0076) model time 0.2521 (0.2555) loss 6.2180 (5.8161) grad_norm 1.3972 (2.3663) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][80/625] eta 0:02:22 lr 0.000511 wd 0.0500 time 0.2557 (0.2617) data time 0.0007 (0.0068) model time 0.2550 (0.2553) loss 5.0867 (5.7975) grad_norm 2.4701 (2.3247) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][90/625] eta 0:02:20 lr 0.000510 wd 0.0500 time 0.2516 (0.2629) data time 0.0007 (0.0061) model time 0.2509 (0.2596) loss 5.9945 (5.8246) grad_norm 3.1508 (2.3146) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][100/625] eta 0:02:17 lr 0.000510 wd 0.0500 time 0.2559 (0.2622) data time 0.0006 (0.0056) model time 0.2553 (0.2586) loss 5.0681 (5.7979) grad_norm 2.9533 (2.3158) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][110/625] eta 0:02:14 lr 0.000510 wd 0.0500 time 0.2536 (0.2616) data time 0.0007 (0.0052) model time 0.2530 (0.2579) loss 4.8515 (5.7971) grad_norm 2.4939 (2.3475) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][120/625] eta 0:02:11 lr 0.000510 wd 0.0500 time 0.2562 (0.2610) data time 0.0011 (0.0048) model time 0.2551 (0.2573) loss 5.9854 (5.7883) grad_norm 2.6290 (2.3587) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][130/625] eta 0:02:09 lr 0.000510 wd 0.0500 time 0.2550 (0.2622) data time 0.0009 (0.0045) model time 0.2542 (0.2596) loss 6.5791 (5.8012) grad_norm 4.0240 (2.4614) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][140/625] eta 0:02:06 lr 0.000510 wd 0.0500 time 0.2594 (0.2618) data time 0.0006 (0.0043) model time 0.2589 (0.2592) loss 5.8944 (5.7939) grad_norm 1.6411 (2.4397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][150/625] eta 0:02:04 lr 0.000509 wd 0.0500 time 0.2563 (0.2627) data time 0.0006 (0.0040) model time 0.2557 (0.2608) loss 4.5918 (5.7750) grad_norm 4.0863 (2.4184) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][160/625] eta 0:02:02 lr 0.000509 wd 0.0500 time 0.2548 (0.2636) data time 0.0007 (0.0038) model time 0.2541 (0.2621) loss 4.9729 (5.7533) grad_norm 1.4381 (2.3999) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][170/625] eta 0:01:59 lr 0.000509 wd 0.0500 time 0.2550 (0.2630) data time 0.0008 (0.0037) model time 0.2541 (0.2614) loss 6.8640 (5.7741) grad_norm 2.1216 (2.4337) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][180/625] eta 0:01:57 lr 0.000509 wd 0.0500 time 0.2541 (0.2637) data time 0.0009 (0.0035) model time 0.2532 (0.2624) loss 6.3320 (5.7788) grad_norm 2.5182 (2.4207) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][190/625] eta 0:01:54 lr 0.000509 wd 0.0500 time 0.2537 (0.2641) data time 0.0007 (0.0034) model time 0.2530 (0.2629) loss 5.9737 (5.7885) grad_norm 2.6012 (2.4290) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][200/625] eta 0:01:52 lr 0.000509 wd 0.0500 time 0.2541 (0.2636) data time 0.0007 (0.0032) model time 0.2535 (0.2623) loss 6.1378 (5.8073) grad_norm 6.8807 (2.6122) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][210/625] eta 0:01:49 lr 0.000509 wd 0.0500 time 0.2525 (0.2632) data time 0.0009 (0.0031) model time 0.2516 (0.2618) loss 5.0787 (5.8115) grad_norm 1.7892 (2.5965) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][220/625] eta 0:01:46 lr 0.000508 wd 0.0500 time 0.2607 (0.2629) data time 0.0006 (0.0030) model time 0.2601 (0.2614) loss 6.3548 (5.8046) grad_norm 1.9145 (2.5747) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][230/625] eta 0:01:43 lr 0.000508 wd 0.0500 time 0.2574 (0.2626) data time 0.0007 (0.0029) model time 0.2568 (0.2610) loss 5.9707 (5.8132) grad_norm 1.8257 (2.5530) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][240/625] eta 0:01:41 lr 0.000508 wd 0.0500 time 0.2544 (0.2632) data time 0.0009 (0.0029) model time 0.2535 (0.2618) loss 5.2779 (5.8286) grad_norm 3.1833 (2.5303) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][250/625] eta 0:01:38 lr 0.000508 wd 0.0500 time 0.2538 (0.2630) data time 0.0013 (0.0028) model time 0.2525 (0.2616) loss 6.3078 (5.8315) grad_norm 1.4612 (2.4988) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][260/625] eta 0:01:35 lr 0.000508 wd 0.0500 time 0.2582 (0.2627) data time 0.0007 (0.0027) model time 0.2575 (0.2613) loss 4.8503 (5.8281) grad_norm 1.5282 (2.4787) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][270/625] eta 0:01:33 lr 0.000508 wd 0.0500 time 0.2569 (0.2625) data time 0.0007 (0.0027) model time 0.2562 (0.2610) loss 5.4522 (5.8337) grad_norm 4.6298 (2.4910) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][280/625] eta 0:01:30 lr 0.000508 wd 0.0500 time 0.2553 (0.2623) data time 0.0010 (0.0026) model time 0.2543 (0.2608) loss 5.6856 (5.8246) grad_norm 1.8347 (2.5013) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][290/625] eta 0:01:28 lr 0.000507 wd 0.0500 time 0.2522 (0.2627) data time 0.0010 (0.0025) model time 0.2511 (0.2614) loss 6.3713 (5.8201) grad_norm 3.4709 (2.5068) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][300/625] eta 0:01:25 lr 0.000507 wd 0.0500 time 0.2539 (0.2625) data time 0.0008 (0.0025) model time 0.2531 (0.2611) loss 5.6188 (5.8288) grad_norm 5.3422 (2.5206) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][310/625] eta 0:01:22 lr 0.000507 wd 0.0500 time 0.2582 (0.2623) data time 0.0006 (0.0024) model time 0.2577 (0.2609) loss 6.8868 (5.8221) grad_norm 3.6433 (2.5545) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][320/625] eta 0:01:19 lr 0.000507 wd 0.0500 time 0.2537 (0.2621) data time 0.0007 (0.0024) model time 0.2530 (0.2606) loss 4.7858 (5.8188) grad_norm 2.6985 (2.5554) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][330/625] eta 0:01:17 lr 0.000507 wd 0.0500 time 0.2530 (0.2618) data time 0.0008 (0.0023) model time 0.2521 (0.2604) loss 6.5370 (5.8172) grad_norm 2.7613 (2.5468) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][340/625] eta 0:01:14 lr 0.000507 wd 0.0500 time 0.2553 (0.2617) data time 0.0009 (0.0023) model time 0.2545 (0.2602) loss 4.9418 (5.8162) grad_norm 2.0817 (2.5286) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][350/625] eta 0:01:12 lr 0.000506 wd 0.0500 time 0.2554 (0.2618) data time 0.0008 (0.0023) model time 0.2546 (0.2604) loss 6.7128 (5.8276) grad_norm 2.4808 (2.5280) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][360/625] eta 0:01:09 lr 0.000506 wd 0.0500 time 0.2546 (0.2620) data time 0.0007 (0.0022) model time 0.2538 (0.2607) loss 5.0907 (5.8306) grad_norm 1.8082 (2.5122) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][370/625] eta 0:01:06 lr 0.000506 wd 0.0500 time 0.2570 (0.2619) data time 0.0006 (0.0022) model time 0.2564 (0.2605) loss 6.4434 (5.8398) grad_norm 1.7928 (2.5064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][380/625] eta 0:01:04 lr 0.000506 wd 0.0500 time 0.2598 (0.2617) data time 0.0008 (0.0022) model time 0.2590 (0.2603) loss 5.3672 (5.8370) grad_norm 2.2449 (2.5096) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][390/625] eta 0:01:01 lr 0.000506 wd 0.0500 time 0.2566 (0.2620) data time 0.0007 (0.0021) model time 0.2559 (0.2607) loss 4.9286 (5.8281) grad_norm 2.3227 (2.5055) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][400/625] eta 0:00:58 lr 0.000506 wd 0.0500 time 0.2522 (0.2621) data time 0.0008 (0.0021) model time 0.2514 (0.2608) loss 7.0030 (5.8268) grad_norm 1.2691 (2.4994) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][410/625] eta 0:00:56 lr 0.000506 wd 0.0500 time 0.2624 (0.2620) data time 0.0008 (0.0021) model time 0.2616 (0.2607) loss 6.1000 (5.8288) grad_norm 1.5943 (2.4966) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][420/625] eta 0:00:53 lr 0.000505 wd 0.0500 time 0.2561 (0.2619) data time 0.0007 (0.0020) model time 0.2554 (0.2605) loss 5.9650 (5.8373) grad_norm 1.4670 (2.4823) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][430/625] eta 0:00:51 lr 0.000505 wd 0.0500 time 0.2555 (0.2617) data time 0.0009 (0.0020) model time 0.2546 (0.2604) loss 6.3263 (5.8411) grad_norm 1.9558 (2.4695) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][440/625] eta 0:00:48 lr 0.000505 wd 0.0500 time 0.2588 (0.2625) data time 0.0007 (0.0020) model time 0.2581 (0.2613) loss 5.0159 (5.8360) grad_norm 7.4729 (2.4730) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][450/625] eta 0:00:45 lr 0.000505 wd 0.0500 time 0.2595 (0.2623) data time 0.0006 (0.0020) model time 0.2589 (0.2611) loss 6.8382 (5.8447) grad_norm 2.0447 (2.4731) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][460/625] eta 0:00:43 lr 0.000505 wd 0.0500 time 0.2577 (0.2622) data time 0.0008 (0.0019) model time 0.2570 (0.2610) loss 5.6332 (5.8431) grad_norm 1.6630 (2.4876) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][470/625] eta 0:00:40 lr 0.000505 wd 0.0500 time 0.2560 (0.2620) data time 0.0008 (0.0019) model time 0.2552 (0.2608) loss 4.9885 (5.8316) grad_norm 2.3465 (2.4838) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][480/625] eta 0:00:37 lr 0.000504 wd 0.0500 time 0.2534 (0.2619) data time 0.0010 (0.0019) model time 0.2524 (0.2606) loss 6.4123 (5.8298) grad_norm 1.9759 (2.4793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][490/625] eta 0:00:35 lr 0.000504 wd 0.0500 time 0.2534 (0.2618) data time 0.0009 (0.0019) model time 0.2524 (0.2605) loss 5.9531 (5.8321) grad_norm 1.3844 (2.4784) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][500/625] eta 0:00:32 lr 0.000504 wd 0.0500 time 0.2510 (0.2617) data time 0.0008 (0.0019) model time 0.2502 (0.2604) loss 5.4698 (5.8293) grad_norm 3.7480 (2.4816) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][510/625] eta 0:00:30 lr 0.000504 wd 0.0500 time 0.2576 (0.2616) data time 0.0007 (0.0018) model time 0.2569 (0.2603) loss 6.1025 (5.8283) grad_norm 2.2869 (2.4865) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:18:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][520/625] eta 0:00:27 lr 0.000504 wd 0.0500 time 0.2575 (0.2614) data time 0.0007 (0.0018) model time 0.2569 (0.2602) loss 5.9022 (5.8284) grad_norm 1.5993 (2.4804) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][530/625] eta 0:00:24 lr 0.000504 wd 0.0500 time 0.2556 (0.2613) data time 0.0009 (0.0018) model time 0.2548 (0.2600) loss 5.1317 (5.8267) grad_norm 1.8807 (2.4736) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][540/625] eta 0:00:22 lr 0.000504 wd 0.0500 time 0.2543 (0.2612) data time 0.0011 (0.0018) model time 0.2532 (0.2599) loss 5.3195 (5.8260) grad_norm 2.3093 (2.4651) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][550/625] eta 0:00:19 lr 0.000503 wd 0.0500 time 0.2544 (0.2615) data time 0.0008 (0.0018) model time 0.2537 (0.2602) loss 5.8442 (5.8290) grad_norm 3.9284 (2.4708) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][560/625] eta 0:00:17 lr 0.000503 wd 0.0500 time 0.2545 (0.2617) data time 0.0008 (0.0018) model time 0.2537 (0.2605) loss 5.4389 (5.8247) grad_norm 2.1113 (2.4949) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][570/625] eta 0:00:14 lr 0.000503 wd 0.0500 time 0.2583 (0.2616) data time 0.0008 (0.0018) model time 0.2575 (0.2604) loss 6.9388 (5.8313) grad_norm 2.6517 (2.4900) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][580/625] eta 0:00:11 lr 0.000503 wd 0.0500 time 0.2547 (0.2618) data time 0.0008 (0.0017) model time 0.2539 (0.2605) loss 5.9154 (5.8307) grad_norm 3.4399 (2.4991) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][590/625] eta 0:00:09 lr 0.000503 wd 0.0500 time 0.2557 (0.2616) data time 0.0010 (0.0017) model time 0.2547 (0.2604) loss 6.3635 (5.8355) grad_norm 2.1299 (2.4965) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][600/625] eta 0:00:06 lr 0.000503 wd 0.0500 time 0.2552 (0.2616) data time 0.0016 (0.0017) model time 0.2536 (0.2604) loss 4.6144 (5.8294) grad_norm 1.6936 (2.4874) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][610/625] eta 0:00:03 lr 0.000502 wd 0.0500 time 0.2523 (0.2618) data time 0.0006 (0.0017) model time 0.2517 (0.2606) loss 4.9811 (5.8254) grad_norm 1.4461 (2.5004) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][620/625] eta 0:00:01 lr 0.000502 wd 0.0500 time 0.2544 (0.2617) data time 0.0005 (0.0017) model time 0.2539 (0.2604) loss 5.2283 (5.8254) grad_norm 1.3272 (2.4962) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 207 training takes 0:02:43 [2024-08-04 07:19:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:19:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.551 (0.551) Loss 0.6377 (0.6377) Acc@1 88.916 (88.916) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9590 (0.7662) Acc@1 80.420 (85.924) Acc@5 95.801 (97.528) Mem 9655MB [2024-08-04 07:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.1084 (0.8947) Acc@1 75.781 (82.447) Acc@5 94.336 (96.166) Mem 9655MB [2024-08-04 07:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.132 Acc@5 96.151 [2024-08-04 07:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 07:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.929 (0.929) Loss 0.5815 (0.5815) Acc@1 89.697 (89.697) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:19:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.139) Loss 0.9185 (0.7151) Acc@1 80.811 (86.257) Acc@5 96.045 (97.638) Mem 9655MB [2024-08-04 07:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.100) Loss 1.0391 (0.8404) Acc@1 76.123 (82.880) Acc@5 94.873 (96.324) Mem 9655MB [2024-08-04 07:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.594 Acc@5 96.317 [2024-08-04 07:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.59% [2024-08-04 07:19:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:19:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:19:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][0/625] eta 0:09:00 lr 0.000502 wd 0.0500 time 0.8643 (0.8643) data time 0.6238 (0.6238) model time 0.0000 (0.0000) loss 4.9968 (4.9968) grad_norm 2.7339 (2.7339) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][10/625] eta 0:03:11 lr 0.000502 wd 0.0500 time 0.2524 (0.3111) data time 0.0009 (0.0576) model time 0.0000 (0.0000) loss 5.0572 (6.0447) grad_norm 1.7334 (2.5312) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][20/625] eta 0:02:52 lr 0.000502 wd 0.0500 time 0.2545 (0.2846) data time 0.0007 (0.0306) model time 0.0000 (0.0000) loss 6.9522 (5.8699) grad_norm 2.1974 (2.3972) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][30/625] eta 0:02:51 lr 0.000502 wd 0.0500 time 0.2538 (0.2881) data time 0.0007 (0.0211) model time 0.0000 (0.0000) loss 4.5819 (5.8080) grad_norm 2.3083 (2.3116) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][40/625] eta 0:02:43 lr 0.000502 wd 0.0500 time 0.2590 (0.2802) data time 0.0008 (0.0162) model time 0.0000 (0.0000) loss 6.3463 (5.8422) grad_norm 2.1539 (2.1930) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][50/625] eta 0:02:40 lr 0.000501 wd 0.0500 time 0.2552 (0.2790) data time 0.0006 (0.0132) model time 0.0000 (0.0000) loss 5.4910 (5.8327) grad_norm 3.0534 (2.2652) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][60/625] eta 0:02:36 lr 0.000501 wd 0.0500 time 0.3900 (0.2775) data time 0.0008 (0.0112) model time 0.3891 (0.2687) loss 6.8053 (5.8793) grad_norm 2.2368 (2.4496) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][70/625] eta 0:02:32 lr 0.000501 wd 0.0500 time 0.2560 (0.2744) data time 0.0007 (0.0097) model time 0.2553 (0.2616) loss 6.7798 (5.8507) grad_norm 2.4466 (2.4366) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][80/625] eta 0:02:28 lr 0.000501 wd 0.0500 time 0.2537 (0.2720) data time 0.0010 (0.0087) model time 0.2528 (0.2592) loss 4.9324 (5.8400) grad_norm 1.6521 (2.3910) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][90/625] eta 0:02:24 lr 0.000501 wd 0.0500 time 0.2571 (0.2704) data time 0.0007 (0.0078) model time 0.2564 (0.2583) loss 5.9000 (5.8244) grad_norm 1.5030 (2.3128) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:19:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][100/625] eta 0:02:21 lr 0.000501 wd 0.0500 time 0.2527 (0.2689) data time 0.0010 (0.0072) model time 0.2517 (0.2576) loss 6.0289 (5.8218) grad_norm 1.9622 (2.3220) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][110/625] eta 0:02:17 lr 0.000501 wd 0.0500 time 0.2536 (0.2677) data time 0.0008 (0.0066) model time 0.2528 (0.2571) loss 6.4959 (5.8436) grad_norm 1.6228 (2.2939) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][120/625] eta 0:02:14 lr 0.000500 wd 0.0500 time 0.2536 (0.2669) data time 0.0006 (0.0061) model time 0.2530 (0.2570) loss 5.9029 (5.8313) grad_norm 2.0842 (2.2627) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][130/625] eta 0:02:11 lr 0.000500 wd 0.0500 time 0.2528 (0.2659) data time 0.0011 (0.0057) model time 0.2517 (0.2566) loss 6.0293 (5.8531) grad_norm 2.1710 (2.2415) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][140/625] eta 0:02:08 lr 0.000500 wd 0.0500 time 0.2540 (0.2652) data time 0.0008 (0.0054) model time 0.2532 (0.2564) loss 5.8279 (5.8631) grad_norm 2.1307 (2.2232) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][150/625] eta 0:02:05 lr 0.000500 wd 0.0500 time 0.2514 (0.2645) data time 0.0010 (0.0051) model time 0.2504 (0.2562) loss 4.8473 (5.8706) grad_norm 2.4908 (2.2293) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][160/625] eta 0:02:02 lr 0.000500 wd 0.0500 time 0.2588 (0.2640) data time 0.0007 (0.0048) model time 0.2582 (0.2561) loss 6.4513 (5.8651) grad_norm 3.0565 (2.2280) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][170/625] eta 0:02:00 lr 0.000500 wd 0.0500 time 0.2541 (0.2643) data time 0.0010 (0.0046) model time 0.2531 (0.2571) loss 6.2561 (5.8685) grad_norm 2.2858 (2.2649) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][180/625] eta 0:01:57 lr 0.000499 wd 0.0500 time 0.2541 (0.2638) data time 0.0007 (0.0044) model time 0.2533 (0.2569) loss 6.1483 (5.8714) grad_norm 2.4604 (2.2599) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][190/625] eta 0:01:54 lr 0.000499 wd 0.0500 time 0.2547 (0.2634) data time 0.0010 (0.0042) model time 0.2536 (0.2567) loss 6.3130 (5.8733) grad_norm 5.0098 (2.2988) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][200/625] eta 0:01:51 lr 0.000499 wd 0.0500 time 0.2550 (0.2629) data time 0.0009 (0.0041) model time 0.2542 (0.2565) loss 5.5917 (5.8752) grad_norm 3.3808 (2.3340) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][210/625] eta 0:01:48 lr 0.000499 wd 0.0500 time 0.2561 (0.2626) data time 0.0011 (0.0039) model time 0.2550 (0.2564) loss 6.5653 (5.8814) grad_norm 1.7893 (2.3391) loss_scale 1024.0000 (528.9858) mem 9655MB [2024-08-04 07:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][220/625] eta 0:01:46 lr 0.000499 wd 0.0500 time 0.2557 (0.2623) data time 0.0009 (0.0038) model time 0.2548 (0.2563) loss 5.1076 (5.8801) grad_norm 1.5421 (2.3362) loss_scale 1024.0000 (551.3846) mem 9655MB [2024-08-04 07:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][230/625] eta 0:01:43 lr 0.000499 wd 0.0500 time 0.2564 (0.2625) data time 0.0006 (0.0036) model time 0.2558 (0.2569) loss 5.9795 (5.8710) grad_norm 2.0491 (2.3103) loss_scale 1024.0000 (571.8442) mem 9655MB [2024-08-04 07:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][240/625] eta 0:01:40 lr 0.000499 wd 0.0500 time 0.2544 (0.2622) data time 0.0010 (0.0035) model time 0.2534 (0.2568) loss 6.4535 (5.8617) grad_norm 3.5531 (2.3836) loss_scale 1024.0000 (590.6058) mem 9655MB [2024-08-04 07:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][250/625] eta 0:01:38 lr 0.000498 wd 0.0500 time 0.2550 (0.2626) data time 0.0007 (0.0034) model time 0.2543 (0.2575) loss 6.6835 (5.8636) grad_norm 2.5865 (2.3808) loss_scale 1024.0000 (607.8725) mem 9655MB [2024-08-04 07:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][260/625] eta 0:01:35 lr 0.000498 wd 0.0500 time 0.2525 (0.2623) data time 0.0008 (0.0033) model time 0.2517 (0.2573) loss 6.4131 (5.8755) grad_norm 4.7078 (2.3769) loss_scale 1024.0000 (623.8161) mem 9655MB [2024-08-04 07:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][270/625] eta 0:01:33 lr 0.000498 wd 0.0500 time 0.2565 (0.2621) data time 0.0006 (0.0032) model time 0.2558 (0.2572) loss 5.6128 (5.8677) grad_norm 1.2755 (2.3614) loss_scale 1024.0000 (638.5830) mem 9655MB [2024-08-04 07:20:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][280/625] eta 0:01:30 lr 0.000498 wd 0.0500 time 0.2550 (0.2625) data time 0.0008 (0.0032) model time 0.2543 (0.2580) loss 5.7963 (5.8721) grad_norm 1.5040 (2.3491) loss_scale 1024.0000 (652.2989) mem 9655MB [2024-08-04 07:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][290/625] eta 0:01:27 lr 0.000498 wd 0.0500 time 0.2581 (0.2623) data time 0.0008 (0.0031) model time 0.2573 (0.2578) loss 6.1839 (5.8635) grad_norm 2.3325 (2.3538) loss_scale 1024.0000 (665.0722) mem 9655MB [2024-08-04 07:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][300/625] eta 0:01:25 lr 0.000498 wd 0.0500 time 0.2583 (0.2626) data time 0.0009 (0.0030) model time 0.2575 (0.2583) loss 5.3410 (5.8530) grad_norm 2.4697 (2.3469) loss_scale 1024.0000 (676.9967) mem 9655MB [2024-08-04 07:20:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][310/625] eta 0:01:22 lr 0.000498 wd 0.0500 time 0.2502 (0.2624) data time 0.0011 (0.0029) model time 0.2491 (0.2582) loss 6.5983 (5.8596) grad_norm 2.0770 (2.3493) loss_scale 1024.0000 (688.1543) mem 9655MB [2024-08-04 07:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][320/625] eta 0:01:19 lr 0.000497 wd 0.0500 time 0.2607 (0.2622) data time 0.0007 (0.0029) model time 0.2600 (0.2581) loss 7.1705 (5.8689) grad_norm 2.3619 (2.3487) loss_scale 1024.0000 (698.6168) mem 9655MB [2024-08-04 07:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][330/625] eta 0:01:17 lr 0.000497 wd 0.0500 time 0.2537 (0.2626) data time 0.0010 (0.0028) model time 0.2526 (0.2586) loss 6.1989 (5.8682) grad_norm 1.6623 (2.3338) loss_scale 1024.0000 (708.4471) mem 9655MB [2024-08-04 07:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][340/625] eta 0:01:14 lr 0.000497 wd 0.0500 time 0.2557 (0.2628) data time 0.0020 (0.0028) model time 0.2537 (0.2590) loss 5.8014 (5.8629) grad_norm 1.8476 (2.3248) loss_scale 1024.0000 (717.7009) mem 9655MB [2024-08-04 07:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][350/625] eta 0:01:12 lr 0.000497 wd 0.0500 time 0.2540 (0.2626) data time 0.0010 (0.0027) model time 0.2531 (0.2589) loss 6.1763 (5.8646) grad_norm 3.4382 (2.3241) loss_scale 1024.0000 (726.4274) mem 9655MB [2024-08-04 07:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][360/625] eta 0:01:09 lr 0.000497 wd 0.0500 time 0.2544 (0.2630) data time 0.0007 (0.0027) model time 0.2538 (0.2594) loss 6.1934 (5.8632) grad_norm 1.4568 (2.3155) loss_scale 1024.0000 (734.6704) mem 9655MB [2024-08-04 07:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][370/625] eta 0:01:07 lr 0.000497 wd 0.0500 time 0.2585 (0.2628) data time 0.0009 (0.0026) model time 0.2576 (0.2593) loss 6.5857 (5.8661) grad_norm 1.7551 (2.3040) loss_scale 1024.0000 (742.4690) mem 9655MB [2024-08-04 07:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][380/625] eta 0:01:04 lr 0.000496 wd 0.0500 time 0.2528 (0.2626) data time 0.0010 (0.0026) model time 0.2518 (0.2591) loss 5.9823 (5.8624) grad_norm 1.6177 (2.3008) loss_scale 1024.0000 (749.8583) mem 9655MB [2024-08-04 07:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][390/625] eta 0:01:01 lr 0.000496 wd 0.0500 time 0.2570 (0.2624) data time 0.0006 (0.0025) model time 0.2564 (0.2590) loss 6.2196 (5.8635) grad_norm 1.8851 (2.2936) loss_scale 1024.0000 (756.8696) mem 9655MB [2024-08-04 07:21:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][400/625] eta 0:00:58 lr 0.000496 wd 0.0500 time 0.2552 (0.2622) data time 0.0007 (0.0025) model time 0.2545 (0.2588) loss 6.4312 (5.8587) grad_norm 1.8426 (2.2827) loss_scale 1024.0000 (763.5312) mem 9655MB [2024-08-04 07:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][410/625] eta 0:00:56 lr 0.000496 wd 0.0500 time 0.2572 (0.2620) data time 0.0012 (0.0025) model time 0.2560 (0.2587) loss 5.8805 (5.8576) grad_norm 2.2178 (2.2808) loss_scale 1024.0000 (769.8686) mem 9655MB [2024-08-04 07:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][420/625] eta 0:00:53 lr 0.000496 wd 0.0500 time 0.2575 (0.2624) data time 0.0012 (0.0024) model time 0.2563 (0.2592) loss 5.6651 (5.8521) grad_norm 2.8356 (2.2911) loss_scale 1024.0000 (775.9050) mem 9655MB [2024-08-04 07:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][430/625] eta 0:00:51 lr 0.000496 wd 0.0500 time 0.2569 (0.2622) data time 0.0006 (0.0024) model time 0.2563 (0.2591) loss 4.2905 (5.8508) grad_norm 4.6409 (2.3165) loss_scale 1024.0000 (781.6613) mem 9655MB [2024-08-04 07:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][440/625] eta 0:00:48 lr 0.000496 wd 0.0500 time 0.2577 (0.2621) data time 0.0010 (0.0024) model time 0.2566 (0.2589) loss 6.5539 (5.8558) grad_norm 2.9638 (2.3381) loss_scale 1024.0000 (787.1565) mem 9655MB [2024-08-04 07:21:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][450/625] eta 0:00:45 lr 0.000495 wd 0.0500 time 0.2549 (0.2619) data time 0.0007 (0.0023) model time 0.2542 (0.2588) loss 5.9334 (5.8559) grad_norm 1.5972 (2.3420) loss_scale 1024.0000 (792.4080) mem 9655MB [2024-08-04 07:21:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][460/625] eta 0:00:43 lr 0.000495 wd 0.0500 time 0.2561 (0.2619) data time 0.0009 (0.0023) model time 0.2552 (0.2588) loss 5.0571 (5.8515) grad_norm 1.9925 (2.3431) loss_scale 1024.0000 (797.4317) mem 9655MB [2024-08-04 07:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][470/625] eta 0:00:40 lr 0.000495 wd 0.0500 time 0.2547 (0.2618) data time 0.0009 (0.0023) model time 0.2538 (0.2587) loss 5.7153 (5.8476) grad_norm 1.6854 (2.3413) loss_scale 1024.0000 (802.2420) mem 9655MB [2024-08-04 07:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][480/625] eta 0:00:37 lr 0.000495 wd 0.0500 time 0.2507 (0.2617) data time 0.0010 (0.0022) model time 0.2498 (0.2587) loss 4.9669 (5.8481) grad_norm 3.0007 (2.3455) loss_scale 1024.0000 (806.8524) mem 9655MB [2024-08-04 07:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][490/625] eta 0:00:35 lr 0.000495 wd 0.0500 time 0.2584 (0.2616) data time 0.0017 (0.0022) model time 0.2567 (0.2586) loss 6.7805 (5.8487) grad_norm 1.5564 (2.3353) loss_scale 1024.0000 (811.2749) mem 9655MB [2024-08-04 07:21:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][500/625] eta 0:00:32 lr 0.000495 wd 0.0500 time 0.2539 (0.2615) data time 0.0010 (0.0022) model time 0.2529 (0.2585) loss 6.1438 (5.8542) grad_norm 2.1483 (2.3266) loss_scale 1024.0000 (815.5210) mem 9655MB [2024-08-04 07:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][510/625] eta 0:00:30 lr 0.000494 wd 0.0500 time 0.2554 (0.2614) data time 0.0008 (0.0022) model time 0.2546 (0.2585) loss 5.0177 (5.8535) grad_norm 1.5091 (2.3132) loss_scale 1024.0000 (819.6008) mem 9655MB [2024-08-04 07:21:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][520/625] eta 0:00:27 lr 0.000494 wd 0.0500 time 0.2568 (0.2613) data time 0.0014 (0.0021) model time 0.2553 (0.2584) loss 6.6462 (5.8511) grad_norm 2.0003 (2.3134) loss_scale 1024.0000 (823.5240) mem 9655MB [2024-08-04 07:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][530/625] eta 0:00:24 lr 0.000494 wd 0.0500 time 0.2600 (0.2612) data time 0.0009 (0.0021) model time 0.2592 (0.2583) loss 6.5941 (5.8546) grad_norm 2.1837 (2.3265) loss_scale 1024.0000 (827.2994) mem 9655MB [2024-08-04 07:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][540/625] eta 0:00:22 lr 0.000494 wd 0.0500 time 0.2591 (0.2614) data time 0.0012 (0.0021) model time 0.2580 (0.2587) loss 6.0690 (5.8615) grad_norm 2.8391 (2.3351) loss_scale 1024.0000 (830.9353) mem 9655MB [2024-08-04 07:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][550/625] eta 0:00:19 lr 0.000494 wd 0.0500 time 0.2625 (0.2618) data time 0.0007 (0.0021) model time 0.2619 (0.2591) loss 6.8566 (5.8623) grad_norm 2.2262 (2.3346) loss_scale 1024.0000 (834.4392) mem 9655MB [2024-08-04 07:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][560/625] eta 0:00:17 lr 0.000494 wd 0.0500 time 0.2512 (0.2617) data time 0.0009 (0.0021) model time 0.2503 (0.2590) loss 6.0661 (5.8642) grad_norm 3.2981 (2.3361) loss_scale 1024.0000 (837.8182) mem 9655MB [2024-08-04 07:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][570/625] eta 0:00:14 lr 0.000494 wd 0.0500 time 0.2545 (0.2616) data time 0.0006 (0.0020) model time 0.2539 (0.2589) loss 6.5933 (5.8636) grad_norm 1.4055 (2.3498) loss_scale 1024.0000 (841.0788) mem 9655MB [2024-08-04 07:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][580/625] eta 0:00:11 lr 0.000493 wd 0.0500 time 0.2562 (0.2620) data time 0.0010 (0.0020) model time 0.2553 (0.2595) loss 5.8848 (5.8682) grad_norm 2.4015 (2.3633) loss_scale 1024.0000 (844.2272) mem 9655MB [2024-08-04 07:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][590/625] eta 0:00:09 lr 0.000493 wd 0.0500 time 0.2518 (0.2619) data time 0.0011 (0.0020) model time 0.2508 (0.2594) loss 5.1847 (5.8616) grad_norm 2.0706 (2.3600) loss_scale 1024.0000 (847.2690) mem 9655MB [2024-08-04 07:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][600/625] eta 0:00:06 lr 0.000493 wd 0.0500 time 0.2541 (0.2622) data time 0.0006 (0.0020) model time 0.2536 (0.2596) loss 5.3791 (5.8596) grad_norm 3.1125 (2.3627) loss_scale 1024.0000 (850.2097) mem 9655MB [2024-08-04 07:22:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][610/625] eta 0:00:03 lr 0.000493 wd 0.0500 time 0.2534 (0.2621) data time 0.0006 (0.0020) model time 0.2528 (0.2596) loss 5.6723 (5.8613) grad_norm 2.8767 (2.3682) loss_scale 1024.0000 (853.0540) mem 9655MB [2024-08-04 07:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][620/625] eta 0:00:01 lr 0.000493 wd 0.0500 time 0.2594 (0.2619) data time 0.0006 (0.0020) model time 0.2588 (0.2595) loss 5.1484 (5.8627) grad_norm 1.7044 (2.3630) loss_scale 1024.0000 (855.8068) mem 9655MB [2024-08-04 07:22:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 208 training takes 0:02:43 [2024-08-04 07:22:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:22:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.426 (0.426) Loss 0.6255 (0.6255) Acc@1 89.746 (89.746) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 07:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.089) Loss 0.9888 (0.7686) Acc@1 79.248 (85.711) Acc@5 95.361 (97.470) Mem 9655MB [2024-08-04 07:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 1.1064 (0.9017) Acc@1 75.977 (82.299) Acc@5 94.482 (96.117) Mem 9655MB [2024-08-04 07:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.062 Acc@5 96.141 [2024-08-04 07:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 07:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.943 (0.943) Loss 0.5806 (0.5806) Acc@1 89.746 (89.746) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.136) Loss 0.9185 (0.7148) Acc@1 80.859 (86.262) Acc@5 95.801 (97.621) Mem 9655MB [2024-08-04 07:22:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0381 (0.8399) Acc@1 76.172 (82.889) Acc@5 94.824 (96.308) Mem 9655MB [2024-08-04 07:22:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.598 Acc@5 96.309 [2024-08-04 07:22:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:22:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.60% [2024-08-04 07:22:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:22:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][0/625] eta 0:07:09 lr 0.000493 wd 0.0500 time 0.6867 (0.6867) data time 0.4375 (0.4375) model time 0.0000 (0.0000) loss 5.1254 (5.1254) grad_norm 2.1196 (2.1196) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][10/625] eta 0:03:00 lr 0.000493 wd 0.0500 time 0.2549 (0.2942) data time 0.0009 (0.0407) model time 0.0000 (0.0000) loss 5.8558 (5.8124) grad_norm 1.6146 (2.6024) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][20/625] eta 0:02:47 lr 0.000492 wd 0.0500 time 0.2560 (0.2762) data time 0.0006 (0.0218) model time 0.0000 (0.0000) loss 5.5571 (5.8208) grad_norm 1.3529 (2.2355) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][30/625] eta 0:02:40 lr 0.000492 wd 0.0500 time 0.2549 (0.2697) data time 0.0008 (0.0150) model time 0.0000 (0.0000) loss 6.3543 (5.8645) grad_norm 2.0765 (2.1225) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][40/625] eta 0:02:35 lr 0.000492 wd 0.0500 time 0.2523 (0.2665) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 6.1905 (5.9522) grad_norm 5.2169 (2.1693) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][50/625] eta 0:02:33 lr 0.000492 wd 0.0500 time 0.3805 (0.2667) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 5.4737 (5.8635) grad_norm 2.2597 (2.2387) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][60/625] eta 0:02:29 lr 0.000492 wd 0.0500 time 0.2682 (0.2650) data time 0.0006 (0.0081) model time 0.2676 (0.2556) loss 5.9729 (5.8780) grad_norm 2.8273 (2.2168) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][70/625] eta 0:02:28 lr 0.000492 wd 0.0500 time 0.2559 (0.2678) data time 0.0010 (0.0071) model time 0.2549 (0.2696) loss 5.8986 (5.8481) grad_norm 1.8288 (2.1776) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][80/625] eta 0:02:25 lr 0.000492 wd 0.0500 time 0.2561 (0.2662) data time 0.0007 (0.0064) model time 0.2554 (0.2645) loss 4.9373 (5.8487) grad_norm 2.8943 (2.2748) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][90/625] eta 0:02:21 lr 0.000491 wd 0.0500 time 0.2547 (0.2652) data time 0.0007 (0.0058) model time 0.2540 (0.2624) loss 6.3639 (5.8478) grad_norm 3.3738 (2.3687) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][100/625] eta 0:02:18 lr 0.000491 wd 0.0500 time 0.2573 (0.2644) data time 0.0008 (0.0053) model time 0.2565 (0.2612) loss 5.2700 (5.8620) grad_norm 3.7278 (2.5354) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][110/625] eta 0:02:16 lr 0.000491 wd 0.0500 time 0.2528 (0.2648) data time 0.0010 (0.0049) model time 0.2518 (0.2623) loss 6.0276 (5.8591) grad_norm 1.7118 (inf) loss_scale 512.0000 (991.7117) mem 9655MB [2024-08-04 07:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][120/625] eta 0:02:13 lr 0.000491 wd 0.0500 time 0.2490 (0.2641) data time 0.0009 (0.0046) model time 0.2482 (0.2613) loss 5.1350 (5.8506) grad_norm 2.0093 (inf) loss_scale 512.0000 (952.0661) mem 9655MB [2024-08-04 07:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][130/625] eta 0:02:11 lr 0.000491 wd 0.0500 time 0.2567 (0.2650) data time 0.0005 (0.0043) model time 0.2562 (0.2630) loss 4.5308 (5.8245) grad_norm 1.9736 (inf) loss_scale 512.0000 (918.4733) mem 9655MB [2024-08-04 07:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][140/625] eta 0:02:08 lr 0.000491 wd 0.0500 time 0.2565 (0.2658) data time 0.0008 (0.0040) model time 0.2557 (0.2643) loss 5.0405 (5.8282) grad_norm 2.2868 (inf) loss_scale 512.0000 (889.6454) mem 9655MB [2024-08-04 07:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][150/625] eta 0:02:05 lr 0.000490 wd 0.0500 time 0.2508 (0.2651) data time 0.0007 (0.0038) model time 0.2501 (0.2633) loss 5.3423 (5.8295) grad_norm 1.5875 (inf) loss_scale 512.0000 (864.6358) mem 9655MB [2024-08-04 07:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][160/625] eta 0:02:03 lr 0.000490 wd 0.0500 time 0.2550 (0.2645) data time 0.0007 (0.0036) model time 0.2543 (0.2626) loss 6.2511 (5.8258) grad_norm 2.2146 (inf) loss_scale 512.0000 (842.7329) mem 9655MB [2024-08-04 07:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][170/625] eta 0:02:00 lr 0.000490 wd 0.0500 time 0.2604 (0.2640) data time 0.0007 (0.0035) model time 0.2596 (0.2620) loss 6.1056 (5.8393) grad_norm 2.8638 (inf) loss_scale 512.0000 (823.3918) mem 9655MB [2024-08-04 07:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][180/625] eta 0:01:57 lr 0.000490 wd 0.0500 time 0.2578 (0.2636) data time 0.0008 (0.0033) model time 0.2570 (0.2615) loss 4.6740 (5.8367) grad_norm 1.6518 (inf) loss_scale 512.0000 (806.1878) mem 9655MB [2024-08-04 07:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][190/625] eta 0:01:54 lr 0.000490 wd 0.0500 time 0.2545 (0.2632) data time 0.0009 (0.0032) model time 0.2536 (0.2610) loss 5.5784 (5.8382) grad_norm 3.3095 (inf) loss_scale 512.0000 (790.7853) mem 9655MB [2024-08-04 07:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][200/625] eta 0:01:51 lr 0.000490 wd 0.0500 time 0.2599 (0.2628) data time 0.0010 (0.0031) model time 0.2589 (0.2606) loss 5.7125 (5.8474) grad_norm 2.4653 (inf) loss_scale 512.0000 (776.9154) mem 9655MB [2024-08-04 07:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][210/625] eta 0:01:48 lr 0.000490 wd 0.0500 time 0.2620 (0.2625) data time 0.0006 (0.0030) model time 0.2613 (0.2602) loss 5.3128 (5.8498) grad_norm 3.7324 (inf) loss_scale 512.0000 (764.3602) mem 9655MB [2024-08-04 07:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][220/625] eta 0:01:46 lr 0.000489 wd 0.0500 time 0.2554 (0.2622) data time 0.0008 (0.0029) model time 0.2547 (0.2599) loss 6.7979 (5.8582) grad_norm 1.6803 (inf) loss_scale 512.0000 (752.9412) mem 9655MB [2024-08-04 07:23:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][230/625] eta 0:01:43 lr 0.000489 wd 0.0500 time 0.2580 (0.2619) data time 0.0012 (0.0028) model time 0.2567 (0.2596) loss 5.8328 (5.8505) grad_norm 1.9353 (inf) loss_scale 512.0000 (742.5108) mem 9655MB [2024-08-04 07:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][240/625] eta 0:01:40 lr 0.000489 wd 0.0500 time 0.2575 (0.2617) data time 0.0006 (0.0027) model time 0.2569 (0.2594) loss 5.2597 (5.8355) grad_norm 2.4397 (inf) loss_scale 512.0000 (732.9461) mem 9655MB [2024-08-04 07:23:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][250/625] eta 0:01:38 lr 0.000489 wd 0.0500 time 0.2581 (0.2614) data time 0.0006 (0.0027) model time 0.2575 (0.2592) loss 5.0565 (5.8239) grad_norm 3.3623 (inf) loss_scale 512.0000 (724.1434) mem 9655MB [2024-08-04 07:23:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][260/625] eta 0:01:35 lr 0.000489 wd 0.0500 time 0.2588 (0.2620) data time 0.0009 (0.0026) model time 0.2579 (0.2600) loss 6.0587 (5.8123) grad_norm 3.7458 (inf) loss_scale 512.0000 (716.0153) mem 9655MB [2024-08-04 07:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][270/625] eta 0:01:32 lr 0.000489 wd 0.0500 time 0.2546 (0.2618) data time 0.0008 (0.0025) model time 0.2539 (0.2598) loss 4.7870 (5.8217) grad_norm 2.6772 (inf) loss_scale 512.0000 (708.4871) mem 9655MB [2024-08-04 07:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][280/625] eta 0:01:30 lr 0.000488 wd 0.0500 time 0.2572 (0.2616) data time 0.0006 (0.0025) model time 0.2566 (0.2596) loss 5.9288 (5.8228) grad_norm 2.1269 (inf) loss_scale 512.0000 (701.4947) mem 9655MB [2024-08-04 07:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][290/625] eta 0:01:27 lr 0.000488 wd 0.0500 time 0.2585 (0.2614) data time 0.0008 (0.0024) model time 0.2577 (0.2594) loss 4.9913 (5.8284) grad_norm 2.2870 (inf) loss_scale 512.0000 (694.9828) mem 9655MB [2024-08-04 07:23:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][300/625] eta 0:01:24 lr 0.000488 wd 0.0500 time 0.2580 (0.2613) data time 0.0009 (0.0024) model time 0.2571 (0.2593) loss 5.0133 (5.8290) grad_norm 1.8756 (inf) loss_scale 512.0000 (688.9037) mem 9655MB [2024-08-04 07:23:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][310/625] eta 0:01:22 lr 0.000488 wd 0.0500 time 0.2565 (0.2611) data time 0.0008 (0.0023) model time 0.2557 (0.2591) loss 5.8296 (5.8227) grad_norm 2.1214 (inf) loss_scale 512.0000 (683.2154) mem 9655MB [2024-08-04 07:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][320/625] eta 0:01:19 lr 0.000488 wd 0.0500 time 0.2551 (0.2610) data time 0.0010 (0.0023) model time 0.2541 (0.2590) loss 6.1843 (5.8285) grad_norm 2.5151 (inf) loss_scale 512.0000 (677.8816) mem 9655MB [2024-08-04 07:23:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][330/625] eta 0:01:16 lr 0.000488 wd 0.0500 time 0.2528 (0.2608) data time 0.0011 (0.0022) model time 0.2517 (0.2588) loss 6.4574 (5.8275) grad_norm 2.1953 (inf) loss_scale 512.0000 (672.8701) mem 9655MB [2024-08-04 07:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][340/625] eta 0:01:14 lr 0.000488 wd 0.0500 time 0.2538 (0.2610) data time 0.0007 (0.0022) model time 0.2531 (0.2591) loss 6.7732 (5.8261) grad_norm 2.6858 (inf) loss_scale 512.0000 (668.1525) mem 9655MB [2024-08-04 07:23:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][350/625] eta 0:01:11 lr 0.000487 wd 0.0500 time 0.2609 (0.2609) data time 0.0007 (0.0022) model time 0.2602 (0.2590) loss 5.0599 (5.8232) grad_norm 1.7142 (inf) loss_scale 512.0000 (663.7037) mem 9655MB [2024-08-04 07:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][360/625] eta 0:01:09 lr 0.000487 wd 0.0500 time 0.2554 (0.2607) data time 0.0007 (0.0021) model time 0.2547 (0.2588) loss 4.4817 (5.8192) grad_norm 2.8445 (inf) loss_scale 512.0000 (659.5014) mem 9655MB [2024-08-04 07:23:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][370/625] eta 0:01:06 lr 0.000487 wd 0.0500 time 0.2540 (0.2611) data time 0.0008 (0.0021) model time 0.2532 (0.2593) loss 7.0040 (5.8236) grad_norm 3.0444 (inf) loss_scale 512.0000 (655.5256) mem 9655MB [2024-08-04 07:24:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][380/625] eta 0:01:03 lr 0.000487 wd 0.0500 time 0.2554 (0.2610) data time 0.0008 (0.0021) model time 0.2546 (0.2592) loss 5.6616 (5.8243) grad_norm 2.3848 (inf) loss_scale 512.0000 (651.7585) mem 9655MB [2024-08-04 07:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][390/625] eta 0:01:01 lr 0.000487 wd 0.0500 time 0.2564 (0.2609) data time 0.0009 (0.0020) model time 0.2555 (0.2591) loss 6.9759 (5.8324) grad_norm 1.6772 (inf) loss_scale 512.0000 (648.1841) mem 9655MB [2024-08-04 07:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][400/625] eta 0:00:58 lr 0.000487 wd 0.0500 time 0.2540 (0.2608) data time 0.0009 (0.0020) model time 0.2531 (0.2590) loss 5.4586 (5.8399) grad_norm 2.1372 (inf) loss_scale 512.0000 (644.7880) mem 9655MB [2024-08-04 07:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][410/625] eta 0:00:56 lr 0.000487 wd 0.0500 time 0.2561 (0.2607) data time 0.0010 (0.0020) model time 0.2551 (0.2589) loss 5.0200 (5.8411) grad_norm 1.8303 (inf) loss_scale 512.0000 (641.5572) mem 9655MB [2024-08-04 07:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][420/625] eta 0:00:53 lr 0.000486 wd 0.0500 time 0.2559 (0.2606) data time 0.0009 (0.0020) model time 0.2550 (0.2589) loss 4.7103 (5.8344) grad_norm 3.1767 (inf) loss_scale 512.0000 (638.4798) mem 9655MB [2024-08-04 07:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][430/625] eta 0:00:50 lr 0.000486 wd 0.0500 time 0.2553 (0.2609) data time 0.0007 (0.0019) model time 0.2546 (0.2593) loss 4.8780 (5.8329) grad_norm 2.2041 (inf) loss_scale 512.0000 (635.5452) mem 9655MB [2024-08-04 07:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][440/625] eta 0:00:48 lr 0.000486 wd 0.0500 time 0.2564 (0.2608) data time 0.0006 (0.0019) model time 0.2557 (0.2592) loss 6.3861 (5.8381) grad_norm 2.2996 (inf) loss_scale 512.0000 (632.7438) mem 9655MB [2024-08-04 07:24:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][450/625] eta 0:00:45 lr 0.000486 wd 0.0500 time 0.2559 (0.2608) data time 0.0008 (0.0019) model time 0.2551 (0.2591) loss 5.8097 (5.8411) grad_norm 3.1659 (inf) loss_scale 512.0000 (630.0665) mem 9655MB [2024-08-04 07:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][460/625] eta 0:00:43 lr 0.000486 wd 0.0500 time 0.2557 (0.2606) data time 0.0010 (0.0019) model time 0.2546 (0.2590) loss 6.2577 (5.8368) grad_norm 3.2368 (inf) loss_scale 512.0000 (627.5054) mem 9655MB [2024-08-04 07:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][470/625] eta 0:00:40 lr 0.000486 wd 0.0500 time 0.2579 (0.2609) data time 0.0009 (0.0018) model time 0.2571 (0.2593) loss 6.8281 (5.8374) grad_norm 1.7201 (inf) loss_scale 512.0000 (625.0531) mem 9655MB [2024-08-04 07:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][480/625] eta 0:00:37 lr 0.000485 wd 0.0500 time 0.2533 (0.2612) data time 0.0009 (0.0018) model time 0.2524 (0.2596) loss 4.6405 (5.8251) grad_norm 2.0615 (inf) loss_scale 512.0000 (622.7027) mem 9655MB [2024-08-04 07:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][490/625] eta 0:00:35 lr 0.000485 wd 0.0500 time 0.2532 (0.2610) data time 0.0008 (0.0018) model time 0.2524 (0.2595) loss 5.9409 (5.8256) grad_norm 1.2571 (inf) loss_scale 512.0000 (620.4481) mem 9655MB [2024-08-04 07:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][500/625] eta 0:00:32 lr 0.000485 wd 0.0500 time 0.2565 (0.2610) data time 0.0010 (0.0018) model time 0.2556 (0.2594) loss 6.6123 (5.8362) grad_norm 2.7906 (inf) loss_scale 512.0000 (618.2834) mem 9655MB [2024-08-04 07:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][510/625] eta 0:00:30 lr 0.000485 wd 0.0500 time 0.2590 (0.2617) data time 0.0009 (0.0018) model time 0.2581 (0.2602) loss 4.3238 (5.8295) grad_norm 1.8630 (inf) loss_scale 512.0000 (616.2035) mem 9655MB [2024-08-04 07:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][520/625] eta 0:00:27 lr 0.000485 wd 0.0500 time 0.2539 (0.2615) data time 0.0008 (0.0018) model time 0.2531 (0.2601) loss 5.4279 (5.8326) grad_norm 2.7048 (inf) loss_scale 512.0000 (614.2035) mem 9655MB [2024-08-04 07:24:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][530/625] eta 0:00:24 lr 0.000485 wd 0.0500 time 0.4363 (0.2621) data time 0.0009 (0.0017) model time 0.4354 (0.2607) loss 6.3331 (5.8400) grad_norm 1.5299 (inf) loss_scale 512.0000 (612.2787) mem 9655MB [2024-08-04 07:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][540/625] eta 0:00:22 lr 0.000485 wd 0.0500 time 0.2537 (0.2620) data time 0.0007 (0.0017) model time 0.2530 (0.2606) loss 5.9695 (5.8418) grad_norm 2.1352 (inf) loss_scale 512.0000 (610.4251) mem 9655MB [2024-08-04 07:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][550/625] eta 0:00:19 lr 0.000484 wd 0.0500 time 0.2572 (0.2619) data time 0.0009 (0.0017) model time 0.2563 (0.2605) loss 5.3447 (5.8406) grad_norm 2.1765 (inf) loss_scale 512.0000 (608.6388) mem 9655MB [2024-08-04 07:24:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][560/625] eta 0:00:17 lr 0.000484 wd 0.0500 time 0.2584 (0.2618) data time 0.0008 (0.0017) model time 0.2577 (0.2604) loss 6.3392 (5.8505) grad_norm 3.5362 (inf) loss_scale 512.0000 (606.9162) mem 9655MB [2024-08-04 07:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][570/625] eta 0:00:14 lr 0.000484 wd 0.0500 time 0.2529 (0.2617) data time 0.0008 (0.0017) model time 0.2521 (0.2603) loss 6.9749 (5.8519) grad_norm 1.4394 (inf) loss_scale 512.0000 (605.2539) mem 9655MB [2024-08-04 07:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][580/625] eta 0:00:11 lr 0.000484 wd 0.0500 time 0.2524 (0.2616) data time 0.0007 (0.0017) model time 0.2518 (0.2602) loss 5.8594 (5.8508) grad_norm 4.0488 (inf) loss_scale 512.0000 (603.6489) mem 9655MB [2024-08-04 07:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][590/625] eta 0:00:09 lr 0.000484 wd 0.0500 time 0.2604 (0.2615) data time 0.0008 (0.0017) model time 0.2596 (0.2601) loss 6.1974 (5.8475) grad_norm 3.4633 (inf) loss_scale 512.0000 (602.0981) mem 9655MB [2024-08-04 07:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][600/625] eta 0:00:06 lr 0.000484 wd 0.0500 time 0.2765 (0.2615) data time 0.0005 (0.0017) model time 0.2760 (0.2601) loss 6.1766 (5.8521) grad_norm 2.0048 (inf) loss_scale 512.0000 (600.5990) mem 9655MB [2024-08-04 07:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][610/625] eta 0:00:03 lr 0.000484 wd 0.0500 time 0.2531 (0.2614) data time 0.0006 (0.0017) model time 0.2525 (0.2600) loss 5.1867 (5.8524) grad_norm 1.9027 (inf) loss_scale 512.0000 (599.1489) mem 9655MB [2024-08-04 07:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][620/625] eta 0:00:01 lr 0.000483 wd 0.0500 time 0.2512 (0.2613) data time 0.0006 (0.0016) model time 0.2506 (0.2598) loss 6.3771 (5.8537) grad_norm 2.1344 (inf) loss_scale 512.0000 (597.7456) mem 9655MB [2024-08-04 07:25:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 209 training takes 0:02:43 [2024-08-04 07:25:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:25:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.561 (0.561) Loss 0.6201 (0.6201) Acc@1 89.355 (89.355) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9497 (0.7533) Acc@1 79.932 (85.844) Acc@5 95.898 (97.501) Mem 9655MB [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0674 (0.8757) Acc@1 76.318 (82.699) Acc@5 94.336 (96.208) Mem 9655MB [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.346 Acc@5 96.203 [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.35% [2024-08-04 07:25:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:25:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.5811 (0.5811) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.9194 (0.7143) Acc@1 80.859 (86.275) Acc@5 95.898 (97.630) Mem 9655MB [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0371 (0.8394) Acc@1 76.221 (82.912) Acc@5 94.775 (96.312) Mem 9655MB [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.622 Acc@5 96.307 [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.62% [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:25:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][0/625] eta 0:06:54 lr 0.000483 wd 0.0500 time 0.6626 (0.6626) data time 0.4130 (0.4130) model time 0.0000 (0.0000) loss 6.4981 (6.4981) grad_norm 2.1432 (2.1432) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][10/625] eta 0:02:59 lr 0.000483 wd 0.0500 time 0.2505 (0.2921) data time 0.0008 (0.0384) model time 0.0000 (0.0000) loss 5.6736 (5.7814) grad_norm 3.1529 (2.5102) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][20/625] eta 0:02:46 lr 0.000483 wd 0.0500 time 0.2582 (0.2754) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 5.5584 (5.7765) grad_norm 1.9501 (2.4645) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][30/625] eta 0:02:40 lr 0.000483 wd 0.0500 time 0.2539 (0.2695) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 4.2074 (5.7560) grad_norm 1.9097 (2.3634) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][40/625] eta 0:02:40 lr 0.000483 wd 0.0500 time 0.2534 (0.2751) data time 0.0009 (0.0110) model time 0.0000 (0.0000) loss 5.3625 (5.7637) grad_norm 1.4393 (2.2943) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][50/625] eta 0:02:37 lr 0.000483 wd 0.0500 time 0.3957 (0.2746) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 6.1623 (5.7830) grad_norm 1.8905 (2.3243) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][60/625] eta 0:02:34 lr 0.000482 wd 0.0500 time 0.4092 (0.2740) data time 0.0007 (0.0077) model time 0.4085 (0.2704) loss 5.5876 (5.7347) grad_norm 1.7963 (2.2850) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][70/625] eta 0:02:30 lr 0.000482 wd 0.0500 time 0.2542 (0.2716) data time 0.0008 (0.0067) model time 0.2534 (0.2629) loss 5.4357 (5.6924) grad_norm 1.7891 (2.3173) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][80/625] eta 0:02:26 lr 0.000482 wd 0.0500 time 0.2547 (0.2696) data time 0.0007 (0.0060) model time 0.2541 (0.2602) loss 4.7820 (5.7046) grad_norm 2.1437 (2.3334) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][90/625] eta 0:02:23 lr 0.000482 wd 0.0500 time 0.2585 (0.2680) data time 0.0009 (0.0055) model time 0.2575 (0.2588) loss 5.6125 (5.7167) grad_norm 1.6295 (2.4217) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][100/625] eta 0:02:20 lr 0.000482 wd 0.0500 time 0.2568 (0.2668) data time 0.0009 (0.0050) model time 0.2559 (0.2578) loss 6.3523 (5.7613) grad_norm 2.3920 (2.4953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][110/625] eta 0:02:17 lr 0.000482 wd 0.0500 time 0.2580 (0.2677) data time 0.0011 (0.0047) model time 0.2568 (0.2608) loss 5.6691 (5.7676) grad_norm 3.2034 (2.5054) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][120/625] eta 0:02:14 lr 0.000481 wd 0.0500 time 0.2605 (0.2667) data time 0.0007 (0.0044) model time 0.2599 (0.2599) loss 7.0962 (5.7566) grad_norm 1.9472 (2.4753) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][130/625] eta 0:02:11 lr 0.000481 wd 0.0500 time 0.2546 (0.2659) data time 0.0008 (0.0041) model time 0.2537 (0.2594) loss 4.9065 (5.7597) grad_norm 1.7408 (2.4331) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][140/625] eta 0:02:08 lr 0.000481 wd 0.0500 time 0.2560 (0.2653) data time 0.0007 (0.0039) model time 0.2553 (0.2590) loss 6.4323 (5.7794) grad_norm 3.4226 (2.4139) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][150/625] eta 0:02:05 lr 0.000481 wd 0.0500 time 0.2540 (0.2646) data time 0.0007 (0.0037) model time 0.2533 (0.2586) loss 6.7194 (5.7814) grad_norm 2.5670 (2.3737) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][160/625] eta 0:02:02 lr 0.000481 wd 0.0500 time 0.2532 (0.2641) data time 0.0010 (0.0035) model time 0.2523 (0.2584) loss 6.1827 (5.7878) grad_norm 2.0849 (2.4170) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][170/625] eta 0:01:59 lr 0.000481 wd 0.0500 time 0.2548 (0.2637) data time 0.0008 (0.0034) model time 0.2540 (0.2581) loss 6.6838 (5.7957) grad_norm 2.2579 (2.4125) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][180/625] eta 0:01:57 lr 0.000481 wd 0.0500 time 0.2552 (0.2643) data time 0.0008 (0.0032) model time 0.2545 (0.2593) loss 5.8993 (5.7817) grad_norm 1.5759 (2.4207) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][190/625] eta 0:01:54 lr 0.000480 wd 0.0500 time 0.2513 (0.2638) data time 0.0011 (0.0031) model time 0.2502 (0.2589) loss 6.1805 (5.7914) grad_norm 1.7715 (2.4063) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][200/625] eta 0:01:51 lr 0.000480 wd 0.0500 time 0.2526 (0.2635) data time 0.0007 (0.0030) model time 0.2519 (0.2588) loss 4.8755 (5.7769) grad_norm 4.8550 (2.4270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][210/625] eta 0:01:49 lr 0.000480 wd 0.0500 time 0.2526 (0.2631) data time 0.0010 (0.0029) model time 0.2516 (0.2585) loss 5.4026 (5.7818) grad_norm 2.7637 (2.4397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][220/625] eta 0:01:46 lr 0.000480 wd 0.0500 time 0.2518 (0.2628) data time 0.0011 (0.0028) model time 0.2508 (0.2583) loss 5.5679 (5.8003) grad_norm 1.7190 (2.4460) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][230/625] eta 0:01:43 lr 0.000480 wd 0.0500 time 0.2548 (0.2626) data time 0.0010 (0.0027) model time 0.2538 (0.2582) loss 4.4980 (5.7944) grad_norm 1.5483 (2.4312) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][240/625] eta 0:01:41 lr 0.000480 wd 0.0500 time 0.2568 (0.2632) data time 0.0009 (0.0027) model time 0.2559 (0.2592) loss 5.3733 (5.7943) grad_norm 3.8880 (2.4337) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][250/625] eta 0:01:38 lr 0.000480 wd 0.0500 time 0.2547 (0.2629) data time 0.0008 (0.0026) model time 0.2539 (0.2590) loss 5.9447 (5.7800) grad_norm 1.9387 (2.4298) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][260/625] eta 0:01:35 lr 0.000479 wd 0.0500 time 0.2553 (0.2627) data time 0.0008 (0.0025) model time 0.2545 (0.2588) loss 4.7851 (5.7703) grad_norm 2.2792 (2.4462) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][270/625] eta 0:01:33 lr 0.000479 wd 0.0500 time 0.2538 (0.2624) data time 0.0007 (0.0025) model time 0.2531 (0.2587) loss 5.0463 (5.7777) grad_norm 2.8076 (2.4382) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][280/625] eta 0:01:31 lr 0.000479 wd 0.0500 time 0.6415 (0.2643) data time 0.0011 (0.0024) model time 0.6404 (0.2610) loss 6.2082 (5.7776) grad_norm 3.2955 (2.4480) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][290/625] eta 0:01:28 lr 0.000479 wd 0.0500 time 0.2563 (0.2646) data time 0.0006 (0.0024) model time 0.2556 (0.2616) loss 6.6792 (5.7889) grad_norm 2.0684 (2.4488) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][300/625] eta 0:01:25 lr 0.000479 wd 0.0500 time 0.2530 (0.2643) data time 0.0007 (0.0023) model time 0.2524 (0.2613) loss 4.6464 (5.7999) grad_norm 1.4641 (2.4331) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][310/625] eta 0:01:23 lr 0.000479 wd 0.0500 time 0.2549 (0.2641) data time 0.0011 (0.0023) model time 0.2538 (0.2611) loss 6.2302 (5.8029) grad_norm 1.5289 (2.4171) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][320/625] eta 0:01:20 lr 0.000478 wd 0.0500 time 0.2587 (0.2639) data time 0.0008 (0.0022) model time 0.2578 (0.2609) loss 5.5234 (5.7991) grad_norm 1.8551 (2.4123) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][330/625] eta 0:01:17 lr 0.000478 wd 0.0500 time 0.2578 (0.2637) data time 0.0006 (0.0022) model time 0.2572 (0.2607) loss 4.4587 (5.7915) grad_norm 2.9004 (2.4066) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][340/625] eta 0:01:15 lr 0.000478 wd 0.0500 time 0.2568 (0.2635) data time 0.0007 (0.0022) model time 0.2562 (0.2606) loss 6.1789 (5.7980) grad_norm 4.1533 (2.4064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][350/625] eta 0:01:12 lr 0.000478 wd 0.0500 time 0.2603 (0.2632) data time 0.0005 (0.0021) model time 0.2598 (0.2604) loss 5.7573 (5.8024) grad_norm 3.9787 (2.4041) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][360/625] eta 0:01:09 lr 0.000478 wd 0.0500 time 0.2555 (0.2635) data time 0.0010 (0.0021) model time 0.2544 (0.2607) loss 6.7748 (5.8034) grad_norm 2.0722 (2.4045) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][370/625] eta 0:01:07 lr 0.000478 wd 0.0500 time 0.2568 (0.2637) data time 0.0010 (0.0021) model time 0.2557 (0.2611) loss 4.6516 (5.7892) grad_norm 1.5079 (2.3992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][380/625] eta 0:01:04 lr 0.000478 wd 0.0500 time 0.2571 (0.2635) data time 0.0007 (0.0020) model time 0.2564 (0.2608) loss 7.0749 (5.7899) grad_norm 3.3053 (2.4053) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][390/625] eta 0:01:02 lr 0.000477 wd 0.0500 time 0.2585 (0.2639) data time 0.0006 (0.0020) model time 0.2579 (0.2614) loss 5.5328 (5.7839) grad_norm 1.6787 (2.4090) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][400/625] eta 0:00:59 lr 0.000477 wd 0.0500 time 0.2576 (0.2637) data time 0.0007 (0.0020) model time 0.2570 (0.2612) loss 6.4708 (5.7870) grad_norm 2.1461 (2.4020) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][410/625] eta 0:00:56 lr 0.000477 wd 0.0500 time 0.2493 (0.2635) data time 0.0008 (0.0019) model time 0.2485 (0.2610) loss 6.2321 (5.7836) grad_norm 2.2004 (2.3913) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][420/625] eta 0:00:53 lr 0.000477 wd 0.0500 time 0.2594 (0.2634) data time 0.0010 (0.0019) model time 0.2584 (0.2609) loss 5.9547 (5.7883) grad_norm 1.5625 (2.3850) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][430/625] eta 0:00:51 lr 0.000477 wd 0.0500 time 0.2588 (0.2632) data time 0.0009 (0.0019) model time 0.2579 (0.2608) loss 4.9226 (5.7936) grad_norm 1.9694 (2.3785) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][440/625] eta 0:00:48 lr 0.000477 wd 0.0500 time 0.2571 (0.2631) data time 0.0006 (0.0019) model time 0.2564 (0.2606) loss 6.0403 (5.7966) grad_norm 13.2309 (2.3932) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][450/625] eta 0:00:46 lr 0.000477 wd 0.0500 time 0.2536 (0.2629) data time 0.0009 (0.0019) model time 0.2527 (0.2605) loss 6.6396 (5.8090) grad_norm 2.8982 (2.3992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][460/625] eta 0:00:43 lr 0.000476 wd 0.0500 time 0.2538 (0.2628) data time 0.0009 (0.0018) model time 0.2529 (0.2604) loss 5.1310 (5.8064) grad_norm 2.0696 (2.4009) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][470/625] eta 0:00:40 lr 0.000476 wd 0.0500 time 0.2616 (0.2627) data time 0.0007 (0.0018) model time 0.2610 (0.2603) loss 4.6708 (5.8049) grad_norm 2.2453 (2.3967) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][480/625] eta 0:00:38 lr 0.000476 wd 0.0500 time 0.2569 (0.2626) data time 0.0010 (0.0018) model time 0.2558 (0.2602) loss 5.8987 (5.8125) grad_norm 2.4427 (2.4017) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][490/625] eta 0:00:35 lr 0.000476 wd 0.0500 time 0.2523 (0.2624) data time 0.0008 (0.0018) model time 0.2515 (0.2601) loss 5.7539 (5.8106) grad_norm 2.7806 (2.4147) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][500/625] eta 0:00:32 lr 0.000476 wd 0.0500 time 0.2538 (0.2623) data time 0.0007 (0.0018) model time 0.2532 (0.2600) loss 6.4974 (5.8162) grad_norm 2.5998 (2.4477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][510/625] eta 0:00:30 lr 0.000476 wd 0.0500 time 0.2532 (0.2625) data time 0.0008 (0.0017) model time 0.2524 (0.2603) loss 6.1190 (5.8181) grad_norm 3.0711 (2.4584) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][520/625] eta 0:00:27 lr 0.000475 wd 0.0500 time 0.2541 (0.2624) data time 0.0008 (0.0017) model time 0.2533 (0.2601) loss 6.0745 (5.8235) grad_norm 2.4236 (2.4663) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][530/625] eta 0:00:24 lr 0.000475 wd 0.0500 time 0.2545 (0.2626) data time 0.0006 (0.0017) model time 0.2539 (0.2604) loss 6.4263 (5.8206) grad_norm 3.6101 (2.4932) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][540/625] eta 0:00:22 lr 0.000475 wd 0.0500 time 0.2560 (0.2625) data time 0.0011 (0.0017) model time 0.2549 (0.2603) loss 5.6219 (5.8167) grad_norm 2.4761 (2.4998) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][550/625] eta 0:00:19 lr 0.000475 wd 0.0500 time 0.2538 (0.2623) data time 0.0009 (0.0017) model time 0.2529 (0.2602) loss 5.1862 (5.8243) grad_norm 1.4403 (2.4964) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][560/625] eta 0:00:17 lr 0.000475 wd 0.0500 time 0.2678 (0.2622) data time 0.0007 (0.0017) model time 0.2671 (0.2601) loss 6.9259 (5.8274) grad_norm 2.4232 (2.4901) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][570/625] eta 0:00:14 lr 0.000475 wd 0.0500 time 0.2526 (0.2621) data time 0.0009 (0.0017) model time 0.2518 (0.2600) loss 6.2432 (5.8227) grad_norm 2.3039 (2.4925) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][580/625] eta 0:00:11 lr 0.000475 wd 0.0500 time 0.2543 (0.2620) data time 0.0009 (0.0016) model time 0.2533 (0.2599) loss 5.0087 (5.8167) grad_norm 2.4873 (2.4895) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][590/625] eta 0:00:09 lr 0.000474 wd 0.0500 time 0.2582 (0.2619) data time 0.0006 (0.0016) model time 0.2576 (0.2598) loss 6.1984 (5.8189) grad_norm 2.1300 (2.4780) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][600/625] eta 0:00:06 lr 0.000474 wd 0.0500 time 0.2639 (0.2619) data time 0.0008 (0.0016) model time 0.2631 (0.2598) loss 6.5593 (5.8193) grad_norm 1.9666 (2.4666) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][610/625] eta 0:00:03 lr 0.000474 wd 0.0500 time 0.2524 (0.2618) data time 0.0003 (0.0016) model time 0.2521 (0.2597) loss 5.8400 (5.8142) grad_norm 2.0385 (2.4571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][620/625] eta 0:00:01 lr 0.000474 wd 0.0500 time 0.2528 (0.2617) data time 0.0003 (0.0016) model time 0.2525 (0.2596) loss 6.2945 (5.8113) grad_norm 1.7908 (2.4616) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 210 training takes 0:02:43 [2024-08-04 07:27:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:27:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.6289 (0.6289) Acc@1 89.355 (89.355) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9814 (0.7630) Acc@1 78.906 (85.827) Acc@5 95.654 (97.541) Mem 9655MB [2024-08-04 07:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0762 (0.8917) Acc@1 76.367 (82.389) Acc@5 94.922 (96.215) Mem 9655MB [2024-08-04 07:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.070 Acc@5 96.187 [2024-08-04 07:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-04 07:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.746 (0.746) Loss 0.5815 (0.5815) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.9185 (0.7144) Acc@1 80.811 (86.279) Acc@5 95.850 (97.630) Mem 9655MB [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0371 (0.8393) Acc@1 76.318 (82.947) Acc@5 95.020 (96.326) Mem 9655MB [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.650 Acc@5 96.327 [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.65% [2024-08-04 07:27:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:27:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][0/625] eta 0:07:12 lr 0.000474 wd 0.0500 time 0.6919 (0.6919) data time 0.4486 (0.4486) model time 0.0000 (0.0000) loss 6.2990 (6.2990) grad_norm 2.8369 (2.8369) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][10/625] eta 0:03:08 lr 0.000474 wd 0.0500 time 0.3944 (0.3071) data time 0.0008 (0.0417) model time 0.0000 (0.0000) loss 5.6307 (5.8675) grad_norm 2.3917 (2.8002) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][20/625] eta 0:02:51 lr 0.000474 wd 0.0500 time 0.2694 (0.2833) data time 0.0012 (0.0223) model time 0.0000 (0.0000) loss 6.0715 (5.9074) grad_norm 3.6071 (2.9198) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][30/625] eta 0:02:43 lr 0.000473 wd 0.0500 time 0.2548 (0.2740) data time 0.0012 (0.0155) model time 0.0000 (0.0000) loss 5.9540 (5.8628) grad_norm 2.6519 (2.6055) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][40/625] eta 0:02:37 lr 0.000473 wd 0.0500 time 0.2582 (0.2696) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 6.9398 (5.8828) grad_norm 2.1916 (2.7104) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][50/625] eta 0:02:35 lr 0.000473 wd 0.0500 time 0.2513 (0.2709) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 4.7855 (5.8883) grad_norm 4.6963 (2.9458) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][60/625] eta 0:02:31 lr 0.000473 wd 0.0500 time 0.2668 (0.2688) data time 0.0007 (0.0083) model time 0.2660 (0.2569) loss 6.1230 (5.8320) grad_norm 1.5937 (2.7858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][70/625] eta 0:02:28 lr 0.000473 wd 0.0500 time 0.2579 (0.2671) data time 0.0009 (0.0073) model time 0.2570 (0.2563) loss 5.7314 (5.8493) grad_norm 2.3984 (2.6851) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][80/625] eta 0:02:24 lr 0.000473 wd 0.0500 time 0.2601 (0.2656) data time 0.0006 (0.0065) model time 0.2595 (0.2558) loss 5.5349 (5.8715) grad_norm 2.5380 (2.6255) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][90/625] eta 0:02:21 lr 0.000473 wd 0.0500 time 0.2582 (0.2647) data time 0.0008 (0.0059) model time 0.2574 (0.2559) loss 6.0604 (5.8826) grad_norm 1.5424 (2.5406) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][100/625] eta 0:02:18 lr 0.000472 wd 0.0500 time 0.2559 (0.2639) data time 0.0009 (0.0054) model time 0.2550 (0.2557) loss 5.0754 (5.8695) grad_norm 2.0218 (2.5036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][110/625] eta 0:02:16 lr 0.000472 wd 0.0500 time 0.2544 (0.2650) data time 0.0007 (0.0050) model time 0.2537 (0.2590) loss 5.5510 (5.8737) grad_norm 3.1972 (2.5543) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][120/625] eta 0:02:14 lr 0.000472 wd 0.0500 time 0.2576 (0.2656) data time 0.0009 (0.0047) model time 0.2568 (0.2608) loss 6.0549 (5.8867) grad_norm 1.5747 (2.5109) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][130/625] eta 0:02:11 lr 0.000472 wd 0.0500 time 0.2561 (0.2648) data time 0.0009 (0.0044) model time 0.2552 (0.2600) loss 6.3521 (5.8968) grad_norm 1.3256 (2.4606) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][140/625] eta 0:02:08 lr 0.000472 wd 0.0500 time 0.2579 (0.2642) data time 0.0009 (0.0041) model time 0.2569 (0.2594) loss 5.2705 (5.8948) grad_norm 1.6873 (2.4596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][150/625] eta 0:02:05 lr 0.000472 wd 0.0500 time 0.2562 (0.2637) data time 0.0007 (0.0039) model time 0.2555 (0.2591) loss 5.3288 (5.8702) grad_norm 3.8046 (2.4685) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][160/625] eta 0:02:02 lr 0.000472 wd 0.0500 time 0.2574 (0.2644) data time 0.0006 (0.0037) model time 0.2568 (0.2605) loss 6.0607 (5.8621) grad_norm 2.3844 (2.5013) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][170/625] eta 0:02:00 lr 0.000471 wd 0.0500 time 0.2529 (0.2651) data time 0.0008 (0.0036) model time 0.2521 (0.2617) loss 6.3514 (5.8688) grad_norm 3.9220 (2.5090) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][180/625] eta 0:01:57 lr 0.000471 wd 0.0500 time 0.2568 (0.2646) data time 0.0008 (0.0034) model time 0.2560 (0.2612) loss 5.2638 (5.8732) grad_norm 2.1088 (2.5044) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][190/625] eta 0:01:54 lr 0.000471 wd 0.0500 time 0.2556 (0.2642) data time 0.0009 (0.0033) model time 0.2547 (0.2608) loss 5.7799 (5.8680) grad_norm 1.9667 (2.4776) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][200/625] eta 0:01:52 lr 0.000471 wd 0.0500 time 0.2541 (0.2639) data time 0.0009 (0.0032) model time 0.2532 (0.2606) loss 6.7187 (5.8484) grad_norm 2.0134 (2.4585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][210/625] eta 0:01:49 lr 0.000471 wd 0.0500 time 0.2569 (0.2644) data time 0.0006 (0.0031) model time 0.2563 (0.2613) loss 6.4042 (5.8452) grad_norm 1.7579 (2.4361) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][220/625] eta 0:01:46 lr 0.000471 wd 0.0500 time 0.2574 (0.2640) data time 0.0011 (0.0030) model time 0.2562 (0.2610) loss 5.0616 (5.8463) grad_norm 1.4742 (2.4344) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][230/625] eta 0:01:45 lr 0.000470 wd 0.0500 time 0.4680 (0.2662) data time 0.0008 (0.0029) model time 0.4672 (0.2639) loss 6.0198 (5.8302) grad_norm 2.2922 (2.4260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][240/625] eta 0:01:42 lr 0.000470 wd 0.0500 time 0.2565 (0.2666) data time 0.0011 (0.0028) model time 0.2555 (0.2645) loss 6.0301 (5.8241) grad_norm 1.8915 (2.4103) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][250/625] eta 0:01:39 lr 0.000470 wd 0.0500 time 0.2542 (0.2662) data time 0.0012 (0.0027) model time 0.2530 (0.2640) loss 5.0639 (5.8220) grad_norm 3.2231 (2.3950) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][260/625] eta 0:01:37 lr 0.000470 wd 0.0500 time 0.2566 (0.2658) data time 0.0011 (0.0027) model time 0.2555 (0.2636) loss 4.3799 (5.8131) grad_norm 2.5654 (2.4169) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][270/625] eta 0:01:34 lr 0.000470 wd 0.0500 time 0.2550 (0.2654) data time 0.0008 (0.0026) model time 0.2541 (0.2632) loss 5.9957 (5.8348) grad_norm 1.8780 (2.4008) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][280/625] eta 0:01:31 lr 0.000470 wd 0.0500 time 0.2589 (0.2650) data time 0.0008 (0.0025) model time 0.2581 (0.2628) loss 6.3593 (5.8405) grad_norm 6.4445 (2.4076) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][290/625] eta 0:01:28 lr 0.000470 wd 0.0500 time 0.2558 (0.2647) data time 0.0009 (0.0025) model time 0.2549 (0.2624) loss 5.9099 (5.8469) grad_norm 3.4020 (2.4170) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][300/625] eta 0:01:25 lr 0.000469 wd 0.0500 time 0.2659 (0.2644) data time 0.0009 (0.0024) model time 0.2650 (0.2622) loss 6.1890 (5.8592) grad_norm 1.7764 (2.4086) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][310/625] eta 0:01:23 lr 0.000469 wd 0.0500 time 0.2550 (0.2641) data time 0.0009 (0.0024) model time 0.2541 (0.2619) loss 5.7951 (5.8711) grad_norm 2.7004 (2.4028) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][320/625] eta 0:01:20 lr 0.000469 wd 0.0500 time 0.2676 (0.2640) data time 0.0009 (0.0023) model time 0.2667 (0.2617) loss 6.7579 (5.8635) grad_norm 2.8216 (2.3993) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][330/625] eta 0:01:17 lr 0.000469 wd 0.0500 time 0.2595 (0.2637) data time 0.0008 (0.0023) model time 0.2587 (0.2615) loss 5.5517 (5.8657) grad_norm 2.4116 (2.4111) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][340/625] eta 0:01:15 lr 0.000469 wd 0.0500 time 0.2556 (0.2635) data time 0.0010 (0.0023) model time 0.2546 (0.2613) loss 5.6117 (5.8563) grad_norm 2.4222 (2.4205) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][350/625] eta 0:01:12 lr 0.000469 wd 0.0500 time 0.2534 (0.2642) data time 0.0009 (0.0022) model time 0.2525 (0.2622) loss 6.2586 (5.8526) grad_norm 1.7801 (2.4084) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][360/625] eta 0:01:10 lr 0.000469 wd 0.0500 time 0.2536 (0.2651) data time 0.0009 (0.0022) model time 0.2527 (0.2632) loss 5.9683 (5.8541) grad_norm 1.9190 (2.4225) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][370/625] eta 0:01:07 lr 0.000468 wd 0.0500 time 0.2556 (0.2654) data time 0.0006 (0.0021) model time 0.2550 (0.2636) loss 6.4828 (5.8645) grad_norm 2.4083 (2.4247) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][380/625] eta 0:01:04 lr 0.000468 wd 0.0500 time 0.2561 (0.2651) data time 0.0014 (0.0021) model time 0.2547 (0.2633) loss 6.1674 (5.8534) grad_norm 3.2725 (2.4248) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][390/625] eta 0:01:02 lr 0.000468 wd 0.0500 time 0.2646 (0.2649) data time 0.0008 (0.0021) model time 0.2639 (0.2631) loss 4.8848 (5.8449) grad_norm 2.6053 (2.4109) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][400/625] eta 0:00:59 lr 0.000468 wd 0.0500 time 0.2495 (0.2647) data time 0.0007 (0.0021) model time 0.2488 (0.2628) loss 6.1425 (5.8391) grad_norm 4.1793 (2.4198) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][410/625] eta 0:00:56 lr 0.000468 wd 0.0500 time 0.2568 (0.2645) data time 0.0008 (0.0020) model time 0.2559 (0.2626) loss 5.0764 (5.8403) grad_norm 3.6546 (2.4203) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][420/625] eta 0:00:54 lr 0.000468 wd 0.0500 time 0.2517 (0.2642) data time 0.0008 (0.0020) model time 0.2508 (0.2624) loss 6.2057 (5.8397) grad_norm 2.0045 (2.4226) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][430/625] eta 0:00:51 lr 0.000468 wd 0.0500 time 0.2717 (0.2645) data time 0.0008 (0.0020) model time 0.2709 (0.2627) loss 6.2118 (5.8399) grad_norm 1.6374 (2.4152) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][440/625] eta 0:00:48 lr 0.000467 wd 0.0500 time 0.2559 (0.2643) data time 0.0007 (0.0020) model time 0.2553 (0.2625) loss 4.9103 (5.8324) grad_norm 2.6385 (2.4046) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][450/625] eta 0:00:46 lr 0.000467 wd 0.0500 time 0.2597 (0.2641) data time 0.0006 (0.0019) model time 0.2591 (0.2623) loss 6.0737 (5.8356) grad_norm 4.7124 (2.3972) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][460/625] eta 0:00:43 lr 0.000467 wd 0.0500 time 0.2525 (0.2639) data time 0.0009 (0.0019) model time 0.2516 (0.2621) loss 6.3635 (5.8298) grad_norm 3.1653 (2.4009) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][470/625] eta 0:00:40 lr 0.000467 wd 0.0500 time 0.2588 (0.2642) data time 0.0006 (0.0019) model time 0.2582 (0.2624) loss 4.9737 (5.8257) grad_norm 3.0832 (2.4069) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][480/625] eta 0:00:38 lr 0.000467 wd 0.0500 time 0.2557 (0.2640) data time 0.0008 (0.0019) model time 0.2549 (0.2623) loss 5.4957 (5.8231) grad_norm 1.8660 (2.4091) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][490/625] eta 0:00:35 lr 0.000467 wd 0.0500 time 0.2563 (0.2638) data time 0.0012 (0.0019) model time 0.2551 (0.2621) loss 5.0644 (5.8228) grad_norm 3.2835 (2.4013) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][500/625] eta 0:00:32 lr 0.000466 wd 0.0500 time 0.2599 (0.2637) data time 0.0006 (0.0018) model time 0.2593 (0.2619) loss 5.6023 (5.8237) grad_norm 1.7043 (2.3959) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][510/625] eta 0:00:30 lr 0.000466 wd 0.0500 time 0.2513 (0.2635) data time 0.0011 (0.0018) model time 0.2502 (0.2618) loss 4.4616 (5.8199) grad_norm 1.9069 (2.3991) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][520/625] eta 0:00:27 lr 0.000466 wd 0.0500 time 0.2506 (0.2635) data time 0.0010 (0.0018) model time 0.2496 (0.2618) loss 6.2634 (5.8215) grad_norm 2.4191 (2.3997) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][530/625] eta 0:00:25 lr 0.000466 wd 0.0500 time 0.2547 (0.2636) data time 0.0011 (0.0018) model time 0.2536 (0.2618) loss 6.0736 (5.8154) grad_norm 2.9095 (2.3964) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][540/625] eta 0:00:22 lr 0.000466 wd 0.0500 time 0.2512 (0.2637) data time 0.0010 (0.0018) model time 0.2502 (0.2620) loss 6.3203 (5.8176) grad_norm 1.5652 (2.3899) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][550/625] eta 0:00:19 lr 0.000466 wd 0.0500 time 0.2531 (0.2636) data time 0.0007 (0.0018) model time 0.2524 (0.2619) loss 5.9438 (5.8152) grad_norm 1.6200 (2.3826) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][560/625] eta 0:00:17 lr 0.000466 wd 0.0500 time 0.2517 (0.2635) data time 0.0008 (0.0017) model time 0.2509 (0.2618) loss 6.1057 (5.8200) grad_norm 1.6695 (2.3819) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][570/625] eta 0:00:14 lr 0.000465 wd 0.0500 time 0.2533 (0.2633) data time 0.0007 (0.0017) model time 0.2525 (0.2617) loss 5.5644 (5.8178) grad_norm 1.8524 (2.3800) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][580/625] eta 0:00:11 lr 0.000465 wd 0.0500 time 0.2547 (0.2632) data time 0.0011 (0.0017) model time 0.2536 (0.2615) loss 5.7010 (5.8159) grad_norm 1.6022 (2.3725) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][590/625] eta 0:00:09 lr 0.000465 wd 0.0500 time 0.2574 (0.2631) data time 0.0008 (0.0017) model time 0.2567 (0.2614) loss 5.7422 (5.8163) grad_norm 1.5830 (2.3668) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][600/625] eta 0:00:06 lr 0.000465 wd 0.0500 time 0.2573 (0.2630) data time 0.0009 (0.0017) model time 0.2565 (0.2613) loss 6.4115 (5.8184) grad_norm 3.5166 (2.3651) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][610/625] eta 0:00:03 lr 0.000465 wd 0.0500 time 0.2538 (0.2629) data time 0.0006 (0.0017) model time 0.2531 (0.2612) loss 5.5161 (5.8182) grad_norm 2.3886 (2.3602) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][620/625] eta 0:00:01 lr 0.000465 wd 0.0500 time 0.2472 (0.2628) data time 0.0005 (0.0017) model time 0.2467 (0.2611) loss 5.1362 (5.8142) grad_norm 1.8146 (2.3620) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 211 training takes 0:02:44 [2024-08-04 07:30:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:30:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:30:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.546 (0.546) Loss 0.6133 (0.6133) Acc@1 89.893 (89.893) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.061 (0.104) Loss 0.9990 (0.7557) Acc@1 79.443 (85.809) Acc@5 95.410 (97.541) Mem 9655MB [2024-08-04 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0762 (0.8869) Acc@1 75.879 (82.508) Acc@5 94.727 (96.233) Mem 9655MB [2024-08-04 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.216 Acc@5 96.205 [2024-08-04 07:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 07:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.873 (0.873) Loss 0.5811 (0.5811) Acc@1 89.697 (89.697) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.138) Loss 0.9185 (0.7140) Acc@1 80.762 (86.288) Acc@5 95.850 (97.643) Mem 9655MB [2024-08-04 07:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.098) Loss 1.0352 (0.8389) Acc@1 76.270 (82.964) Acc@5 95.020 (96.340) Mem 9655MB [2024-08-04 07:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.654 Acc@5 96.343 [2024-08-04 07:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-04 07:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.65% [2024-08-04 07:30:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:30:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][0/625] eta 0:08:41 lr 0.000465 wd 0.0500 time 0.8346 (0.8346) data time 0.5582 (0.5582) model time 0.0000 (0.0000) loss 5.3008 (5.3008) grad_norm 2.0838 (2.0838) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][10/625] eta 0:03:10 lr 0.000464 wd 0.0500 time 0.2569 (0.3097) data time 0.0009 (0.0516) model time 0.0000 (0.0000) loss 5.5955 (5.6062) grad_norm 1.9102 (2.4395) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][20/625] eta 0:02:51 lr 0.000464 wd 0.0500 time 0.2550 (0.2839) data time 0.0011 (0.0275) model time 0.0000 (0.0000) loss 6.2652 (5.7233) grad_norm 2.1497 (2.6057) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][30/625] eta 0:02:43 lr 0.000464 wd 0.0500 time 0.2570 (0.2749) data time 0.0009 (0.0189) model time 0.0000 (0.0000) loss 6.6341 (5.7236) grad_norm 2.9775 (2.5752) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][40/625] eta 0:02:41 lr 0.000464 wd 0.0500 time 0.2546 (0.2753) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 6.2404 (5.6449) grad_norm 1.6289 (2.5446) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][50/625] eta 0:02:38 lr 0.000464 wd 0.0500 time 0.2585 (0.2749) data time 0.0006 (0.0118) model time 0.0000 (0.0000) loss 6.4851 (5.6669) grad_norm 2.5613 (2.5887) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][60/625] eta 0:02:33 lr 0.000464 wd 0.0500 time 0.2646 (0.2719) data time 0.0006 (0.0100) model time 0.2640 (0.2561) loss 5.7183 (5.6871) grad_norm 1.6278 (2.5797) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][70/625] eta 0:02:31 lr 0.000464 wd 0.0500 time 0.2545 (0.2727) data time 0.0008 (0.0087) model time 0.2537 (0.2661) loss 5.3106 (5.7341) grad_norm 3.2948 (2.5226) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][80/625] eta 0:02:27 lr 0.000463 wd 0.0500 time 0.2564 (0.2705) data time 0.0007 (0.0078) model time 0.2557 (0.2623) loss 5.7034 (5.7061) grad_norm 2.8969 (2.5717) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][90/625] eta 0:02:23 lr 0.000463 wd 0.0500 time 0.2530 (0.2689) data time 0.0009 (0.0070) model time 0.2521 (0.2603) loss 6.7672 (5.7103) grad_norm 4.4959 (2.6925) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][100/625] eta 0:02:20 lr 0.000463 wd 0.0500 time 0.2592 (0.2677) data time 0.0016 (0.0064) model time 0.2576 (0.2594) loss 5.9736 (5.7440) grad_norm 3.1945 (2.7130) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][110/625] eta 0:02:17 lr 0.000463 wd 0.0500 time 0.2563 (0.2666) data time 0.0010 (0.0060) model time 0.2553 (0.2586) loss 6.3191 (5.7578) grad_norm 2.4782 (2.6706) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][120/625] eta 0:02:14 lr 0.000463 wd 0.0500 time 0.2530 (0.2656) data time 0.0010 (0.0055) model time 0.2520 (0.2579) loss 5.8489 (5.7400) grad_norm 2.7096 (2.6187) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][130/625] eta 0:02:11 lr 0.000463 wd 0.0500 time 0.2530 (0.2649) data time 0.0009 (0.0052) model time 0.2521 (0.2577) loss 5.5036 (5.7446) grad_norm 3.2799 (2.6066) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][140/625] eta 0:02:08 lr 0.000463 wd 0.0500 time 0.2612 (0.2657) data time 0.0012 (0.0049) model time 0.2601 (0.2595) loss 5.1218 (5.7403) grad_norm 5.5151 (2.6595) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][150/625] eta 0:02:05 lr 0.000462 wd 0.0500 time 0.2584 (0.2651) data time 0.0009 (0.0046) model time 0.2575 (0.2591) loss 6.4062 (5.7599) grad_norm 2.0792 (2.7285) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][160/625] eta 0:02:03 lr 0.000462 wd 0.0500 time 0.2575 (0.2645) data time 0.0008 (0.0044) model time 0.2566 (0.2588) loss 5.4656 (5.7794) grad_norm 1.9939 (2.7115) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][170/625] eta 0:02:00 lr 0.000462 wd 0.0500 time 0.2558 (0.2640) data time 0.0008 (0.0042) model time 0.2550 (0.2585) loss 6.1126 (5.7838) grad_norm 2.0690 (2.6766) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][180/625] eta 0:01:57 lr 0.000462 wd 0.0500 time 0.2634 (0.2636) data time 0.0009 (0.0040) model time 0.2625 (0.2582) loss 6.8607 (5.8051) grad_norm 1.7878 (2.6471) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][190/625] eta 0:01:54 lr 0.000462 wd 0.0500 time 0.2579 (0.2638) data time 0.0008 (0.0039) model time 0.2571 (0.2588) loss 5.3429 (5.8193) grad_norm 2.5281 (2.6582) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][200/625] eta 0:01:51 lr 0.000462 wd 0.0500 time 0.2611 (0.2634) data time 0.0006 (0.0037) model time 0.2605 (0.2585) loss 5.8260 (5.8096) grad_norm 4.6824 (2.6637) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][210/625] eta 0:01:49 lr 0.000462 wd 0.0500 time 0.2576 (0.2630) data time 0.0009 (0.0036) model time 0.2567 (0.2583) loss 5.9826 (5.8246) grad_norm 3.3471 (2.6461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][220/625] eta 0:01:46 lr 0.000461 wd 0.0500 time 0.2550 (0.2627) data time 0.0011 (0.0035) model time 0.2539 (0.2581) loss 5.7583 (5.8436) grad_norm 1.5947 (2.6285) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][230/625] eta 0:01:43 lr 0.000461 wd 0.0500 time 0.2556 (0.2633) data time 0.0007 (0.0034) model time 0.2549 (0.2590) loss 6.2701 (5.8593) grad_norm 3.1678 (2.6828) loss_scale 1024.0000 (516.4329) mem 9655MB [2024-08-04 07:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][240/625] eta 0:01:41 lr 0.000461 wd 0.0500 time 0.2629 (0.2632) data time 0.0008 (0.0033) model time 0.2621 (0.2591) loss 6.5696 (5.8502) grad_norm 2.2537 (2.6666) loss_scale 1024.0000 (537.4938) mem 9655MB [2024-08-04 07:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][250/625] eta 0:01:38 lr 0.000461 wd 0.0500 time 0.2595 (0.2629) data time 0.0008 (0.0032) model time 0.2586 (0.2589) loss 4.6921 (5.8527) grad_norm 2.9486 (2.6409) loss_scale 1024.0000 (556.8765) mem 9655MB [2024-08-04 07:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][260/625] eta 0:01:35 lr 0.000461 wd 0.0500 time 0.2580 (0.2627) data time 0.0008 (0.0031) model time 0.2572 (0.2587) loss 6.0950 (5.8464) grad_norm 1.7049 (2.6074) loss_scale 1024.0000 (574.7739) mem 9655MB [2024-08-04 07:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][270/625] eta 0:01:33 lr 0.000461 wd 0.0500 time 0.2585 (0.2625) data time 0.0008 (0.0030) model time 0.2576 (0.2587) loss 6.2267 (5.8524) grad_norm 2.0186 (2.6266) loss_scale 1024.0000 (591.3506) mem 9655MB [2024-08-04 07:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][280/625] eta 0:01:30 lr 0.000460 wd 0.0500 time 0.2598 (0.2623) data time 0.0006 (0.0029) model time 0.2593 (0.2585) loss 5.4706 (5.8442) grad_norm 2.0270 (2.6496) loss_scale 1024.0000 (606.7473) mem 9655MB [2024-08-04 07:32:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][290/625] eta 0:01:27 lr 0.000460 wd 0.0500 time 0.2567 (0.2621) data time 0.0008 (0.0029) model time 0.2560 (0.2584) loss 5.8822 (5.8362) grad_norm 2.1911 (2.6340) loss_scale 1024.0000 (621.0859) mem 9655MB [2024-08-04 07:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][300/625] eta 0:01:25 lr 0.000460 wd 0.0500 time 0.2537 (0.2629) data time 0.0009 (0.0028) model time 0.2528 (0.2595) loss 5.9004 (5.8343) grad_norm 1.9765 (2.6136) loss_scale 1024.0000 (634.4718) mem 9655MB [2024-08-04 07:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][310/625] eta 0:01:22 lr 0.000460 wd 0.0500 time 0.2574 (0.2627) data time 0.0006 (0.0027) model time 0.2569 (0.2593) loss 6.2817 (5.8363) grad_norm 1.5119 (2.5993) loss_scale 1024.0000 (646.9968) mem 9655MB [2024-08-04 07:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][320/625] eta 0:01:20 lr 0.000460 wd 0.0500 time 0.2611 (0.2625) data time 0.0008 (0.0027) model time 0.2604 (0.2592) loss 5.7942 (5.8335) grad_norm 2.0931 (2.6204) loss_scale 1024.0000 (658.7414) mem 9655MB [2024-08-04 07:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][330/625] eta 0:01:17 lr 0.000460 wd 0.0500 time 0.4673 (0.2636) data time 0.0006 (0.0026) model time 0.4667 (0.2606) loss 6.5957 (5.8231) grad_norm 1.7116 (2.6027) loss_scale 1024.0000 (669.7764) mem 9655MB [2024-08-04 07:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][340/625] eta 0:01:15 lr 0.000460 wd 0.0500 time 0.2549 (0.2633) data time 0.0006 (0.0026) model time 0.2542 (0.2603) loss 6.0622 (5.8311) grad_norm 3.0821 (2.5918) loss_scale 1024.0000 (680.1642) mem 9655MB [2024-08-04 07:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][350/625] eta 0:01:12 lr 0.000459 wd 0.0500 time 0.2573 (0.2638) data time 0.0008 (0.0025) model time 0.2565 (0.2610) loss 6.5217 (5.8342) grad_norm 2.6715 (2.5903) loss_scale 1024.0000 (689.9601) mem 9655MB [2024-08-04 07:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][360/625] eta 0:01:09 lr 0.000459 wd 0.0500 time 0.2596 (0.2636) data time 0.0017 (0.0025) model time 0.2579 (0.2608) loss 6.7703 (5.8494) grad_norm 2.3960 (2.5951) loss_scale 1024.0000 (699.2133) mem 9655MB [2024-08-04 07:32:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][370/625] eta 0:01:07 lr 0.000459 wd 0.0500 time 0.2583 (0.2634) data time 0.0008 (0.0025) model time 0.2575 (0.2606) loss 5.9933 (5.8448) grad_norm 2.8159 (2.5917) loss_scale 1024.0000 (707.9677) mem 9655MB [2024-08-04 07:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][380/625] eta 0:01:04 lr 0.000459 wd 0.0500 time 0.2553 (0.2632) data time 0.0009 (0.0024) model time 0.2544 (0.2604) loss 5.7716 (5.8416) grad_norm 2.5214 (2.5968) loss_scale 1024.0000 (716.2625) mem 9655MB [2024-08-04 07:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][390/625] eta 0:01:01 lr 0.000459 wd 0.0500 time 0.2558 (0.2630) data time 0.0009 (0.0024) model time 0.2549 (0.2603) loss 5.0107 (5.8342) grad_norm 4.0125 (2.5971) loss_scale 1024.0000 (724.1330) mem 9655MB [2024-08-04 07:32:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][400/625] eta 0:00:59 lr 0.000459 wd 0.0500 time 0.2543 (0.2633) data time 0.0009 (0.0023) model time 0.2534 (0.2607) loss 6.2187 (5.8308) grad_norm 2.1068 (2.6197) loss_scale 1024.0000 (731.6110) mem 9655MB [2024-08-04 07:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][410/625] eta 0:00:56 lr 0.000459 wd 0.0500 time 0.4505 (0.2636) data time 0.0006 (0.0023) model time 0.4499 (0.2610) loss 6.8260 (5.8281) grad_norm 2.7808 (2.6234) loss_scale 1024.0000 (738.7251) mem 9655MB [2024-08-04 07:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][420/625] eta 0:00:54 lr 0.000458 wd 0.0500 time 0.2536 (0.2634) data time 0.0008 (0.0023) model time 0.2528 (0.2609) loss 6.2267 (5.8209) grad_norm 1.7337 (2.6153) loss_scale 1024.0000 (745.5012) mem 9655MB [2024-08-04 07:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][430/625] eta 0:00:51 lr 0.000458 wd 0.0500 time 0.2500 (0.2633) data time 0.0009 (0.0023) model time 0.2491 (0.2608) loss 6.3323 (5.8215) grad_norm 2.1867 (2.6035) loss_scale 1024.0000 (751.9629) mem 9655MB [2024-08-04 07:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][440/625] eta 0:00:48 lr 0.000458 wd 0.0500 time 0.2558 (0.2631) data time 0.0008 (0.0022) model time 0.2551 (0.2606) loss 6.3697 (5.8193) grad_norm 2.5033 (2.5914) loss_scale 1024.0000 (758.1315) mem 9655MB [2024-08-04 07:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][450/625] eta 0:00:46 lr 0.000458 wd 0.0500 time 0.2579 (0.2630) data time 0.0008 (0.0022) model time 0.2571 (0.2605) loss 4.9019 (5.8118) grad_norm 4.7622 (2.6052) loss_scale 1024.0000 (764.0266) mem 9655MB [2024-08-04 07:32:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][460/625] eta 0:00:43 lr 0.000458 wd 0.0500 time 0.2554 (0.2636) data time 0.0007 (0.0022) model time 0.2547 (0.2612) loss 5.2741 (5.8136) grad_norm 1.8776 (2.5998) loss_scale 1024.0000 (769.6659) mem 9655MB [2024-08-04 07:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][470/625] eta 0:00:40 lr 0.000458 wd 0.0500 time 0.2547 (0.2635) data time 0.0008 (0.0022) model time 0.2539 (0.2611) loss 5.0943 (5.8142) grad_norm 2.5023 (2.5935) loss_scale 1024.0000 (775.0658) mem 9655MB [2024-08-04 07:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][480/625] eta 0:00:38 lr 0.000458 wd 0.0500 time 0.2521 (0.2633) data time 0.0009 (0.0021) model time 0.2512 (0.2610) loss 4.6212 (5.8129) grad_norm 1.3401 (2.5871) loss_scale 1024.0000 (780.2412) mem 9655MB [2024-08-04 07:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][490/625] eta 0:00:35 lr 0.000457 wd 0.0500 time 0.2612 (0.2632) data time 0.0008 (0.0021) model time 0.2603 (0.2609) loss 5.3172 (5.8169) grad_norm 1.5048 (2.5781) loss_scale 1024.0000 (785.2057) mem 9655MB [2024-08-04 07:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][500/625] eta 0:00:32 lr 0.000457 wd 0.0500 time 0.2582 (0.2631) data time 0.0006 (0.0021) model time 0.2576 (0.2607) loss 5.1504 (5.8163) grad_norm 2.0695 (2.5673) loss_scale 1024.0000 (789.9721) mem 9655MB [2024-08-04 07:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][510/625] eta 0:00:30 lr 0.000457 wd 0.0500 time 0.2528 (0.2632) data time 0.0009 (0.0021) model time 0.2518 (0.2609) loss 5.4853 (5.8213) grad_norm 2.0419 (2.5579) loss_scale 1024.0000 (794.5519) mem 9655MB [2024-08-04 07:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][520/625] eta 0:00:27 lr 0.000457 wd 0.0500 time 0.2554 (0.2630) data time 0.0014 (0.0021) model time 0.2539 (0.2607) loss 6.3330 (5.8106) grad_norm 1.5601 (2.5462) loss_scale 1024.0000 (798.9559) mem 9655MB [2024-08-04 07:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][530/625] eta 0:00:24 lr 0.000457 wd 0.0500 time 0.2595 (0.2629) data time 0.0009 (0.0020) model time 0.2586 (0.2606) loss 5.4436 (5.8139) grad_norm 1.3430 (2.5364) loss_scale 1024.0000 (803.1940) mem 9655MB [2024-08-04 07:33:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][540/625] eta 0:00:22 lr 0.000457 wd 0.0500 time 0.2585 (0.2635) data time 0.0007 (0.0020) model time 0.2579 (0.2613) loss 5.9109 (5.8110) grad_norm 1.6815 (2.5262) loss_scale 1024.0000 (807.2754) mem 9655MB [2024-08-04 07:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][550/625] eta 0:00:19 lr 0.000456 wd 0.0500 time 0.2581 (0.2635) data time 0.0008 (0.0020) model time 0.2573 (0.2613) loss 6.4203 (5.8153) grad_norm 3.2598 (2.5184) loss_scale 1024.0000 (811.2087) mem 9655MB [2024-08-04 07:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][560/625] eta 0:00:17 lr 0.000456 wd 0.0500 time 0.2533 (0.2633) data time 0.0008 (0.0020) model time 0.2525 (0.2612) loss 5.3207 (5.8165) grad_norm 1.9301 (2.5019) loss_scale 1024.0000 (815.0018) mem 9655MB [2024-08-04 07:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][570/625] eta 0:00:14 lr 0.000456 wd 0.0500 time 0.2665 (0.2632) data time 0.0006 (0.0020) model time 0.2659 (0.2611) loss 6.2539 (5.8135) grad_norm 1.5961 (2.4905) loss_scale 1024.0000 (818.6620) mem 9655MB [2024-08-04 07:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][580/625] eta 0:00:11 lr 0.000456 wd 0.0500 time 0.2541 (0.2633) data time 0.0006 (0.0019) model time 0.2535 (0.2612) loss 6.1169 (5.8214) grad_norm 1.2269 (2.4873) loss_scale 1024.0000 (822.1962) mem 9655MB [2024-08-04 07:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][590/625] eta 0:00:09 lr 0.000456 wd 0.0500 time 0.2568 (0.2632) data time 0.0008 (0.0019) model time 0.2559 (0.2611) loss 6.3239 (5.8130) grad_norm 1.4372 (2.4791) loss_scale 1024.0000 (825.6108) mem 9655MB [2024-08-04 07:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][600/625] eta 0:00:06 lr 0.000456 wd 0.0500 time 0.2534 (0.2630) data time 0.0007 (0.0019) model time 0.2527 (0.2610) loss 6.6111 (5.8146) grad_norm 1.9724 (2.4788) loss_scale 1024.0000 (828.9118) mem 9655MB [2024-08-04 07:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][610/625] eta 0:00:03 lr 0.000456 wd 0.0500 time 0.2537 (0.2629) data time 0.0006 (0.0019) model time 0.2531 (0.2609) loss 5.6961 (5.8076) grad_norm 2.2972 (2.4776) loss_scale 1024.0000 (832.1047) mem 9655MB [2024-08-04 07:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][620/625] eta 0:00:01 lr 0.000455 wd 0.0500 time 0.2530 (0.2628) data time 0.0004 (0.0019) model time 0.2526 (0.2607) loss 7.0906 (5.8116) grad_norm 3.1625 (2.4750) loss_scale 1024.0000 (835.1948) mem 9655MB [2024-08-04 07:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 212 training takes 0:02:44 [2024-08-04 07:33:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:33:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.6167 (0.6167) Acc@1 89.307 (89.307) Acc@5 98.389 (98.389) Mem 9655MB [2024-08-04 07:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9521 (0.7458) Acc@1 80.762 (85.915) Acc@5 95.752 (97.488) Mem 9655MB [2024-08-04 07:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0410 (0.8703) Acc@1 77.246 (82.626) Acc@5 94.727 (96.182) Mem 9655MB [2024-08-04 07:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.298 Acc@5 96.153 [2024-08-04 07:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-04 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.856 (0.856) Loss 0.5811 (0.5811) Acc@1 89.795 (89.795) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.137) Loss 0.9185 (0.7139) Acc@1 80.713 (86.310) Acc@5 95.850 (97.643) Mem 9655MB [2024-08-04 07:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.103) Loss 1.0352 (0.8388) Acc@1 76.367 (82.968) Acc@5 95.117 (96.343) Mem 9655MB [2024-08-04 07:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.672 Acc@5 96.347 [2024-08-04 07:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-04 07:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.67% [2024-08-04 07:33:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:33:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][0/625] eta 0:08:21 lr 0.000455 wd 0.0500 time 0.8027 (0.8027) data time 0.5543 (0.5543) model time 0.0000 (0.0000) loss 6.2200 (6.2200) grad_norm 1.6291 (1.6291) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][10/625] eta 0:03:08 lr 0.000455 wd 0.0500 time 0.2628 (0.3067) data time 0.0007 (0.0512) model time 0.0000 (0.0000) loss 6.5924 (6.3128) grad_norm 2.4935 (2.0702) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][20/625] eta 0:02:51 lr 0.000455 wd 0.0500 time 0.2576 (0.2827) data time 0.0011 (0.0273) model time 0.0000 (0.0000) loss 6.0123 (5.8147) grad_norm 2.7239 (2.3614) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][30/625] eta 0:02:49 lr 0.000455 wd 0.0500 time 0.3818 (0.2848) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 6.0653 (5.8345) grad_norm 2.4510 (2.3843) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][40/625] eta 0:02:48 lr 0.000455 wd 0.0500 time 0.2662 (0.2873) data time 0.0010 (0.0144) model time 0.0000 (0.0000) loss 6.8016 (5.9440) grad_norm 2.2493 (2.3543) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][50/625] eta 0:02:43 lr 0.000455 wd 0.0500 time 0.2593 (0.2849) data time 0.0006 (0.0118) model time 0.0000 (0.0000) loss 4.9438 (5.9063) grad_norm 1.5036 (2.2474) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][60/625] eta 0:02:38 lr 0.000454 wd 0.0500 time 0.2559 (0.2802) data time 0.0009 (0.0100) model time 0.2550 (0.2552) loss 5.7516 (5.9290) grad_norm 2.3374 (2.2308) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][70/625] eta 0:02:33 lr 0.000454 wd 0.0500 time 0.2541 (0.2768) data time 0.0011 (0.0088) model time 0.2531 (0.2551) loss 5.4906 (5.9540) grad_norm 1.4065 (2.2066) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][80/625] eta 0:02:29 lr 0.000454 wd 0.0500 time 0.2543 (0.2743) data time 0.0008 (0.0078) model time 0.2535 (0.2552) loss 6.2902 (5.9741) grad_norm 1.4340 (2.2447) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][90/625] eta 0:02:25 lr 0.000454 wd 0.0500 time 0.2551 (0.2724) data time 0.0010 (0.0070) model time 0.2541 (0.2555) loss 6.3782 (5.9852) grad_norm 1.9463 (2.2600) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][100/625] eta 0:02:22 lr 0.000454 wd 0.0500 time 0.2588 (0.2713) data time 0.0008 (0.0064) model time 0.2580 (0.2564) loss 6.4426 (5.9925) grad_norm 1.8847 (2.2534) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][110/625] eta 0:02:19 lr 0.000454 wd 0.0500 time 0.2579 (0.2702) data time 0.0009 (0.0060) model time 0.2570 (0.2567) loss 6.5418 (5.9887) grad_norm 1.6151 (2.2799) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][120/625] eta 0:02:16 lr 0.000454 wd 0.0500 time 0.2608 (0.2708) data time 0.0012 (0.0055) model time 0.2597 (0.2596) loss 6.8729 (5.9917) grad_norm 3.0107 (2.3355) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][130/625] eta 0:02:14 lr 0.000453 wd 0.0500 time 0.2573 (0.2712) data time 0.0007 (0.0052) model time 0.2565 (0.2615) loss 6.3941 (5.9892) grad_norm 2.8914 (2.3594) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][140/625] eta 0:02:11 lr 0.000453 wd 0.0500 time 0.2600 (0.2703) data time 0.0007 (0.0049) model time 0.2593 (0.2610) loss 4.5700 (5.9679) grad_norm 3.6590 (2.4147) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][150/625] eta 0:02:07 lr 0.000453 wd 0.0500 time 0.2597 (0.2693) data time 0.0006 (0.0046) model time 0.2591 (0.2604) loss 4.7238 (5.9323) grad_norm 1.6121 (2.4242) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][160/625] eta 0:02:04 lr 0.000453 wd 0.0500 time 0.2547 (0.2686) data time 0.0007 (0.0044) model time 0.2540 (0.2600) loss 5.0510 (5.9330) grad_norm 1.5583 (2.4020) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][170/625] eta 0:02:01 lr 0.000453 wd 0.0500 time 0.2546 (0.2679) data time 0.0008 (0.0042) model time 0.2538 (0.2597) loss 5.3242 (5.9244) grad_norm 2.1110 (2.3965) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][180/625] eta 0:01:59 lr 0.000453 wd 0.0500 time 0.2587 (0.2694) data time 0.0007 (0.0040) model time 0.2580 (0.2624) loss 5.4179 (5.9253) grad_norm 2.7524 (2.3825) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][190/625] eta 0:01:56 lr 0.000453 wd 0.0500 time 0.2581 (0.2689) data time 0.0008 (0.0038) model time 0.2573 (0.2621) loss 6.9558 (5.9187) grad_norm 1.1534 (2.3773) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][200/625] eta 0:01:53 lr 0.000452 wd 0.0500 time 0.2575 (0.2682) data time 0.0007 (0.0037) model time 0.2567 (0.2616) loss 6.5482 (5.9178) grad_norm 1.6723 (2.3721) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][210/625] eta 0:01:51 lr 0.000452 wd 0.0500 time 0.2508 (0.2683) data time 0.0010 (0.0036) model time 0.2498 (0.2621) loss 5.5361 (5.9173) grad_norm 3.0467 (2.3723) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][220/625] eta 0:01:48 lr 0.000452 wd 0.0500 time 0.2575 (0.2678) data time 0.0006 (0.0034) model time 0.2569 (0.2617) loss 7.1908 (5.9233) grad_norm 5.6847 (2.4044) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][230/625] eta 0:01:45 lr 0.000452 wd 0.0500 time 0.2580 (0.2673) data time 0.0008 (0.0033) model time 0.2572 (0.2614) loss 6.4095 (5.9232) grad_norm 2.3731 (2.4133) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][240/625] eta 0:01:42 lr 0.000452 wd 0.0500 time 0.2540 (0.2668) data time 0.0007 (0.0032) model time 0.2533 (0.2610) loss 5.2273 (5.9223) grad_norm 3.0898 (2.4100) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][250/625] eta 0:01:39 lr 0.000452 wd 0.0500 time 0.2559 (0.2664) data time 0.0009 (0.0031) model time 0.2550 (0.2607) loss 6.0973 (5.9213) grad_norm 3.6554 (2.3916) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][260/625] eta 0:01:37 lr 0.000452 wd 0.0500 time 0.2554 (0.2660) data time 0.0011 (0.0031) model time 0.2542 (0.2605) loss 6.2653 (5.9067) grad_norm 2.0449 (2.4334) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][270/625] eta 0:01:34 lr 0.000451 wd 0.0500 time 0.2549 (0.2657) data time 0.0009 (0.0030) model time 0.2540 (0.2603) loss 6.4915 (5.9069) grad_norm 2.5477 (2.4460) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][280/625] eta 0:01:31 lr 0.000451 wd 0.0500 time 0.2578 (0.2654) data time 0.0009 (0.0029) model time 0.2569 (0.2601) loss 5.8388 (5.8888) grad_norm 3.5823 (2.4687) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][290/625] eta 0:01:28 lr 0.000451 wd 0.0500 time 0.2563 (0.2651) data time 0.0009 (0.0028) model time 0.2554 (0.2599) loss 5.2572 (5.8647) grad_norm 2.1234 (2.4708) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][300/625] eta 0:01:26 lr 0.000451 wd 0.0500 time 0.2562 (0.2647) data time 0.0010 (0.0028) model time 0.2552 (0.2597) loss 6.0220 (5.8688) grad_norm 2.3251 (2.4743) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][310/625] eta 0:01:23 lr 0.000451 wd 0.0500 time 0.2543 (0.2645) data time 0.0008 (0.0027) model time 0.2535 (0.2595) loss 5.2490 (5.8661) grad_norm 2.7825 (2.4724) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][320/625] eta 0:01:20 lr 0.000451 wd 0.0500 time 0.2566 (0.2642) data time 0.0011 (0.0027) model time 0.2555 (0.2594) loss 6.2563 (5.8587) grad_norm 1.6802 (2.4635) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][330/625] eta 0:01:17 lr 0.000451 wd 0.0500 time 0.2562 (0.2640) data time 0.0007 (0.0026) model time 0.2556 (0.2593) loss 7.1322 (5.8639) grad_norm 1.6364 (2.4605) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][340/625] eta 0:01:15 lr 0.000450 wd 0.0500 time 0.2573 (0.2638) data time 0.0007 (0.0026) model time 0.2566 (0.2592) loss 6.0247 (5.8596) grad_norm 2.5022 (2.4717) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][350/625] eta 0:01:12 lr 0.000450 wd 0.0500 time 0.2621 (0.2636) data time 0.0008 (0.0025) model time 0.2613 (0.2590) loss 6.1272 (5.8644) grad_norm 2.4773 (2.4689) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][360/625] eta 0:01:09 lr 0.000450 wd 0.0500 time 0.2546 (0.2634) data time 0.0010 (0.0025) model time 0.2535 (0.2589) loss 5.7266 (5.8624) grad_norm 3.0543 (2.4808) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][370/625] eta 0:01:07 lr 0.000450 wd 0.0500 time 0.2537 (0.2632) data time 0.0008 (0.0024) model time 0.2529 (0.2588) loss 6.3084 (5.8588) grad_norm 1.4747 (2.4806) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][380/625] eta 0:01:04 lr 0.000450 wd 0.0500 time 0.3871 (0.2634) data time 0.0008 (0.0024) model time 0.3863 (0.2591) loss 4.9992 (5.8651) grad_norm 2.1427 (2.4674) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][390/625] eta 0:01:01 lr 0.000450 wd 0.0500 time 0.2601 (0.2632) data time 0.0008 (0.0024) model time 0.2593 (0.2590) loss 6.4699 (5.8655) grad_norm 2.4864 (2.4593) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][400/625] eta 0:00:59 lr 0.000450 wd 0.0500 time 0.2539 (0.2634) data time 0.0009 (0.0023) model time 0.2530 (0.2594) loss 5.6330 (5.8664) grad_norm 2.2968 (2.4723) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][410/625] eta 0:00:56 lr 0.000449 wd 0.0500 time 0.2569 (0.2632) data time 0.0010 (0.0023) model time 0.2559 (0.2592) loss 5.2028 (5.8553) grad_norm 1.5162 (2.4716) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][420/625] eta 0:00:53 lr 0.000449 wd 0.0500 time 0.2567 (0.2631) data time 0.0009 (0.0023) model time 0.2558 (0.2591) loss 5.9932 (5.8485) grad_norm 2.6665 (2.4721) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][430/625] eta 0:00:51 lr 0.000449 wd 0.0500 time 0.2538 (0.2629) data time 0.0012 (0.0022) model time 0.2526 (0.2590) loss 5.6352 (5.8508) grad_norm 4.3190 (2.4771) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][440/625] eta 0:00:48 lr 0.000449 wd 0.0500 time 0.2573 (0.2632) data time 0.0007 (0.0022) model time 0.2566 (0.2594) loss 6.7712 (5.8444) grad_norm 1.7659 (2.5041) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][450/625] eta 0:00:46 lr 0.000449 wd 0.0500 time 0.2588 (0.2631) data time 0.0007 (0.0022) model time 0.2580 (0.2593) loss 5.8456 (5.8373) grad_norm 1.6979 (2.5221) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][460/625] eta 0:00:43 lr 0.000449 wd 0.0500 time 0.2554 (0.2637) data time 0.0011 (0.0022) model time 0.2543 (0.2601) loss 4.9517 (5.8309) grad_norm 4.6806 (2.5282) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][470/625] eta 0:00:40 lr 0.000448 wd 0.0500 time 0.2568 (0.2635) data time 0.0011 (0.0021) model time 0.2557 (0.2599) loss 5.9456 (5.8345) grad_norm 2.1176 (2.5275) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][480/625] eta 0:00:38 lr 0.000448 wd 0.0500 time 0.2520 (0.2634) data time 0.0010 (0.0021) model time 0.2510 (0.2599) loss 6.9544 (5.8309) grad_norm 1.9773 (2.5175) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][490/625] eta 0:00:35 lr 0.000448 wd 0.0500 time 0.2591 (0.2633) data time 0.0008 (0.0021) model time 0.2583 (0.2598) loss 6.6711 (5.8390) grad_norm 2.0120 (2.5021) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][500/625] eta 0:00:32 lr 0.000448 wd 0.0500 time 0.2568 (0.2634) data time 0.0009 (0.0021) model time 0.2559 (0.2600) loss 5.2026 (5.8417) grad_norm 5.7917 (2.4978) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][510/625] eta 0:00:30 lr 0.000448 wd 0.0500 time 0.2551 (0.2641) data time 0.0010 (0.0020) model time 0.2542 (0.2608) loss 6.7484 (5.8452) grad_norm 1.9565 (2.5090) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][520/625] eta 0:00:27 lr 0.000448 wd 0.0500 time 0.2580 (0.2639) data time 0.0010 (0.0020) model time 0.2570 (0.2607) loss 6.1418 (5.8478) grad_norm 2.2129 (2.5191) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][530/625] eta 0:00:25 lr 0.000448 wd 0.0500 time 0.2540 (0.2638) data time 0.0011 (0.0020) model time 0.2528 (0.2606) loss 6.6981 (5.8499) grad_norm 1.2088 (2.5115) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][540/625] eta 0:00:22 lr 0.000447 wd 0.0500 time 0.2597 (0.2637) data time 0.0007 (0.0020) model time 0.2590 (0.2605) loss 6.2716 (5.8563) grad_norm 2.0809 (2.5023) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][550/625] eta 0:00:19 lr 0.000447 wd 0.0500 time 0.2566 (0.2635) data time 0.0009 (0.0020) model time 0.2557 (0.2604) loss 5.0870 (5.8563) grad_norm 1.5963 (2.5035) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][560/625] eta 0:00:17 lr 0.000447 wd 0.0500 time 0.2607 (0.2634) data time 0.0006 (0.0019) model time 0.2600 (0.2603) loss 5.3652 (5.8610) grad_norm 1.7905 (2.5113) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][570/625] eta 0:00:14 lr 0.000447 wd 0.0500 time 0.2569 (0.2633) data time 0.0006 (0.0019) model time 0.2562 (0.2602) loss 6.4975 (5.8582) grad_norm 2.6827 (2.5033) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][580/625] eta 0:00:11 lr 0.000447 wd 0.0500 time 0.2559 (0.2632) data time 0.0008 (0.0019) model time 0.2550 (0.2601) loss 5.5940 (5.8608) grad_norm 1.2701 (2.4929) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][590/625] eta 0:00:09 lr 0.000447 wd 0.0500 time 0.2513 (0.2631) data time 0.0010 (0.0019) model time 0.2504 (0.2601) loss 5.6055 (5.8579) grad_norm 2.3865 (2.4875) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][600/625] eta 0:00:06 lr 0.000447 wd 0.0500 time 0.2512 (0.2630) data time 0.0011 (0.0019) model time 0.2500 (0.2600) loss 6.1123 (5.8566) grad_norm 2.2956 (2.4778) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][610/625] eta 0:00:03 lr 0.000446 wd 0.0500 time 0.2526 (0.2629) data time 0.0004 (0.0019) model time 0.2522 (0.2599) loss 5.3313 (5.8601) grad_norm 1.7750 (inf) loss_scale 512.0000 (1015.6203) mem 9655MB [2024-08-04 07:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][620/625] eta 0:00:01 lr 0.000446 wd 0.0500 time 0.2534 (0.2627) data time 0.0005 (0.0018) model time 0.2529 (0.2598) loss 6.0005 (5.8613) grad_norm 2.4037 (inf) loss_scale 512.0000 (1007.5105) mem 9655MB [2024-08-04 07:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 213 training takes 0:02:44 [2024-08-04 07:36:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:36:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.637 (0.637) Loss 0.6011 (0.6011) Acc@1 89.502 (89.502) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 07:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.113) Loss 0.9692 (0.7414) Acc@1 79.980 (85.875) Acc@5 95.557 (97.523) Mem 9655MB [2024-08-04 07:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.086) Loss 1.0566 (0.8704) Acc@1 75.977 (82.457) Acc@5 94.971 (96.231) Mem 9655MB [2024-08-04 07:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.172 Acc@5 96.217 [2024-08-04 07:36:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-04 07:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.978 (0.978) Loss 0.5815 (0.5815) Acc@1 89.795 (89.795) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.144) Loss 0.9189 (0.7140) Acc@1 80.713 (86.310) Acc@5 95.752 (97.634) Mem 9655MB [2024-08-04 07:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.102) Loss 1.0342 (0.8389) Acc@1 76.611 (82.996) Acc@5 94.971 (96.326) Mem 9655MB [2024-08-04 07:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.702 Acc@5 96.339 [2024-08-04 07:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-04 07:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.70% [2024-08-04 07:36:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:36:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:36:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][0/625] eta 0:07:00 lr 0.000446 wd 0.0500 time 0.6725 (0.6725) data time 0.4317 (0.4317) model time 0.0000 (0.0000) loss 5.0583 (5.0583) grad_norm 2.5665 (2.5665) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][10/625] eta 0:03:11 lr 0.000446 wd 0.0500 time 0.2494 (0.3113) data time 0.0006 (0.0401) model time 0.0000 (0.0000) loss 5.8334 (5.7060) grad_norm 1.8810 (inf) loss_scale 256.0000 (442.1818) mem 9655MB [2024-08-04 07:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][20/625] eta 0:02:56 lr 0.000446 wd 0.0500 time 0.2595 (0.2916) data time 0.0008 (0.0215) model time 0.0000 (0.0000) loss 5.6572 (5.6433) grad_norm 4.2282 (inf) loss_scale 256.0000 (353.5238) mem 9655MB [2024-08-04 07:36:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][30/625] eta 0:02:49 lr 0.000446 wd 0.0500 time 0.2559 (0.2847) data time 0.0011 (0.0148) model time 0.0000 (0.0000) loss 4.9323 (5.5855) grad_norm 1.7361 (inf) loss_scale 256.0000 (322.0645) mem 9655MB [2024-08-04 07:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][40/625] eta 0:02:42 lr 0.000446 wd 0.0500 time 0.2567 (0.2778) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 5.3317 (5.6382) grad_norm 2.6703 (inf) loss_scale 256.0000 (305.9512) mem 9655MB [2024-08-04 07:36:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][50/625] eta 0:02:38 lr 0.000445 wd 0.0500 time 0.2551 (0.2759) data time 0.0011 (0.0094) model time 0.0000 (0.0000) loss 6.3662 (5.6550) grad_norm 1.5315 (inf) loss_scale 256.0000 (296.1569) mem 9655MB [2024-08-04 07:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][60/625] eta 0:02:33 lr 0.000445 wd 0.0500 time 0.2555 (0.2726) data time 0.0009 (0.0080) model time 0.2546 (0.2547) loss 5.5132 (5.6918) grad_norm 2.0738 (inf) loss_scale 256.0000 (289.5738) mem 9655MB [2024-08-04 07:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][70/625] eta 0:02:29 lr 0.000445 wd 0.0500 time 0.2548 (0.2702) data time 0.0008 (0.0070) model time 0.2539 (0.2549) loss 6.5417 (5.7198) grad_norm 1.8065 (inf) loss_scale 256.0000 (284.8451) mem 9655MB [2024-08-04 07:36:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][80/625] eta 0:02:26 lr 0.000445 wd 0.0500 time 0.2567 (0.2684) data time 0.0008 (0.0062) model time 0.2559 (0.2547) loss 5.5665 (5.7277) grad_norm 4.2714 (inf) loss_scale 256.0000 (281.2840) mem 9655MB [2024-08-04 07:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][90/625] eta 0:02:23 lr 0.000445 wd 0.0500 time 0.2543 (0.2690) data time 0.0011 (0.0057) model time 0.2532 (0.2592) loss 6.0919 (5.7496) grad_norm 2.9202 (inf) loss_scale 256.0000 (278.5055) mem 9655MB [2024-08-04 07:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][100/625] eta 0:02:21 lr 0.000445 wd 0.0500 time 0.2573 (0.2698) data time 0.0010 (0.0052) model time 0.2564 (0.2626) loss 4.9137 (5.7545) grad_norm 1.6228 (inf) loss_scale 256.0000 (276.2772) mem 9655MB [2024-08-04 07:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][110/625] eta 0:02:19 lr 0.000445 wd 0.0500 time 0.2570 (0.2704) data time 0.0008 (0.0048) model time 0.2561 (0.2648) loss 6.3629 (5.7571) grad_norm 2.7715 (inf) loss_scale 256.0000 (274.4505) mem 9655MB [2024-08-04 07:36:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][120/625] eta 0:02:16 lr 0.000444 wd 0.0500 time 0.2566 (0.2709) data time 0.0009 (0.0045) model time 0.2558 (0.2663) loss 6.4011 (5.7683) grad_norm 2.8463 (inf) loss_scale 256.0000 (272.9256) mem 9655MB [2024-08-04 07:37:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][130/625] eta 0:02:13 lr 0.000444 wd 0.0500 time 0.2561 (0.2697) data time 0.0008 (0.0042) model time 0.2553 (0.2649) loss 5.7847 (5.7747) grad_norm 2.1936 (inf) loss_scale 256.0000 (271.6336) mem 9655MB [2024-08-04 07:37:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][140/625] eta 0:02:11 lr 0.000444 wd 0.0500 time 0.4848 (0.2719) data time 0.0009 (0.0040) model time 0.4839 (0.2686) loss 6.1418 (5.7708) grad_norm 2.1045 (inf) loss_scale 256.0000 (270.5248) mem 9655MB [2024-08-04 07:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][150/625] eta 0:02:08 lr 0.000444 wd 0.0500 time 0.2560 (0.2708) data time 0.0007 (0.0038) model time 0.2553 (0.2673) loss 5.1687 (5.7728) grad_norm 1.9991 (inf) loss_scale 256.0000 (269.5629) mem 9655MB [2024-08-04 07:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][160/625] eta 0:02:05 lr 0.000444 wd 0.0500 time 0.2580 (0.2698) data time 0.0006 (0.0036) model time 0.2574 (0.2661) loss 4.7468 (5.7700) grad_norm 4.3107 (inf) loss_scale 256.0000 (268.7205) mem 9655MB [2024-08-04 07:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][170/625] eta 0:02:02 lr 0.000444 wd 0.0500 time 0.2542 (0.2690) data time 0.0008 (0.0035) model time 0.2534 (0.2652) loss 4.6925 (5.7552) grad_norm 2.2851 (inf) loss_scale 256.0000 (267.9766) mem 9655MB [2024-08-04 07:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][180/625] eta 0:01:59 lr 0.000444 wd 0.0500 time 0.2604 (0.2683) data time 0.0008 (0.0033) model time 0.2596 (0.2644) loss 6.7533 (5.7563) grad_norm 2.2679 (inf) loss_scale 256.0000 (267.3149) mem 9655MB [2024-08-04 07:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][190/625] eta 0:01:56 lr 0.000443 wd 0.0500 time 0.3871 (0.2683) data time 0.0008 (0.0032) model time 0.3863 (0.2647) loss 6.1451 (5.7609) grad_norm 1.6721 (inf) loss_scale 256.0000 (266.7225) mem 9655MB [2024-08-04 07:37:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][200/625] eta 0:01:53 lr 0.000443 wd 0.0500 time 0.2552 (0.2677) data time 0.0008 (0.0031) model time 0.2544 (0.2640) loss 6.2358 (5.7694) grad_norm 1.8688 (inf) loss_scale 256.0000 (266.1891) mem 9655MB [2024-08-04 07:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][210/625] eta 0:01:51 lr 0.000443 wd 0.0500 time 0.2609 (0.2678) data time 0.0007 (0.0030) model time 0.2602 (0.2643) loss 4.8705 (5.7624) grad_norm 2.4140 (inf) loss_scale 256.0000 (265.7062) mem 9655MB [2024-08-04 07:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][220/625] eta 0:01:48 lr 0.000443 wd 0.0500 time 0.2684 (0.2673) data time 0.0007 (0.0029) model time 0.2677 (0.2637) loss 6.2084 (5.7725) grad_norm 1.4148 (inf) loss_scale 256.0000 (265.2670) mem 9655MB [2024-08-04 07:37:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][230/625] eta 0:01:45 lr 0.000443 wd 0.0500 time 0.2551 (0.2668) data time 0.0009 (0.0028) model time 0.2542 (0.2633) loss 6.1756 (5.7741) grad_norm 1.3611 (inf) loss_scale 256.0000 (264.8658) mem 9655MB [2024-08-04 07:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][240/625] eta 0:01:42 lr 0.000443 wd 0.0500 time 0.2585 (0.2673) data time 0.0009 (0.0027) model time 0.2576 (0.2640) loss 5.9415 (5.7667) grad_norm 2.0647 (inf) loss_scale 256.0000 (264.4979) mem 9655MB [2024-08-04 07:37:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][250/625] eta 0:01:40 lr 0.000443 wd 0.0500 time 0.2511 (0.2673) data time 0.0009 (0.0027) model time 0.2502 (0.2642) loss 5.8751 (5.7495) grad_norm 1.7132 (inf) loss_scale 256.0000 (264.1594) mem 9655MB [2024-08-04 07:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][260/625] eta 0:01:37 lr 0.000442 wd 0.0500 time 0.2601 (0.2669) data time 0.0008 (0.0026) model time 0.2594 (0.2638) loss 4.5895 (5.7455) grad_norm 2.7847 (inf) loss_scale 256.0000 (263.8467) mem 9655MB [2024-08-04 07:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][270/625] eta 0:01:34 lr 0.000442 wd 0.0500 time 0.2577 (0.2665) data time 0.0009 (0.0025) model time 0.2568 (0.2634) loss 5.1974 (5.7336) grad_norm 2.1357 (inf) loss_scale 256.0000 (263.5572) mem 9655MB [2024-08-04 07:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][280/625] eta 0:01:31 lr 0.000442 wd 0.0500 time 0.2579 (0.2662) data time 0.0007 (0.0025) model time 0.2573 (0.2630) loss 5.2880 (5.7352) grad_norm 1.7987 (inf) loss_scale 256.0000 (263.2883) mem 9655MB [2024-08-04 07:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][290/625] eta 0:01:29 lr 0.000442 wd 0.0500 time 0.2567 (0.2659) data time 0.0008 (0.0024) model time 0.2560 (0.2628) loss 4.7730 (5.7284) grad_norm 1.5737 (inf) loss_scale 256.0000 (263.0378) mem 9655MB [2024-08-04 07:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][300/625] eta 0:01:26 lr 0.000442 wd 0.0500 time 0.2545 (0.2656) data time 0.0010 (0.0024) model time 0.2535 (0.2625) loss 5.4376 (5.7334) grad_norm 3.5807 (inf) loss_scale 256.0000 (262.8040) mem 9655MB [2024-08-04 07:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][310/625] eta 0:01:23 lr 0.000442 wd 0.0500 time 0.2584 (0.2653) data time 0.0010 (0.0023) model time 0.2574 (0.2623) loss 6.3470 (5.7349) grad_norm 3.2628 (inf) loss_scale 256.0000 (262.5852) mem 9655MB [2024-08-04 07:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][320/625] eta 0:01:20 lr 0.000442 wd 0.0500 time 0.2569 (0.2650) data time 0.0008 (0.0023) model time 0.2561 (0.2620) loss 5.4510 (5.7327) grad_norm 4.4792 (inf) loss_scale 256.0000 (262.3801) mem 9655MB [2024-08-04 07:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][330/625] eta 0:01:18 lr 0.000441 wd 0.0500 time 0.2543 (0.2647) data time 0.0012 (0.0023) model time 0.2531 (0.2617) loss 5.7748 (5.7499) grad_norm 3.7630 (inf) loss_scale 256.0000 (262.1873) mem 9655MB [2024-08-04 07:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][340/625] eta 0:01:15 lr 0.000441 wd 0.0500 time 0.2546 (0.2645) data time 0.0008 (0.0022) model time 0.2538 (0.2615) loss 5.6657 (5.7573) grad_norm 2.5142 (inf) loss_scale 256.0000 (262.0059) mem 9655MB [2024-08-04 07:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][350/625] eta 0:01:12 lr 0.000441 wd 0.0500 time 0.2589 (0.2647) data time 0.0007 (0.0022) model time 0.2582 (0.2619) loss 4.7088 (5.7484) grad_norm 2.2008 (inf) loss_scale 256.0000 (261.8348) mem 9655MB [2024-08-04 07:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][360/625] eta 0:01:10 lr 0.000441 wd 0.0500 time 0.2565 (0.2651) data time 0.0007 (0.0022) model time 0.2559 (0.2624) loss 6.5942 (5.7471) grad_norm 1.7730 (inf) loss_scale 256.0000 (261.6731) mem 9655MB [2024-08-04 07:38:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][370/625] eta 0:01:07 lr 0.000441 wd 0.0500 time 0.2531 (0.2649) data time 0.0009 (0.0021) model time 0.2522 (0.2621) loss 5.0944 (5.7473) grad_norm 2.1898 (inf) loss_scale 256.0000 (261.5202) mem 9655MB [2024-08-04 07:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][380/625] eta 0:01:04 lr 0.000441 wd 0.0500 time 0.2589 (0.2646) data time 0.0007 (0.0021) model time 0.2582 (0.2620) loss 6.7609 (5.7571) grad_norm 3.3670 (inf) loss_scale 256.0000 (261.3753) mem 9655MB [2024-08-04 07:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][390/625] eta 0:01:02 lr 0.000441 wd 0.0500 time 0.2548 (0.2644) data time 0.0007 (0.0021) model time 0.2540 (0.2617) loss 6.0904 (5.7584) grad_norm 1.8633 (inf) loss_scale 256.0000 (261.2379) mem 9655MB [2024-08-04 07:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][400/625] eta 0:00:59 lr 0.000440 wd 0.0500 time 0.2577 (0.2642) data time 0.0008 (0.0020) model time 0.2569 (0.2616) loss 5.7229 (5.7629) grad_norm 1.6225 (inf) loss_scale 256.0000 (261.1072) mem 9655MB [2024-08-04 07:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][410/625] eta 0:00:56 lr 0.000440 wd 0.0500 time 0.2565 (0.2640) data time 0.0009 (0.0020) model time 0.2555 (0.2614) loss 6.1877 (5.7623) grad_norm 11.6278 (inf) loss_scale 256.0000 (260.9830) mem 9655MB [2024-08-04 07:38:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][420/625] eta 0:00:54 lr 0.000440 wd 0.0500 time 0.2599 (0.2638) data time 0.0005 (0.0020) model time 0.2593 (0.2612) loss 6.9580 (5.7579) grad_norm 4.4472 (inf) loss_scale 256.0000 (260.8646) mem 9655MB [2024-08-04 07:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][430/625] eta 0:00:51 lr 0.000440 wd 0.0500 time 0.2549 (0.2636) data time 0.0012 (0.0020) model time 0.2537 (0.2610) loss 6.1840 (5.7636) grad_norm 2.4761 (inf) loss_scale 256.0000 (260.7517) mem 9655MB [2024-08-04 07:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][440/625] eta 0:00:48 lr 0.000440 wd 0.0500 time 0.2554 (0.2635) data time 0.0012 (0.0019) model time 0.2542 (0.2609) loss 5.4769 (5.7602) grad_norm 2.4434 (inf) loss_scale 256.0000 (260.6440) mem 9655MB [2024-08-04 07:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][450/625] eta 0:00:46 lr 0.000440 wd 0.0500 time 0.2517 (0.2633) data time 0.0011 (0.0019) model time 0.2506 (0.2608) loss 6.5263 (5.7653) grad_norm 2.2913 (inf) loss_scale 256.0000 (260.5410) mem 9655MB [2024-08-04 07:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][460/625] eta 0:00:43 lr 0.000440 wd 0.0500 time 0.2593 (0.2636) data time 0.0008 (0.0019) model time 0.2585 (0.2611) loss 5.7021 (5.7702) grad_norm 2.0823 (inf) loss_scale 256.0000 (260.4425) mem 9655MB [2024-08-04 07:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][470/625] eta 0:00:40 lr 0.000439 wd 0.0500 time 0.2556 (0.2635) data time 0.0009 (0.0019) model time 0.2547 (0.2610) loss 5.8983 (5.7710) grad_norm 2.5966 (inf) loss_scale 256.0000 (260.3482) mem 9655MB [2024-08-04 07:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][480/625] eta 0:00:38 lr 0.000439 wd 0.0500 time 0.2556 (0.2633) data time 0.0009 (0.0019) model time 0.2546 (0.2609) loss 6.2792 (5.7772) grad_norm 3.5487 (inf) loss_scale 256.0000 (260.2578) mem 9655MB [2024-08-04 07:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][490/625] eta 0:00:35 lr 0.000439 wd 0.0500 time 0.2564 (0.2632) data time 0.0007 (0.0018) model time 0.2556 (0.2607) loss 6.3585 (5.7790) grad_norm 1.4771 (inf) loss_scale 256.0000 (260.1711) mem 9655MB [2024-08-04 07:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][500/625] eta 0:00:32 lr 0.000439 wd 0.0500 time 0.2590 (0.2630) data time 0.0006 (0.0018) model time 0.2584 (0.2606) loss 5.7707 (5.7786) grad_norm 2.7717 (inf) loss_scale 256.0000 (260.0878) mem 9655MB [2024-08-04 07:38:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][510/625] eta 0:00:30 lr 0.000439 wd 0.0500 time 0.2579 (0.2632) data time 0.0009 (0.0018) model time 0.2570 (0.2608) loss 5.9131 (5.7759) grad_norm 1.6008 (inf) loss_scale 256.0000 (260.0078) mem 9655MB [2024-08-04 07:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][520/625] eta 0:00:27 lr 0.000439 wd 0.0500 time 0.2666 (0.2631) data time 0.0008 (0.0018) model time 0.2659 (0.2607) loss 5.6276 (5.7677) grad_norm 2.6072 (inf) loss_scale 256.0000 (259.9309) mem 9655MB [2024-08-04 07:38:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][530/625] eta 0:00:24 lr 0.000438 wd 0.0500 time 0.2557 (0.2629) data time 0.0009 (0.0018) model time 0.2548 (0.2606) loss 7.1369 (5.7741) grad_norm 1.8286 (inf) loss_scale 256.0000 (259.8569) mem 9655MB [2024-08-04 07:38:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][540/625] eta 0:00:22 lr 0.000438 wd 0.0500 time 0.2694 (0.2628) data time 0.0009 (0.0018) model time 0.2685 (0.2605) loss 5.3244 (5.7729) grad_norm 2.7004 (inf) loss_scale 256.0000 (259.7856) mem 9655MB [2024-08-04 07:38:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][550/625] eta 0:00:19 lr 0.000438 wd 0.0500 time 0.2548 (0.2627) data time 0.0007 (0.0017) model time 0.2541 (0.2604) loss 7.1067 (5.7761) grad_norm 2.0195 (inf) loss_scale 256.0000 (259.7169) mem 9655MB [2024-08-04 07:38:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][560/625] eta 0:00:17 lr 0.000438 wd 0.0500 time 0.4408 (0.2629) data time 0.0009 (0.0017) model time 0.4399 (0.2607) loss 5.5701 (5.7773) grad_norm 3.0972 (inf) loss_scale 256.0000 (259.6506) mem 9655MB [2024-08-04 07:38:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][570/625] eta 0:00:14 lr 0.000438 wd 0.0500 time 0.2572 (0.2628) data time 0.0009 (0.0017) model time 0.2563 (0.2606) loss 5.7221 (5.7845) grad_norm 1.7450 (inf) loss_scale 256.0000 (259.5867) mem 9655MB [2024-08-04 07:38:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][580/625] eta 0:00:11 lr 0.000438 wd 0.0500 time 0.2563 (0.2631) data time 0.0008 (0.0017) model time 0.2555 (0.2609) loss 6.0142 (5.7882) grad_norm 1.7311 (inf) loss_scale 256.0000 (259.5250) mem 9655MB [2024-08-04 07:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][590/625] eta 0:00:09 lr 0.000438 wd 0.0500 time 0.2530 (0.2630) data time 0.0010 (0.0017) model time 0.2520 (0.2608) loss 5.8880 (5.7863) grad_norm 1.5174 (inf) loss_scale 256.0000 (259.4653) mem 9655MB [2024-08-04 07:39:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][600/625] eta 0:00:06 lr 0.000437 wd 0.0500 time 0.2699 (0.2629) data time 0.0009 (0.0017) model time 0.2690 (0.2607) loss 5.1248 (5.7875) grad_norm 2.5335 (inf) loss_scale 256.0000 (259.4077) mem 9655MB [2024-08-04 07:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][610/625] eta 0:00:03 lr 0.000437 wd 0.0500 time 0.2520 (0.2631) data time 0.0006 (0.0017) model time 0.2514 (0.2610) loss 5.7293 (5.7860) grad_norm 2.0338 (inf) loss_scale 256.0000 (259.3519) mem 9655MB [2024-08-04 07:39:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][620/625] eta 0:00:01 lr 0.000437 wd 0.0500 time 0.2542 (0.2629) data time 0.0005 (0.0016) model time 0.2536 (0.2608) loss 6.0739 (5.7880) grad_norm 2.2362 (inf) loss_scale 256.0000 (259.2979) mem 9655MB [2024-08-04 07:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 214 training takes 0:02:44 [2024-08-04 07:39:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:39:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.582 (0.582) Loss 0.6123 (0.6123) Acc@1 89.453 (89.453) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 07:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 0.9727 (0.7479) Acc@1 79.443 (85.942) Acc@5 95.996 (97.599) Mem 9655MB [2024-08-04 07:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0879 (0.8748) Acc@1 75.635 (82.692) Acc@5 95.215 (96.356) Mem 9655MB [2024-08-04 07:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.448 Acc@5 96.347 [2024-08-04 07:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 07:39:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.45% [2024-08-04 07:39:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:39:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.642 (0.642) Loss 0.5811 (0.5811) Acc@1 89.795 (89.795) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.110) Loss 0.9194 (0.7136) Acc@1 80.908 (86.381) Acc@5 95.850 (97.643) Mem 9655MB [2024-08-04 07:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.0342 (0.8387) Acc@1 76.660 (83.066) Acc@5 95.117 (96.336) Mem 9655MB [2024-08-04 07:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.762 Acc@5 96.347 [2024-08-04 07:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.76% [2024-08-04 07:39:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:39:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:39:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][0/625] eta 0:08:27 lr 0.000437 wd 0.0500 time 0.8127 (0.8127) data time 0.4632 (0.4632) model time 0.0000 (0.0000) loss 6.3177 (6.3177) grad_norm 1.9447 (1.9447) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][10/625] eta 0:03:23 lr 0.000437 wd 0.0500 time 0.3866 (0.3317) data time 0.0007 (0.0429) model time 0.0000 (0.0000) loss 5.6444 (6.1534) grad_norm 1.9211 (1.9401) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][20/625] eta 0:02:59 lr 0.000437 wd 0.0500 time 0.2583 (0.2963) data time 0.0006 (0.0229) model time 0.0000 (0.0000) loss 6.8372 (5.9895) grad_norm 2.1068 (2.0531) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][30/625] eta 0:02:48 lr 0.000437 wd 0.0500 time 0.2512 (0.2831) data time 0.0010 (0.0158) model time 0.0000 (0.0000) loss 5.5364 (5.9299) grad_norm 1.9022 (2.2801) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][40/625] eta 0:02:41 lr 0.000437 wd 0.0500 time 0.2627 (0.2767) data time 0.0006 (0.0122) model time 0.0000 (0.0000) loss 5.9915 (5.8516) grad_norm 2.0194 (2.3994) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][50/625] eta 0:02:36 lr 0.000436 wd 0.0500 time 0.2565 (0.2725) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 7.0673 (5.9449) grad_norm 9.6819 (2.6133) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][60/625] eta 0:02:32 lr 0.000436 wd 0.0500 time 0.2521 (0.2697) data time 0.0010 (0.0085) model time 0.2512 (0.2549) loss 5.9934 (5.9289) grad_norm 3.1876 (2.5720) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][70/625] eta 0:02:30 lr 0.000436 wd 0.0500 time 0.2575 (0.2707) data time 0.0006 (0.0074) model time 0.2569 (0.2655) loss 5.3092 (5.9006) grad_norm 2.3140 (2.5279) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][80/625] eta 0:02:26 lr 0.000436 wd 0.0500 time 0.2548 (0.2691) data time 0.0009 (0.0066) model time 0.2539 (0.2626) loss 5.2672 (5.9156) grad_norm 1.6287 (2.4981) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][90/625] eta 0:02:23 lr 0.000436 wd 0.0500 time 0.2608 (0.2677) data time 0.0009 (0.0060) model time 0.2599 (0.2606) loss 6.6402 (5.9067) grad_norm 2.7027 (2.4845) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][100/625] eta 0:02:19 lr 0.000436 wd 0.0500 time 0.2564 (0.2666) data time 0.0007 (0.0055) model time 0.2557 (0.2598) loss 5.6971 (5.8990) grad_norm 1.8940 (2.6253) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][110/625] eta 0:02:16 lr 0.000436 wd 0.0500 time 0.2541 (0.2657) data time 0.0009 (0.0051) model time 0.2532 (0.2591) loss 5.2262 (5.8664) grad_norm 2.4150 (2.8280) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][120/625] eta 0:02:13 lr 0.000435 wd 0.0500 time 0.2630 (0.2650) data time 0.0008 (0.0047) model time 0.2621 (0.2587) loss 6.4920 (5.8650) grad_norm 1.7880 (2.7863) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][130/625] eta 0:02:11 lr 0.000435 wd 0.0500 time 0.2592 (0.2658) data time 0.0006 (0.0044) model time 0.2586 (0.2607) loss 5.4238 (5.8654) grad_norm 3.0926 (2.8171) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][140/625] eta 0:02:08 lr 0.000435 wd 0.0500 time 0.2598 (0.2651) data time 0.0008 (0.0042) model time 0.2590 (0.2600) loss 5.3356 (5.8497) grad_norm 3.1757 (2.8163) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][150/625] eta 0:02:05 lr 0.000435 wd 0.0500 time 0.2533 (0.2645) data time 0.0008 (0.0040) model time 0.2525 (0.2595) loss 6.6261 (5.8741) grad_norm 1.8616 (2.7849) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][160/625] eta 0:02:02 lr 0.000435 wd 0.0500 time 0.2537 (0.2640) data time 0.0008 (0.0038) model time 0.2529 (0.2592) loss 6.6353 (5.8794) grad_norm 3.1992 (2.7532) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][170/625] eta 0:02:00 lr 0.000435 wd 0.0500 time 0.3875 (0.2644) data time 0.0010 (0.0036) model time 0.3864 (0.2600) loss 5.2678 (5.8683) grad_norm 2.1803 (2.7533) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][180/625] eta 0:01:57 lr 0.000435 wd 0.0500 time 0.2528 (0.2647) data time 0.0008 (0.0035) model time 0.2520 (0.2608) loss 4.5035 (5.8394) grad_norm 2.3892 (2.7462) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][190/625] eta 0:01:55 lr 0.000434 wd 0.0500 time 0.3720 (0.2650) data time 0.0008 (0.0033) model time 0.3711 (0.2613) loss 5.4947 (5.8402) grad_norm 3.9905 (2.7553) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][200/625] eta 0:01:52 lr 0.000434 wd 0.0500 time 0.4544 (0.2655) data time 0.0011 (0.0032) model time 0.4532 (0.2622) loss 6.7199 (5.8448) grad_norm 4.1844 (2.7451) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][210/625] eta 0:01:50 lr 0.000434 wd 0.0500 time 0.2592 (0.2666) data time 0.0008 (0.0031) model time 0.2584 (0.2638) loss 4.5636 (5.8331) grad_norm 2.3842 (2.7158) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][220/625] eta 0:01:47 lr 0.000434 wd 0.0500 time 0.2575 (0.2661) data time 0.0008 (0.0030) model time 0.2567 (0.2633) loss 5.7047 (5.8337) grad_norm 1.7331 (2.6917) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][230/625] eta 0:01:44 lr 0.000434 wd 0.0500 time 0.2551 (0.2657) data time 0.0009 (0.0029) model time 0.2542 (0.2629) loss 6.0230 (5.8273) grad_norm 1.1953 (2.6743) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][240/625] eta 0:01:42 lr 0.000434 wd 0.0500 time 0.2574 (0.2654) data time 0.0008 (0.0028) model time 0.2565 (0.2626) loss 6.5116 (5.8311) grad_norm 2.4802 (2.6512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][250/625] eta 0:01:39 lr 0.000433 wd 0.0500 time 0.2555 (0.2650) data time 0.0006 (0.0028) model time 0.2549 (0.2622) loss 5.7642 (5.8238) grad_norm 2.7491 (2.6367) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][260/625] eta 0:01:36 lr 0.000433 wd 0.0500 time 0.2543 (0.2647) data time 0.0008 (0.0027) model time 0.2535 (0.2618) loss 6.6375 (5.8181) grad_norm 1.8838 (2.6183) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][270/625] eta 0:01:33 lr 0.000433 wd 0.0500 time 0.2579 (0.2644) data time 0.0013 (0.0026) model time 0.2566 (0.2616) loss 5.5439 (5.8192) grad_norm 2.1058 (2.6027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][280/625] eta 0:01:31 lr 0.000433 wd 0.0500 time 0.2534 (0.2640) data time 0.0009 (0.0026) model time 0.2525 (0.2612) loss 7.0408 (5.8271) grad_norm 1.9523 (2.5771) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][290/625] eta 0:01:28 lr 0.000433 wd 0.0500 time 0.2581 (0.2638) data time 0.0008 (0.0025) model time 0.2573 (0.2610) loss 5.7639 (5.8197) grad_norm 2.2395 (2.6285) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][300/625] eta 0:01:25 lr 0.000433 wd 0.0500 time 0.4562 (0.2642) data time 0.0008 (0.0025) model time 0.4554 (0.2616) loss 5.0111 (5.8268) grad_norm 2.1882 (2.6223) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][310/625] eta 0:01:23 lr 0.000433 wd 0.0500 time 0.2534 (0.2640) data time 0.0007 (0.0024) model time 0.2527 (0.2614) loss 5.6737 (5.8322) grad_norm 1.7997 (2.6026) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][320/625] eta 0:01:20 lr 0.000432 wd 0.0500 time 0.2625 (0.2643) data time 0.0008 (0.0024) model time 0.2617 (0.2619) loss 5.9598 (5.8353) grad_norm 3.6782 (2.6152) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][330/625] eta 0:01:17 lr 0.000432 wd 0.0500 time 0.2537 (0.2641) data time 0.0012 (0.0023) model time 0.2525 (0.2616) loss 5.4323 (5.8360) grad_norm 2.0005 (2.6282) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][340/625] eta 0:01:15 lr 0.000432 wd 0.0500 time 0.2559 (0.2644) data time 0.0009 (0.0023) model time 0.2550 (0.2621) loss 6.8132 (5.8472) grad_norm 2.0994 (2.6164) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][350/625] eta 0:01:12 lr 0.000432 wd 0.0500 time 0.2559 (0.2648) data time 0.0008 (0.0022) model time 0.2551 (0.2625) loss 5.0598 (5.8489) grad_norm 1.9028 (2.5968) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][360/625] eta 0:01:10 lr 0.000432 wd 0.0500 time 0.2544 (0.2650) data time 0.0008 (0.0022) model time 0.2536 (0.2629) loss 5.2643 (5.8470) grad_norm 2.3922 (2.5829) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][370/625] eta 0:01:07 lr 0.000432 wd 0.0500 time 0.2556 (0.2649) data time 0.0008 (0.0022) model time 0.2548 (0.2627) loss 6.1167 (5.8489) grad_norm 2.5006 (2.5642) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:40:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][380/625] eta 0:01:04 lr 0.000432 wd 0.0500 time 0.2549 (0.2646) data time 0.0008 (0.0021) model time 0.2542 (0.2625) loss 5.9397 (5.8473) grad_norm 1.9610 (2.5636) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][390/625] eta 0:01:02 lr 0.000431 wd 0.0500 time 0.2545 (0.2644) data time 0.0009 (0.0021) model time 0.2537 (0.2623) loss 4.9708 (5.8494) grad_norm 3.7867 (2.5695) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][400/625] eta 0:00:59 lr 0.000431 wd 0.0500 time 0.2660 (0.2642) data time 0.0006 (0.0021) model time 0.2653 (0.2621) loss 6.9888 (5.8408) grad_norm 2.7251 (2.5735) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][410/625] eta 0:00:56 lr 0.000431 wd 0.0500 time 0.2567 (0.2641) data time 0.0012 (0.0021) model time 0.2555 (0.2620) loss 7.0987 (5.8436) grad_norm 1.6674 (2.5678) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][420/625] eta 0:00:54 lr 0.000431 wd 0.0500 time 0.2545 (0.2639) data time 0.0010 (0.0020) model time 0.2535 (0.2618) loss 6.9080 (5.8426) grad_norm 2.5464 (2.5626) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][430/625] eta 0:00:51 lr 0.000431 wd 0.0500 time 0.2580 (0.2637) data time 0.0008 (0.0020) model time 0.2572 (0.2616) loss 6.0428 (5.8439) grad_norm 1.7272 (2.5553) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][440/625] eta 0:00:48 lr 0.000431 wd 0.0500 time 0.2535 (0.2636) data time 0.0007 (0.0020) model time 0.2528 (0.2615) loss 6.3703 (5.8419) grad_norm 1.5114 (2.5486) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][450/625] eta 0:00:46 lr 0.000431 wd 0.0500 time 0.2545 (0.2643) data time 0.0007 (0.0020) model time 0.2538 (0.2623) loss 6.3200 (5.8406) grad_norm 3.1084 (2.5396) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][460/625] eta 0:00:43 lr 0.000430 wd 0.0500 time 0.2589 (0.2641) data time 0.0009 (0.0019) model time 0.2580 (0.2621) loss 4.7522 (5.8330) grad_norm 1.8949 (2.5513) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][470/625] eta 0:00:40 lr 0.000430 wd 0.0500 time 0.2549 (0.2644) data time 0.0010 (0.0019) model time 0.2539 (0.2625) loss 5.9909 (5.8306) grad_norm 1.7281 (2.5460) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][480/625] eta 0:00:38 lr 0.000430 wd 0.0500 time 0.2584 (0.2646) data time 0.0006 (0.0019) model time 0.2578 (0.2628) loss 5.1377 (5.8241) grad_norm 2.2378 (2.5290) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][490/625] eta 0:00:35 lr 0.000430 wd 0.0500 time 0.2538 (0.2645) data time 0.0007 (0.0019) model time 0.2531 (0.2626) loss 5.2460 (5.8271) grad_norm 1.6201 (2.5159) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][500/625] eta 0:00:33 lr 0.000430 wd 0.0500 time 0.2552 (0.2643) data time 0.0011 (0.0018) model time 0.2541 (0.2624) loss 6.0665 (5.8241) grad_norm 2.1666 (2.5026) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][510/625] eta 0:00:30 lr 0.000430 wd 0.0500 time 0.2516 (0.2641) data time 0.0009 (0.0018) model time 0.2507 (0.2623) loss 5.0137 (5.8245) grad_norm 5.5477 (2.4980) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][520/625] eta 0:00:27 lr 0.000430 wd 0.0500 time 0.2525 (0.2640) data time 0.0008 (0.0018) model time 0.2516 (0.2622) loss 5.6197 (5.8222) grad_norm 3.6754 (2.4910) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][530/625] eta 0:00:25 lr 0.000429 wd 0.0500 time 0.2572 (0.2639) data time 0.0006 (0.0018) model time 0.2566 (0.2621) loss 6.1082 (5.8229) grad_norm 1.2828 (2.4831) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][540/625] eta 0:00:22 lr 0.000429 wd 0.0500 time 0.2576 (0.2638) data time 0.0006 (0.0018) model time 0.2570 (0.2620) loss 6.8228 (5.8273) grad_norm 2.4209 (2.4759) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][550/625] eta 0:00:19 lr 0.000429 wd 0.0500 time 0.2573 (0.2637) data time 0.0007 (0.0018) model time 0.2565 (0.2619) loss 5.5246 (5.8316) grad_norm 4.9539 (2.4786) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][560/625] eta 0:00:17 lr 0.000429 wd 0.0500 time 0.2566 (0.2636) data time 0.0008 (0.0018) model time 0.2558 (0.2618) loss 5.6381 (5.8360) grad_norm 1.4741 (2.4713) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][570/625] eta 0:00:14 lr 0.000429 wd 0.0500 time 0.2587 (0.2635) data time 0.0009 (0.0017) model time 0.2578 (0.2617) loss 6.0345 (5.8393) grad_norm 1.5580 (2.4963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][580/625] eta 0:00:11 lr 0.000429 wd 0.0500 time 0.2586 (0.2633) data time 0.0008 (0.0017) model time 0.2578 (0.2615) loss 6.7310 (5.8437) grad_norm 2.5566 (2.5055) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][590/625] eta 0:00:09 lr 0.000429 wd 0.0500 time 0.2582 (0.2632) data time 0.0011 (0.0017) model time 0.2571 (0.2614) loss 5.2242 (5.8420) grad_norm 1.3600 (2.4985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][600/625] eta 0:00:06 lr 0.000428 wd 0.0500 time 0.2573 (0.2631) data time 0.0011 (0.0017) model time 0.2562 (0.2613) loss 6.8462 (5.8434) grad_norm 2.1410 (2.4904) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][610/625] eta 0:00:03 lr 0.000428 wd 0.0500 time 0.2538 (0.2630) data time 0.0004 (0.0017) model time 0.2534 (0.2612) loss 5.7750 (5.8413) grad_norm 3.4600 (2.4857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:41:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][620/625] eta 0:00:01 lr 0.000428 wd 0.0500 time 0.2541 (0.2629) data time 0.0004 (0.0017) model time 0.2537 (0.2611) loss 4.4993 (5.8359) grad_norm 1.9528 (2.4835) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 215 training takes 0:02:44 [2024-08-04 07:42:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:42:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.611 (0.611) Loss 0.6240 (0.6240) Acc@1 89.502 (89.502) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 0.9639 (0.7579) Acc@1 80.371 (86.035) Acc@5 95.752 (97.563) Mem 9655MB [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.0889 (0.8782) Acc@1 76.025 (82.917) Acc@5 94.678 (96.370) Mem 9655MB [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.548 Acc@5 96.363 [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.55% [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:42:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.586 (0.586) Loss 0.5815 (0.5815) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.107) Loss 0.9185 (0.7136) Acc@1 80.859 (86.435) Acc@5 95.898 (97.652) Mem 9655MB [2024-08-04 07:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0332 (0.8385) Acc@1 76.611 (83.117) Acc@5 95.264 (96.359) Mem 9655MB [2024-08-04 07:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.802 Acc@5 96.369 [2024-08-04 07:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:42:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.80% [2024-08-04 07:42:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:42:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][0/625] eta 0:08:02 lr 0.000428 wd 0.0500 time 0.7723 (0.7723) data time 0.4639 (0.4639) model time 0.0000 (0.0000) loss 6.3396 (6.3396) grad_norm 3.7329 (3.7329) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][10/625] eta 0:03:17 lr 0.000428 wd 0.0500 time 0.2593 (0.3213) data time 0.0007 (0.0431) model time 0.0000 (0.0000) loss 6.8350 (6.1789) grad_norm 1.9813 (2.4920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][20/625] eta 0:02:55 lr 0.000428 wd 0.0500 time 0.2553 (0.2900) data time 0.0009 (0.0230) model time 0.0000 (0.0000) loss 5.1583 (6.0402) grad_norm 1.6209 (2.2902) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][30/625] eta 0:02:53 lr 0.000428 wd 0.0500 time 0.2578 (0.2915) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 6.5126 (6.0390) grad_norm 1.9177 (2.1900) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][40/625] eta 0:02:48 lr 0.000428 wd 0.0500 time 0.2580 (0.2880) data time 0.0007 (0.0122) model time 0.0000 (0.0000) loss 6.1377 (6.0545) grad_norm 4.2797 (2.5644) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][50/625] eta 0:02:42 lr 0.000427 wd 0.0500 time 0.2565 (0.2818) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 5.4751 (5.9150) grad_norm 2.2050 (2.6063) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][60/625] eta 0:02:36 lr 0.000427 wd 0.0500 time 0.2563 (0.2777) data time 0.0012 (0.0085) model time 0.2550 (0.2558) loss 5.8093 (5.8711) grad_norm 1.9112 (2.5700) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][70/625] eta 0:02:32 lr 0.000427 wd 0.0500 time 0.2563 (0.2748) data time 0.0007 (0.0075) model time 0.2556 (0.2560) loss 5.3207 (5.8260) grad_norm 1.4756 (2.5419) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][80/625] eta 0:02:28 lr 0.000427 wd 0.0500 time 0.2563 (0.2724) data time 0.0010 (0.0067) model time 0.2553 (0.2554) loss 5.9428 (5.8243) grad_norm 2.2698 (2.5197) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][90/625] eta 0:02:24 lr 0.000427 wd 0.0500 time 0.2546 (0.2704) data time 0.0011 (0.0060) model time 0.2535 (0.2548) loss 6.1176 (5.8480) grad_norm 1.7412 (2.4524) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][100/625] eta 0:02:21 lr 0.000427 wd 0.0500 time 0.2544 (0.2690) data time 0.0008 (0.0055) model time 0.2536 (0.2551) loss 6.0180 (5.8412) grad_norm 2.7565 (2.4220) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][110/625] eta 0:02:17 lr 0.000427 wd 0.0500 time 0.2578 (0.2678) data time 0.0009 (0.0051) model time 0.2569 (0.2550) loss 6.5640 (5.8388) grad_norm 1.5241 (2.4283) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][120/625] eta 0:02:14 lr 0.000426 wd 0.0500 time 0.2594 (0.2671) data time 0.0009 (0.0048) model time 0.2586 (0.2554) loss 4.3386 (5.8327) grad_norm 3.9544 (2.5869) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][130/625] eta 0:02:11 lr 0.000426 wd 0.0500 time 0.2585 (0.2662) data time 0.0009 (0.0045) model time 0.2576 (0.2553) loss 5.1861 (5.8060) grad_norm 3.4054 (2.5854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][140/625] eta 0:02:08 lr 0.000426 wd 0.0500 time 0.2591 (0.2656) data time 0.0009 (0.0042) model time 0.2582 (0.2554) loss 6.5882 (5.7961) grad_norm 2.5830 (2.5985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][150/625] eta 0:02:06 lr 0.000426 wd 0.0500 time 0.2548 (0.2673) data time 0.0006 (0.0040) model time 0.2542 (0.2589) loss 5.8671 (5.8008) grad_norm 2.2292 (2.5696) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][160/625] eta 0:02:03 lr 0.000426 wd 0.0500 time 0.2560 (0.2666) data time 0.0009 (0.0038) model time 0.2551 (0.2586) loss 5.2743 (5.8140) grad_norm 2.4509 (2.5513) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][170/625] eta 0:02:01 lr 0.000426 wd 0.0500 time 0.2570 (0.2660) data time 0.0006 (0.0036) model time 0.2564 (0.2584) loss 5.6734 (5.8166) grad_norm 2.2169 (2.5649) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][180/625] eta 0:01:58 lr 0.000426 wd 0.0500 time 0.2506 (0.2655) data time 0.0010 (0.0035) model time 0.2495 (0.2582) loss 5.5652 (5.8251) grad_norm 2.4252 (2.5974) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][190/625] eta 0:01:55 lr 0.000425 wd 0.0500 time 0.2546 (0.2650) data time 0.0007 (0.0034) model time 0.2540 (0.2580) loss 6.0512 (5.8308) grad_norm 1.2846 (2.5819) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:42:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][200/625] eta 0:01:52 lr 0.000425 wd 0.0500 time 0.2534 (0.2646) data time 0.0011 (0.0032) model time 0.2523 (0.2578) loss 6.5665 (5.8474) grad_norm 1.6232 (2.6060) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][210/625] eta 0:01:49 lr 0.000425 wd 0.0500 time 0.2542 (0.2642) data time 0.0008 (0.0031) model time 0.2534 (0.2577) loss 5.5873 (5.8586) grad_norm 2.4651 (2.6158) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][220/625] eta 0:01:46 lr 0.000425 wd 0.0500 time 0.2555 (0.2639) data time 0.0009 (0.0031) model time 0.2546 (0.2576) loss 4.5741 (5.8435) grad_norm 1.8337 (2.5838) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][230/625] eta 0:01:44 lr 0.000425 wd 0.0500 time 0.2533 (0.2636) data time 0.0009 (0.0030) model time 0.2524 (0.2575) loss 5.8973 (5.8455) grad_norm 1.4326 (2.5717) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][240/625] eta 0:01:41 lr 0.000425 wd 0.0500 time 0.2597 (0.2633) data time 0.0006 (0.0029) model time 0.2591 (0.2574) loss 5.5019 (5.8328) grad_norm 4.1678 (2.5581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][250/625] eta 0:01:38 lr 0.000425 wd 0.0500 time 0.2603 (0.2631) data time 0.0008 (0.0028) model time 0.2595 (0.2573) loss 5.5024 (5.8407) grad_norm 2.0557 (2.5475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][260/625] eta 0:01:36 lr 0.000424 wd 0.0500 time 0.2676 (0.2635) data time 0.0009 (0.0027) model time 0.2666 (0.2581) loss 6.4528 (5.8399) grad_norm 2.0253 (2.5238) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][270/625] eta 0:01:33 lr 0.000424 wd 0.0500 time 0.4658 (0.2645) data time 0.0009 (0.0027) model time 0.4649 (0.2595) loss 5.5579 (5.8268) grad_norm 1.9043 (2.5088) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][280/625] eta 0:01:31 lr 0.000424 wd 0.0500 time 0.2614 (0.2650) data time 0.0008 (0.0026) model time 0.2606 (0.2603) loss 5.3276 (5.8271) grad_norm 1.8907 (2.5218) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][290/625] eta 0:01:28 lr 0.000424 wd 0.0500 time 0.2561 (0.2646) data time 0.0010 (0.0025) model time 0.2552 (0.2600) loss 5.8565 (5.8164) grad_norm 6.3480 (2.5531) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][300/625] eta 0:01:25 lr 0.000424 wd 0.0500 time 0.2534 (0.2644) data time 0.0010 (0.0025) model time 0.2525 (0.2599) loss 5.5519 (5.8160) grad_norm 3.0515 (2.5507) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][310/625] eta 0:01:23 lr 0.000424 wd 0.0500 time 0.2561 (0.2641) data time 0.0006 (0.0024) model time 0.2555 (0.2597) loss 6.0323 (5.8078) grad_norm 3.0145 (2.5922) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][320/625] eta 0:01:20 lr 0.000424 wd 0.0500 time 0.2582 (0.2638) data time 0.0010 (0.0024) model time 0.2573 (0.2595) loss 5.5861 (5.8168) grad_norm 1.6807 (2.5856) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][330/625] eta 0:01:18 lr 0.000423 wd 0.0500 time 0.4587 (0.2648) data time 0.0007 (0.0024) model time 0.4579 (0.2608) loss 6.0755 (5.8202) grad_norm 1.6107 (2.5652) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][340/625] eta 0:01:15 lr 0.000423 wd 0.0500 time 0.2528 (0.2646) data time 0.0008 (0.0023) model time 0.2521 (0.2606) loss 5.7203 (5.8213) grad_norm 2.1830 (2.5472) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][350/625] eta 0:01:12 lr 0.000423 wd 0.0500 time 0.2580 (0.2643) data time 0.0007 (0.0023) model time 0.2573 (0.2604) loss 6.2338 (5.8275) grad_norm 1.2537 (2.5256) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][360/625] eta 0:01:10 lr 0.000423 wd 0.0500 time 0.2571 (0.2644) data time 0.0009 (0.0022) model time 0.2561 (0.2606) loss 5.9766 (5.8291) grad_norm 1.8353 (2.5073) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][370/625] eta 0:01:07 lr 0.000423 wd 0.0500 time 0.2545 (0.2642) data time 0.0008 (0.0022) model time 0.2537 (0.2605) loss 6.4182 (5.8389) grad_norm 2.0729 (2.5060) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][380/625] eta 0:01:04 lr 0.000423 wd 0.0500 time 0.2539 (0.2640) data time 0.0009 (0.0022) model time 0.2530 (0.2603) loss 5.6207 (5.8397) grad_norm 1.5878 (2.5061) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][390/625] eta 0:01:02 lr 0.000422 wd 0.0500 time 0.2580 (0.2638) data time 0.0006 (0.0021) model time 0.2574 (0.2602) loss 4.5264 (5.8355) grad_norm 2.1580 (2.4963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][400/625] eta 0:00:59 lr 0.000422 wd 0.0500 time 0.2507 (0.2637) data time 0.0008 (0.0021) model time 0.2499 (0.2601) loss 5.3138 (5.8348) grad_norm 2.6527 (2.4988) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][410/625] eta 0:00:56 lr 0.000422 wd 0.0500 time 0.4525 (0.2641) data time 0.0007 (0.0021) model time 0.4519 (0.2606) loss 4.5921 (5.8256) grad_norm 2.5044 (2.5109) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:43:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][420/625] eta 0:00:54 lr 0.000422 wd 0.0500 time 0.2539 (0.2643) data time 0.0008 (0.0020) model time 0.2531 (0.2610) loss 6.0365 (5.8243) grad_norm 2.3606 (2.5127) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][430/625] eta 0:00:51 lr 0.000422 wd 0.0500 time 0.4698 (0.2647) data time 0.0006 (0.0020) model time 0.4691 (0.2614) loss 4.6951 (5.8200) grad_norm 2.1493 (2.5129) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][440/625] eta 0:00:49 lr 0.000422 wd 0.0500 time 0.2586 (0.2649) data time 0.0007 (0.0020) model time 0.2579 (0.2617) loss 5.4470 (5.8150) grad_norm 4.3247 (2.5568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][450/625] eta 0:00:46 lr 0.000422 wd 0.0500 time 0.2631 (0.2656) data time 0.0006 (0.0020) model time 0.2624 (0.2625) loss 5.5613 (5.8040) grad_norm 1.9305 (2.5569) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][460/625] eta 0:00:43 lr 0.000421 wd 0.0500 time 0.2509 (0.2658) data time 0.0012 (0.0020) model time 0.2497 (0.2629) loss 5.4668 (5.8042) grad_norm 1.9860 (2.5427) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][470/625] eta 0:00:41 lr 0.000421 wd 0.0500 time 0.2526 (0.2656) data time 0.0008 (0.0019) model time 0.2518 (0.2627) loss 5.0022 (5.8061) grad_norm 2.5066 (2.5346) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][480/625] eta 0:00:38 lr 0.000421 wd 0.0500 time 0.2529 (0.2654) data time 0.0009 (0.0019) model time 0.2520 (0.2625) loss 6.0629 (5.8133) grad_norm 2.2286 (2.5193) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][490/625] eta 0:00:35 lr 0.000421 wd 0.0500 time 0.2527 (0.2652) data time 0.0011 (0.0019) model time 0.2516 (0.2623) loss 5.4867 (5.8097) grad_norm 1.3720 (2.5142) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][500/625] eta 0:00:33 lr 0.000421 wd 0.0500 time 0.2551 (0.2651) data time 0.0011 (0.0019) model time 0.2540 (0.2622) loss 7.3943 (5.8101) grad_norm 1.8343 (2.5113) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][510/625] eta 0:00:30 lr 0.000421 wd 0.0500 time 0.2555 (0.2649) data time 0.0006 (0.0019) model time 0.2548 (0.2621) loss 5.1349 (5.8153) grad_norm 2.5079 (2.5008) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][520/625] eta 0:00:27 lr 0.000421 wd 0.0500 time 0.2603 (0.2647) data time 0.0012 (0.0019) model time 0.2591 (0.2619) loss 5.0487 (5.8146) grad_norm 1.4106 (2.5012) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][530/625] eta 0:00:25 lr 0.000420 wd 0.0500 time 0.2544 (0.2646) data time 0.0012 (0.0018) model time 0.2533 (0.2618) loss 5.8141 (5.8142) grad_norm 2.0043 (2.5027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][540/625] eta 0:00:22 lr 0.000420 wd 0.0500 time 0.2551 (0.2644) data time 0.0006 (0.0018) model time 0.2545 (0.2617) loss 6.5031 (5.8187) grad_norm 2.0219 (2.5000) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][550/625] eta 0:00:19 lr 0.000420 wd 0.0500 time 0.2556 (0.2643) data time 0.0010 (0.0018) model time 0.2546 (0.2615) loss 6.1441 (5.8226) grad_norm 2.0908 (2.4901) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][560/625] eta 0:00:17 lr 0.000420 wd 0.0500 time 0.2602 (0.2642) data time 0.0009 (0.0018) model time 0.2593 (0.2614) loss 6.1903 (5.8201) grad_norm 1.8051 (2.4912) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][570/625] eta 0:00:14 lr 0.000420 wd 0.0500 time 0.2552 (0.2640) data time 0.0008 (0.0018) model time 0.2544 (0.2613) loss 5.5201 (5.8152) grad_norm 3.6091 (2.5067) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][580/625] eta 0:00:11 lr 0.000420 wd 0.0500 time 0.2577 (0.2639) data time 0.0007 (0.0018) model time 0.2570 (0.2612) loss 5.4004 (5.8145) grad_norm 1.9126 (2.5047) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][590/625] eta 0:00:09 lr 0.000420 wd 0.0500 time 0.2566 (0.2637) data time 0.0008 (0.0017) model time 0.2557 (0.2611) loss 6.6700 (5.8130) grad_norm 2.2831 (2.4988) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][600/625] eta 0:00:06 lr 0.000419 wd 0.0500 time 0.2563 (0.2636) data time 0.0008 (0.0017) model time 0.2554 (0.2610) loss 4.7297 (5.8090) grad_norm 2.2084 (2.4940) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][610/625] eta 0:00:03 lr 0.000419 wd 0.0500 time 0.2531 (0.2635) data time 0.0004 (0.0017) model time 0.2527 (0.2609) loss 6.0002 (5.8071) grad_norm 2.2061 (2.4864) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][620/625] eta 0:00:01 lr 0.000419 wd 0.0500 time 0.2535 (0.2634) data time 0.0004 (0.0017) model time 0.2531 (0.2607) loss 6.3685 (5.8087) grad_norm 1.7164 (2.4783) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 216 training takes 0:02:44 [2024-08-04 07:44:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:44:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.543 (0.543) Loss 0.6030 (0.6030) Acc@1 89.697 (89.697) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 07:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9482 (0.7288) Acc@1 80.225 (86.173) Acc@5 95.752 (97.585) Mem 9655MB [2024-08-04 07:44:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0371 (0.8514) Acc@1 76.904 (82.875) Acc@5 94.727 (96.331) Mem 9655MB [2024-08-04 07:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.566 Acc@5 96.333 [2024-08-04 07:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.57% [2024-08-04 07:44:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:44:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.634 (0.634) Loss 0.5815 (0.5815) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.110) Loss 0.9185 (0.7135) Acc@1 80.859 (86.448) Acc@5 95.801 (97.643) Mem 9655MB [2024-08-04 07:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.0332 (0.8382) Acc@1 76.562 (83.129) Acc@5 95.264 (96.359) Mem 9655MB [2024-08-04 07:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.819 Acc@5 96.373 [2024-08-04 07:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:44:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.82% [2024-08-04 07:44:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:44:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][0/625] eta 0:08:04 lr 0.000419 wd 0.0500 time 0.7756 (0.7756) data time 0.4742 (0.4742) model time 0.0000 (0.0000) loss 5.2844 (5.2844) grad_norm 2.8199 (2.8199) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][10/625] eta 0:03:08 lr 0.000419 wd 0.0500 time 0.2550 (0.3068) data time 0.0008 (0.0440) model time 0.0000 (0.0000) loss 6.0937 (5.5266) grad_norm 1.6415 (2.2307) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][20/625] eta 0:02:57 lr 0.000419 wd 0.0500 time 0.2572 (0.2927) data time 0.0008 (0.0235) model time 0.0000 (0.0000) loss 5.4097 (5.8258) grad_norm 1.7109 (2.1166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][30/625] eta 0:02:51 lr 0.000419 wd 0.0500 time 0.2536 (0.2875) data time 0.0007 (0.0163) model time 0.0000 (0.0000) loss 5.0784 (5.8087) grad_norm 1.5547 (2.1430) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][40/625] eta 0:02:46 lr 0.000419 wd 0.0500 time 0.2567 (0.2842) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 5.5674 (5.8155) grad_norm 1.9214 (2.2101) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][50/625] eta 0:02:40 lr 0.000418 wd 0.0500 time 0.2534 (0.2786) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 5.5164 (5.8397) grad_norm 1.5183 (2.0995) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][60/625] eta 0:02:35 lr 0.000418 wd 0.0500 time 0.2548 (0.2749) data time 0.0009 (0.0087) model time 0.2539 (0.2548) loss 5.9998 (5.7539) grad_norm 5.5313 (2.1374) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][70/625] eta 0:02:31 lr 0.000418 wd 0.0500 time 0.2539 (0.2722) data time 0.0010 (0.0077) model time 0.2528 (0.2548) loss 5.2831 (5.7008) grad_norm 2.9895 (2.2723) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][80/625] eta 0:02:27 lr 0.000418 wd 0.0500 time 0.2605 (0.2701) data time 0.0007 (0.0068) model time 0.2598 (0.2546) loss 5.9840 (5.7381) grad_norm 2.3095 (2.3036) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][90/625] eta 0:02:23 lr 0.000418 wd 0.0500 time 0.2544 (0.2685) data time 0.0009 (0.0062) model time 0.2536 (0.2545) loss 5.9139 (5.7614) grad_norm 1.8342 (2.2924) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][100/625] eta 0:02:21 lr 0.000418 wd 0.0500 time 0.2537 (0.2692) data time 0.0008 (0.0057) model time 0.2529 (0.2586) loss 6.5362 (5.7723) grad_norm 1.6166 (2.2516) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][110/625] eta 0:02:19 lr 0.000418 wd 0.0500 time 0.2566 (0.2710) data time 0.0010 (0.0052) model time 0.2556 (0.2636) loss 5.2198 (5.7743) grad_norm 5.9581 (2.2694) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][120/625] eta 0:02:16 lr 0.000417 wd 0.0500 time 0.2535 (0.2698) data time 0.0010 (0.0049) model time 0.2525 (0.2624) loss 6.4689 (5.7620) grad_norm 1.6739 (2.2821) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][130/625] eta 0:02:13 lr 0.000417 wd 0.0500 time 0.2563 (0.2688) data time 0.0006 (0.0046) model time 0.2557 (0.2616) loss 4.6843 (5.7433) grad_norm 3.0757 (2.2947) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 07:45:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][140/625] eta 0:02:11 lr 0.000417 wd 0.0500 time 0.4670 (0.2707) data time 0.0009 (0.0043) model time 0.4661 (0.2652) loss 6.2065 (5.7511) grad_norm 2.1384 (2.3027) loss_scale 512.0000 (270.5248) mem 9655MB [2024-08-04 07:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][150/625] eta 0:02:08 lr 0.000417 wd 0.0500 time 0.2586 (0.2698) data time 0.0014 (0.0041) model time 0.2572 (0.2643) loss 6.4597 (5.7567) grad_norm 2.3786 (2.3582) loss_scale 512.0000 (286.5166) mem 9655MB [2024-08-04 07:45:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][160/625] eta 0:02:05 lr 0.000417 wd 0.0500 time 0.2563 (0.2690) data time 0.0009 (0.0039) model time 0.2554 (0.2636) loss 5.5321 (5.7593) grad_norm 4.0436 (2.4034) loss_scale 512.0000 (300.5217) mem 9655MB [2024-08-04 07:45:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][170/625] eta 0:02:03 lr 0.000417 wd 0.0500 time 0.2549 (0.2706) data time 0.0008 (0.0037) model time 0.2541 (0.2662) loss 6.3031 (5.7611) grad_norm 3.4095 (2.4369) loss_scale 512.0000 (312.8889) mem 9655MB [2024-08-04 07:45:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][180/625] eta 0:02:00 lr 0.000417 wd 0.0500 time 0.2573 (0.2698) data time 0.0005 (0.0036) model time 0.2567 (0.2654) loss 4.8664 (5.7475) grad_norm 1.7485 (2.4579) loss_scale 512.0000 (323.8895) mem 9655MB [2024-08-04 07:45:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][190/625] eta 0:01:57 lr 0.000416 wd 0.0500 time 0.2555 (0.2691) data time 0.0007 (0.0034) model time 0.2548 (0.2647) loss 5.5331 (5.7539) grad_norm 1.4436 (2.4580) loss_scale 512.0000 (333.7382) mem 9655MB [2024-08-04 07:45:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][200/625] eta 0:01:54 lr 0.000416 wd 0.0500 time 0.2588 (0.2684) data time 0.0009 (0.0033) model time 0.2579 (0.2640) loss 4.9360 (5.7665) grad_norm 2.4620 (2.4846) loss_scale 512.0000 (342.6070) mem 9655MB [2024-08-04 07:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][210/625] eta 0:01:51 lr 0.000416 wd 0.0500 time 0.2571 (0.2678) data time 0.0008 (0.0032) model time 0.2563 (0.2634) loss 5.3597 (5.7649) grad_norm 2.3410 (2.4842) loss_scale 512.0000 (350.6351) mem 9655MB [2024-08-04 07:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][220/625] eta 0:01:48 lr 0.000416 wd 0.0500 time 0.2585 (0.2673) data time 0.0006 (0.0031) model time 0.2579 (0.2629) loss 5.8180 (5.7747) grad_norm 2.2375 (2.5192) loss_scale 512.0000 (357.9367) mem 9655MB [2024-08-04 07:45:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][230/625] eta 0:01:45 lr 0.000416 wd 0.0500 time 0.2557 (0.2668) data time 0.0006 (0.0030) model time 0.2550 (0.2625) loss 4.7792 (5.7682) grad_norm 2.4782 (2.5037) loss_scale 512.0000 (364.6061) mem 9655MB [2024-08-04 07:46:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][240/625] eta 0:01:42 lr 0.000416 wd 0.0500 time 0.2583 (0.2664) data time 0.0007 (0.0029) model time 0.2576 (0.2621) loss 5.7366 (5.7680) grad_norm 2.4524 (2.4827) loss_scale 512.0000 (370.7220) mem 9655MB [2024-08-04 07:46:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][250/625] eta 0:01:39 lr 0.000416 wd 0.0500 time 0.2610 (0.2660) data time 0.0006 (0.0028) model time 0.2604 (0.2618) loss 4.5387 (5.7587) grad_norm 2.4623 (2.4623) loss_scale 512.0000 (376.3506) mem 9655MB [2024-08-04 07:46:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][260/625] eta 0:01:36 lr 0.000415 wd 0.0500 time 0.2578 (0.2656) data time 0.0008 (0.0028) model time 0.2570 (0.2615) loss 6.4923 (5.7616) grad_norm 1.9406 (2.4465) loss_scale 512.0000 (381.5479) mem 9655MB [2024-08-04 07:46:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][270/625] eta 0:01:34 lr 0.000415 wd 0.0500 time 0.2566 (0.2653) data time 0.0006 (0.0027) model time 0.2560 (0.2613) loss 5.3700 (5.7706) grad_norm 1.5502 (2.4465) loss_scale 512.0000 (386.3616) mem 9655MB [2024-08-04 07:46:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][280/625] eta 0:01:31 lr 0.000415 wd 0.0500 time 0.2604 (0.2650) data time 0.0010 (0.0026) model time 0.2594 (0.2611) loss 6.5075 (5.7808) grad_norm 2.2541 (2.4314) loss_scale 512.0000 (390.8327) mem 9655MB [2024-08-04 07:46:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][290/625] eta 0:01:28 lr 0.000415 wd 0.0500 time 0.2546 (0.2647) data time 0.0009 (0.0026) model time 0.2537 (0.2608) loss 5.9187 (5.7738) grad_norm 2.3517 (2.4351) loss_scale 512.0000 (394.9966) mem 9655MB [2024-08-04 07:46:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][300/625] eta 0:01:25 lr 0.000415 wd 0.0500 time 0.2538 (0.2644) data time 0.0007 (0.0025) model time 0.2530 (0.2605) loss 6.7164 (5.7758) grad_norm 3.1650 (2.4255) loss_scale 512.0000 (398.8837) mem 9655MB [2024-08-04 07:46:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][310/625] eta 0:01:23 lr 0.000415 wd 0.0500 time 0.2551 (0.2641) data time 0.0009 (0.0025) model time 0.2542 (0.2603) loss 5.2434 (5.7740) grad_norm 1.8900 (2.4091) loss_scale 512.0000 (402.5209) mem 9655MB [2024-08-04 07:46:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][320/625] eta 0:01:20 lr 0.000415 wd 0.0500 time 0.2588 (0.2638) data time 0.0006 (0.0024) model time 0.2582 (0.2601) loss 6.4813 (5.7866) grad_norm 1.5687 (2.4522) loss_scale 512.0000 (405.9315) mem 9655MB [2024-08-04 07:46:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][330/625] eta 0:01:17 lr 0.000414 wd 0.0500 time 0.2557 (0.2636) data time 0.0007 (0.0024) model time 0.2550 (0.2599) loss 6.4325 (5.7850) grad_norm 1.4867 (2.4593) loss_scale 512.0000 (409.1360) mem 9655MB [2024-08-04 07:46:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][340/625] eta 0:01:15 lr 0.000414 wd 0.0500 time 0.2578 (0.2644) data time 0.0009 (0.0023) model time 0.2570 (0.2610) loss 6.1113 (5.7762) grad_norm 2.3164 (2.4781) loss_scale 512.0000 (412.1525) mem 9655MB [2024-08-04 07:46:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][350/625] eta 0:01:12 lr 0.000414 wd 0.0500 time 0.2569 (0.2642) data time 0.0008 (0.0023) model time 0.2561 (0.2608) loss 6.0147 (5.7716) grad_norm 1.7666 (2.4735) loss_scale 512.0000 (414.9972) mem 9655MB [2024-08-04 07:46:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][360/625] eta 0:01:10 lr 0.000414 wd 0.0500 time 0.2577 (0.2645) data time 0.0008 (0.0023) model time 0.2569 (0.2612) loss 5.6824 (5.7712) grad_norm 2.2825 (2.4623) loss_scale 512.0000 (417.6842) mem 9655MB [2024-08-04 07:46:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][370/625] eta 0:01:07 lr 0.000414 wd 0.0500 time 0.2532 (0.2642) data time 0.0008 (0.0022) model time 0.2524 (0.2610) loss 4.8065 (5.7654) grad_norm 2.3740 (2.4532) loss_scale 512.0000 (420.2264) mem 9655MB [2024-08-04 07:46:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][380/625] eta 0:01:04 lr 0.000414 wd 0.0500 time 0.2561 (0.2640) data time 0.0009 (0.0022) model time 0.2553 (0.2608) loss 6.0018 (5.7646) grad_norm 1.3392 (2.4505) loss_scale 512.0000 (422.6352) mem 9655MB [2024-08-04 07:46:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][390/625] eta 0:01:01 lr 0.000414 wd 0.0500 time 0.2573 (0.2638) data time 0.0014 (0.0022) model time 0.2559 (0.2607) loss 5.1404 (5.7654) grad_norm 3.8586 (2.4640) loss_scale 512.0000 (424.9207) mem 9655MB [2024-08-04 07:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][400/625] eta 0:00:59 lr 0.000413 wd 0.0500 time 0.2664 (0.2637) data time 0.0006 (0.0021) model time 0.2659 (0.2606) loss 5.0061 (5.7602) grad_norm 1.4817 (2.4824) loss_scale 512.0000 (427.0923) mem 9655MB [2024-08-04 07:46:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][410/625] eta 0:00:56 lr 0.000413 wd 0.0500 time 0.2567 (0.2636) data time 0.0011 (0.0021) model time 0.2556 (0.2605) loss 4.5529 (5.7569) grad_norm 2.5953 (2.4886) loss_scale 512.0000 (429.1582) mem 9655MB [2024-08-04 07:46:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][420/625] eta 0:00:54 lr 0.000413 wd 0.0500 time 0.2528 (0.2643) data time 0.0007 (0.0021) model time 0.2520 (0.2614) loss 5.6909 (5.7576) grad_norm 2.2374 (2.4771) loss_scale 512.0000 (431.1259) mem 9655MB [2024-08-04 07:46:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][430/625] eta 0:00:51 lr 0.000413 wd 0.0500 time 0.2558 (0.2641) data time 0.0009 (0.0020) model time 0.2548 (0.2613) loss 6.9272 (5.7552) grad_norm 1.9628 (2.5101) loss_scale 512.0000 (433.0023) mem 9655MB [2024-08-04 07:46:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][440/625] eta 0:00:48 lr 0.000413 wd 0.0500 time 0.2589 (0.2640) data time 0.0006 (0.0020) model time 0.2582 (0.2611) loss 6.3835 (5.7596) grad_norm 3.0732 (2.5014) loss_scale 512.0000 (434.7937) mem 9655MB [2024-08-04 07:46:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][450/625] eta 0:00:46 lr 0.000413 wd 0.0500 time 0.2590 (0.2638) data time 0.0008 (0.0020) model time 0.2582 (0.2610) loss 5.0704 (5.7559) grad_norm 1.9380 (2.4917) loss_scale 512.0000 (436.5055) mem 9655MB [2024-08-04 07:46:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][460/625] eta 0:00:43 lr 0.000413 wd 0.0500 time 0.2781 (0.2643) data time 0.0009 (0.0020) model time 0.2772 (0.2616) loss 5.6775 (5.7605) grad_norm 2.2929 (2.4992) loss_scale 512.0000 (438.1432) mem 9655MB [2024-08-04 07:47:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][470/625] eta 0:00:40 lr 0.000412 wd 0.0500 time 0.2563 (0.2641) data time 0.0006 (0.0020) model time 0.2557 (0.2614) loss 5.6348 (5.7617) grad_norm 1.9792 (2.4936) loss_scale 512.0000 (439.7113) mem 9655MB [2024-08-04 07:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][480/625] eta 0:00:38 lr 0.000412 wd 0.0500 time 0.2633 (0.2640) data time 0.0008 (0.0019) model time 0.2625 (0.2613) loss 4.8652 (5.7610) grad_norm 1.6683 (2.4949) loss_scale 512.0000 (441.2141) mem 9655MB [2024-08-04 07:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][490/625] eta 0:00:35 lr 0.000412 wd 0.0500 time 0.2600 (0.2642) data time 0.0008 (0.0019) model time 0.2591 (0.2616) loss 6.2457 (5.7648) grad_norm 3.6515 (2.4990) loss_scale 512.0000 (442.6558) mem 9655MB [2024-08-04 07:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][500/625] eta 0:00:33 lr 0.000412 wd 0.0500 time 0.2558 (0.2640) data time 0.0011 (0.0019) model time 0.2546 (0.2614) loss 5.8890 (5.7697) grad_norm 1.7628 (2.5008) loss_scale 512.0000 (444.0399) mem 9655MB [2024-08-04 07:47:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][510/625] eta 0:00:30 lr 0.000412 wd 0.0500 time 0.2528 (0.2639) data time 0.0006 (0.0019) model time 0.2522 (0.2613) loss 5.6381 (5.7675) grad_norm 1.8011 (2.4956) loss_scale 512.0000 (445.3699) mem 9655MB [2024-08-04 07:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][520/625] eta 0:00:27 lr 0.000412 wd 0.0500 time 0.2559 (0.2641) data time 0.0008 (0.0019) model time 0.2551 (0.2616) loss 6.3142 (5.7683) grad_norm 1.8267 (2.4917) loss_scale 512.0000 (446.6488) mem 9655MB [2024-08-04 07:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][530/625] eta 0:00:25 lr 0.000412 wd 0.0500 time 0.2545 (0.2640) data time 0.0010 (0.0018) model time 0.2535 (0.2615) loss 6.2516 (5.7688) grad_norm 2.0576 (2.4927) loss_scale 512.0000 (447.8795) mem 9655MB [2024-08-04 07:47:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][540/625] eta 0:00:22 lr 0.000411 wd 0.0500 time 0.2578 (0.2643) data time 0.0009 (0.0018) model time 0.2569 (0.2619) loss 5.6254 (5.7671) grad_norm 2.2296 (2.4879) loss_scale 512.0000 (449.0647) mem 9655MB [2024-08-04 07:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][550/625] eta 0:00:19 lr 0.000411 wd 0.0500 time 0.2521 (0.2645) data time 0.0008 (0.0018) model time 0.2513 (0.2621) loss 6.9401 (5.7622) grad_norm 1.8984 (2.4854) loss_scale 512.0000 (450.2069) mem 9655MB [2024-08-04 07:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][560/625] eta 0:00:17 lr 0.000411 wd 0.0500 time 0.2553 (0.2644) data time 0.0008 (0.0018) model time 0.2545 (0.2620) loss 5.3149 (5.7624) grad_norm 2.0552 (2.4844) loss_scale 512.0000 (451.3084) mem 9655MB [2024-08-04 07:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][570/625] eta 0:00:14 lr 0.000411 wd 0.0500 time 0.2496 (0.2642) data time 0.0007 (0.0018) model time 0.2489 (0.2619) loss 4.4418 (5.7628) grad_norm 2.6139 (2.4917) loss_scale 512.0000 (452.3713) mem 9655MB [2024-08-04 07:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][580/625] eta 0:00:11 lr 0.000411 wd 0.0500 time 0.2523 (0.2641) data time 0.0011 (0.0018) model time 0.2512 (0.2617) loss 5.2028 (5.7620) grad_norm 2.4075 (2.4850) loss_scale 512.0000 (453.3976) mem 9655MB [2024-08-04 07:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][590/625] eta 0:00:09 lr 0.000411 wd 0.0500 time 0.2554 (0.2639) data time 0.0009 (0.0017) model time 0.2545 (0.2616) loss 6.1367 (5.7627) grad_norm 2.7800 (2.4877) loss_scale 512.0000 (454.3892) mem 9655MB [2024-08-04 07:47:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][600/625] eta 0:00:06 lr 0.000411 wd 0.0500 time 0.2554 (0.2642) data time 0.0008 (0.0017) model time 0.2545 (0.2619) loss 5.2082 (5.7611) grad_norm 1.5016 (2.4789) loss_scale 512.0000 (455.3478) mem 9655MB [2024-08-04 07:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][610/625] eta 0:00:03 lr 0.000410 wd 0.0500 time 0.2542 (0.2640) data time 0.0006 (0.0017) model time 0.2536 (0.2618) loss 5.7379 (5.7584) grad_norm 1.7444 (2.4774) loss_scale 512.0000 (456.2750) mem 9655MB [2024-08-04 07:47:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][620/625] eta 0:00:01 lr 0.000410 wd 0.0500 time 0.2548 (0.2639) data time 0.0007 (0.0017) model time 0.2541 (0.2616) loss 4.9432 (5.7515) grad_norm 2.2773 (2.4784) loss_scale 512.0000 (457.1723) mem 9655MB [2024-08-04 07:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 217 training takes 0:02:44 [2024-08-04 07:47:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:47:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:47:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.6260 (0.6260) Acc@1 89.258 (89.258) Acc@5 98.291 (98.291) Mem 9655MB [2024-08-04 07:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9717 (0.7424) Acc@1 79.639 (86.133) Acc@5 95.508 (97.470) Mem 9655MB [2024-08-04 07:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0439 (0.8647) Acc@1 76.367 (82.875) Acc@5 95.410 (96.294) Mem 9655MB [2024-08-04 07:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.540 Acc@5 96.283 [2024-08-04 07:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:47:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.952 (0.952) Loss 0.5815 (0.5815) Acc@1 89.746 (89.746) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:47:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.141) Loss 0.9189 (0.7135) Acc@1 80.762 (86.439) Acc@5 95.752 (97.652) Mem 9655MB [2024-08-04 07:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.100) Loss 1.0332 (0.8383) Acc@1 76.562 (83.122) Acc@5 95.264 (96.366) Mem 9655MB [2024-08-04 07:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.812 Acc@5 96.375 [2024-08-04 07:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:47:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][0/625] eta 0:13:36 lr 0.000410 wd 0.0500 time 1.3065 (1.3065) data time 0.9138 (0.9138) model time 0.0000 (0.0000) loss 5.5667 (5.5667) grad_norm 2.6406 (2.6406) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:47:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][10/625] eta 0:03:57 lr 0.000410 wd 0.0500 time 0.2573 (0.3861) data time 0.0007 (0.0839) model time 0.0000 (0.0000) loss 5.5775 (5.7141) grad_norm 2.4867 (2.1728) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:47:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][20/625] eta 0:03:16 lr 0.000410 wd 0.0500 time 0.2572 (0.3251) data time 0.0007 (0.0444) model time 0.0000 (0.0000) loss 6.1751 (5.6515) grad_norm 2.3458 (2.1010) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:47:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][30/625] eta 0:03:00 lr 0.000410 wd 0.0500 time 0.2565 (0.3031) data time 0.0011 (0.0304) model time 0.0000 (0.0000) loss 5.5965 (5.7585) grad_norm 1.3821 (1.9987) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:47:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][40/625] eta 0:02:50 lr 0.000410 wd 0.0500 time 0.2555 (0.2917) data time 0.0008 (0.0232) model time 0.0000 (0.0000) loss 5.9856 (5.7759) grad_norm 5.4068 (2.1202) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][50/625] eta 0:02:43 lr 0.000410 wd 0.0500 time 0.2546 (0.2846) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 5.4283 (5.7771) grad_norm 1.8806 (2.2129) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][60/625] eta 0:02:38 lr 0.000409 wd 0.0500 time 0.2544 (0.2800) data time 0.0008 (0.0159) model time 0.2537 (0.2555) loss 6.2863 (5.8104) grad_norm 2.9172 (2.2789) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][70/625] eta 0:02:33 lr 0.000409 wd 0.0500 time 0.2570 (0.2768) data time 0.0007 (0.0138) model time 0.2564 (0.2561) loss 5.2309 (5.8341) grad_norm 3.5536 (2.3977) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][80/625] eta 0:02:30 lr 0.000409 wd 0.0500 time 0.2570 (0.2767) data time 0.0008 (0.0122) model time 0.2561 (0.2624) loss 6.0738 (5.8077) grad_norm 3.9106 (2.4240) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][90/625] eta 0:02:26 lr 0.000409 wd 0.0500 time 0.2562 (0.2746) data time 0.0009 (0.0110) model time 0.2552 (0.2609) loss 6.3003 (5.7994) grad_norm 2.0357 (2.3829) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][100/625] eta 0:02:23 lr 0.000409 wd 0.0500 time 0.2645 (0.2728) data time 0.0007 (0.0100) model time 0.2638 (0.2598) loss 5.0628 (5.7808) grad_norm 2.3033 (2.3550) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][110/625] eta 0:02:19 lr 0.000409 wd 0.0500 time 0.2589 (0.2713) data time 0.0008 (0.0092) model time 0.2582 (0.2590) loss 5.9605 (5.7921) grad_norm 1.4629 (2.3957) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][120/625] eta 0:02:17 lr 0.000409 wd 0.0500 time 0.2583 (0.2718) data time 0.0006 (0.0085) model time 0.2576 (0.2616) loss 5.9807 (5.8048) grad_norm 1.3795 (2.3651) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][130/625] eta 0:02:14 lr 0.000408 wd 0.0500 time 0.2543 (0.2721) data time 0.0006 (0.0079) model time 0.2537 (0.2633) loss 5.5294 (5.8230) grad_norm 1.6267 (2.3476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][140/625] eta 0:02:11 lr 0.000408 wd 0.0500 time 0.2534 (0.2709) data time 0.0011 (0.0074) model time 0.2523 (0.2623) loss 5.0036 (5.8202) grad_norm 2.9016 (2.3510) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][150/625] eta 0:02:08 lr 0.000408 wd 0.0500 time 0.2567 (0.2714) data time 0.0008 (0.0070) model time 0.2560 (0.2637) loss 4.3602 (5.8245) grad_norm 1.6874 (2.3419) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][160/625] eta 0:02:05 lr 0.000408 wd 0.0500 time 0.2571 (0.2704) data time 0.0007 (0.0066) model time 0.2565 (0.2629) loss 6.2457 (5.8018) grad_norm 6.9112 (2.4001) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][170/625] eta 0:02:02 lr 0.000408 wd 0.0500 time 0.3742 (0.2703) data time 0.0008 (0.0063) model time 0.3734 (0.2633) loss 5.9635 (5.8049) grad_norm 2.0960 (2.3965) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][180/625] eta 0:01:59 lr 0.000408 wd 0.0500 time 0.2558 (0.2696) data time 0.0009 (0.0060) model time 0.2549 (0.2627) loss 5.0794 (5.7865) grad_norm 2.0058 (2.3715) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][190/625] eta 0:01:57 lr 0.000408 wd 0.0500 time 0.2536 (0.2697) data time 0.0009 (0.0057) model time 0.2527 (0.2633) loss 6.1526 (5.7896) grad_norm 3.2243 (2.3716) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][200/625] eta 0:01:54 lr 0.000407 wd 0.0500 time 0.2608 (0.2690) data time 0.0007 (0.0055) model time 0.2602 (0.2628) loss 4.6980 (5.7680) grad_norm 2.5040 (2.3689) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][210/625] eta 0:01:51 lr 0.000407 wd 0.0500 time 0.2531 (0.2684) data time 0.0008 (0.0053) model time 0.2523 (0.2624) loss 6.0892 (5.7553) grad_norm 1.5966 (2.3928) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][220/625] eta 0:01:48 lr 0.000407 wd 0.0500 time 0.2545 (0.2679) data time 0.0008 (0.0051) model time 0.2537 (0.2619) loss 6.0511 (5.7639) grad_norm 2.6433 (2.3817) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][230/625] eta 0:01:45 lr 0.000407 wd 0.0500 time 0.2548 (0.2674) data time 0.0008 (0.0049) model time 0.2540 (0.2616) loss 5.3718 (5.7586) grad_norm 1.5565 (2.3656) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][240/625] eta 0:01:42 lr 0.000407 wd 0.0500 time 0.2666 (0.2670) data time 0.0008 (0.0047) model time 0.2658 (0.2613) loss 6.3299 (5.7726) grad_norm 1.8188 (2.3523) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][250/625] eta 0:01:39 lr 0.000407 wd 0.0500 time 0.2534 (0.2666) data time 0.0008 (0.0046) model time 0.2526 (0.2610) loss 5.8310 (5.7737) grad_norm 2.5017 (2.3505) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][260/625] eta 0:01:37 lr 0.000407 wd 0.0500 time 0.2529 (0.2662) data time 0.0010 (0.0044) model time 0.2519 (0.2608) loss 6.0858 (5.7747) grad_norm 1.4797 (2.3622) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][270/625] eta 0:01:34 lr 0.000406 wd 0.0500 time 0.2550 (0.2658) data time 0.0009 (0.0043) model time 0.2541 (0.2605) loss 6.3944 (5.7736) grad_norm 2.4798 (2.3433) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][280/625] eta 0:01:31 lr 0.000406 wd 0.0500 time 0.2732 (0.2656) data time 0.0007 (0.0042) model time 0.2725 (0.2604) loss 5.2289 (5.7822) grad_norm 1.6251 (2.3195) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][290/625] eta 0:01:28 lr 0.000406 wd 0.0500 time 0.2530 (0.2653) data time 0.0008 (0.0041) model time 0.2522 (0.2602) loss 5.5177 (5.7785) grad_norm 2.8125 (2.3150) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][300/625] eta 0:01:26 lr 0.000406 wd 0.0500 time 0.4288 (0.2656) data time 0.0008 (0.0040) model time 0.4280 (0.2608) loss 5.1558 (5.7739) grad_norm 2.4740 (2.3144) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][310/625] eta 0:01:23 lr 0.000406 wd 0.0500 time 0.2523 (0.2653) data time 0.0009 (0.0039) model time 0.2514 (0.2605) loss 5.6002 (5.7703) grad_norm 1.9764 (2.3153) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][320/625] eta 0:01:20 lr 0.000406 wd 0.0500 time 0.2523 (0.2650) data time 0.0008 (0.0038) model time 0.2515 (0.2604) loss 7.3922 (5.7814) grad_norm 2.1289 (2.3168) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][330/625] eta 0:01:18 lr 0.000406 wd 0.0500 time 0.2551 (0.2648) data time 0.0011 (0.0037) model time 0.2540 (0.2602) loss 6.9590 (5.7783) grad_norm 1.5013 (2.3107) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][340/625] eta 0:01:15 lr 0.000405 wd 0.0500 time 0.4697 (0.2651) data time 0.0008 (0.0036) model time 0.4689 (0.2607) loss 5.7952 (5.7827) grad_norm 1.9637 (2.2984) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][350/625] eta 0:01:13 lr 0.000405 wd 0.0500 time 0.2541 (0.2665) data time 0.0007 (0.0035) model time 0.2533 (0.2625) loss 6.2890 (5.7814) grad_norm 1.7690 (2.3293) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][360/625] eta 0:01:10 lr 0.000405 wd 0.0500 time 0.2568 (0.2666) data time 0.0008 (0.0035) model time 0.2560 (0.2627) loss 5.2326 (5.7830) grad_norm 2.1263 (2.3218) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][370/625] eta 0:01:07 lr 0.000405 wd 0.0500 time 0.2677 (0.2665) data time 0.0010 (0.0034) model time 0.2667 (0.2626) loss 6.2815 (5.7942) grad_norm 1.5513 (2.3163) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][380/625] eta 0:01:05 lr 0.000405 wd 0.0500 time 0.2561 (0.2663) data time 0.0009 (0.0033) model time 0.2552 (0.2625) loss 5.1185 (5.7942) grad_norm 1.6913 (2.3128) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][390/625] eta 0:01:02 lr 0.000405 wd 0.0500 time 0.2490 (0.2665) data time 0.0008 (0.0033) model time 0.2482 (0.2629) loss 4.7900 (5.7912) grad_norm 3.7615 (2.3199) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][400/625] eta 0:00:59 lr 0.000405 wd 0.0500 time 0.2544 (0.2663) data time 0.0018 (0.0032) model time 0.2526 (0.2627) loss 6.4348 (5.7927) grad_norm 2.9572 (2.3205) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][410/625] eta 0:00:57 lr 0.000404 wd 0.0500 time 0.2536 (0.2665) data time 0.0009 (0.0032) model time 0.2527 (0.2630) loss 5.9171 (5.7927) grad_norm 3.3581 (2.3298) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][420/625] eta 0:00:54 lr 0.000404 wd 0.0500 time 0.2531 (0.2662) data time 0.0007 (0.0031) model time 0.2524 (0.2628) loss 4.6999 (5.7965) grad_norm 4.1525 (2.3441) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][430/625] eta 0:00:51 lr 0.000404 wd 0.0500 time 0.2548 (0.2660) data time 0.0008 (0.0031) model time 0.2539 (0.2626) loss 4.9383 (5.7944) grad_norm 2.1042 (2.3374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][440/625] eta 0:00:49 lr 0.000404 wd 0.0500 time 0.2560 (0.2658) data time 0.0009 (0.0030) model time 0.2552 (0.2624) loss 5.4469 (5.7988) grad_norm 2.3307 (2.3360) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][450/625] eta 0:00:46 lr 0.000404 wd 0.0500 time 0.2595 (0.2656) data time 0.0008 (0.0030) model time 0.2587 (0.2622) loss 5.4743 (5.7898) grad_norm 1.3747 (2.3511) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][460/625] eta 0:00:43 lr 0.000404 wd 0.0500 time 0.2555 (0.2654) data time 0.0008 (0.0029) model time 0.2547 (0.2621) loss 5.2020 (5.7906) grad_norm 4.4513 (2.3531) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][470/625] eta 0:00:41 lr 0.000404 wd 0.0500 time 0.2653 (0.2653) data time 0.0007 (0.0029) model time 0.2646 (0.2619) loss 6.0804 (5.7869) grad_norm 4.4511 (2.3599) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][480/625] eta 0:00:38 lr 0.000404 wd 0.0500 time 0.2558 (0.2651) data time 0.0014 (0.0028) model time 0.2544 (0.2618) loss 5.6184 (5.7962) grad_norm 2.0398 (2.3688) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][490/625] eta 0:00:35 lr 0.000403 wd 0.0500 time 0.2574 (0.2649) data time 0.0007 (0.0028) model time 0.2568 (0.2616) loss 4.9342 (5.8006) grad_norm 3.6259 (2.3675) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][500/625] eta 0:00:33 lr 0.000403 wd 0.0500 time 0.2567 (0.2647) data time 0.0007 (0.0028) model time 0.2560 (0.2615) loss 4.6842 (5.8024) grad_norm 2.0673 (2.3607) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][510/625] eta 0:00:30 lr 0.000403 wd 0.0500 time 0.2502 (0.2646) data time 0.0010 (0.0027) model time 0.2492 (0.2614) loss 5.4861 (5.7981) grad_norm 2.2944 (2.3562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][520/625] eta 0:00:27 lr 0.000403 wd 0.0500 time 0.2555 (0.2644) data time 0.0011 (0.0027) model time 0.2544 (0.2613) loss 4.8675 (5.7966) grad_norm 2.9655 (2.3614) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][530/625] eta 0:00:25 lr 0.000403 wd 0.0500 time 0.2586 (0.2647) data time 0.0008 (0.0027) model time 0.2578 (0.2616) loss 6.9814 (5.7982) grad_norm 1.5160 (2.3680) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][540/625] eta 0:00:22 lr 0.000403 wd 0.0500 time 0.2537 (0.2645) data time 0.0008 (0.0026) model time 0.2529 (0.2615) loss 6.2270 (5.7922) grad_norm 3.5068 (2.3639) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][550/625] eta 0:00:19 lr 0.000403 wd 0.0500 time 0.2540 (0.2646) data time 0.0008 (0.0026) model time 0.2532 (0.2616) loss 5.1771 (5.7898) grad_norm 2.4020 (2.3623) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][560/625] eta 0:00:17 lr 0.000402 wd 0.0500 time 0.2565 (0.2645) data time 0.0009 (0.0026) model time 0.2556 (0.2615) loss 6.5647 (5.7919) grad_norm 4.2038 (2.3678) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][570/625] eta 0:00:14 lr 0.000402 wd 0.0500 time 0.2603 (0.2644) data time 0.0009 (0.0026) model time 0.2594 (0.2614) loss 6.1576 (5.8008) grad_norm 1.6307 (2.3646) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][580/625] eta 0:00:11 lr 0.000402 wd 0.0500 time 0.2522 (0.2644) data time 0.0007 (0.0025) model time 0.2514 (0.2615) loss 4.8435 (5.7964) grad_norm 2.4543 (2.3652) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][590/625] eta 0:00:09 lr 0.000402 wd 0.0500 time 0.2580 (0.2643) data time 0.0009 (0.0025) model time 0.2571 (0.2614) loss 6.5572 (5.8012) grad_norm 2.1823 (2.3764) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][600/625] eta 0:00:06 lr 0.000402 wd 0.0500 time 0.2556 (0.2642) data time 0.0013 (0.0025) model time 0.2543 (0.2613) loss 6.8185 (5.8077) grad_norm 2.5293 (2.3915) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][610/625] eta 0:00:03 lr 0.000402 wd 0.0500 time 0.2554 (0.2647) data time 0.0006 (0.0025) model time 0.2548 (0.2619) loss 5.7740 (5.8072) grad_norm 2.3233 (2.3910) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][620/625] eta 0:00:01 lr 0.000402 wd 0.0500 time 0.2549 (0.2645) data time 0.0006 (0.0024) model time 0.2543 (0.2617) loss 6.7892 (5.8113) grad_norm 2.2140 (2.3904) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 218 training takes 0:02:45 [2024-08-04 07:50:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:50:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:50:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.593 (0.593) Loss 0.6323 (0.6323) Acc@1 89.844 (89.844) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 07:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 0.9595 (0.7627) Acc@1 81.592 (86.279) Acc@5 95.898 (97.590) Mem 9655MB [2024-08-04 07:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.0771 (0.8908) Acc@1 76.270 (82.829) Acc@5 95.020 (96.345) Mem 9655MB [2024-08-04 07:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.440 Acc@5 96.357 [2024-08-04 07:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-04 07:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.878 (0.878) Loss 0.5815 (0.5815) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 07:50:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.139) Loss 0.9180 (0.7131) Acc@1 80.664 (86.448) Acc@5 95.801 (97.670) Mem 9655MB [2024-08-04 07:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.099) Loss 1.0332 (0.8377) Acc@1 76.855 (83.166) Acc@5 95.215 (96.375) Mem 9655MB [2024-08-04 07:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.841 Acc@5 96.387 [2024-08-04 07:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.84% [2024-08-04 07:50:35 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:50:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:50:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][0/625] eta 0:08:04 lr 0.000401 wd 0.0500 time 0.7747 (0.7747) data time 0.5226 (0.5226) model time 0.0000 (0.0000) loss 5.7695 (5.7695) grad_norm 3.2910 (3.2910) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][10/625] eta 0:03:08 lr 0.000401 wd 0.0500 time 0.2602 (0.3063) data time 0.0007 (0.0484) model time 0.0000 (0.0000) loss 4.7503 (5.2772) grad_norm 3.3762 (2.7374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][20/625] eta 0:02:57 lr 0.000401 wd 0.0500 time 0.2569 (0.2929) data time 0.0009 (0.0258) model time 0.0000 (0.0000) loss 5.9600 (5.4959) grad_norm 4.7512 (2.6622) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][30/625] eta 0:02:47 lr 0.000401 wd 0.0500 time 0.2557 (0.2809) data time 0.0007 (0.0178) model time 0.0000 (0.0000) loss 4.5532 (5.5067) grad_norm 3.0654 (2.7799) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][40/625] eta 0:02:40 lr 0.000401 wd 0.0500 time 0.2601 (0.2747) data time 0.0008 (0.0137) model time 0.0000 (0.0000) loss 6.1004 (5.5821) grad_norm 2.1755 (2.6133) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][50/625] eta 0:02:35 lr 0.000401 wd 0.0500 time 0.2536 (0.2709) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 6.1183 (5.6584) grad_norm 3.0944 (2.5241) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][60/625] eta 0:02:32 lr 0.000401 wd 0.0500 time 0.3843 (0.2706) data time 0.0008 (0.0095) model time 0.3835 (0.2683) loss 6.6337 (5.6642) grad_norm 2.3738 (2.5595) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][70/625] eta 0:02:32 lr 0.000400 wd 0.0500 time 0.2520 (0.2740) data time 0.0009 (0.0083) model time 0.2511 (0.2811) loss 5.6636 (5.6270) grad_norm 2.1091 (2.5793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][80/625] eta 0:02:28 lr 0.000400 wd 0.0500 time 0.2539 (0.2718) data time 0.0009 (0.0074) model time 0.2531 (0.2724) loss 5.4932 (5.6304) grad_norm 2.0260 (2.5317) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][90/625] eta 0:02:24 lr 0.000400 wd 0.0500 time 0.2483 (0.2701) data time 0.0011 (0.0066) model time 0.2472 (0.2681) loss 5.6348 (5.6555) grad_norm 4.3465 (2.5609) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][100/625] eta 0:02:21 lr 0.000400 wd 0.0500 time 0.2562 (0.2687) data time 0.0007 (0.0061) model time 0.2555 (0.2655) loss 5.7532 (5.6991) grad_norm 1.5337 (2.5755) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][110/625] eta 0:02:19 lr 0.000400 wd 0.0500 time 0.2576 (0.2715) data time 0.0010 (0.0056) model time 0.2566 (0.2711) loss 6.3600 (5.7081) grad_norm 1.7533 (2.5547) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][120/625] eta 0:02:16 lr 0.000400 wd 0.0500 time 0.2589 (0.2705) data time 0.0008 (0.0052) model time 0.2580 (0.2693) loss 5.5118 (5.7175) grad_norm 1.5985 (2.5018) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][130/625] eta 0:02:13 lr 0.000400 wd 0.0500 time 0.2552 (0.2695) data time 0.0007 (0.0049) model time 0.2546 (0.2678) loss 4.9152 (5.7163) grad_norm 2.6331 (2.4897) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][140/625] eta 0:02:10 lr 0.000400 wd 0.0500 time 0.2575 (0.2686) data time 0.0009 (0.0046) model time 0.2566 (0.2664) loss 5.9519 (5.7363) grad_norm 1.8767 (2.5417) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][150/625] eta 0:02:07 lr 0.000399 wd 0.0500 time 0.2541 (0.2686) data time 0.0007 (0.0044) model time 0.2534 (0.2665) loss 4.7711 (5.7378) grad_norm 1.3695 (2.5430) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][160/625] eta 0:02:04 lr 0.000399 wd 0.0500 time 0.2631 (0.2679) data time 0.0007 (0.0042) model time 0.2623 (0.2656) loss 6.1758 (5.7477) grad_norm 1.9190 (2.5156) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][170/625] eta 0:02:01 lr 0.000399 wd 0.0500 time 0.2449 (0.2673) data time 0.0011 (0.0040) model time 0.2438 (0.2648) loss 6.3070 (5.7468) grad_norm 2.2314 (2.4908) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][180/625] eta 0:01:59 lr 0.000399 wd 0.0500 time 0.4537 (0.2678) data time 0.0009 (0.0038) model time 0.4527 (0.2656) loss 6.0419 (5.7534) grad_norm 2.1226 (2.4669) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][190/625] eta 0:01:56 lr 0.000399 wd 0.0500 time 0.2550 (0.2672) data time 0.0011 (0.0037) model time 0.2539 (0.2649) loss 6.0651 (5.7543) grad_norm 1.5144 (2.4335) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][200/625] eta 0:01:53 lr 0.000399 wd 0.0500 time 0.2539 (0.2666) data time 0.0009 (0.0035) model time 0.2529 (0.2642) loss 4.9401 (5.7459) grad_norm 1.9718 (2.4331) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][210/625] eta 0:01:50 lr 0.000399 wd 0.0500 time 0.2544 (0.2661) data time 0.0009 (0.0034) model time 0.2534 (0.2637) loss 6.4336 (5.7602) grad_norm 3.1741 (2.4115) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][220/625] eta 0:01:47 lr 0.000398 wd 0.0500 time 0.2568 (0.2657) data time 0.0012 (0.0033) model time 0.2556 (0.2632) loss 6.2226 (5.7637) grad_norm 2.5229 (2.4302) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][230/625] eta 0:01:44 lr 0.000398 wd 0.0500 time 0.2561 (0.2653) data time 0.0008 (0.0032) model time 0.2552 (0.2628) loss 5.6933 (5.7574) grad_norm 3.0651 (2.4547) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][240/625] eta 0:01:42 lr 0.000398 wd 0.0500 time 0.4560 (0.2658) data time 0.0008 (0.0031) model time 0.4552 (0.2634) loss 5.0359 (5.7515) grad_norm 2.2313 (2.4571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][250/625] eta 0:01:39 lr 0.000398 wd 0.0500 time 0.2567 (0.2655) data time 0.0006 (0.0030) model time 0.2561 (0.2632) loss 6.6238 (5.7545) grad_norm 1.9630 (2.4454) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][260/625] eta 0:01:36 lr 0.000398 wd 0.0500 time 0.2561 (0.2651) data time 0.0008 (0.0029) model time 0.2553 (0.2628) loss 5.4084 (5.7361) grad_norm 2.1752 (2.4409) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][270/625] eta 0:01:34 lr 0.000398 wd 0.0500 time 0.2576 (0.2656) data time 0.0008 (0.0029) model time 0.2568 (0.2634) loss 6.5320 (5.7325) grad_norm 2.2592 (2.4390) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][280/625] eta 0:01:31 lr 0.000398 wd 0.0500 time 0.2566 (0.2652) data time 0.0008 (0.0028) model time 0.2559 (0.2630) loss 6.4362 (5.7296) grad_norm 1.9061 (2.4368) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][290/625] eta 0:01:28 lr 0.000397 wd 0.0500 time 0.2542 (0.2655) data time 0.0010 (0.0027) model time 0.2532 (0.2634) loss 6.1848 (5.7383) grad_norm 2.5167 (2.4434) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][300/625] eta 0:01:26 lr 0.000397 wd 0.0500 time 0.2556 (0.2652) data time 0.0007 (0.0027) model time 0.2549 (0.2631) loss 5.5916 (5.7453) grad_norm 3.4488 (2.4662) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:51:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][310/625] eta 0:01:23 lr 0.000397 wd 0.0500 time 0.2548 (0.2649) data time 0.0007 (0.0026) model time 0.2541 (0.2628) loss 5.1778 (5.7351) grad_norm 1.3393 (2.4621) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][320/625] eta 0:01:20 lr 0.000397 wd 0.0500 time 0.2549 (0.2646) data time 0.0010 (0.0026) model time 0.2539 (0.2625) loss 5.7313 (5.7219) grad_norm 3.7765 (2.4669) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][330/625] eta 0:01:17 lr 0.000397 wd 0.0500 time 0.2576 (0.2644) data time 0.0008 (0.0025) model time 0.2568 (0.2623) loss 6.4362 (5.7231) grad_norm 2.8800 (2.4644) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][340/625] eta 0:01:15 lr 0.000397 wd 0.0500 time 0.2562 (0.2642) data time 0.0008 (0.0025) model time 0.2554 (0.2621) loss 5.0968 (5.7238) grad_norm 2.3540 (2.4572) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][350/625] eta 0:01:12 lr 0.000397 wd 0.0500 time 0.2562 (0.2640) data time 0.0009 (0.0024) model time 0.2554 (0.2618) loss 4.9258 (5.7256) grad_norm 1.7252 (2.4555) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][360/625] eta 0:01:09 lr 0.000396 wd 0.0500 time 0.2571 (0.2641) data time 0.0007 (0.0024) model time 0.2564 (0.2621) loss 6.2579 (5.7191) grad_norm 2.3865 (2.4497) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][370/625] eta 0:01:07 lr 0.000396 wd 0.0500 time 0.2533 (0.2639) data time 0.0009 (0.0023) model time 0.2524 (0.2618) loss 5.7174 (5.7155) grad_norm 1.8260 (2.4397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][380/625] eta 0:01:04 lr 0.000396 wd 0.0500 time 0.2575 (0.2637) data time 0.0008 (0.0023) model time 0.2567 (0.2617) loss 6.4037 (5.7215) grad_norm 1.8914 (2.4303) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][390/625] eta 0:01:01 lr 0.000396 wd 0.0500 time 0.2557 (0.2635) data time 0.0006 (0.0023) model time 0.2551 (0.2615) loss 4.9193 (5.7266) grad_norm 1.9550 (2.4221) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][400/625] eta 0:00:59 lr 0.000396 wd 0.0500 time 0.2631 (0.2640) data time 0.0007 (0.0022) model time 0.2624 (0.2620) loss 5.4097 (5.7298) grad_norm 2.3169 (2.4260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][410/625] eta 0:00:56 lr 0.000396 wd 0.0500 time 0.2540 (0.2637) data time 0.0009 (0.0022) model time 0.2531 (0.2618) loss 6.2776 (5.7357) grad_norm 3.9052 (2.4318) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][420/625] eta 0:00:54 lr 0.000396 wd 0.0500 time 0.2596 (0.2642) data time 0.0011 (0.0022) model time 0.2585 (0.2623) loss 6.2492 (5.7388) grad_norm 3.1349 (2.4484) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][430/625] eta 0:00:51 lr 0.000395 wd 0.0500 time 0.4313 (0.2644) data time 0.0008 (0.0021) model time 0.4305 (0.2626) loss 5.9513 (5.7370) grad_norm 2.9616 (2.4671) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][440/625] eta 0:00:49 lr 0.000395 wd 0.0500 time 0.2675 (0.2651) data time 0.0007 (0.0021) model time 0.2668 (0.2635) loss 5.0406 (5.7352) grad_norm 1.8951 (2.4592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][450/625] eta 0:00:46 lr 0.000395 wd 0.0500 time 0.2565 (0.2649) data time 0.0008 (0.0021) model time 0.2556 (0.2633) loss 4.9730 (5.7346) grad_norm 3.1820 (2.4503) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][460/625] eta 0:00:43 lr 0.000395 wd 0.0500 time 0.2568 (0.2648) data time 0.0009 (0.0021) model time 0.2559 (0.2631) loss 6.0346 (5.7344) grad_norm 2.6862 (2.4554) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][470/625] eta 0:00:41 lr 0.000395 wd 0.0500 time 0.2550 (0.2646) data time 0.0010 (0.0020) model time 0.2540 (0.2630) loss 5.9547 (5.7391) grad_norm 3.2332 (2.4565) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][480/625] eta 0:00:38 lr 0.000395 wd 0.0500 time 0.2564 (0.2645) data time 0.0008 (0.0020) model time 0.2556 (0.2628) loss 6.4003 (5.7477) grad_norm 2.2753 (2.4596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][490/625] eta 0:00:35 lr 0.000395 wd 0.0500 time 0.2581 (0.2643) data time 0.0008 (0.0020) model time 0.2573 (0.2626) loss 5.9341 (5.7505) grad_norm 1.8196 (2.4465) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][500/625] eta 0:00:33 lr 0.000394 wd 0.0500 time 0.2578 (0.2641) data time 0.0009 (0.0020) model time 0.2569 (0.2624) loss 5.5950 (5.7545) grad_norm 1.6353 (2.4436) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][510/625] eta 0:00:30 lr 0.000394 wd 0.0500 time 0.2556 (0.2640) data time 0.0009 (0.0019) model time 0.2548 (0.2623) loss 7.0558 (5.7598) grad_norm 1.8319 (2.4419) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][520/625] eta 0:00:27 lr 0.000394 wd 0.0500 time 0.2532 (0.2638) data time 0.0006 (0.0019) model time 0.2526 (0.2621) loss 5.2892 (5.7601) grad_norm 1.8314 (2.4446) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][530/625] eta 0:00:25 lr 0.000394 wd 0.0500 time 0.2535 (0.2636) data time 0.0006 (0.0019) model time 0.2528 (0.2619) loss 5.8636 (5.7626) grad_norm 2.0117 (2.4390) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:52:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][540/625] eta 0:00:22 lr 0.000394 wd 0.0500 time 0.2582 (0.2639) data time 0.0006 (0.0019) model time 0.2576 (0.2622) loss 6.4289 (5.7686) grad_norm 1.9386 (2.4430) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][550/625] eta 0:00:19 lr 0.000394 wd 0.0500 time 0.2548 (0.2637) data time 0.0007 (0.0019) model time 0.2541 (0.2621) loss 5.4482 (5.7670) grad_norm 1.6499 (2.4421) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][560/625] eta 0:00:17 lr 0.000394 wd 0.0500 time 0.2692 (0.2637) data time 0.0006 (0.0019) model time 0.2686 (0.2620) loss 7.0543 (5.7641) grad_norm 2.2301 (2.4369) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][570/625] eta 0:00:14 lr 0.000394 wd 0.0500 time 0.2564 (0.2635) data time 0.0007 (0.0018) model time 0.2557 (0.2619) loss 5.7731 (5.7643) grad_norm 1.8199 (2.4306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][580/625] eta 0:00:11 lr 0.000393 wd 0.0500 time 0.2538 (0.2637) data time 0.0009 (0.0018) model time 0.2529 (0.2621) loss 5.9999 (5.7609) grad_norm 1.5534 (2.4224) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][590/625] eta 0:00:09 lr 0.000393 wd 0.0500 time 0.2571 (0.2636) data time 0.0007 (0.0018) model time 0.2564 (0.2620) loss 4.9665 (5.7625) grad_norm 1.8243 (2.4163) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][600/625] eta 0:00:06 lr 0.000393 wd 0.0500 time 0.2570 (0.2635) data time 0.0008 (0.0018) model time 0.2562 (0.2619) loss 5.2928 (5.7634) grad_norm 2.1068 (2.4104) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][610/625] eta 0:00:03 lr 0.000393 wd 0.0500 time 0.2524 (0.2634) data time 0.0005 (0.0018) model time 0.2519 (0.2618) loss 6.7428 (5.7570) grad_norm 2.3022 (2.4071) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][620/625] eta 0:00:01 lr 0.000393 wd 0.0500 time 0.2542 (0.2632) data time 0.0006 (0.0018) model time 0.2536 (0.2616) loss 6.2090 (5.7526) grad_norm 2.6371 (2.4024) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 219 training takes 0:02:44 [2024-08-04 07:53:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:53:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:53:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.551 (0.551) Loss 0.6279 (0.6279) Acc@1 89.404 (89.404) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9722 (0.7553) Acc@1 80.664 (85.995) Acc@5 95.850 (97.661) Mem 9655MB [2024-08-04 07:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0527 (0.8801) Acc@1 77.734 (82.812) Acc@5 95.117 (96.356) Mem 9655MB [2024-08-04 07:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.548 Acc@5 96.337 [2024-08-04 07:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:53:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.014 (1.014) Loss 0.5820 (0.5820) Acc@1 89.746 (89.746) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:53:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.147) Loss 0.9180 (0.7131) Acc@1 80.615 (86.430) Acc@5 95.752 (97.670) Mem 9655MB [2024-08-04 07:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.103) Loss 1.0322 (0.8374) Acc@1 76.807 (83.154) Acc@5 95.361 (96.405) Mem 9655MB [2024-08-04 07:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.835 Acc@5 96.417 [2024-08-04 07:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][0/625] eta 0:13:01 lr 0.000393 wd 0.0500 time 1.2500 (1.2500) data time 0.6612 (0.6612) model time 0.0000 (0.0000) loss 6.0664 (6.0664) grad_norm 1.6934 (1.6934) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][10/625] eta 0:03:32 lr 0.000393 wd 0.0500 time 0.2524 (0.3463) data time 0.0014 (0.0610) model time 0.0000 (0.0000) loss 5.7248 (5.6985) grad_norm 2.6681 (2.0448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][20/625] eta 0:03:03 lr 0.000392 wd 0.0500 time 0.2538 (0.3032) data time 0.0011 (0.0324) model time 0.0000 (0.0000) loss 6.0753 (5.4661) grad_norm 1.9422 (1.9168) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][30/625] eta 0:02:54 lr 0.000392 wd 0.0500 time 0.2556 (0.2939) data time 0.0008 (0.0222) model time 0.0000 (0.0000) loss 4.8076 (5.4805) grad_norm 2.0643 (2.0126) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][40/625] eta 0:02:51 lr 0.000392 wd 0.0500 time 0.2560 (0.2924) data time 0.0011 (0.0170) model time 0.0000 (0.0000) loss 5.7880 (5.4766) grad_norm 3.1005 (2.1561) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][50/625] eta 0:02:45 lr 0.000392 wd 0.0500 time 0.2560 (0.2886) data time 0.0007 (0.0139) model time 0.0000 (0.0000) loss 4.3567 (5.4723) grad_norm 3.7014 (2.2157) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][60/625] eta 0:02:40 lr 0.000392 wd 0.0500 time 0.2752 (0.2837) data time 0.0007 (0.0118) model time 0.2745 (0.2578) loss 4.7286 (5.5035) grad_norm 1.9528 (2.3731) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][70/625] eta 0:02:37 lr 0.000392 wd 0.0500 time 0.2550 (0.2847) data time 0.0008 (0.0102) model time 0.2542 (0.2737) loss 5.9832 (5.5327) grad_norm 2.3619 (2.6939) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][80/625] eta 0:02:34 lr 0.000392 wd 0.0500 time 0.2522 (0.2834) data time 0.0010 (0.0091) model time 0.2512 (0.2736) loss 5.0890 (5.5427) grad_norm 1.5774 (2.6392) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][90/625] eta 0:02:30 lr 0.000391 wd 0.0500 time 0.2572 (0.2805) data time 0.0007 (0.0082) model time 0.2566 (0.2691) loss 4.6837 (5.5593) grad_norm 3.5067 (2.6007) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][100/625] eta 0:02:26 lr 0.000391 wd 0.0500 time 0.2538 (0.2798) data time 0.0008 (0.0075) model time 0.2530 (0.2698) loss 4.8063 (5.5730) grad_norm 1.7990 (2.5569) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][110/625] eta 0:02:22 lr 0.000391 wd 0.0500 time 0.2554 (0.2776) data time 0.0009 (0.0069) model time 0.2546 (0.2673) loss 6.0717 (5.5880) grad_norm 1.5539 (2.5090) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:53:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][120/625] eta 0:02:19 lr 0.000391 wd 0.0500 time 0.2527 (0.2759) data time 0.0012 (0.0064) model time 0.2515 (0.2658) loss 5.2168 (5.6030) grad_norm 2.8577 (2.5081) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][130/625] eta 0:02:15 lr 0.000391 wd 0.0500 time 0.2562 (0.2744) data time 0.0008 (0.0060) model time 0.2554 (0.2644) loss 6.2387 (5.6114) grad_norm 2.6524 (2.4966) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][140/625] eta 0:02:12 lr 0.000391 wd 0.0500 time 0.2575 (0.2742) data time 0.0007 (0.0056) model time 0.2568 (0.2651) loss 5.7928 (5.6052) grad_norm 3.8969 (2.4792) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][150/625] eta 0:02:10 lr 0.000391 wd 0.0500 time 0.2509 (0.2739) data time 0.0006 (0.0053) model time 0.2502 (0.2654) loss 5.7536 (5.6173) grad_norm 1.5858 (2.4621) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][160/625] eta 0:02:06 lr 0.000391 wd 0.0500 time 0.2545 (0.2728) data time 0.0008 (0.0050) model time 0.2537 (0.2646) loss 5.8664 (5.6461) grad_norm 2.6458 (2.4323) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][170/625] eta 0:02:03 lr 0.000390 wd 0.0500 time 0.2590 (0.2720) data time 0.0009 (0.0048) model time 0.2581 (0.2640) loss 4.9437 (5.6506) grad_norm 2.1396 (2.4042) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][180/625] eta 0:02:00 lr 0.000390 wd 0.0500 time 0.2542 (0.2712) data time 0.0010 (0.0046) model time 0.2531 (0.2634) loss 6.1009 (5.6708) grad_norm 2.3427 (2.3815) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][190/625] eta 0:01:57 lr 0.000390 wd 0.0500 time 0.2561 (0.2704) data time 0.0009 (0.0044) model time 0.2551 (0.2628) loss 6.6430 (5.6909) grad_norm 1.6816 (2.3726) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][200/625] eta 0:01:54 lr 0.000390 wd 0.0500 time 0.2544 (0.2697) data time 0.0010 (0.0043) model time 0.2533 (0.2623) loss 6.8040 (5.7015) grad_norm 1.5845 (2.3626) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][210/625] eta 0:01:51 lr 0.000390 wd 0.0500 time 0.2571 (0.2691) data time 0.0008 (0.0041) model time 0.2563 (0.2618) loss 5.3417 (5.6999) grad_norm 3.4701 (2.3793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][220/625] eta 0:01:48 lr 0.000390 wd 0.0500 time 0.2564 (0.2685) data time 0.0008 (0.0040) model time 0.2555 (0.2615) loss 6.3314 (5.7193) grad_norm 1.7299 (2.3592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][230/625] eta 0:01:46 lr 0.000390 wd 0.0500 time 0.2582 (0.2689) data time 0.0009 (0.0038) model time 0.2573 (0.2623) loss 6.0050 (5.7173) grad_norm 1.6965 (2.3340) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][240/625] eta 0:01:43 lr 0.000389 wd 0.0500 time 0.2541 (0.2699) data time 0.0010 (0.0037) model time 0.2531 (0.2639) loss 6.4673 (5.7152) grad_norm 2.1947 (2.3274) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][250/625] eta 0:01:41 lr 0.000389 wd 0.0500 time 0.2577 (0.2694) data time 0.0007 (0.0036) model time 0.2570 (0.2635) loss 4.8404 (5.7061) grad_norm 3.1281 (2.3715) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][260/625] eta 0:01:38 lr 0.000389 wd 0.0500 time 0.2569 (0.2689) data time 0.0008 (0.0035) model time 0.2562 (0.2631) loss 5.0236 (5.7083) grad_norm 2.1526 (2.3590) loss_scale 1024.0000 (517.8851) mem 9655MB [2024-08-04 07:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][270/625] eta 0:01:35 lr 0.000389 wd 0.0500 time 0.2563 (0.2684) data time 0.0010 (0.0034) model time 0.2554 (0.2628) loss 4.9899 (5.7233) grad_norm 1.9883 (2.3504) loss_scale 1024.0000 (536.5609) mem 9655MB [2024-08-04 07:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][280/625] eta 0:01:32 lr 0.000389 wd 0.0500 time 0.2562 (0.2680) data time 0.0007 (0.0033) model time 0.2555 (0.2625) loss 4.5630 (5.7196) grad_norm 2.3724 (2.3578) loss_scale 1024.0000 (553.9075) mem 9655MB [2024-08-04 07:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][290/625] eta 0:01:29 lr 0.000389 wd 0.0500 time 0.2597 (0.2676) data time 0.0008 (0.0032) model time 0.2589 (0.2622) loss 6.1644 (5.7316) grad_norm 2.1914 (2.3706) loss_scale 1024.0000 (570.0619) mem 9655MB [2024-08-04 07:54:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][300/625] eta 0:01:26 lr 0.000389 wd 0.0500 time 0.2615 (0.2673) data time 0.0006 (0.0032) model time 0.2610 (0.2619) loss 6.2505 (5.7223) grad_norm 1.4198 (2.3863) loss_scale 1024.0000 (585.1429) mem 9655MB [2024-08-04 07:54:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][310/625] eta 0:01:24 lr 0.000388 wd 0.0500 time 0.2579 (0.2669) data time 0.0007 (0.0031) model time 0.2572 (0.2616) loss 5.9732 (5.7251) grad_norm 1.9425 (2.3915) loss_scale 1024.0000 (599.2540) mem 9655MB [2024-08-04 07:54:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][320/625] eta 0:01:21 lr 0.000388 wd 0.0500 time 0.2568 (0.2665) data time 0.0009 (0.0030) model time 0.2559 (0.2614) loss 5.1187 (5.7255) grad_norm 2.1507 (2.3796) loss_scale 1024.0000 (612.4860) mem 9655MB [2024-08-04 07:54:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][330/625] eta 0:01:18 lr 0.000388 wd 0.0500 time 0.2544 (0.2666) data time 0.0007 (0.0030) model time 0.2538 (0.2617) loss 6.4194 (5.7345) grad_norm 1.6768 (2.3794) loss_scale 1024.0000 (624.9184) mem 9655MB [2024-08-04 07:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][340/625] eta 0:01:16 lr 0.000388 wd 0.0500 time 0.2547 (0.2667) data time 0.0010 (0.0029) model time 0.2537 (0.2619) loss 6.1425 (5.7392) grad_norm 2.4365 (2.3960) loss_scale 1024.0000 (636.6217) mem 9655MB [2024-08-04 07:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][350/625] eta 0:01:13 lr 0.000388 wd 0.0500 time 0.2683 (0.2664) data time 0.0010 (0.0029) model time 0.2673 (0.2617) loss 5.4842 (5.7366) grad_norm 2.2113 (2.3790) loss_scale 1024.0000 (647.6581) mem 9655MB [2024-08-04 07:55:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][360/625] eta 0:01:10 lr 0.000388 wd 0.0500 time 0.2565 (0.2662) data time 0.0019 (0.0028) model time 0.2546 (0.2615) loss 5.9650 (5.7438) grad_norm 1.6787 (2.3825) loss_scale 1024.0000 (658.0831) mem 9655MB [2024-08-04 07:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][370/625] eta 0:01:07 lr 0.000388 wd 0.0500 time 0.2543 (0.2664) data time 0.0010 (0.0028) model time 0.2533 (0.2619) loss 5.4755 (5.7390) grad_norm 3.0571 (2.3853) loss_scale 1024.0000 (667.9461) mem 9655MB [2024-08-04 07:55:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][380/625] eta 0:01:05 lr 0.000387 wd 0.0500 time 0.2524 (0.2667) data time 0.0010 (0.0027) model time 0.2515 (0.2623) loss 6.5008 (5.7489) grad_norm 1.4590 (2.3704) loss_scale 1024.0000 (677.2913) mem 9655MB [2024-08-04 07:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][390/625] eta 0:01:02 lr 0.000387 wd 0.0500 time 0.2542 (0.2664) data time 0.0006 (0.0027) model time 0.2536 (0.2621) loss 6.2509 (5.7525) grad_norm 3.9917 (2.3719) loss_scale 1024.0000 (686.1586) mem 9655MB [2024-08-04 07:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][400/625] eta 0:00:59 lr 0.000387 wd 0.0500 time 0.2582 (0.2662) data time 0.0006 (0.0026) model time 0.2576 (0.2620) loss 5.8462 (5.7481) grad_norm 2.0470 (2.3823) loss_scale 1024.0000 (694.5835) mem 9655MB [2024-08-04 07:55:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][410/625] eta 0:00:57 lr 0.000387 wd 0.0500 time 0.2585 (0.2660) data time 0.0008 (0.0026) model time 0.2576 (0.2618) loss 5.6369 (5.7493) grad_norm 2.7143 (2.3775) loss_scale 1024.0000 (702.5985) mem 9655MB [2024-08-04 07:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][420/625] eta 0:00:54 lr 0.000387 wd 0.0500 time 0.2544 (0.2662) data time 0.0007 (0.0025) model time 0.2536 (0.2622) loss 5.6350 (5.7382) grad_norm 2.1289 (2.3757) loss_scale 1024.0000 (710.2328) mem 9655MB [2024-08-04 07:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][430/625] eta 0:00:51 lr 0.000387 wd 0.0500 time 0.2545 (0.2660) data time 0.0007 (0.0025) model time 0.2538 (0.2620) loss 5.5892 (5.7449) grad_norm 2.9885 (2.3833) loss_scale 1024.0000 (717.5128) mem 9655MB [2024-08-04 07:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][440/625] eta 0:00:49 lr 0.000387 wd 0.0500 time 0.2532 (0.2658) data time 0.0007 (0.0025) model time 0.2526 (0.2618) loss 5.9819 (5.7457) grad_norm 2.0384 (2.3845) loss_scale 1024.0000 (724.4626) mem 9655MB [2024-08-04 07:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][450/625] eta 0:00:46 lr 0.000387 wd 0.0500 time 0.2546 (0.2659) data time 0.0010 (0.0024) model time 0.2536 (0.2620) loss 6.4582 (5.7462) grad_norm 1.7380 (2.3728) loss_scale 1024.0000 (731.1042) mem 9655MB [2024-08-04 07:55:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][460/625] eta 0:00:43 lr 0.000386 wd 0.0500 time 0.2541 (0.2657) data time 0.0009 (0.0024) model time 0.2531 (0.2619) loss 4.6308 (5.7496) grad_norm 2.4820 (2.3749) loss_scale 1024.0000 (737.4577) mem 9655MB [2024-08-04 07:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][470/625] eta 0:00:41 lr 0.000386 wd 0.0500 time 0.2556 (0.2655) data time 0.0007 (0.0024) model time 0.2549 (0.2617) loss 6.6022 (5.7495) grad_norm 2.9198 (2.3661) loss_scale 1024.0000 (743.5414) mem 9655MB [2024-08-04 07:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][480/625] eta 0:00:38 lr 0.000386 wd 0.0500 time 0.2598 (0.2657) data time 0.0006 (0.0023) model time 0.2592 (0.2620) loss 5.5629 (5.7540) grad_norm 3.4078 (2.3778) loss_scale 1024.0000 (749.3721) mem 9655MB [2024-08-04 07:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][490/625] eta 0:00:35 lr 0.000386 wd 0.0500 time 0.2554 (0.2655) data time 0.0008 (0.0023) model time 0.2547 (0.2618) loss 6.7993 (5.7659) grad_norm 4.0785 (2.3994) loss_scale 1024.0000 (754.9654) mem 9655MB [2024-08-04 07:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][500/625] eta 0:00:33 lr 0.000386 wd 0.0500 time 0.2540 (0.2653) data time 0.0010 (0.0023) model time 0.2530 (0.2617) loss 5.7610 (5.7609) grad_norm 2.5522 (2.4072) loss_scale 1024.0000 (760.3353) mem 9655MB [2024-08-04 07:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][510/625] eta 0:00:30 lr 0.000386 wd 0.0500 time 0.2568 (0.2652) data time 0.0012 (0.0023) model time 0.2556 (0.2616) loss 5.7037 (5.7619) grad_norm 2.6820 (2.3979) loss_scale 1024.0000 (765.4951) mem 9655MB [2024-08-04 07:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][520/625] eta 0:00:27 lr 0.000386 wd 0.0500 time 0.2556 (0.2650) data time 0.0007 (0.0022) model time 0.2549 (0.2615) loss 6.5081 (5.7669) grad_norm 2.0203 (2.4015) loss_scale 1024.0000 (770.4568) mem 9655MB [2024-08-04 07:55:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][530/625] eta 0:00:25 lr 0.000385 wd 0.0500 time 0.2545 (0.2648) data time 0.0009 (0.0022) model time 0.2537 (0.2613) loss 5.9783 (5.7601) grad_norm 1.9859 (2.4106) loss_scale 1024.0000 (775.2316) mem 9655MB [2024-08-04 07:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][540/625] eta 0:00:22 lr 0.000385 wd 0.0500 time 0.2581 (0.2647) data time 0.0007 (0.0022) model time 0.2574 (0.2612) loss 4.6832 (5.7587) grad_norm 1.5880 (2.4157) loss_scale 1024.0000 (779.8299) mem 9655MB [2024-08-04 07:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][550/625] eta 0:00:19 lr 0.000385 wd 0.0500 time 0.2585 (0.2648) data time 0.0007 (0.0022) model time 0.2578 (0.2615) loss 5.4677 (5.7574) grad_norm 1.4876 (2.4122) loss_scale 1024.0000 (784.2613) mem 9655MB [2024-08-04 07:55:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][560/625] eta 0:00:17 lr 0.000385 wd 0.0500 time 0.2575 (0.2649) data time 0.0007 (0.0021) model time 0.2568 (0.2616) loss 5.9077 (5.7599) grad_norm 2.9486 (2.4175) loss_scale 1024.0000 (788.5348) mem 9655MB [2024-08-04 07:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][570/625] eta 0:00:14 lr 0.000385 wd 0.0500 time 0.4430 (0.2651) data time 0.0010 (0.0021) model time 0.4420 (0.2618) loss 4.9272 (5.7584) grad_norm 2.3193 (2.4195) loss_scale 1024.0000 (792.6585) mem 9655MB [2024-08-04 07:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][580/625] eta 0:00:11 lr 0.000385 wd 0.0500 time 0.2528 (0.2649) data time 0.0012 (0.0021) model time 0.2516 (0.2617) loss 5.6192 (5.7604) grad_norm 3.8417 (2.4360) loss_scale 1024.0000 (796.6403) mem 9655MB [2024-08-04 07:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][590/625] eta 0:00:09 lr 0.000385 wd 0.0500 time 0.2575 (0.2648) data time 0.0008 (0.0021) model time 0.2567 (0.2616) loss 6.7029 (5.7666) grad_norm 2.3601 (2.4339) loss_scale 1024.0000 (800.4873) mem 9655MB [2024-08-04 07:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][600/625] eta 0:00:06 lr 0.000384 wd 0.0500 time 0.2600 (0.2650) data time 0.0008 (0.0021) model time 0.2592 (0.2618) loss 5.5893 (5.7662) grad_norm 1.8657 (2.4315) loss_scale 1024.0000 (804.2063) mem 9655MB [2024-08-04 07:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][610/625] eta 0:00:03 lr 0.000384 wd 0.0500 time 0.2531 (0.2649) data time 0.0006 (0.0021) model time 0.2525 (0.2618) loss 6.6378 (5.7699) grad_norm 3.7908 (2.4425) loss_scale 1024.0000 (807.8036) mem 9655MB [2024-08-04 07:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][620/625] eta 0:00:01 lr 0.000384 wd 0.0500 time 0.2518 (0.2647) data time 0.0006 (0.0020) model time 0.2513 (0.2616) loss 6.2773 (5.7720) grad_norm 2.4778 (2.4505) loss_scale 1024.0000 (811.2850) mem 9655MB [2024-08-04 07:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 220 training takes 0:02:45 [2024-08-04 07:56:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:56:11 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:56:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.558 (0.558) Loss 0.6094 (0.6094) Acc@1 89.600 (89.600) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 07:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.9541 (0.7426) Acc@1 79.297 (85.933) Acc@5 96.094 (97.581) Mem 9655MB [2024-08-04 07:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0439 (0.8657) Acc@1 77.783 (82.822) Acc@5 94.971 (96.273) Mem 9655MB [2024-08-04 07:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.540 Acc@5 96.275 [2024-08-04 07:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-04 07:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.993 (0.993) Loss 0.5825 (0.5825) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:56:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.144) Loss 0.9170 (0.7129) Acc@1 80.566 (86.404) Acc@5 95.752 (97.643) Mem 9655MB [2024-08-04 07:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.102) Loss 1.0312 (0.8371) Acc@1 76.807 (83.152) Acc@5 95.410 (96.412) Mem 9655MB [2024-08-04 07:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.829 Acc@5 96.425 [2024-08-04 07:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:56:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][0/625] eta 0:13:08 lr 0.000384 wd 0.0500 time 1.2614 (1.2614) data time 0.8571 (0.8571) model time 0.0000 (0.0000) loss 5.9764 (5.9764) grad_norm 2.9732 (2.9732) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][10/625] eta 0:03:33 lr 0.000384 wd 0.0500 time 0.2591 (0.3472) data time 0.0009 (0.0787) model time 0.0000 (0.0000) loss 6.9432 (6.1561) grad_norm 1.4995 (2.2429) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][20/625] eta 0:03:03 lr 0.000384 wd 0.0500 time 0.2598 (0.3038) data time 0.0008 (0.0417) model time 0.0000 (0.0000) loss 5.4165 (5.8108) grad_norm 2.5703 (2.1375) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][30/625] eta 0:02:51 lr 0.000384 wd 0.0500 time 0.2525 (0.2884) data time 0.0008 (0.0285) model time 0.0000 (0.0000) loss 5.8872 (5.8968) grad_norm 1.8948 (2.3166) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][40/625] eta 0:02:44 lr 0.000384 wd 0.0500 time 0.2559 (0.2806) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 5.6488 (5.8940) grad_norm 1.7249 (2.2962) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][50/625] eta 0:02:38 lr 0.000383 wd 0.0500 time 0.2596 (0.2758) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 6.0174 (5.8356) grad_norm 3.1757 (2.3622) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][60/625] eta 0:02:34 lr 0.000383 wd 0.0500 time 0.2655 (0.2731) data time 0.0008 (0.0150) model time 0.2647 (0.2581) loss 5.6123 (5.7857) grad_norm 2.6217 (2.4074) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][70/625] eta 0:02:30 lr 0.000383 wd 0.0500 time 0.2648 (0.2709) data time 0.0008 (0.0130) model time 0.2640 (0.2573) loss 5.4548 (5.7730) grad_norm 2.0415 (2.3589) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][80/625] eta 0:02:26 lr 0.000383 wd 0.0500 time 0.2549 (0.2692) data time 0.0008 (0.0115) model time 0.2542 (0.2569) loss 6.0243 (5.8066) grad_norm 2.2057 (2.3034) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][90/625] eta 0:02:23 lr 0.000383 wd 0.0500 time 0.2567 (0.2678) data time 0.0009 (0.0104) model time 0.2558 (0.2565) loss 6.3728 (5.8093) grad_norm 1.7301 (2.2691) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][100/625] eta 0:02:19 lr 0.000383 wd 0.0500 time 0.2542 (0.2666) data time 0.0009 (0.0094) model time 0.2533 (0.2563) loss 5.7175 (5.8030) grad_norm 1.5143 (2.2568) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][110/625] eta 0:02:18 lr 0.000383 wd 0.0500 time 0.2601 (0.2693) data time 0.0008 (0.0087) model time 0.2593 (0.2628) loss 6.3870 (5.8060) grad_norm 1.7830 (2.2556) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][120/625] eta 0:02:16 lr 0.000382 wd 0.0500 time 0.2571 (0.2699) data time 0.0010 (0.0081) model time 0.2562 (0.2646) loss 5.4134 (5.8202) grad_norm 2.6022 (2.2682) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][130/625] eta 0:02:13 lr 0.000382 wd 0.0500 time 0.2517 (0.2703) data time 0.0008 (0.0075) model time 0.2509 (0.2658) loss 6.9414 (5.7980) grad_norm 2.6312 (2.2553) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][140/625] eta 0:02:10 lr 0.000382 wd 0.0500 time 0.2564 (0.2694) data time 0.0010 (0.0070) model time 0.2554 (0.2648) loss 6.3848 (5.8340) grad_norm 1.9588 (2.2533) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][150/625] eta 0:02:07 lr 0.000382 wd 0.0500 time 0.2526 (0.2695) data time 0.0010 (0.0066) model time 0.2516 (0.2653) loss 5.7367 (5.8319) grad_norm 5.1348 (2.2865) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][160/625] eta 0:02:04 lr 0.000382 wd 0.0500 time 0.2589 (0.2687) data time 0.0008 (0.0063) model time 0.2581 (0.2644) loss 4.6177 (5.8096) grad_norm 2.0904 (2.3135) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][170/625] eta 0:02:01 lr 0.000382 wd 0.0500 time 0.2653 (0.2680) data time 0.0006 (0.0060) model time 0.2647 (0.2637) loss 5.9191 (5.8064) grad_norm 3.6712 (2.3192) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][180/625] eta 0:01:59 lr 0.000382 wd 0.0500 time 0.2530 (0.2684) data time 0.0010 (0.0057) model time 0.2521 (0.2645) loss 6.2562 (5.8186) grad_norm 2.9929 (2.3390) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][190/625] eta 0:01:56 lr 0.000381 wd 0.0500 time 0.2559 (0.2679) data time 0.0006 (0.0054) model time 0.2553 (0.2641) loss 6.2332 (5.8246) grad_norm 1.6677 (2.3297) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][200/625] eta 0:01:53 lr 0.000381 wd 0.0500 time 0.2687 (0.2675) data time 0.0007 (0.0052) model time 0.2680 (0.2637) loss 4.8877 (5.8059) grad_norm 1.8523 (2.3403) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][210/625] eta 0:01:50 lr 0.000381 wd 0.0500 time 0.2552 (0.2670) data time 0.0008 (0.0050) model time 0.2544 (0.2632) loss 6.3396 (5.8042) grad_norm 3.5826 (2.4467) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][220/625] eta 0:01:48 lr 0.000381 wd 0.0500 time 0.4443 (0.2673) data time 0.0008 (0.0048) model time 0.4435 (0.2638) loss 6.2997 (5.8212) grad_norm 2.0813 (2.4569) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][230/625] eta 0:01:45 lr 0.000381 wd 0.0500 time 0.2554 (0.2668) data time 0.0008 (0.0047) model time 0.2545 (0.2633) loss 6.7370 (5.8168) grad_norm 2.0451 (2.4617) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][240/625] eta 0:01:42 lr 0.000381 wd 0.0500 time 0.2571 (0.2673) data time 0.0009 (0.0045) model time 0.2562 (0.2640) loss 6.6143 (5.8057) grad_norm 1.9431 (2.5012) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][250/625] eta 0:01:40 lr 0.000381 wd 0.0500 time 0.2535 (0.2669) data time 0.0010 (0.0044) model time 0.2525 (0.2636) loss 6.7613 (5.8171) grad_norm 2.5510 (2.4767) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 07:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][260/625] eta 0:01:37 lr 0.000381 wd 0.0500 time 0.2563 (0.2665) data time 0.0007 (0.0042) model time 0.2556 (0.2633) loss 5.8304 (5.8156) grad_norm 3.5718 (inf) loss_scale 512.0000 (1006.3448) mem 9655MB [2024-08-04 07:57:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][270/625] eta 0:01:34 lr 0.000380 wd 0.0500 time 0.2535 (0.2661) data time 0.0009 (0.0041) model time 0.2526 (0.2629) loss 5.7176 (5.8121) grad_norm 1.8171 (inf) loss_scale 512.0000 (988.1033) mem 9655MB [2024-08-04 07:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][280/625] eta 0:01:31 lr 0.000380 wd 0.0500 time 0.2585 (0.2658) data time 0.0010 (0.0040) model time 0.2575 (0.2626) loss 5.9039 (5.8163) grad_norm 2.4616 (inf) loss_scale 512.0000 (971.1601) mem 9655MB [2024-08-04 07:57:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][290/625] eta 0:01:28 lr 0.000380 wd 0.0500 time 0.2527 (0.2654) data time 0.0008 (0.0039) model time 0.2519 (0.2623) loss 5.0894 (5.8166) grad_norm 1.8406 (inf) loss_scale 512.0000 (955.3814) mem 9655MB [2024-08-04 07:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][300/625] eta 0:01:26 lr 0.000380 wd 0.0500 time 0.2544 (0.2656) data time 0.0009 (0.0038) model time 0.2535 (0.2626) loss 6.2679 (5.8190) grad_norm 1.9509 (inf) loss_scale 512.0000 (940.6512) mem 9655MB [2024-08-04 07:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][310/625] eta 0:01:23 lr 0.000380 wd 0.0500 time 0.2532 (0.2653) data time 0.0009 (0.0037) model time 0.2523 (0.2623) loss 6.2742 (5.8220) grad_norm 1.6106 (inf) loss_scale 512.0000 (926.8682) mem 9655MB [2024-08-04 07:57:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][320/625] eta 0:01:20 lr 0.000380 wd 0.0500 time 0.2541 (0.2650) data time 0.0009 (0.0036) model time 0.2532 (0.2621) loss 5.4588 (5.8144) grad_norm 3.2080 (inf) loss_scale 512.0000 (913.9439) mem 9655MB [2024-08-04 07:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][330/625] eta 0:01:18 lr 0.000380 wd 0.0500 time 0.2553 (0.2648) data time 0.0009 (0.0035) model time 0.2544 (0.2618) loss 6.1187 (5.8088) grad_norm 2.8695 (inf) loss_scale 512.0000 (901.8006) mem 9655MB [2024-08-04 07:57:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][340/625] eta 0:01:15 lr 0.000379 wd 0.0500 time 0.2630 (0.2646) data time 0.0008 (0.0035) model time 0.2622 (0.2617) loss 5.8455 (5.7998) grad_norm 1.8680 (inf) loss_scale 512.0000 (890.3695) mem 9655MB [2024-08-04 07:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][350/625] eta 0:01:12 lr 0.000379 wd 0.0500 time 0.2560 (0.2649) data time 0.0008 (0.0034) model time 0.2553 (0.2621) loss 5.9189 (5.7983) grad_norm 1.9491 (inf) loss_scale 512.0000 (879.5897) mem 9655MB [2024-08-04 07:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][360/625] eta 0:01:10 lr 0.000379 wd 0.0500 time 0.2542 (0.2652) data time 0.0011 (0.0033) model time 0.2532 (0.2625) loss 5.9093 (5.8005) grad_norm 1.5863 (inf) loss_scale 512.0000 (869.4072) mem 9655MB [2024-08-04 07:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][370/625] eta 0:01:07 lr 0.000379 wd 0.0500 time 0.2547 (0.2656) data time 0.0013 (0.0033) model time 0.2534 (0.2630) loss 4.8397 (5.8026) grad_norm 1.7875 (inf) loss_scale 512.0000 (859.7736) mem 9655MB [2024-08-04 07:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][380/625] eta 0:01:04 lr 0.000379 wd 0.0500 time 0.2590 (0.2653) data time 0.0008 (0.0032) model time 0.2582 (0.2627) loss 6.2097 (5.7942) grad_norm 4.9409 (inf) loss_scale 512.0000 (850.6457) mem 9655MB [2024-08-04 07:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][390/625] eta 0:01:02 lr 0.000379 wd 0.0500 time 0.2571 (0.2651) data time 0.0007 (0.0031) model time 0.2564 (0.2625) loss 4.7815 (5.7894) grad_norm 1.6563 (inf) loss_scale 512.0000 (841.9847) mem 9655MB [2024-08-04 07:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][400/625] eta 0:00:59 lr 0.000379 wd 0.0500 time 0.2576 (0.2648) data time 0.0009 (0.0031) model time 0.2567 (0.2623) loss 5.7717 (5.7900) grad_norm 4.7176 (inf) loss_scale 512.0000 (833.7556) mem 9655MB [2024-08-04 07:58:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][410/625] eta 0:00:56 lr 0.000378 wd 0.0500 time 0.4655 (0.2651) data time 0.0007 (0.0030) model time 0.4648 (0.2626) loss 6.0555 (5.7909) grad_norm 3.0493 (inf) loss_scale 512.0000 (825.9270) mem 9655MB [2024-08-04 07:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][420/625] eta 0:00:54 lr 0.000378 wd 0.0500 time 0.2523 (0.2653) data time 0.0007 (0.0030) model time 0.2516 (0.2629) loss 5.2166 (5.7908) grad_norm 2.0600 (inf) loss_scale 512.0000 (818.4703) mem 9655MB [2024-08-04 07:58:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][430/625] eta 0:00:51 lr 0.000378 wd 0.0500 time 0.2543 (0.2654) data time 0.0007 (0.0029) model time 0.2536 (0.2631) loss 4.8178 (5.7891) grad_norm 3.3058 (inf) loss_scale 512.0000 (811.3596) mem 9655MB [2024-08-04 07:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][440/625] eta 0:00:49 lr 0.000378 wd 0.0500 time 0.2606 (0.2652) data time 0.0008 (0.0029) model time 0.2598 (0.2629) loss 6.9632 (5.7863) grad_norm 1.8664 (inf) loss_scale 512.0000 (804.5714) mem 9655MB [2024-08-04 07:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][450/625] eta 0:00:46 lr 0.000378 wd 0.0500 time 0.2492 (0.2650) data time 0.0007 (0.0028) model time 0.2485 (0.2627) loss 6.6032 (5.7950) grad_norm 2.1137 (inf) loss_scale 512.0000 (798.0843) mem 9655MB [2024-08-04 07:58:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][460/625] eta 0:00:43 lr 0.000378 wd 0.0500 time 0.2635 (0.2653) data time 0.0012 (0.0028) model time 0.2622 (0.2630) loss 5.1040 (5.7876) grad_norm 3.0347 (inf) loss_scale 512.0000 (791.8785) mem 9655MB [2024-08-04 07:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][470/625] eta 0:00:41 lr 0.000378 wd 0.0500 time 0.2559 (0.2654) data time 0.0009 (0.0028) model time 0.2550 (0.2632) loss 6.2407 (5.7860) grad_norm 2.5139 (inf) loss_scale 512.0000 (785.9363) mem 9655MB [2024-08-04 07:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][480/625] eta 0:00:38 lr 0.000378 wd 0.0500 time 0.2526 (0.2652) data time 0.0006 (0.0027) model time 0.2520 (0.2630) loss 5.6874 (5.7833) grad_norm 2.6844 (inf) loss_scale 512.0000 (780.2412) mem 9655MB [2024-08-04 07:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][490/625] eta 0:00:35 lr 0.000377 wd 0.0500 time 0.2574 (0.2650) data time 0.0009 (0.0027) model time 0.2565 (0.2629) loss 5.8340 (5.7750) grad_norm 1.4411 (inf) loss_scale 512.0000 (774.7780) mem 9655MB [2024-08-04 07:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][500/625] eta 0:00:33 lr 0.000377 wd 0.0500 time 0.2536 (0.2649) data time 0.0007 (0.0026) model time 0.2529 (0.2627) loss 7.2748 (5.7759) grad_norm 2.3623 (inf) loss_scale 512.0000 (769.5329) mem 9655MB [2024-08-04 07:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][510/625] eta 0:00:30 lr 0.000377 wd 0.0500 time 0.2522 (0.2647) data time 0.0009 (0.0026) model time 0.2512 (0.2625) loss 4.6609 (5.7706) grad_norm 1.7566 (inf) loss_scale 512.0000 (764.4932) mem 9655MB [2024-08-04 07:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][520/625] eta 0:00:27 lr 0.000377 wd 0.0500 time 0.2548 (0.2648) data time 0.0009 (0.0026) model time 0.2539 (0.2626) loss 5.5893 (5.7735) grad_norm 3.0262 (inf) loss_scale 512.0000 (759.6468) mem 9655MB [2024-08-04 07:58:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][530/625] eta 0:00:25 lr 0.000377 wd 0.0500 time 0.4206 (0.2653) data time 0.0008 (0.0025) model time 0.4198 (0.2633) loss 4.8743 (5.7753) grad_norm 2.1693 (inf) loss_scale 512.0000 (754.9831) mem 9655MB [2024-08-04 07:58:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][540/625] eta 0:00:22 lr 0.000377 wd 0.0500 time 0.2574 (0.2655) data time 0.0008 (0.0025) model time 0.2565 (0.2635) loss 5.4754 (5.7772) grad_norm 2.4684 (inf) loss_scale 512.0000 (750.4917) mem 9655MB [2024-08-04 07:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][550/625] eta 0:00:19 lr 0.000377 wd 0.0500 time 0.2556 (0.2653) data time 0.0008 (0.0025) model time 0.2548 (0.2633) loss 6.4947 (5.7771) grad_norm 1.8137 (inf) loss_scale 512.0000 (746.1633) mem 9655MB [2024-08-04 07:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][560/625] eta 0:00:17 lr 0.000376 wd 0.0500 time 0.2557 (0.2651) data time 0.0006 (0.0025) model time 0.2551 (0.2631) loss 4.8953 (5.7776) grad_norm 3.6642 (inf) loss_scale 512.0000 (741.9893) mem 9655MB [2024-08-04 07:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][570/625] eta 0:00:14 lr 0.000376 wd 0.0500 time 0.2579 (0.2650) data time 0.0008 (0.0024) model time 0.2572 (0.2630) loss 6.1250 (5.7752) grad_norm 3.7256 (inf) loss_scale 512.0000 (737.9615) mem 9655MB [2024-08-04 07:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][580/625] eta 0:00:11 lr 0.000376 wd 0.0500 time 0.2544 (0.2648) data time 0.0009 (0.0024) model time 0.2535 (0.2629) loss 6.0845 (5.7778) grad_norm 2.7842 (inf) loss_scale 512.0000 (734.0723) mem 9655MB [2024-08-04 07:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][590/625] eta 0:00:09 lr 0.000376 wd 0.0500 time 0.2565 (0.2647) data time 0.0009 (0.0024) model time 0.2556 (0.2627) loss 6.1745 (5.7789) grad_norm 2.7189 (inf) loss_scale 512.0000 (730.3147) mem 9655MB [2024-08-04 07:58:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][600/625] eta 0:00:06 lr 0.000376 wd 0.0500 time 0.2388 (0.2652) data time 0.0008 (0.0024) model time 0.2381 (0.2633) loss 6.0350 (5.7807) grad_norm 3.8753 (inf) loss_scale 512.0000 (726.6822) mem 9655MB [2024-08-04 07:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][610/625] eta 0:00:03 lr 0.000376 wd 0.0500 time 0.2524 (0.2651) data time 0.0004 (0.0023) model time 0.2520 (0.2632) loss 6.5572 (5.7862) grad_norm 1.9285 (inf) loss_scale 512.0000 (723.1686) mem 9655MB [2024-08-04 07:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][620/625] eta 0:00:01 lr 0.000376 wd 0.0500 time 0.2536 (0.2649) data time 0.0004 (0.0023) model time 0.2533 (0.2630) loss 6.3397 (5.7878) grad_norm 3.8125 (inf) loss_scale 512.0000 (719.7681) mem 9655MB [2024-08-04 07:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 221 training takes 0:02:45 [2024-08-04 07:59:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 07:59:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 07:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.626 (0.626) Loss 0.5913 (0.5913) Acc@1 89.746 (89.746) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 07:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 0.9404 (0.7320) Acc@1 79.688 (86.062) Acc@5 95.898 (97.528) Mem 9655MB [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.084) Loss 1.0381 (0.8530) Acc@1 77.295 (82.861) Acc@5 94.873 (96.308) Mem 9655MB [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.626 Acc@5 96.297 [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.63% [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 07:59:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 07:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5820 (0.5820) Acc@1 89.551 (89.551) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9175 (0.7126) Acc@1 80.615 (86.386) Acc@5 95.898 (97.647) Mem 9655MB [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0303 (0.8368) Acc@1 77.002 (83.154) Acc@5 95.459 (96.431) Mem 9655MB [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.843 Acc@5 96.435 [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.84% [2024-08-04 07:59:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 07:59:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 07:59:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][0/625] eta 0:09:25 lr 0.000376 wd 0.0500 time 0.9044 (0.9044) data time 0.6607 (0.6607) model time 0.0000 (0.0000) loss 6.3413 (6.3413) grad_norm 1.3314 (1.3314) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][10/625] eta 0:03:14 lr 0.000375 wd 0.0500 time 0.2520 (0.3164) data time 0.0006 (0.0609) model time 0.0000 (0.0000) loss 6.1004 (5.8245) grad_norm 2.0370 (2.0472) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][20/625] eta 0:02:53 lr 0.000375 wd 0.0500 time 0.2559 (0.2874) data time 0.0008 (0.0323) model time 0.0000 (0.0000) loss 6.0878 (5.8078) grad_norm 3.3347 (2.6129) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][30/625] eta 0:02:45 lr 0.000375 wd 0.0500 time 0.2591 (0.2779) data time 0.0011 (0.0222) model time 0.0000 (0.0000) loss 5.0149 (5.7843) grad_norm 2.3037 (2.6818) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][40/625] eta 0:02:39 lr 0.000375 wd 0.0500 time 0.2650 (0.2729) data time 0.0006 (0.0170) model time 0.0000 (0.0000) loss 5.5765 (5.8128) grad_norm 3.9474 (2.6291) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][50/625] eta 0:02:37 lr 0.000375 wd 0.0500 time 0.2577 (0.2738) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 5.6556 (5.8462) grad_norm 3.0656 (2.6431) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][60/625] eta 0:02:34 lr 0.000375 wd 0.0500 time 0.3988 (0.2732) data time 0.0006 (0.0118) model time 0.3982 (0.2689) loss 5.4935 (5.8474) grad_norm 2.5914 (2.6090) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][70/625] eta 0:02:30 lr 0.000375 wd 0.0500 time 0.2560 (0.2708) data time 0.0006 (0.0102) model time 0.2553 (0.2622) loss 4.8948 (5.8203) grad_norm 3.2939 (2.5923) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][80/625] eta 0:02:26 lr 0.000374 wd 0.0500 time 0.2595 (0.2691) data time 0.0009 (0.0091) model time 0.2586 (0.2602) loss 5.1210 (5.8378) grad_norm 2.0202 (2.5489) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][90/625] eta 0:02:23 lr 0.000374 wd 0.0500 time 0.2663 (0.2679) data time 0.0008 (0.0082) model time 0.2655 (0.2595) loss 6.7091 (5.8507) grad_norm 1.4327 (2.5087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][100/625] eta 0:02:20 lr 0.000374 wd 0.0500 time 0.2644 (0.2671) data time 0.0006 (0.0074) model time 0.2638 (0.2594) loss 5.8609 (5.8609) grad_norm 3.0511 (2.5143) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 07:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][110/625] eta 0:02:17 lr 0.000374 wd 0.0500 time 0.2547 (0.2677) data time 0.0009 (0.0069) model time 0.2538 (0.2616) loss 6.3554 (5.8267) grad_norm 1.8611 (inf) loss_scale 256.0000 (500.4685) mem 9655MB [2024-08-04 07:59:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][120/625] eta 0:02:14 lr 0.000374 wd 0.0500 time 0.2819 (0.2670) data time 0.0006 (0.0064) model time 0.2812 (0.2611) loss 5.3684 (5.7953) grad_norm 1.9616 (inf) loss_scale 256.0000 (480.2645) mem 9655MB [2024-08-04 07:59:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][130/625] eta 0:02:11 lr 0.000374 wd 0.0500 time 0.2437 (0.2662) data time 0.0010 (0.0060) model time 0.2427 (0.2604) loss 6.5951 (5.7946) grad_norm 1.9169 (inf) loss_scale 256.0000 (463.1450) mem 9655MB [2024-08-04 07:59:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][140/625] eta 0:02:08 lr 0.000374 wd 0.0500 time 0.2593 (0.2654) data time 0.0008 (0.0056) model time 0.2585 (0.2598) loss 5.6072 (5.7998) grad_norm 2.3187 (inf) loss_scale 256.0000 (448.4539) mem 9655MB [2024-08-04 07:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][150/625] eta 0:02:05 lr 0.000373 wd 0.0500 time 0.2574 (0.2649) data time 0.0006 (0.0053) model time 0.2568 (0.2594) loss 5.2448 (5.8002) grad_norm 2.8319 (inf) loss_scale 256.0000 (435.7086) mem 9655MB [2024-08-04 07:59:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][160/625] eta 0:02:02 lr 0.000373 wd 0.0500 time 0.2553 (0.2643) data time 0.0007 (0.0050) model time 0.2546 (0.2590) loss 4.6661 (5.7768) grad_norm 2.0575 (inf) loss_scale 256.0000 (424.5466) mem 9655MB [2024-08-04 07:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][170/625] eta 0:02:00 lr 0.000373 wd 0.0500 time 0.2678 (0.2640) data time 0.0007 (0.0048) model time 0.2671 (0.2589) loss 6.1390 (5.7759) grad_norm 3.2177 (inf) loss_scale 256.0000 (414.6901) mem 9655MB [2024-08-04 07:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][180/625] eta 0:01:57 lr 0.000373 wd 0.0500 time 0.2603 (0.2636) data time 0.0008 (0.0046) model time 0.2595 (0.2586) loss 5.2939 (5.7608) grad_norm 1.5854 (inf) loss_scale 256.0000 (405.9227) mem 9655MB [2024-08-04 07:59:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][190/625] eta 0:01:54 lr 0.000373 wd 0.0500 time 0.2699 (0.2639) data time 0.0008 (0.0044) model time 0.2691 (0.2593) loss 4.9989 (5.7450) grad_norm 2.2184 (inf) loss_scale 256.0000 (398.0733) mem 9655MB [2024-08-04 07:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][200/625] eta 0:01:51 lr 0.000373 wd 0.0500 time 0.2559 (0.2635) data time 0.0009 (0.0042) model time 0.2550 (0.2591) loss 6.1099 (5.7459) grad_norm 2.2210 (inf) loss_scale 256.0000 (391.0050) mem 9655MB [2024-08-04 08:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][210/625] eta 0:01:49 lr 0.000373 wd 0.0500 time 0.2580 (0.2632) data time 0.0010 (0.0041) model time 0.2570 (0.2588) loss 5.6636 (5.7458) grad_norm 2.0524 (inf) loss_scale 256.0000 (384.6066) mem 9655MB [2024-08-04 08:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][220/625] eta 0:01:46 lr 0.000373 wd 0.0500 time 0.2541 (0.2630) data time 0.0009 (0.0039) model time 0.2532 (0.2588) loss 6.1225 (5.7474) grad_norm 2.2945 (inf) loss_scale 256.0000 (378.7873) mem 9655MB [2024-08-04 08:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][230/625] eta 0:01:43 lr 0.000372 wd 0.0500 time 0.2566 (0.2627) data time 0.0006 (0.0038) model time 0.2560 (0.2586) loss 5.9809 (5.7572) grad_norm 1.9892 (inf) loss_scale 256.0000 (373.4719) mem 9655MB [2024-08-04 08:00:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][240/625] eta 0:01:41 lr 0.000372 wd 0.0500 time 0.4437 (0.2640) data time 0.0007 (0.0037) model time 0.4430 (0.2605) loss 4.6140 (5.7447) grad_norm 3.3686 (inf) loss_scale 256.0000 (368.5975) mem 9655MB [2024-08-04 08:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][250/625] eta 0:01:38 lr 0.000372 wd 0.0500 time 0.2586 (0.2638) data time 0.0008 (0.0036) model time 0.2578 (0.2603) loss 6.0807 (5.7518) grad_norm 3.6766 (inf) loss_scale 256.0000 (364.1116) mem 9655MB [2024-08-04 08:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][260/625] eta 0:01:36 lr 0.000372 wd 0.0500 time 0.2603 (0.2635) data time 0.0008 (0.0035) model time 0.2595 (0.2600) loss 5.7651 (5.7472) grad_norm 1.6127 (inf) loss_scale 256.0000 (359.9693) mem 9655MB [2024-08-04 08:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][270/625] eta 0:01:33 lr 0.000372 wd 0.0500 time 0.2571 (0.2647) data time 0.0008 (0.0034) model time 0.2563 (0.2616) loss 6.3570 (5.7515) grad_norm 2.2053 (inf) loss_scale 256.0000 (356.1328) mem 9655MB [2024-08-04 08:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][280/625] eta 0:01:31 lr 0.000372 wd 0.0500 time 0.2547 (0.2650) data time 0.0006 (0.0033) model time 0.2541 (0.2621) loss 6.9272 (5.7673) grad_norm 1.5815 (inf) loss_scale 256.0000 (352.5694) mem 9655MB [2024-08-04 08:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][290/625] eta 0:01:28 lr 0.000372 wd 0.0500 time 0.2559 (0.2651) data time 0.0008 (0.0032) model time 0.2552 (0.2623) loss 4.8647 (5.7607) grad_norm 4.0718 (inf) loss_scale 256.0000 (349.2509) mem 9655MB [2024-08-04 08:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][300/625] eta 0:01:26 lr 0.000371 wd 0.0500 time 0.2543 (0.2648) data time 0.0010 (0.0031) model time 0.2534 (0.2620) loss 6.2962 (5.7579) grad_norm 2.6111 (inf) loss_scale 256.0000 (346.1528) mem 9655MB [2024-08-04 08:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][310/625] eta 0:01:23 lr 0.000371 wd 0.0500 time 0.2512 (0.2645) data time 0.0009 (0.0031) model time 0.2503 (0.2617) loss 5.4337 (5.7596) grad_norm 1.9411 (inf) loss_scale 256.0000 (343.2540) mem 9655MB [2024-08-04 08:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][320/625] eta 0:01:20 lr 0.000371 wd 0.0500 time 0.2619 (0.2642) data time 0.0009 (0.0030) model time 0.2610 (0.2615) loss 4.6907 (5.7601) grad_norm 1.8156 (inf) loss_scale 256.0000 (340.5358) mem 9655MB [2024-08-04 08:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][330/625] eta 0:01:17 lr 0.000371 wd 0.0500 time 0.2559 (0.2640) data time 0.0007 (0.0029) model time 0.2551 (0.2612) loss 6.1404 (5.7625) grad_norm 1.5916 (inf) loss_scale 256.0000 (337.9819) mem 9655MB [2024-08-04 08:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][340/625] eta 0:01:15 lr 0.000371 wd 0.0500 time 0.2568 (0.2637) data time 0.0008 (0.0029) model time 0.2561 (0.2610) loss 5.7140 (5.7579) grad_norm 2.4122 (inf) loss_scale 256.0000 (335.5777) mem 9655MB [2024-08-04 08:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][350/625] eta 0:01:12 lr 0.000371 wd 0.0500 time 0.2561 (0.2635) data time 0.0008 (0.0028) model time 0.2553 (0.2608) loss 6.2486 (5.7720) grad_norm 2.0305 (inf) loss_scale 256.0000 (333.3105) mem 9655MB [2024-08-04 08:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][360/625] eta 0:01:09 lr 0.000371 wd 0.0500 time 0.2609 (0.2639) data time 0.0008 (0.0028) model time 0.2600 (0.2613) loss 5.6661 (5.7786) grad_norm 1.9742 (inf) loss_scale 256.0000 (331.1690) mem 9655MB [2024-08-04 08:00:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][370/625] eta 0:01:07 lr 0.000370 wd 0.0500 time 0.2532 (0.2637) data time 0.0010 (0.0027) model time 0.2522 (0.2611) loss 6.5850 (5.7808) grad_norm 2.0058 (inf) loss_scale 256.0000 (329.1429) mem 9655MB [2024-08-04 08:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][380/625] eta 0:01:04 lr 0.000370 wd 0.0500 time 0.2585 (0.2635) data time 0.0007 (0.0027) model time 0.2578 (0.2609) loss 5.3674 (5.7775) grad_norm 1.7377 (inf) loss_scale 256.0000 (327.2231) mem 9655MB [2024-08-04 08:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][390/625] eta 0:01:01 lr 0.000370 wd 0.0500 time 0.2589 (0.2637) data time 0.0009 (0.0026) model time 0.2581 (0.2613) loss 5.9654 (5.7757) grad_norm 2.4433 (inf) loss_scale 256.0000 (325.4015) mem 9655MB [2024-08-04 08:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][400/625] eta 0:00:59 lr 0.000370 wd 0.0500 time 0.2551 (0.2636) data time 0.0007 (0.0026) model time 0.2544 (0.2611) loss 5.5243 (5.7762) grad_norm 2.1341 (inf) loss_scale 256.0000 (323.6708) mem 9655MB [2024-08-04 08:00:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][410/625] eta 0:00:56 lr 0.000370 wd 0.0500 time 0.2551 (0.2634) data time 0.0008 (0.0025) model time 0.2543 (0.2610) loss 6.5070 (5.7795) grad_norm 3.9829 (inf) loss_scale 256.0000 (322.0243) mem 9655MB [2024-08-04 08:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][420/625] eta 0:00:54 lr 0.000370 wd 0.0500 time 0.2534 (0.2636) data time 0.0009 (0.0025) model time 0.2525 (0.2613) loss 6.2265 (5.7873) grad_norm 1.6291 (inf) loss_scale 256.0000 (320.4561) mem 9655MB [2024-08-04 08:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][430/625] eta 0:00:51 lr 0.000370 wd 0.0500 time 0.2524 (0.2635) data time 0.0008 (0.0025) model time 0.2516 (0.2611) loss 6.3424 (5.7844) grad_norm 2.5032 (inf) loss_scale 256.0000 (318.9606) mem 9655MB [2024-08-04 08:01:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][440/625] eta 0:00:48 lr 0.000370 wd 0.0500 time 0.2565 (0.2633) data time 0.0010 (0.0024) model time 0.2555 (0.2610) loss 7.0365 (5.7918) grad_norm 2.1847 (inf) loss_scale 256.0000 (317.5329) mem 9655MB [2024-08-04 08:01:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][450/625] eta 0:00:46 lr 0.000369 wd 0.0500 time 0.2512 (0.2631) data time 0.0007 (0.0024) model time 0.2505 (0.2608) loss 6.4037 (5.7857) grad_norm 2.4236 (inf) loss_scale 256.0000 (316.1685) mem 9655MB [2024-08-04 08:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][460/625] eta 0:00:43 lr 0.000369 wd 0.0500 time 0.2543 (0.2629) data time 0.0009 (0.0024) model time 0.2534 (0.2606) loss 5.6631 (5.7920) grad_norm 1.4248 (inf) loss_scale 256.0000 (314.8633) mem 9655MB [2024-08-04 08:01:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][470/625] eta 0:00:40 lr 0.000369 wd 0.0500 time 0.2567 (0.2630) data time 0.0008 (0.0024) model time 0.2559 (0.2608) loss 5.9389 (5.7864) grad_norm 1.3318 (inf) loss_scale 256.0000 (313.6136) mem 9655MB [2024-08-04 08:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][480/625] eta 0:00:38 lr 0.000369 wd 0.0500 time 0.2557 (0.2633) data time 0.0012 (0.0023) model time 0.2546 (0.2611) loss 5.1520 (5.7889) grad_norm 2.1165 (inf) loss_scale 256.0000 (312.4158) mem 9655MB [2024-08-04 08:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][490/625] eta 0:00:35 lr 0.000369 wd 0.0500 time 0.2552 (0.2631) data time 0.0010 (0.0023) model time 0.2542 (0.2610) loss 4.8954 (5.7819) grad_norm 1.8035 (inf) loss_scale 256.0000 (311.2668) mem 9655MB [2024-08-04 08:01:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][500/625] eta 0:00:32 lr 0.000369 wd 0.0500 time 0.2589 (0.2630) data time 0.0008 (0.0023) model time 0.2581 (0.2608) loss 6.1738 (5.7807) grad_norm 1.7258 (inf) loss_scale 256.0000 (310.1637) mem 9655MB [2024-08-04 08:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][510/625] eta 0:00:30 lr 0.000369 wd 0.0500 time 0.2615 (0.2635) data time 0.0007 (0.0022) model time 0.2607 (0.2614) loss 6.2131 (5.7777) grad_norm 1.6063 (inf) loss_scale 256.0000 (309.1037) mem 9655MB [2024-08-04 08:01:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][520/625] eta 0:00:27 lr 0.000368 wd 0.0500 time 0.2629 (0.2633) data time 0.0007 (0.0022) model time 0.2622 (0.2613) loss 6.3748 (5.7825) grad_norm 2.0990 (inf) loss_scale 256.0000 (308.0845) mem 9655MB [2024-08-04 08:01:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][530/625] eta 0:00:25 lr 0.000368 wd 0.0500 time 0.2578 (0.2632) data time 0.0006 (0.0022) model time 0.2572 (0.2611) loss 6.0232 (5.7808) grad_norm 1.7155 (inf) loss_scale 256.0000 (307.1036) mem 9655MB [2024-08-04 08:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][540/625] eta 0:00:22 lr 0.000368 wd 0.0500 time 0.2578 (0.2631) data time 0.0006 (0.0022) model time 0.2572 (0.2610) loss 5.1570 (5.7818) grad_norm 1.7694 (inf) loss_scale 256.0000 (306.1590) mem 9655MB [2024-08-04 08:01:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][550/625] eta 0:00:19 lr 0.000368 wd 0.0500 time 0.2592 (0.2630) data time 0.0006 (0.0021) model time 0.2586 (0.2609) loss 6.3882 (5.7812) grad_norm 4.2762 (inf) loss_scale 256.0000 (305.2486) mem 9655MB [2024-08-04 08:01:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][560/625] eta 0:00:17 lr 0.000368 wd 0.0500 time 0.2571 (0.2628) data time 0.0007 (0.0021) model time 0.2564 (0.2608) loss 6.2195 (5.7859) grad_norm 1.6436 (inf) loss_scale 256.0000 (304.3708) mem 9655MB [2024-08-04 08:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][570/625] eta 0:00:14 lr 0.000368 wd 0.0500 time 0.2532 (0.2627) data time 0.0007 (0.0021) model time 0.2525 (0.2607) loss 5.9975 (5.7821) grad_norm 7.1287 (inf) loss_scale 256.0000 (303.5236) mem 9655MB [2024-08-04 08:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][580/625] eta 0:00:11 lr 0.000368 wd 0.0500 time 0.2555 (0.2626) data time 0.0006 (0.0021) model time 0.2548 (0.2606) loss 6.2864 (5.7813) grad_norm 2.3019 (inf) loss_scale 256.0000 (302.7057) mem 9655MB [2024-08-04 08:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][590/625] eta 0:00:09 lr 0.000368 wd 0.0500 time 0.2578 (0.2628) data time 0.0008 (0.0021) model time 0.2570 (0.2608) loss 5.7830 (5.7804) grad_norm 2.1588 (inf) loss_scale 256.0000 (301.9154) mem 9655MB [2024-08-04 08:01:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][600/625] eta 0:00:06 lr 0.000367 wd 0.0500 time 0.2546 (0.2627) data time 0.0010 (0.0020) model time 0.2536 (0.2607) loss 4.8276 (5.7795) grad_norm 2.6996 (inf) loss_scale 256.0000 (301.1514) mem 9655MB [2024-08-04 08:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][610/625] eta 0:00:03 lr 0.000367 wd 0.0500 time 0.2543 (0.2626) data time 0.0006 (0.0020) model time 0.2537 (0.2606) loss 5.6884 (5.7768) grad_norm 2.8585 (inf) loss_scale 256.0000 (300.4124) mem 9655MB [2024-08-04 08:01:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][620/625] eta 0:00:01 lr 0.000367 wd 0.0500 time 0.2536 (0.2624) data time 0.0005 (0.0020) model time 0.2530 (0.2604) loss 4.9683 (5.7730) grad_norm 2.1472 (inf) loss_scale 256.0000 (299.6973) mem 9655MB [2024-08-04 08:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 222 training takes 0:02:43 [2024-08-04 08:01:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:01:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.6089 (0.6089) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9346 (0.7363) Acc@1 79.980 (86.075) Acc@5 96.289 (97.656) Mem 9655MB [2024-08-04 08:01:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0527 (0.8537) Acc@1 76.270 (82.975) Acc@5 94.873 (96.452) Mem 9655MB [2024-08-04 08:01:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.616 Acc@5 96.399 [2024-08-04 08:01:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 08:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.776 (0.776) Loss 0.5820 (0.5820) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.9170 (0.7127) Acc@1 80.518 (86.399) Acc@5 95.850 (97.656) Mem 9655MB [2024-08-04 08:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0303 (0.8365) Acc@1 77.051 (83.150) Acc@5 95.459 (96.445) Mem 9655MB [2024-08-04 08:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.843 Acc@5 96.449 [2024-08-04 08:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][0/625] eta 0:11:27 lr 0.000367 wd 0.0500 time 1.0992 (1.0992) data time 0.4176 (0.4176) model time 0.0000 (0.0000) loss 5.6071 (5.6071) grad_norm 1.4145 (1.4145) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][10/625] eta 0:03:24 lr 0.000367 wd 0.0500 time 0.2531 (0.3330) data time 0.0010 (0.0388) model time 0.0000 (0.0000) loss 6.0855 (5.9139) grad_norm 1.9859 (2.7008) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][20/625] eta 0:03:02 lr 0.000367 wd 0.0500 time 0.2573 (0.3015) data time 0.0007 (0.0207) model time 0.0000 (0.0000) loss 4.8698 (5.7023) grad_norm 3.4610 (2.8332) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][30/625] eta 0:02:54 lr 0.000367 wd 0.0500 time 0.2578 (0.2934) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 6.0957 (5.7898) grad_norm 1.8133 (3.0113) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][40/625] eta 0:02:48 lr 0.000367 wd 0.0500 time 0.2514 (0.2885) data time 0.0007 (0.0111) model time 0.0000 (0.0000) loss 4.7808 (5.7317) grad_norm 1.9648 (2.8185) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][50/625] eta 0:02:44 lr 0.000366 wd 0.0500 time 0.2566 (0.2860) data time 0.0009 (0.0091) model time 0.0000 (0.0000) loss 5.8710 (5.7874) grad_norm 2.3954 (2.7587) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][60/625] eta 0:02:39 lr 0.000366 wd 0.0500 time 0.3846 (0.2830) data time 0.0008 (0.0077) model time 0.3838 (0.2667) loss 6.1308 (5.7590) grad_norm 2.2264 (2.6480) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][70/625] eta 0:02:34 lr 0.000366 wd 0.0500 time 0.2540 (0.2793) data time 0.0008 (0.0068) model time 0.2532 (0.2612) loss 5.5004 (5.7378) grad_norm 3.3554 (2.5477) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][80/625] eta 0:02:30 lr 0.000366 wd 0.0500 time 0.2545 (0.2763) data time 0.0009 (0.0060) model time 0.2535 (0.2589) loss 5.7259 (5.7371) grad_norm 2.0991 (2.5126) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][90/625] eta 0:02:26 lr 0.000366 wd 0.0500 time 0.2542 (0.2742) data time 0.0012 (0.0055) model time 0.2530 (0.2581) loss 6.5042 (5.7716) grad_norm 1.4714 (2.4631) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][100/625] eta 0:02:23 lr 0.000366 wd 0.0500 time 0.2562 (0.2742) data time 0.0008 (0.0050) model time 0.2553 (0.2612) loss 5.6258 (5.7619) grad_norm 2.0305 (2.5011) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][110/625] eta 0:02:21 lr 0.000366 wd 0.0500 time 0.2524 (0.2743) data time 0.0009 (0.0047) model time 0.2515 (0.2635) loss 5.4665 (5.7648) grad_norm 1.5919 (2.4873) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][120/625] eta 0:02:17 lr 0.000365 wd 0.0500 time 0.2549 (0.2728) data time 0.0011 (0.0044) model time 0.2538 (0.2622) loss 5.0101 (5.7415) grad_norm 1.9432 (2.4470) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][130/625] eta 0:02:14 lr 0.000365 wd 0.0500 time 0.2664 (0.2717) data time 0.0010 (0.0041) model time 0.2654 (0.2617) loss 5.0718 (5.7547) grad_norm 2.1272 (2.4011) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][140/625] eta 0:02:11 lr 0.000365 wd 0.0500 time 0.2581 (0.2721) data time 0.0012 (0.0039) model time 0.2570 (0.2633) loss 6.2046 (5.7478) grad_norm 2.4463 (2.4180) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][150/625] eta 0:02:08 lr 0.000365 wd 0.0500 time 0.2686 (0.2711) data time 0.0012 (0.0037) model time 0.2674 (0.2626) loss 5.2292 (5.7568) grad_norm 1.9737 (2.4283) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][160/625] eta 0:02:05 lr 0.000365 wd 0.0500 time 0.2531 (0.2701) data time 0.0009 (0.0035) model time 0.2521 (0.2618) loss 6.1906 (5.7499) grad_norm 1.5834 (2.4347) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][170/625] eta 0:02:02 lr 0.000365 wd 0.0500 time 0.2513 (0.2693) data time 0.0008 (0.0034) model time 0.2505 (0.2612) loss 6.2778 (5.7373) grad_norm 1.6081 (2.4510) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][180/625] eta 0:01:59 lr 0.000365 wd 0.0500 time 0.2553 (0.2686) data time 0.0008 (0.0032) model time 0.2545 (0.2608) loss 6.0168 (5.7426) grad_norm 1.5118 (2.4326) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][190/625] eta 0:01:56 lr 0.000364 wd 0.0500 time 0.2550 (0.2679) data time 0.0010 (0.0031) model time 0.2540 (0.2603) loss 6.2884 (5.7419) grad_norm 1.7456 (2.4583) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][200/625] eta 0:01:53 lr 0.000364 wd 0.0500 time 0.2578 (0.2674) data time 0.0010 (0.0030) model time 0.2567 (0.2601) loss 5.5888 (5.7501) grad_norm 3.9525 (2.5202) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][210/625] eta 0:01:50 lr 0.000364 wd 0.0500 time 0.2559 (0.2669) data time 0.0008 (0.0029) model time 0.2551 (0.2598) loss 5.6297 (5.7497) grad_norm 2.3383 (2.5261) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][220/625] eta 0:01:48 lr 0.000364 wd 0.0500 time 0.2554 (0.2673) data time 0.0007 (0.0028) model time 0.2547 (0.2607) loss 6.6907 (5.7404) grad_norm 2.4989 (2.5140) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][230/625] eta 0:01:45 lr 0.000364 wd 0.0500 time 0.2518 (0.2683) data time 0.0008 (0.0028) model time 0.2511 (0.2624) loss 5.1937 (5.7374) grad_norm 1.5520 (2.4863) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][240/625] eta 0:01:43 lr 0.000364 wd 0.0500 time 0.2550 (0.2678) data time 0.0007 (0.0027) model time 0.2543 (0.2619) loss 6.4067 (5.7468) grad_norm 3.4329 (2.4835) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][250/625] eta 0:01:40 lr 0.000364 wd 0.0500 time 0.2576 (0.2674) data time 0.0008 (0.0026) model time 0.2568 (0.2616) loss 4.5936 (5.7582) grad_norm 5.2598 (2.5248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][260/625] eta 0:01:37 lr 0.000364 wd 0.0500 time 0.2589 (0.2670) data time 0.0007 (0.0025) model time 0.2582 (0.2614) loss 6.7101 (5.7701) grad_norm 2.3342 (2.5504) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][270/625] eta 0:01:34 lr 0.000363 wd 0.0500 time 0.2535 (0.2666) data time 0.0009 (0.0025) model time 0.2526 (0.2611) loss 6.0613 (5.7558) grad_norm 3.0418 (2.5862) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][280/625] eta 0:01:31 lr 0.000363 wd 0.0500 time 0.2583 (0.2662) data time 0.0011 (0.0024) model time 0.2572 (0.2609) loss 4.4928 (5.7483) grad_norm 1.5789 (2.6029) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][290/625] eta 0:01:29 lr 0.000363 wd 0.0500 time 0.2510 (0.2659) data time 0.0008 (0.0024) model time 0.2502 (0.2606) loss 6.2876 (5.7468) grad_norm 2.3291 (2.5981) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][300/625] eta 0:01:26 lr 0.000363 wd 0.0500 time 0.2533 (0.2660) data time 0.0007 (0.0023) model time 0.2526 (0.2610) loss 5.3365 (5.7453) grad_norm 9.4426 (2.6020) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][310/625] eta 0:01:23 lr 0.000363 wd 0.0500 time 0.2562 (0.2657) data time 0.0006 (0.0023) model time 0.2556 (0.2608) loss 7.1075 (5.7477) grad_norm 2.0600 (2.6027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][320/625] eta 0:01:20 lr 0.000363 wd 0.0500 time 0.2554 (0.2654) data time 0.0009 (0.0022) model time 0.2545 (0.2605) loss 6.7375 (5.7533) grad_norm 2.3130 (2.6096) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][330/625] eta 0:01:18 lr 0.000363 wd 0.0500 time 0.2571 (0.2655) data time 0.0007 (0.0022) model time 0.2564 (0.2608) loss 6.1402 (5.7498) grad_norm 1.2501 (2.5929) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][340/625] eta 0:01:15 lr 0.000362 wd 0.0500 time 0.2550 (0.2656) data time 0.0011 (0.0022) model time 0.2539 (0.2610) loss 6.3402 (5.7498) grad_norm 3.1293 (2.5876) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][350/625] eta 0:01:12 lr 0.000362 wd 0.0500 time 0.2565 (0.2653) data time 0.0008 (0.0021) model time 0.2557 (0.2608) loss 5.6526 (5.7449) grad_norm 2.4159 (2.6370) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][360/625] eta 0:01:10 lr 0.000362 wd 0.0500 time 0.2535 (0.2650) data time 0.0008 (0.0021) model time 0.2528 (0.2606) loss 5.0581 (5.7507) grad_norm 2.6118 (2.6415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][370/625] eta 0:01:07 lr 0.000362 wd 0.0500 time 0.2583 (0.2652) data time 0.0006 (0.0021) model time 0.2577 (0.2610) loss 6.5744 (5.7508) grad_norm 2.4262 (2.6316) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][380/625] eta 0:01:04 lr 0.000362 wd 0.0500 time 0.2551 (0.2650) data time 0.0009 (0.0020) model time 0.2542 (0.2607) loss 4.5639 (5.7568) grad_norm 3.7701 (2.6268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][390/625] eta 0:01:02 lr 0.000362 wd 0.0500 time 0.2543 (0.2647) data time 0.0009 (0.0020) model time 0.2534 (0.2606) loss 6.4826 (5.7625) grad_norm 2.7675 (2.6480) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][400/625] eta 0:00:59 lr 0.000362 wd 0.0500 time 0.2604 (0.2650) data time 0.0008 (0.0020) model time 0.2596 (0.2610) loss 5.8627 (5.7688) grad_norm 3.8453 (2.6438) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][410/625] eta 0:00:56 lr 0.000362 wd 0.0500 time 0.2556 (0.2648) data time 0.0010 (0.0020) model time 0.2546 (0.2608) loss 5.3158 (5.7691) grad_norm 3.0178 (2.6454) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][420/625] eta 0:00:54 lr 0.000361 wd 0.0500 time 0.2622 (0.2646) data time 0.0006 (0.0019) model time 0.2615 (0.2607) loss 6.4528 (5.7640) grad_norm 3.4811 (2.6463) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][430/625] eta 0:00:51 lr 0.000361 wd 0.0500 time 0.2571 (0.2644) data time 0.0008 (0.0019) model time 0.2562 (0.2606) loss 4.6260 (5.7695) grad_norm 1.9267 (2.6515) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][440/625] eta 0:00:48 lr 0.000361 wd 0.0500 time 0.2541 (0.2647) data time 0.0009 (0.0019) model time 0.2532 (0.2609) loss 5.5790 (5.7669) grad_norm 1.3665 (2.6356) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][450/625] eta 0:00:46 lr 0.000361 wd 0.0500 time 0.2547 (0.2645) data time 0.0010 (0.0019) model time 0.2538 (0.2608) loss 6.2534 (5.7666) grad_norm 2.6711 (2.6263) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][460/625] eta 0:00:43 lr 0.000361 wd 0.0500 time 0.2530 (0.2643) data time 0.0006 (0.0018) model time 0.2524 (0.2607) loss 6.0639 (5.7685) grad_norm 2.5596 (2.6185) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][470/625] eta 0:00:40 lr 0.000361 wd 0.0500 time 0.2564 (0.2641) data time 0.0007 (0.0018) model time 0.2557 (0.2605) loss 6.9255 (5.7730) grad_norm 2.7912 (2.6160) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][480/625] eta 0:00:38 lr 0.000361 wd 0.0500 time 0.2564 (0.2643) data time 0.0012 (0.0018) model time 0.2552 (0.2608) loss 4.8557 (5.7700) grad_norm 1.7022 (2.6105) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][490/625] eta 0:00:35 lr 0.000360 wd 0.0500 time 0.2556 (0.2641) data time 0.0007 (0.0018) model time 0.2548 (0.2607) loss 5.0935 (5.7642) grad_norm 1.7085 (2.6034) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][500/625] eta 0:00:33 lr 0.000360 wd 0.0500 time 0.2589 (0.2643) data time 0.0006 (0.0018) model time 0.2583 (0.2609) loss 5.9328 (5.7646) grad_norm 14.5893 (2.6189) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][510/625] eta 0:00:30 lr 0.000360 wd 0.0500 time 0.2531 (0.2642) data time 0.0008 (0.0018) model time 0.2523 (0.2608) loss 6.7693 (5.7678) grad_norm 1.6016 (2.6029) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][520/625] eta 0:00:27 lr 0.000360 wd 0.0500 time 0.2563 (0.2643) data time 0.0010 (0.0017) model time 0.2553 (0.2610) loss 5.8938 (5.7628) grad_norm 2.1823 (2.6310) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][530/625] eta 0:00:25 lr 0.000360 wd 0.0500 time 0.2522 (0.2641) data time 0.0011 (0.0017) model time 0.2511 (0.2608) loss 6.0731 (5.7671) grad_norm 2.2690 (2.6308) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][540/625] eta 0:00:22 lr 0.000360 wd 0.0500 time 0.2557 (0.2644) data time 0.0007 (0.0017) model time 0.2551 (0.2612) loss 4.6565 (5.7667) grad_norm 2.4713 (2.6283) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][550/625] eta 0:00:19 lr 0.000360 wd 0.0500 time 0.2544 (0.2642) data time 0.0009 (0.0017) model time 0.2535 (0.2611) loss 6.5134 (5.7633) grad_norm 1.9506 (2.6431) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][560/625] eta 0:00:17 lr 0.000360 wd 0.0500 time 0.2562 (0.2641) data time 0.0008 (0.0017) model time 0.2554 (0.2609) loss 6.5074 (5.7665) grad_norm 1.7352 (2.6346) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][570/625] eta 0:00:14 lr 0.000359 wd 0.0500 time 0.2546 (0.2640) data time 0.0012 (0.0017) model time 0.2534 (0.2608) loss 4.3242 (5.7629) grad_norm 1.6154 (2.6250) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][580/625] eta 0:00:11 lr 0.000359 wd 0.0500 time 0.2560 (0.2638) data time 0.0011 (0.0017) model time 0.2548 (0.2607) loss 5.8526 (5.7678) grad_norm 1.4675 (2.6159) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][590/625] eta 0:00:09 lr 0.000359 wd 0.0500 time 0.2474 (0.2637) data time 0.0010 (0.0017) model time 0.2465 (0.2606) loss 4.3716 (5.7615) grad_norm 3.5959 (2.6611) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][600/625] eta 0:00:06 lr 0.000359 wd 0.0500 time 0.2568 (0.2636) data time 0.0006 (0.0016) model time 0.2562 (0.2605) loss 5.1032 (5.7595) grad_norm 2.0225 (2.6554) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][610/625] eta 0:00:03 lr 0.000359 wd 0.0500 time 0.2539 (0.2634) data time 0.0006 (0.0016) model time 0.2533 (0.2604) loss 6.5203 (5.7595) grad_norm 2.6369 (2.6554) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][620/625] eta 0:00:01 lr 0.000359 wd 0.0500 time 0.2542 (0.2633) data time 0.0006 (0.0016) model time 0.2536 (0.2603) loss 4.5195 (5.7574) grad_norm 1.9875 (2.6429) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 223 training takes 0:02:44 [2024-08-04 08:04:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:04:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.6069 (0.6069) Acc@1 89.014 (89.014) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 08:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9482 (0.7342) Acc@1 80.029 (86.075) Acc@5 95.947 (97.607) Mem 9655MB [2024-08-04 08:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0430 (0.8550) Acc@1 76.562 (82.836) Acc@5 95.117 (96.405) Mem 9655MB [2024-08-04 08:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.566 Acc@5 96.405 [2024-08-04 08:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-04 08:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.746 (0.746) Loss 0.5820 (0.5820) Acc@1 89.697 (89.697) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.9165 (0.7126) Acc@1 80.615 (86.439) Acc@5 95.898 (97.647) Mem 9655MB [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0283 (0.8364) Acc@1 76.953 (83.166) Acc@5 95.459 (96.438) Mem 9655MB [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.855 Acc@5 96.443 [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.86% [2024-08-04 08:04:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:04:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][0/625] eta 0:07:31 lr 0.000359 wd 0.0500 time 0.7219 (0.7219) data time 0.4800 (0.4800) model time 0.0000 (0.0000) loss 5.1320 (5.1320) grad_norm 1.4803 (1.4803) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][10/625] eta 0:03:16 lr 0.000359 wd 0.0500 time 0.2547 (0.3196) data time 0.0007 (0.0444) model time 0.0000 (0.0000) loss 4.4404 (5.5263) grad_norm 1.6903 (2.0814) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][20/625] eta 0:02:55 lr 0.000358 wd 0.0500 time 0.2599 (0.2900) data time 0.0008 (0.0237) model time 0.0000 (0.0000) loss 5.3450 (5.5154) grad_norm 1.7344 (2.1814) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][30/625] eta 0:02:45 lr 0.000358 wd 0.0500 time 0.2572 (0.2789) data time 0.0009 (0.0164) model time 0.0000 (0.0000) loss 6.5623 (5.5575) grad_norm 2.6000 (2.2630) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][40/625] eta 0:02:42 lr 0.000358 wd 0.0500 time 0.2578 (0.2780) data time 0.0009 (0.0126) model time 0.0000 (0.0000) loss 5.7518 (5.6415) grad_norm 1.3256 (2.2377) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][50/625] eta 0:02:37 lr 0.000358 wd 0.0500 time 0.2543 (0.2736) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 4.7422 (5.6578) grad_norm 1.8481 (2.2364) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][60/625] eta 0:02:32 lr 0.000358 wd 0.0500 time 0.2566 (0.2707) data time 0.0006 (0.0088) model time 0.2560 (0.2546) loss 5.4721 (5.6568) grad_norm 1.7881 (2.2705) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][70/625] eta 0:02:30 lr 0.000358 wd 0.0500 time 0.2548 (0.2711) data time 0.0008 (0.0077) model time 0.2540 (0.2637) loss 5.6319 (5.6840) grad_norm 3.1411 (2.2701) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][80/625] eta 0:02:26 lr 0.000358 wd 0.0500 time 0.2586 (0.2693) data time 0.0009 (0.0069) model time 0.2577 (0.2608) loss 5.9372 (5.6789) grad_norm 1.7999 (2.2162) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][90/625] eta 0:02:23 lr 0.000357 wd 0.0500 time 0.2564 (0.2679) data time 0.0008 (0.0062) model time 0.2556 (0.2597) loss 5.5766 (5.6879) grad_norm 3.1056 (2.2727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][100/625] eta 0:02:21 lr 0.000357 wd 0.0500 time 0.2562 (0.2687) data time 0.0008 (0.0057) model time 0.2554 (0.2626) loss 6.6685 (5.7296) grad_norm 3.1149 (2.2648) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][110/625] eta 0:02:18 lr 0.000357 wd 0.0500 time 0.2562 (0.2693) data time 0.0007 (0.0053) model time 0.2555 (0.2647) loss 6.1596 (5.7649) grad_norm 3.0238 (2.3857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][120/625] eta 0:02:15 lr 0.000357 wd 0.0500 time 0.2546 (0.2683) data time 0.0008 (0.0049) model time 0.2538 (0.2634) loss 5.6017 (5.7345) grad_norm 2.0908 (2.4403) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][130/625] eta 0:02:12 lr 0.000357 wd 0.0500 time 0.2583 (0.2673) data time 0.0007 (0.0046) model time 0.2577 (0.2624) loss 5.9973 (5.7635) grad_norm 2.1855 (2.4797) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][140/625] eta 0:02:10 lr 0.000357 wd 0.0500 time 0.2407 (0.2694) data time 0.0010 (0.0044) model time 0.2397 (0.2660) loss 5.6357 (5.7520) grad_norm 2.2176 (2.4687) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][150/625] eta 0:02:07 lr 0.000357 wd 0.0500 time 0.2545 (0.2684) data time 0.0008 (0.0042) model time 0.2537 (0.2647) loss 4.9927 (5.7388) grad_norm 1.5994 (2.4487) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][160/625] eta 0:02:04 lr 0.000357 wd 0.0500 time 0.2492 (0.2676) data time 0.0008 (0.0040) model time 0.2485 (0.2637) loss 5.8992 (5.7283) grad_norm 2.8106 (2.4739) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][170/625] eta 0:02:01 lr 0.000356 wd 0.0500 time 0.2533 (0.2669) data time 0.0009 (0.0038) model time 0.2524 (0.2629) loss 6.1796 (5.7424) grad_norm 2.4638 (2.4848) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][180/625] eta 0:01:58 lr 0.000356 wd 0.0500 time 0.2585 (0.2662) data time 0.0006 (0.0036) model time 0.2579 (0.2623) loss 4.9697 (5.7277) grad_norm 2.7470 (2.4903) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][190/625] eta 0:01:55 lr 0.000356 wd 0.0500 time 0.2563 (0.2656) data time 0.0012 (0.0035) model time 0.2551 (0.2617) loss 4.6779 (5.7097) grad_norm 2.0620 (2.4831) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][200/625] eta 0:01:52 lr 0.000356 wd 0.0500 time 0.2515 (0.2651) data time 0.0008 (0.0034) model time 0.2507 (0.2612) loss 5.3735 (5.7269) grad_norm 2.5749 (2.4764) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][210/625] eta 0:01:50 lr 0.000356 wd 0.0500 time 0.2529 (0.2662) data time 0.0007 (0.0033) model time 0.2522 (0.2628) loss 4.8715 (5.7225) grad_norm 2.0733 (2.4875) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][220/625] eta 0:01:47 lr 0.000356 wd 0.0500 time 0.2555 (0.2657) data time 0.0009 (0.0032) model time 0.2546 (0.2623) loss 5.9332 (5.7132) grad_norm 1.4224 (2.4852) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][230/625] eta 0:01:45 lr 0.000356 wd 0.0500 time 0.2556 (0.2660) data time 0.0011 (0.0031) model time 0.2546 (0.2629) loss 5.6312 (5.7145) grad_norm 1.6075 (2.4636) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][240/625] eta 0:01:42 lr 0.000355 wd 0.0500 time 0.2557 (0.2665) data time 0.0007 (0.0030) model time 0.2550 (0.2635) loss 4.5052 (5.7186) grad_norm 1.8543 (2.4841) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][250/625] eta 0:01:39 lr 0.000355 wd 0.0500 time 0.2552 (0.2660) data time 0.0009 (0.0029) model time 0.2543 (0.2631) loss 4.9423 (5.7075) grad_norm 1.5404 (2.4812) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][260/625] eta 0:01:36 lr 0.000355 wd 0.0500 time 0.2548 (0.2657) data time 0.0008 (0.0028) model time 0.2539 (0.2628) loss 6.1684 (5.7138) grad_norm 2.2404 (2.5179) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][270/625] eta 0:01:34 lr 0.000355 wd 0.0500 time 0.2532 (0.2653) data time 0.0009 (0.0027) model time 0.2523 (0.2624) loss 5.8738 (5.7063) grad_norm 3.3362 (2.5355) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:05:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][280/625] eta 0:01:31 lr 0.000355 wd 0.0500 time 0.2518 (0.2651) data time 0.0009 (0.0027) model time 0.2509 (0.2622) loss 6.9368 (5.7091) grad_norm inf (inf) loss_scale 128.0000 (255.5445) mem 9655MB [2024-08-04 08:06:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][290/625] eta 0:01:28 lr 0.000355 wd 0.0500 time 0.2543 (0.2654) data time 0.0009 (0.0026) model time 0.2535 (0.2626) loss 6.2267 (5.7126) grad_norm 1.5427 (inf) loss_scale 128.0000 (251.1615) mem 9655MB [2024-08-04 08:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][300/625] eta 0:01:26 lr 0.000355 wd 0.0500 time 0.2528 (0.2651) data time 0.0007 (0.0026) model time 0.2521 (0.2623) loss 5.2707 (5.7113) grad_norm 2.8406 (inf) loss_scale 128.0000 (247.0698) mem 9655MB [2024-08-04 08:06:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][310/625] eta 0:01:23 lr 0.000355 wd 0.0500 time 0.2507 (0.2648) data time 0.0007 (0.0025) model time 0.2499 (0.2621) loss 5.1729 (5.7097) grad_norm 5.9927 (inf) loss_scale 128.0000 (243.2412) mem 9655MB [2024-08-04 08:06:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][320/625] eta 0:01:20 lr 0.000354 wd 0.0500 time 0.2582 (0.2645) data time 0.0008 (0.0025) model time 0.2574 (0.2618) loss 5.4866 (5.7218) grad_norm 1.7937 (inf) loss_scale 128.0000 (239.6511) mem 9655MB [2024-08-04 08:06:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][330/625] eta 0:01:17 lr 0.000354 wd 0.0500 time 0.2548 (0.2642) data time 0.0009 (0.0024) model time 0.2538 (0.2615) loss 6.6556 (5.7223) grad_norm 2.5701 (inf) loss_scale 128.0000 (236.2779) mem 9655MB [2024-08-04 08:06:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][340/625] eta 0:01:15 lr 0.000354 wd 0.0500 time 0.2527 (0.2646) data time 0.0007 (0.0024) model time 0.2520 (0.2620) loss 6.3844 (5.7166) grad_norm 2.3335 (inf) loss_scale 128.0000 (233.1026) mem 9655MB [2024-08-04 08:06:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][350/625] eta 0:01:12 lr 0.000354 wd 0.0500 time 0.2543 (0.2649) data time 0.0011 (0.0023) model time 0.2532 (0.2624) loss 6.1454 (5.7268) grad_norm 2.2384 (inf) loss_scale 128.0000 (230.1083) mem 9655MB [2024-08-04 08:06:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][360/625] eta 0:01:10 lr 0.000354 wd 0.0500 time 0.2545 (0.2647) data time 0.0011 (0.0023) model time 0.2534 (0.2622) loss 5.2472 (5.7333) grad_norm 2.3525 (inf) loss_scale 128.0000 (227.2798) mem 9655MB [2024-08-04 08:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][370/625] eta 0:01:07 lr 0.000354 wd 0.0500 time 0.2574 (0.2650) data time 0.0006 (0.0023) model time 0.2568 (0.2627) loss 6.4403 (5.7407) grad_norm 1.4581 (inf) loss_scale 128.0000 (224.6038) mem 9655MB [2024-08-04 08:06:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][380/625] eta 0:01:04 lr 0.000354 wd 0.0500 time 0.2522 (0.2648) data time 0.0009 (0.0022) model time 0.2513 (0.2625) loss 4.7795 (5.7442) grad_norm 1.3736 (inf) loss_scale 128.0000 (222.0682) mem 9655MB [2024-08-04 08:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][390/625] eta 0:01:02 lr 0.000353 wd 0.0500 time 0.2520 (0.2646) data time 0.0008 (0.0022) model time 0.2512 (0.2623) loss 6.7496 (5.7469) grad_norm 3.3540 (inf) loss_scale 128.0000 (219.6624) mem 9655MB [2024-08-04 08:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][400/625] eta 0:00:59 lr 0.000353 wd 0.0500 time 0.2519 (0.2644) data time 0.0009 (0.0022) model time 0.2509 (0.2621) loss 5.9574 (5.7487) grad_norm 2.3985 (inf) loss_scale 128.0000 (217.3766) mem 9655MB [2024-08-04 08:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][410/625] eta 0:00:56 lr 0.000353 wd 0.0500 time 0.2571 (0.2642) data time 0.0008 (0.0021) model time 0.2563 (0.2619) loss 4.4200 (5.7525) grad_norm 4.0528 (inf) loss_scale 128.0000 (215.2019) mem 9655MB [2024-08-04 08:06:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][420/625] eta 0:00:54 lr 0.000353 wd 0.0500 time 0.2575 (0.2640) data time 0.0007 (0.0021) model time 0.2568 (0.2617) loss 6.3705 (5.7603) grad_norm 2.4168 (inf) loss_scale 128.0000 (213.1306) mem 9655MB [2024-08-04 08:06:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][430/625] eta 0:00:51 lr 0.000353 wd 0.0500 time 0.2534 (0.2638) data time 0.0007 (0.0021) model time 0.2527 (0.2615) loss 5.7722 (5.7584) grad_norm 3.0621 (inf) loss_scale 128.0000 (211.1555) mem 9655MB [2024-08-04 08:06:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][440/625] eta 0:00:48 lr 0.000353 wd 0.0500 time 0.2630 (0.2641) data time 0.0007 (0.0021) model time 0.2623 (0.2618) loss 5.9519 (5.7537) grad_norm 2.6223 (inf) loss_scale 128.0000 (209.2698) mem 9655MB [2024-08-04 08:06:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][450/625] eta 0:00:46 lr 0.000353 wd 0.0500 time 0.2558 (0.2643) data time 0.0009 (0.0020) model time 0.2549 (0.2622) loss 5.0182 (5.7534) grad_norm 3.4918 (inf) loss_scale 128.0000 (207.4678) mem 9655MB [2024-08-04 08:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][460/625] eta 0:00:43 lr 0.000353 wd 0.0500 time 0.2564 (0.2646) data time 0.0009 (0.0020) model time 0.2555 (0.2625) loss 5.8283 (5.7571) grad_norm 3.4482 (inf) loss_scale 128.0000 (205.7440) mem 9655MB [2024-08-04 08:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][470/625] eta 0:00:41 lr 0.000352 wd 0.0500 time 0.2510 (0.2647) data time 0.0010 (0.0020) model time 0.2500 (0.2627) loss 4.6625 (5.7595) grad_norm 2.2759 (inf) loss_scale 128.0000 (204.0934) mem 9655MB [2024-08-04 08:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][480/625] eta 0:00:38 lr 0.000352 wd 0.0500 time 0.2515 (0.2646) data time 0.0007 (0.0020) model time 0.2508 (0.2625) loss 5.7830 (5.7619) grad_norm 1.6458 (inf) loss_scale 128.0000 (202.5114) mem 9655MB [2024-08-04 08:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][490/625] eta 0:00:35 lr 0.000352 wd 0.0500 time 0.2554 (0.2644) data time 0.0007 (0.0019) model time 0.2547 (0.2624) loss 6.4282 (5.7609) grad_norm 2.3009 (inf) loss_scale 128.0000 (200.9939) mem 9655MB [2024-08-04 08:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][500/625] eta 0:00:33 lr 0.000352 wd 0.0500 time 0.2522 (0.2643) data time 0.0009 (0.0019) model time 0.2513 (0.2622) loss 6.6154 (5.7564) grad_norm 1.7755 (inf) loss_scale 128.0000 (199.5369) mem 9655MB [2024-08-04 08:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][510/625] eta 0:00:30 lr 0.000352 wd 0.0500 time 0.2512 (0.2645) data time 0.0010 (0.0019) model time 0.2502 (0.2625) loss 4.5156 (5.7520) grad_norm 1.7873 (inf) loss_scale 128.0000 (198.1370) mem 9655MB [2024-08-04 08:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][520/625] eta 0:00:27 lr 0.000352 wd 0.0500 time 0.2539 (0.2645) data time 0.0007 (0.0019) model time 0.2532 (0.2626) loss 5.3272 (5.7525) grad_norm 1.8364 (inf) loss_scale 128.0000 (196.7908) mem 9655MB [2024-08-04 08:07:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][530/625] eta 0:00:25 lr 0.000352 wd 0.0500 time 0.2543 (0.2651) data time 0.0006 (0.0019) model time 0.2536 (0.2633) loss 5.8094 (5.7543) grad_norm 1.4499 (inf) loss_scale 128.0000 (195.4953) mem 9655MB [2024-08-04 08:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][540/625] eta 0:00:22 lr 0.000351 wd 0.0500 time 0.2558 (0.2650) data time 0.0008 (0.0018) model time 0.2550 (0.2631) loss 4.8102 (5.7542) grad_norm 1.4787 (inf) loss_scale 128.0000 (194.2477) mem 9655MB [2024-08-04 08:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][550/625] eta 0:00:19 lr 0.000351 wd 0.0500 time 0.2513 (0.2648) data time 0.0011 (0.0018) model time 0.2502 (0.2629) loss 5.2583 (5.7543) grad_norm 2.4029 (inf) loss_scale 128.0000 (193.0454) mem 9655MB [2024-08-04 08:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][560/625] eta 0:00:17 lr 0.000351 wd 0.0500 time 0.2538 (0.2647) data time 0.0010 (0.0018) model time 0.2528 (0.2628) loss 6.0661 (5.7563) grad_norm 1.9017 (inf) loss_scale 128.0000 (191.8859) mem 9655MB [2024-08-04 08:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][570/625] eta 0:00:14 lr 0.000351 wd 0.0500 time 0.2575 (0.2645) data time 0.0008 (0.0018) model time 0.2567 (0.2626) loss 6.0436 (5.7524) grad_norm 2.1283 (inf) loss_scale 128.0000 (190.7671) mem 9655MB [2024-08-04 08:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][580/625] eta 0:00:11 lr 0.000351 wd 0.0500 time 0.2557 (0.2643) data time 0.0007 (0.0018) model time 0.2550 (0.2625) loss 6.9632 (5.7587) grad_norm 3.6376 (inf) loss_scale 128.0000 (189.6867) mem 9655MB [2024-08-04 08:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][590/625] eta 0:00:09 lr 0.000351 wd 0.0500 time 0.2517 (0.2642) data time 0.0009 (0.0018) model time 0.2508 (0.2623) loss 5.8171 (5.7608) grad_norm 1.7744 (inf) loss_scale 128.0000 (188.6430) mem 9655MB [2024-08-04 08:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][600/625] eta 0:00:06 lr 0.000351 wd 0.0500 time 0.2561 (0.2644) data time 0.0007 (0.0018) model time 0.2554 (0.2625) loss 5.8755 (5.7552) grad_norm 1.6246 (inf) loss_scale 128.0000 (187.6339) mem 9655MB [2024-08-04 08:07:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][610/625] eta 0:00:03 lr 0.000351 wd 0.0500 time 0.2539 (0.2643) data time 0.0006 (0.0017) model time 0.2534 (0.2624) loss 6.4822 (5.7554) grad_norm 2.0275 (inf) loss_scale 128.0000 (186.6579) mem 9655MB [2024-08-04 08:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][620/625] eta 0:00:01 lr 0.000350 wd 0.0500 time 0.2532 (0.2641) data time 0.0003 (0.0017) model time 0.2529 (0.2623) loss 4.2618 (5.7547) grad_norm 2.5990 (inf) loss_scale 128.0000 (185.7134) mem 9655MB [2024-08-04 08:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 224 training takes 0:02:45 [2024-08-04 08:07:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:07:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:07:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.553 (0.553) Loss 0.5991 (0.5991) Acc@1 90.283 (90.283) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9453 (0.7421) Acc@1 80.469 (86.337) Acc@5 96.045 (97.572) Mem 9655MB [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0859 (0.8680) Acc@1 77.100 (83.105) Acc@5 94.385 (96.340) Mem 9655MB [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.806 Acc@5 96.351 [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.81% [2024-08-04 08:07:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:07:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:07:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 0.5815 (0.5815) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:07:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9180 (0.7125) Acc@1 80.420 (86.448) Acc@5 95.850 (97.661) Mem 9655MB [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0283 (0.8363) Acc@1 76.807 (83.175) Acc@5 95.361 (96.452) Mem 9655MB [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.857 Acc@5 96.457 [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.86% [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:07:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][0/625] eta 0:07:31 lr 0.000350 wd 0.0500 time 0.7227 (0.7227) data time 0.4785 (0.4785) model time 0.0000 (0.0000) loss 5.8087 (5.8087) grad_norm 2.2711 (2.2711) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][10/625] eta 0:03:03 lr 0.000350 wd 0.0500 time 0.2534 (0.2979) data time 0.0012 (0.0443) model time 0.0000 (0.0000) loss 5.9618 (5.9575) grad_norm 6.0205 (2.9986) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][20/625] eta 0:02:52 lr 0.000350 wd 0.0500 time 0.2564 (0.2851) data time 0.0008 (0.0237) model time 0.0000 (0.0000) loss 6.4138 (5.8220) grad_norm 2.2471 (2.8251) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][30/625] eta 0:02:43 lr 0.000350 wd 0.0500 time 0.2512 (0.2754) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 5.8368 (5.6938) grad_norm 3.0729 (2.7084) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][40/625] eta 0:02:38 lr 0.000350 wd 0.0500 time 0.2505 (0.2705) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 5.4972 (5.5931) grad_norm 1.8036 (2.6332) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][50/625] eta 0:02:36 lr 0.000350 wd 0.0500 time 0.2580 (0.2717) data time 0.0006 (0.0103) model time 0.0000 (0.0000) loss 5.2752 (5.6601) grad_norm 2.5738 (2.6905) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][60/625] eta 0:02:32 lr 0.000350 wd 0.0500 time 0.2608 (0.2693) data time 0.0008 (0.0088) model time 0.2601 (0.2559) loss 4.8904 (5.6767) grad_norm 1.8029 (2.6572) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][70/625] eta 0:02:28 lr 0.000349 wd 0.0500 time 0.2551 (0.2674) data time 0.0009 (0.0077) model time 0.2542 (0.2554) loss 6.3612 (5.6808) grad_norm 2.0464 (2.6469) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][80/625] eta 0:02:26 lr 0.000349 wd 0.0500 time 0.2531 (0.2681) data time 0.0010 (0.0069) model time 0.2521 (0.2611) loss 6.6385 (5.6886) grad_norm 1.7967 (2.6467) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:07:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][90/625] eta 0:02:22 lr 0.000349 wd 0.0500 time 0.2543 (0.2668) data time 0.0008 (0.0062) model time 0.2536 (0.2595) loss 4.9509 (5.7057) grad_norm 1.7753 (2.5969) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][100/625] eta 0:02:19 lr 0.000349 wd 0.0500 time 0.2540 (0.2657) data time 0.0011 (0.0057) model time 0.2529 (0.2585) loss 4.5118 (5.7157) grad_norm 2.0729 (2.5699) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][110/625] eta 0:02:17 lr 0.000349 wd 0.0500 time 0.2549 (0.2665) data time 0.0009 (0.0053) model time 0.2540 (0.2611) loss 5.6060 (5.7564) grad_norm 1.5626 (2.5664) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][120/625] eta 0:02:14 lr 0.000349 wd 0.0500 time 0.2543 (0.2656) data time 0.0009 (0.0049) model time 0.2534 (0.2602) loss 5.5042 (5.7503) grad_norm 2.1246 (2.5303) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][130/625] eta 0:02:11 lr 0.000349 wd 0.0500 time 0.2570 (0.2649) data time 0.0007 (0.0046) model time 0.2562 (0.2596) loss 5.0216 (5.7517) grad_norm 2.8392 (2.5178) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][140/625] eta 0:02:08 lr 0.000348 wd 0.0500 time 0.2577 (0.2642) data time 0.0008 (0.0043) model time 0.2569 (0.2591) loss 5.8079 (5.7402) grad_norm 1.8214 (2.5170) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][150/625] eta 0:02:05 lr 0.000348 wd 0.0500 time 0.2555 (0.2636) data time 0.0007 (0.0041) model time 0.2548 (0.2585) loss 6.3282 (5.7386) grad_norm 2.4390 (2.4964) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][160/625] eta 0:02:02 lr 0.000348 wd 0.0500 time 0.2537 (0.2631) data time 0.0010 (0.0039) model time 0.2527 (0.2582) loss 6.4419 (5.7224) grad_norm 1.7178 (2.4549) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][170/625] eta 0:01:59 lr 0.000348 wd 0.0500 time 0.2561 (0.2626) data time 0.0008 (0.0037) model time 0.2553 (0.2578) loss 5.3819 (5.7295) grad_norm 2.6092 (2.4394) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][180/625] eta 0:01:57 lr 0.000348 wd 0.0500 time 0.2653 (0.2641) data time 0.0009 (0.0036) model time 0.2644 (0.2601) loss 5.7450 (5.7252) grad_norm 2.0201 (2.4358) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][190/625] eta 0:01:55 lr 0.000348 wd 0.0500 time 0.2551 (0.2645) data time 0.0006 (0.0034) model time 0.2545 (0.2610) loss 6.6378 (5.7280) grad_norm 2.9323 (2.4410) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][200/625] eta 0:01:52 lr 0.000348 wd 0.0500 time 0.2561 (0.2652) data time 0.0007 (0.0033) model time 0.2554 (0.2620) loss 5.7007 (5.7429) grad_norm 1.8906 (2.4472) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][210/625] eta 0:01:49 lr 0.000348 wd 0.0500 time 0.2539 (0.2648) data time 0.0009 (0.0032) model time 0.2530 (0.2616) loss 6.2764 (5.7546) grad_norm 2.8217 (2.4484) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][220/625] eta 0:01:47 lr 0.000347 wd 0.0500 time 0.2547 (0.2644) data time 0.0008 (0.0031) model time 0.2540 (0.2613) loss 5.3749 (5.7534) grad_norm 4.2779 (2.4840) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][230/625] eta 0:01:44 lr 0.000347 wd 0.0500 time 0.2588 (0.2649) data time 0.0008 (0.0030) model time 0.2580 (0.2621) loss 5.9984 (5.7525) grad_norm 2.7121 (2.4773) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][240/625] eta 0:01:41 lr 0.000347 wd 0.0500 time 0.2575 (0.2646) data time 0.0006 (0.0029) model time 0.2569 (0.2617) loss 5.5508 (5.7414) grad_norm 1.7643 (2.5019) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][250/625] eta 0:01:39 lr 0.000347 wd 0.0500 time 0.2544 (0.2642) data time 0.0008 (0.0029) model time 0.2536 (0.2613) loss 5.8565 (5.7278) grad_norm 2.6886 (2.5446) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][260/625] eta 0:01:36 lr 0.000347 wd 0.0500 time 0.2561 (0.2640) data time 0.0007 (0.0028) model time 0.2554 (0.2611) loss 5.3650 (5.7360) grad_norm 1.8387 (2.5275) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][270/625] eta 0:01:33 lr 0.000347 wd 0.0500 time 0.2604 (0.2637) data time 0.0008 (0.0027) model time 0.2596 (0.2609) loss 5.5955 (5.7359) grad_norm 1.7998 (2.5156) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][280/625] eta 0:01:30 lr 0.000347 wd 0.0500 time 0.2547 (0.2634) data time 0.0010 (0.0026) model time 0.2536 (0.2606) loss 5.7685 (5.7340) grad_norm 5.0250 (2.5248) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][290/625] eta 0:01:28 lr 0.000346 wd 0.0500 time 0.2564 (0.2643) data time 0.0007 (0.0026) model time 0.2557 (0.2617) loss 6.8698 (5.7293) grad_norm 2.5033 (2.5656) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][300/625] eta 0:01:25 lr 0.000346 wd 0.0500 time 0.2556 (0.2640) data time 0.0009 (0.0025) model time 0.2547 (0.2615) loss 5.4713 (5.7315) grad_norm 5.0697 (2.5731) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][310/625] eta 0:01:23 lr 0.000346 wd 0.0500 time 0.2552 (0.2637) data time 0.0009 (0.0025) model time 0.2544 (0.2612) loss 5.3453 (5.7252) grad_norm 3.4168 (2.5880) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][320/625] eta 0:01:20 lr 0.000346 wd 0.0500 time 0.2577 (0.2635) data time 0.0008 (0.0024) model time 0.2569 (0.2611) loss 6.0033 (5.7249) grad_norm 2.0563 (2.5809) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][330/625] eta 0:01:18 lr 0.000346 wd 0.0500 time 0.2517 (0.2645) data time 0.0006 (0.0024) model time 0.2511 (0.2623) loss 5.3116 (5.7206) grad_norm 1.6471 (2.5588) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][340/625] eta 0:01:15 lr 0.000346 wd 0.0500 time 0.2552 (0.2643) data time 0.0008 (0.0023) model time 0.2544 (0.2621) loss 6.8166 (5.7231) grad_norm 1.7782 (2.5432) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][350/625] eta 0:01:12 lr 0.000346 wd 0.0500 time 0.2589 (0.2641) data time 0.0009 (0.0023) model time 0.2581 (0.2619) loss 4.5191 (5.7301) grad_norm 1.3543 (2.5243) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][360/625] eta 0:01:09 lr 0.000346 wd 0.0500 time 0.2628 (0.2639) data time 0.0006 (0.0023) model time 0.2622 (0.2617) loss 7.0869 (5.7428) grad_norm 2.1191 (2.5175) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][370/625] eta 0:01:07 lr 0.000345 wd 0.0500 time 0.2549 (0.2636) data time 0.0009 (0.0022) model time 0.2540 (0.2614) loss 5.9093 (5.7414) grad_norm 1.8604 (2.5044) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][380/625] eta 0:01:04 lr 0.000345 wd 0.0500 time 0.2551 (0.2634) data time 0.0007 (0.0022) model time 0.2544 (0.2612) loss 4.7563 (5.7365) grad_norm 3.3340 (2.5380) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][390/625] eta 0:01:01 lr 0.000345 wd 0.0500 time 0.2522 (0.2632) data time 0.0012 (0.0022) model time 0.2510 (0.2610) loss 5.3226 (5.7367) grad_norm 3.0921 (2.5445) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][400/625] eta 0:00:59 lr 0.000345 wd 0.0500 time 0.2515 (0.2635) data time 0.0007 (0.0021) model time 0.2508 (0.2614) loss 5.9097 (5.7386) grad_norm 3.1392 (2.5546) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][410/625] eta 0:00:56 lr 0.000345 wd 0.0500 time 0.4696 (0.2639) data time 0.0009 (0.0021) model time 0.4687 (0.2618) loss 4.8508 (5.7359) grad_norm 1.6191 (2.5731) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][420/625] eta 0:00:54 lr 0.000345 wd 0.0500 time 0.2567 (0.2636) data time 0.0009 (0.0021) model time 0.2559 (0.2616) loss 6.2634 (5.7370) grad_norm 10.5429 (2.5970) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][430/625] eta 0:00:51 lr 0.000345 wd 0.0500 time 0.2504 (0.2635) data time 0.0007 (0.0020) model time 0.2497 (0.2614) loss 6.1423 (5.7391) grad_norm 2.2536 (2.6073) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][440/625] eta 0:00:48 lr 0.000345 wd 0.0500 time 0.2566 (0.2633) data time 0.0007 (0.0020) model time 0.2559 (0.2613) loss 5.3976 (5.7414) grad_norm 2.4433 (2.6075) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][450/625] eta 0:00:46 lr 0.000344 wd 0.0500 time 0.2553 (0.2632) data time 0.0009 (0.0020) model time 0.2544 (0.2612) loss 4.7194 (5.7345) grad_norm 2.7703 (2.6033) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][460/625] eta 0:00:43 lr 0.000344 wd 0.0500 time 0.2582 (0.2634) data time 0.0008 (0.0020) model time 0.2574 (0.2615) loss 6.1734 (5.7393) grad_norm 2.9197 (2.6102) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][470/625] eta 0:00:40 lr 0.000344 wd 0.0500 time 0.2501 (0.2636) data time 0.0009 (0.0019) model time 0.2492 (0.2617) loss 5.9486 (5.7391) grad_norm 1.9160 (2.6065) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][480/625] eta 0:00:38 lr 0.000344 wd 0.0500 time 0.2573 (0.2635) data time 0.0010 (0.0019) model time 0.2563 (0.2616) loss 5.6912 (5.7490) grad_norm 2.6137 (2.5949) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][490/625] eta 0:00:35 lr 0.000344 wd 0.0500 time 0.2581 (0.2639) data time 0.0010 (0.0019) model time 0.2572 (0.2620) loss 5.3565 (5.7467) grad_norm 3.8666 (2.5964) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][500/625] eta 0:00:32 lr 0.000344 wd 0.0500 time 0.2594 (0.2637) data time 0.0016 (0.0019) model time 0.2578 (0.2619) loss 6.7347 (5.7490) grad_norm 3.0261 (2.6109) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][510/625] eta 0:00:30 lr 0.000344 wd 0.0500 time 0.2548 (0.2638) data time 0.0009 (0.0019) model time 0.2539 (0.2620) loss 5.7580 (5.7485) grad_norm 2.0120 (2.6052) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][520/625] eta 0:00:27 lr 0.000343 wd 0.0500 time 0.2600 (0.2637) data time 0.0008 (0.0018) model time 0.2592 (0.2619) loss 6.3260 (5.7444) grad_norm 1.7043 (2.5925) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][530/625] eta 0:00:25 lr 0.000343 wd 0.0500 time 0.2530 (0.2635) data time 0.0011 (0.0018) model time 0.2519 (0.2617) loss 5.8806 (5.7348) grad_norm 4.1653 (2.6063) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][540/625] eta 0:00:22 lr 0.000343 wd 0.0500 time 0.2608 (0.2634) data time 0.0010 (0.0018) model time 0.2598 (0.2616) loss 5.9166 (5.7312) grad_norm 2.2443 (2.6200) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:09:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][550/625] eta 0:00:19 lr 0.000343 wd 0.0500 time 0.2517 (0.2632) data time 0.0009 (0.0018) model time 0.2508 (0.2614) loss 4.9014 (5.7322) grad_norm 5.8258 (2.6645) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][560/625] eta 0:00:17 lr 0.000343 wd 0.0500 time 0.2583 (0.2631) data time 0.0008 (0.0018) model time 0.2575 (0.2613) loss 5.4176 (5.7295) grad_norm 2.4433 (2.6739) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][570/625] eta 0:00:14 lr 0.000343 wd 0.0500 time 0.2600 (0.2633) data time 0.0008 (0.0018) model time 0.2593 (0.2616) loss 5.7527 (5.7248) grad_norm 1.3380 (2.6794) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][580/625] eta 0:00:11 lr 0.000343 wd 0.0500 time 0.2647 (0.2632) data time 0.0008 (0.0018) model time 0.2639 (0.2615) loss 4.6099 (5.7187) grad_norm 2.4728 (2.6766) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][590/625] eta 0:00:09 lr 0.000343 wd 0.0500 time 0.2560 (0.2631) data time 0.0008 (0.0017) model time 0.2552 (0.2614) loss 6.3130 (5.7235) grad_norm 1.7557 (2.7568) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][600/625] eta 0:00:06 lr 0.000342 wd 0.0500 time 0.2542 (0.2630) data time 0.0008 (0.0017) model time 0.2534 (0.2613) loss 5.5984 (5.7263) grad_norm 2.2494 (2.7485) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][610/625] eta 0:00:03 lr 0.000342 wd 0.0500 time 0.2529 (0.2629) data time 0.0004 (0.0017) model time 0.2525 (0.2611) loss 5.5609 (5.7273) grad_norm 3.2615 (2.7634) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][620/625] eta 0:00:01 lr 0.000342 wd 0.0500 time 0.2538 (0.2627) data time 0.0003 (0.0017) model time 0.2534 (0.2610) loss 5.5985 (5.7308) grad_norm 1.8372 (2.7550) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 225 training takes 0:02:44 [2024-08-04 08:10:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:10:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.6118 (0.6118) Acc@1 89.941 (89.941) Acc@5 98.486 (98.486) Mem 9655MB [2024-08-04 08:10:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9512 (0.7344) Acc@1 79.834 (86.235) Acc@5 95.996 (97.528) Mem 9655MB [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0430 (0.8581) Acc@1 76.953 (83.094) Acc@5 95.264 (96.387) Mem 9655MB [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.823 Acc@5 96.391 [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.82% [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:10:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:10:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.5825 (0.5825) Acc@1 89.600 (89.600) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:10:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9170 (0.7123) Acc@1 80.420 (86.470) Acc@5 95.850 (97.656) Mem 9655MB [2024-08-04 08:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0283 (0.8359) Acc@1 77.051 (83.205) Acc@5 95.264 (96.456) Mem 9655MB [2024-08-04 08:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.885 Acc@5 96.457 [2024-08-04 08:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.89% [2024-08-04 08:10:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:10:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][0/625] eta 0:07:48 lr 0.000342 wd 0.0500 time 0.7490 (0.7490) data time 0.5099 (0.5099) model time 0.0000 (0.0000) loss 6.3494 (6.3494) grad_norm 2.7005 (2.7005) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][10/625] eta 0:03:12 lr 0.000342 wd 0.0500 time 0.3859 (0.3132) data time 0.0006 (0.0472) model time 0.0000 (0.0000) loss 4.8356 (5.8433) grad_norm 1.9085 (1.9357) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][20/625] eta 0:02:58 lr 0.000342 wd 0.0500 time 0.2577 (0.2946) data time 0.0008 (0.0251) model time 0.0000 (0.0000) loss 6.2080 (5.7494) grad_norm 2.7959 (2.2289) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][30/625] eta 0:02:51 lr 0.000342 wd 0.0500 time 0.2604 (0.2887) data time 0.0007 (0.0173) model time 0.0000 (0.0000) loss 5.0163 (5.8065) grad_norm 2.1241 (2.2833) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][40/625] eta 0:02:44 lr 0.000342 wd 0.0500 time 0.2574 (0.2805) data time 0.0006 (0.0133) model time 0.0000 (0.0000) loss 6.2029 (5.8323) grad_norm 2.8088 (2.2507) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][50/625] eta 0:02:41 lr 0.000341 wd 0.0500 time 0.2568 (0.2801) data time 0.0007 (0.0109) model time 0.0000 (0.0000) loss 5.7979 (5.8221) grad_norm 2.1340 (2.2396) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][60/625] eta 0:02:37 lr 0.000341 wd 0.0500 time 0.3828 (0.2781) data time 0.0007 (0.0093) model time 0.3821 (0.2668) loss 5.0183 (5.7663) grad_norm 2.6576 (2.3055) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][70/625] eta 0:02:32 lr 0.000341 wd 0.0500 time 0.2599 (0.2750) data time 0.0007 (0.0081) model time 0.2592 (0.2609) loss 5.9576 (5.7593) grad_norm 2.1051 (2.2878) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][80/625] eta 0:02:28 lr 0.000341 wd 0.0500 time 0.2677 (0.2727) data time 0.0009 (0.0072) model time 0.2668 (0.2591) loss 4.8448 (5.7394) grad_norm 2.1840 (2.2607) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][90/625] eta 0:02:24 lr 0.000341 wd 0.0500 time 0.2585 (0.2708) data time 0.0009 (0.0065) model time 0.2576 (0.2580) loss 4.9468 (5.7084) grad_norm 1.7303 (2.2996) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][100/625] eta 0:02:21 lr 0.000341 wd 0.0500 time 0.2568 (0.2693) data time 0.0008 (0.0060) model time 0.2561 (0.2573) loss 6.0343 (5.6981) grad_norm 2.1592 (2.2721) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][110/625] eta 0:02:18 lr 0.000341 wd 0.0500 time 0.2535 (0.2681) data time 0.0010 (0.0055) model time 0.2525 (0.2569) loss 5.0397 (5.7026) grad_norm 1.8293 (2.2523) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][120/625] eta 0:02:14 lr 0.000341 wd 0.0500 time 0.2533 (0.2671) data time 0.0011 (0.0052) model time 0.2522 (0.2566) loss 4.8506 (5.7135) grad_norm 2.6292 (2.2658) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:10:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][130/625] eta 0:02:11 lr 0.000340 wd 0.0500 time 0.2567 (0.2663) data time 0.0007 (0.0048) model time 0.2560 (0.2565) loss 5.5129 (5.7008) grad_norm 3.8419 (2.3461) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][140/625] eta 0:02:10 lr 0.000340 wd 0.0500 time 0.4186 (0.2681) data time 0.0009 (0.0046) model time 0.4177 (0.2604) loss 6.0221 (5.6941) grad_norm 3.4395 (2.3840) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][150/625] eta 0:02:06 lr 0.000340 wd 0.0500 time 0.2567 (0.2673) data time 0.0008 (0.0043) model time 0.2558 (0.2598) loss 4.8408 (5.7044) grad_norm 2.3231 (2.3807) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][160/625] eta 0:02:03 lr 0.000340 wd 0.0500 time 0.2565 (0.2666) data time 0.0015 (0.0041) model time 0.2550 (0.2593) loss 5.9049 (5.6885) grad_norm 2.3041 (2.3834) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][170/625] eta 0:02:01 lr 0.000340 wd 0.0500 time 0.2571 (0.2660) data time 0.0009 (0.0040) model time 0.2563 (0.2590) loss 6.7851 (5.6902) grad_norm 2.9280 (2.3864) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][180/625] eta 0:01:58 lr 0.000340 wd 0.0500 time 0.2542 (0.2655) data time 0.0011 (0.0038) model time 0.2531 (0.2587) loss 6.6143 (5.6979) grad_norm 1.7106 (2.3694) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][190/625] eta 0:01:55 lr 0.000340 wd 0.0500 time 0.2551 (0.2667) data time 0.0008 (0.0036) model time 0.2543 (0.2608) loss 6.1959 (5.6929) grad_norm 3.2594 (2.4822) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][200/625] eta 0:01:53 lr 0.000339 wd 0.0500 time 0.2618 (0.2662) data time 0.0009 (0.0035) model time 0.2608 (0.2605) loss 5.9551 (5.7033) grad_norm 2.9137 (2.5178) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][210/625] eta 0:01:50 lr 0.000339 wd 0.0500 time 0.2521 (0.2664) data time 0.0007 (0.0034) model time 0.2514 (0.2611) loss 6.1251 (5.7263) grad_norm 2.6172 (2.5032) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][220/625] eta 0:01:48 lr 0.000339 wd 0.0500 time 0.2546 (0.2667) data time 0.0009 (0.0033) model time 0.2537 (0.2617) loss 5.2531 (5.7342) grad_norm 1.4912 (2.4876) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][230/625] eta 0:01:45 lr 0.000339 wd 0.0500 time 0.2588 (0.2672) data time 0.0013 (0.0032) model time 0.2574 (0.2626) loss 5.5077 (5.7331) grad_norm 2.3770 (2.5103) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][240/625] eta 0:01:42 lr 0.000339 wd 0.0500 time 0.2581 (0.2667) data time 0.0007 (0.0031) model time 0.2574 (0.2622) loss 6.0397 (5.7336) grad_norm 2.0704 (2.5098) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][250/625] eta 0:01:39 lr 0.000339 wd 0.0500 time 0.2544 (0.2662) data time 0.0010 (0.0030) model time 0.2534 (0.2617) loss 6.5617 (5.7341) grad_norm 3.9419 (2.5102) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][260/625] eta 0:01:37 lr 0.000339 wd 0.0500 time 0.2529 (0.2658) data time 0.0010 (0.0029) model time 0.2518 (0.2614) loss 6.3913 (5.7277) grad_norm 3.0803 (2.5267) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][270/625] eta 0:01:34 lr 0.000339 wd 0.0500 time 0.2583 (0.2655) data time 0.0008 (0.0029) model time 0.2575 (0.2611) loss 5.4908 (5.7252) grad_norm 2.2856 (2.5380) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][280/625] eta 0:01:31 lr 0.000338 wd 0.0500 time 0.2559 (0.2651) data time 0.0008 (0.0028) model time 0.2550 (0.2608) loss 5.6901 (5.7271) grad_norm 2.6229 (2.5294) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][290/625] eta 0:01:28 lr 0.000338 wd 0.0500 time 0.2602 (0.2648) data time 0.0008 (0.0027) model time 0.2594 (0.2605) loss 4.6460 (5.7380) grad_norm 2.8314 (2.5349) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][300/625] eta 0:01:25 lr 0.000338 wd 0.0500 time 0.2569 (0.2645) data time 0.0009 (0.0027) model time 0.2560 (0.2603) loss 5.7572 (5.7462) grad_norm 2.5899 (2.5478) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][310/625] eta 0:01:23 lr 0.000338 wd 0.0500 time 0.2575 (0.2642) data time 0.0006 (0.0026) model time 0.2568 (0.2601) loss 5.2851 (5.7443) grad_norm 2.3003 (2.5558) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][320/625] eta 0:01:20 lr 0.000338 wd 0.0500 time 0.2576 (0.2640) data time 0.0009 (0.0026) model time 0.2566 (0.2599) loss 6.0484 (5.7551) grad_norm 1.6742 (2.5568) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][330/625] eta 0:01:17 lr 0.000338 wd 0.0500 time 0.2552 (0.2637) data time 0.0008 (0.0025) model time 0.2544 (0.2597) loss 5.2792 (5.7571) grad_norm 2.5073 (2.5495) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][340/625] eta 0:01:15 lr 0.000338 wd 0.0500 time 0.2551 (0.2645) data time 0.0006 (0.0025) model time 0.2545 (0.2608) loss 6.0171 (5.7580) grad_norm 3.4678 (2.5420) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][350/625] eta 0:01:12 lr 0.000337 wd 0.0500 time 0.2555 (0.2643) data time 0.0006 (0.0024) model time 0.2549 (0.2606) loss 4.7342 (5.7514) grad_norm 2.7357 (2.5363) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][360/625] eta 0:01:10 lr 0.000337 wd 0.0500 time 0.2614 (0.2646) data time 0.0008 (0.0024) model time 0.2605 (0.2611) loss 6.9209 (5.7464) grad_norm 3.5589 (2.5494) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][370/625] eta 0:01:07 lr 0.000337 wd 0.0500 time 0.2551 (0.2644) data time 0.0009 (0.0023) model time 0.2543 (0.2609) loss 4.5029 (5.7417) grad_norm 2.0051 (2.5399) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][380/625] eta 0:01:04 lr 0.000337 wd 0.0500 time 0.2578 (0.2642) data time 0.0008 (0.0023) model time 0.2570 (0.2607) loss 4.9259 (5.7412) grad_norm 2.0343 (2.5280) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][390/625] eta 0:01:02 lr 0.000337 wd 0.0500 time 0.2529 (0.2639) data time 0.0010 (0.0023) model time 0.2519 (0.2605) loss 5.5374 (5.7391) grad_norm 3.2568 (2.5252) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][400/625] eta 0:00:59 lr 0.000337 wd 0.0500 time 0.2577 (0.2642) data time 0.0010 (0.0022) model time 0.2567 (0.2609) loss 5.0441 (5.7377) grad_norm 3.2026 (2.5381) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][410/625] eta 0:00:56 lr 0.000337 wd 0.0500 time 0.2514 (0.2640) data time 0.0011 (0.0022) model time 0.2503 (0.2607) loss 5.1695 (5.7471) grad_norm 1.6936 (2.5466) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][420/625] eta 0:00:54 lr 0.000337 wd 0.0500 time 0.2543 (0.2638) data time 0.0006 (0.0022) model time 0.2536 (0.2606) loss 6.4208 (5.7493) grad_norm 2.3858 (2.5384) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][430/625] eta 0:00:51 lr 0.000336 wd 0.0500 time 0.2632 (0.2637) data time 0.0006 (0.0022) model time 0.2626 (0.2605) loss 4.8972 (5.7461) grad_norm 2.0353 (2.5278) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][440/625] eta 0:00:48 lr 0.000336 wd 0.0500 time 0.2564 (0.2639) data time 0.0008 (0.0021) model time 0.2556 (0.2608) loss 5.9533 (5.7435) grad_norm 3.1155 (2.5321) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][450/625] eta 0:00:46 lr 0.000336 wd 0.0500 time 0.2550 (0.2637) data time 0.0011 (0.0021) model time 0.2539 (0.2607) loss 5.5925 (5.7341) grad_norm 1.5782 (2.5303) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][460/625] eta 0:00:43 lr 0.000336 wd 0.0500 time 0.2564 (0.2643) data time 0.0007 (0.0021) model time 0.2558 (0.2613) loss 5.1389 (5.7365) grad_norm 2.1831 (2.5240) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][470/625] eta 0:00:40 lr 0.000336 wd 0.0500 time 0.2560 (0.2641) data time 0.0009 (0.0020) model time 0.2550 (0.2612) loss 6.3609 (5.7379) grad_norm 4.6818 (2.6171) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][480/625] eta 0:00:38 lr 0.000336 wd 0.0500 time 0.2563 (0.2639) data time 0.0007 (0.0020) model time 0.2556 (0.2610) loss 4.9214 (5.7317) grad_norm 2.7233 (2.6237) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][490/625] eta 0:00:35 lr 0.000336 wd 0.0500 time 0.2501 (0.2638) data time 0.0010 (0.0020) model time 0.2491 (0.2609) loss 6.4689 (5.7423) grad_norm 1.6227 (2.6197) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][500/625] eta 0:00:33 lr 0.000336 wd 0.0500 time 0.2519 (0.2640) data time 0.0008 (0.0020) model time 0.2510 (0.2612) loss 5.7402 (5.7454) grad_norm 2.4164 (2.6214) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][510/625] eta 0:00:30 lr 0.000335 wd 0.0500 time 0.2545 (0.2641) data time 0.0008 (0.0020) model time 0.2537 (0.2614) loss 6.2104 (5.7396) grad_norm 2.2252 (2.6180) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][520/625] eta 0:00:27 lr 0.000335 wd 0.0500 time 0.2562 (0.2640) data time 0.0008 (0.0019) model time 0.2554 (0.2612) loss 5.4307 (5.7439) grad_norm 1.9678 (2.6402) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][530/625] eta 0:00:25 lr 0.000335 wd 0.0500 time 0.2522 (0.2638) data time 0.0009 (0.0019) model time 0.2514 (0.2611) loss 6.4137 (5.7457) grad_norm 1.8262 (2.6440) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][540/625] eta 0:00:22 lr 0.000335 wd 0.0500 time 0.2562 (0.2637) data time 0.0009 (0.0019) model time 0.2552 (0.2610) loss 6.5868 (5.7402) grad_norm 3.2175 (2.6420) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][550/625] eta 0:00:19 lr 0.000335 wd 0.0500 time 0.2539 (0.2635) data time 0.0007 (0.0019) model time 0.2532 (0.2609) loss 6.0302 (5.7412) grad_norm 3.8393 (2.6496) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][560/625] eta 0:00:17 lr 0.000335 wd 0.0500 time 0.2529 (0.2634) data time 0.0007 (0.0019) model time 0.2522 (0.2607) loss 5.7485 (5.7408) grad_norm 4.1334 (2.6625) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][570/625] eta 0:00:14 lr 0.000335 wd 0.0500 time 0.2540 (0.2632) data time 0.0009 (0.0019) model time 0.2531 (0.2606) loss 5.8216 (5.7380) grad_norm 1.2277 (2.6731) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][580/625] eta 0:00:11 lr 0.000335 wd 0.0500 time 0.2525 (0.2632) data time 0.0007 (0.0018) model time 0.2518 (0.2606) loss 5.7323 (5.7452) grad_norm 3.6195 (2.6751) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][590/625] eta 0:00:09 lr 0.000334 wd 0.0500 time 0.2534 (0.2631) data time 0.0010 (0.0018) model time 0.2524 (0.2605) loss 6.3711 (5.7403) grad_norm 5.0886 (2.6901) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][600/625] eta 0:00:06 lr 0.000334 wd 0.0500 time 0.2554 (0.2632) data time 0.0011 (0.0018) model time 0.2543 (0.2606) loss 5.2643 (5.7361) grad_norm 2.3587 (2.7031) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][610/625] eta 0:00:03 lr 0.000334 wd 0.0500 time 0.2536 (0.2630) data time 0.0006 (0.0018) model time 0.2530 (0.2605) loss 6.2295 (5.7401) grad_norm 1.7307 (2.6963) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][620/625] eta 0:00:01 lr 0.000334 wd 0.0500 time 0.2528 (0.2629) data time 0.0004 (0.0018) model time 0.2523 (0.2604) loss 5.2829 (5.7422) grad_norm 1.9080 (2.6889) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 226 training takes 0:02:44 [2024-08-04 08:13:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:13:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:13:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.6104 (0.6104) Acc@1 89.111 (89.111) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 08:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.9434 (0.7350) Acc@1 80.371 (86.346) Acc@5 95.898 (97.647) Mem 9655MB [2024-08-04 08:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0371 (0.8540) Acc@1 76.807 (83.082) Acc@5 95.020 (96.382) Mem 9655MB [2024-08-04 08:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.800 Acc@5 96.395 [2024-08-04 08:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:13:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.718 (0.718) Loss 0.5820 (0.5820) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.127) Loss 0.9170 (0.7123) Acc@1 80.664 (86.457) Acc@5 95.947 (97.674) Mem 9655MB [2024-08-04 08:13:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0293 (0.8359) Acc@1 77.246 (83.205) Acc@5 95.312 (96.477) Mem 9655MB [2024-08-04 08:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.889 Acc@5 96.477 [2024-08-04 08:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.89% [2024-08-04 08:13:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:13:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:13:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][0/625] eta 0:07:52 lr 0.000334 wd 0.0500 time 0.7554 (0.7554) data time 0.4995 (0.4995) model time 0.0000 (0.0000) loss 6.8962 (6.8962) grad_norm 1.9761 (1.9761) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][10/625] eta 0:03:27 lr 0.000334 wd 0.0500 time 0.2529 (0.3377) data time 0.0008 (0.0463) model time 0.0000 (0.0000) loss 6.2986 (5.6270) grad_norm 1.7530 (1.9805) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][20/625] eta 0:03:00 lr 0.000334 wd 0.0500 time 0.2571 (0.2985) data time 0.0008 (0.0247) model time 0.0000 (0.0000) loss 6.3540 (5.7631) grad_norm 1.3726 (1.8913) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][30/625] eta 0:02:49 lr 0.000334 wd 0.0500 time 0.2543 (0.2844) data time 0.0005 (0.0170) model time 0.0000 (0.0000) loss 5.3878 (5.7689) grad_norm 3.0622 (2.0749) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][40/625] eta 0:02:42 lr 0.000333 wd 0.0500 time 0.2554 (0.2773) data time 0.0012 (0.0131) model time 0.0000 (0.0000) loss 6.4123 (5.8184) grad_norm 2.7324 (2.1594) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][50/625] eta 0:02:36 lr 0.000333 wd 0.0500 time 0.2553 (0.2729) data time 0.0009 (0.0107) model time 0.0000 (0.0000) loss 5.4148 (5.7677) grad_norm 2.7536 (2.2708) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][60/625] eta 0:02:32 lr 0.000333 wd 0.0500 time 0.2538 (0.2702) data time 0.0007 (0.0091) model time 0.2531 (0.2554) loss 6.3915 (5.7402) grad_norm 2.2813 (2.2859) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][70/625] eta 0:02:28 lr 0.000333 wd 0.0500 time 0.2536 (0.2679) data time 0.0010 (0.0079) model time 0.2526 (0.2544) loss 6.1191 (5.7594) grad_norm 1.9522 (2.2656) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][80/625] eta 0:02:27 lr 0.000333 wd 0.0500 time 0.2642 (0.2711) data time 0.0008 (0.0071) model time 0.2634 (0.2670) loss 6.7653 (5.7508) grad_norm 3.6805 (2.3700) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][90/625] eta 0:02:24 lr 0.000333 wd 0.0500 time 0.2542 (0.2693) data time 0.0011 (0.0064) model time 0.2531 (0.2637) loss 6.8394 (5.7793) grad_norm 1.9084 (2.3486) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][100/625] eta 0:02:21 lr 0.000333 wd 0.0500 time 0.4443 (0.2698) data time 0.0008 (0.0059) model time 0.4435 (0.2656) loss 5.7159 (5.7563) grad_norm 1.7609 (2.2960) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][110/625] eta 0:02:18 lr 0.000332 wd 0.0500 time 0.2571 (0.2685) data time 0.0010 (0.0054) model time 0.2561 (0.2639) loss 6.4084 (5.7749) grad_norm 1.8407 (2.2666) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][120/625] eta 0:02:15 lr 0.000332 wd 0.0500 time 0.2561 (0.2675) data time 0.0006 (0.0051) model time 0.2555 (0.2626) loss 5.5930 (5.7818) grad_norm 2.8697 (2.2983) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][130/625] eta 0:02:11 lr 0.000332 wd 0.0500 time 0.2576 (0.2665) data time 0.0009 (0.0048) model time 0.2567 (0.2615) loss 5.0219 (5.8077) grad_norm 3.1245 (2.3518) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][140/625] eta 0:02:09 lr 0.000332 wd 0.0500 time 0.2666 (0.2671) data time 0.0006 (0.0045) model time 0.2660 (0.2628) loss 5.6467 (5.8005) grad_norm 2.5419 (2.3680) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][150/625] eta 0:02:07 lr 0.000332 wd 0.0500 time 0.2573 (0.2677) data time 0.0006 (0.0042) model time 0.2568 (0.2642) loss 6.3853 (5.7953) grad_norm 2.0611 (2.3570) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][160/625] eta 0:02:04 lr 0.000332 wd 0.0500 time 0.2552 (0.2670) data time 0.0011 (0.0040) model time 0.2541 (0.2633) loss 6.5503 (5.7712) grad_norm 1.7979 (2.3533) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:13:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][170/625] eta 0:02:01 lr 0.000332 wd 0.0500 time 0.2527 (0.2664) data time 0.0007 (0.0039) model time 0.2519 (0.2627) loss 6.1149 (5.7660) grad_norm 1.5455 (2.3577) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][180/625] eta 0:01:58 lr 0.000332 wd 0.0500 time 0.2658 (0.2660) data time 0.0013 (0.0037) model time 0.2645 (0.2624) loss 6.5376 (5.7687) grad_norm 1.9537 (2.3482) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][190/625] eta 0:01:55 lr 0.000331 wd 0.0500 time 0.2567 (0.2655) data time 0.0011 (0.0036) model time 0.2556 (0.2619) loss 6.0766 (5.7551) grad_norm 4.5979 (2.3547) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][200/625] eta 0:01:52 lr 0.000331 wd 0.0500 time 0.2577 (0.2651) data time 0.0007 (0.0034) model time 0.2569 (0.2614) loss 6.6862 (5.7499) grad_norm 2.4735 (2.3474) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][210/625] eta 0:01:50 lr 0.000331 wd 0.0500 time 0.2546 (0.2652) data time 0.0007 (0.0033) model time 0.2539 (0.2618) loss 5.9943 (5.7468) grad_norm 2.8212 (2.3591) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][220/625] eta 0:01:47 lr 0.000331 wd 0.0500 time 0.2523 (0.2648) data time 0.0008 (0.0032) model time 0.2515 (0.2615) loss 4.9907 (5.7524) grad_norm 1.8365 (2.3647) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][230/625] eta 0:01:44 lr 0.000331 wd 0.0500 time 0.2563 (0.2644) data time 0.0009 (0.0031) model time 0.2554 (0.2611) loss 6.2655 (5.7609) grad_norm 1.9026 (2.3678) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][240/625] eta 0:01:41 lr 0.000331 wd 0.0500 time 0.2560 (0.2641) data time 0.0011 (0.0030) model time 0.2549 (0.2608) loss 6.3449 (5.7552) grad_norm 3.4551 (2.3821) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][250/625] eta 0:01:39 lr 0.000331 wd 0.0500 time 0.2505 (0.2643) data time 0.0009 (0.0029) model time 0.2495 (0.2612) loss 6.2669 (5.7644) grad_norm 2.0315 (2.3791) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][260/625] eta 0:01:36 lr 0.000331 wd 0.0500 time 0.2521 (0.2640) data time 0.0010 (0.0028) model time 0.2512 (0.2609) loss 5.9649 (5.7775) grad_norm 1.7445 (2.3688) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][270/625] eta 0:01:33 lr 0.000330 wd 0.0500 time 0.2488 (0.2637) data time 0.0008 (0.0028) model time 0.2481 (0.2606) loss 5.2239 (5.7809) grad_norm 1.7594 (2.3660) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][280/625] eta 0:01:30 lr 0.000330 wd 0.0500 time 0.2515 (0.2634) data time 0.0010 (0.0027) model time 0.2505 (0.2604) loss 5.1825 (5.7838) grad_norm 1.6194 (2.3679) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][290/625] eta 0:01:28 lr 0.000330 wd 0.0500 time 0.2519 (0.2631) data time 0.0008 (0.0027) model time 0.2512 (0.2601) loss 6.1279 (5.7888) grad_norm 2.8682 (2.3898) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][300/625] eta 0:01:25 lr 0.000330 wd 0.0500 time 0.4346 (0.2635) data time 0.0008 (0.0026) model time 0.4337 (0.2606) loss 5.5572 (5.7929) grad_norm 2.3465 (2.4026) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][310/625] eta 0:01:22 lr 0.000330 wd 0.0500 time 0.2541 (0.2632) data time 0.0008 (0.0026) model time 0.2533 (0.2603) loss 4.8885 (5.7927) grad_norm 2.9081 (2.4338) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][320/625] eta 0:01:20 lr 0.000330 wd 0.0500 time 0.2534 (0.2630) data time 0.0011 (0.0025) model time 0.2523 (0.2601) loss 4.9580 (5.7896) grad_norm 2.2755 (2.4295) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][330/625] eta 0:01:17 lr 0.000330 wd 0.0500 time 0.4628 (0.2634) data time 0.0016 (0.0025) model time 0.4612 (0.2607) loss 6.3081 (5.7910) grad_norm 3.6156 (2.4470) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][340/625] eta 0:01:15 lr 0.000330 wd 0.0500 time 0.4413 (0.2642) data time 0.0011 (0.0024) model time 0.4402 (0.2617) loss 5.2333 (5.7865) grad_norm 3.6127 (2.4541) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][350/625] eta 0:01:12 lr 0.000329 wd 0.0500 time 0.2551 (0.2639) data time 0.0008 (0.0024) model time 0.2542 (0.2614) loss 5.7458 (5.7810) grad_norm 2.5230 (2.4535) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][360/625] eta 0:01:09 lr 0.000329 wd 0.0500 time 0.2562 (0.2637) data time 0.0008 (0.0023) model time 0.2554 (0.2612) loss 5.3551 (5.7774) grad_norm 3.5914 (2.4734) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][370/625] eta 0:01:07 lr 0.000329 wd 0.0500 time 0.2561 (0.2640) data time 0.0009 (0.0023) model time 0.2551 (0.2616) loss 6.3724 (5.7694) grad_norm 4.4196 (2.5016) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][380/625] eta 0:01:04 lr 0.000329 wd 0.0500 time 0.2540 (0.2638) data time 0.0006 (0.0023) model time 0.2533 (0.2614) loss 5.9319 (5.7638) grad_norm 3.6099 (2.5348) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][390/625] eta 0:01:01 lr 0.000329 wd 0.0500 time 0.2566 (0.2636) data time 0.0007 (0.0022) model time 0.2560 (0.2613) loss 6.7336 (5.7614) grad_norm 2.6083 (2.5243) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][400/625] eta 0:00:59 lr 0.000329 wd 0.0500 time 0.2564 (0.2634) data time 0.0009 (0.0022) model time 0.2555 (0.2611) loss 5.8615 (5.7687) grad_norm 2.4491 (2.5174) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 08:15:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][410/625] eta 0:00:56 lr 0.000329 wd 0.0500 time 0.2571 (0.2632) data time 0.0013 (0.0022) model time 0.2558 (0.2609) loss 5.9444 (5.7743) grad_norm 2.3730 (2.5150) loss_scale 256.0000 (129.8686) mem 9655MB [2024-08-04 08:15:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][420/625] eta 0:00:53 lr 0.000328 wd 0.0500 time 0.2541 (0.2631) data time 0.0009 (0.0021) model time 0.2532 (0.2608) loss 5.4873 (5.7717) grad_norm 2.0529 (2.5121) loss_scale 256.0000 (132.8646) mem 9655MB [2024-08-04 08:15:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][430/625] eta 0:00:51 lr 0.000328 wd 0.0500 time 0.2520 (0.2629) data time 0.0008 (0.0021) model time 0.2512 (0.2606) loss 5.4403 (5.7736) grad_norm 1.6923 (2.4991) loss_scale 256.0000 (135.7216) mem 9655MB [2024-08-04 08:15:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][440/625] eta 0:00:48 lr 0.000328 wd 0.0500 time 0.2564 (0.2636) data time 0.0013 (0.0021) model time 0.2552 (0.2615) loss 5.8230 (5.7779) grad_norm 1.6473 (2.4832) loss_scale 256.0000 (138.4490) mem 9655MB [2024-08-04 08:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][450/625] eta 0:00:46 lr 0.000328 wd 0.0500 time 0.2546 (0.2638) data time 0.0010 (0.0021) model time 0.2536 (0.2616) loss 4.6957 (5.7730) grad_norm 2.5315 (2.4850) loss_scale 256.0000 (141.0554) mem 9655MB [2024-08-04 08:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][460/625] eta 0:00:43 lr 0.000328 wd 0.0500 time 0.2520 (0.2640) data time 0.0007 (0.0020) model time 0.2513 (0.2620) loss 6.3256 (5.7723) grad_norm 2.2671 (2.4904) loss_scale 256.0000 (143.5488) mem 9655MB [2024-08-04 08:15:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][470/625] eta 0:00:40 lr 0.000328 wd 0.0500 time 0.2521 (0.2638) data time 0.0011 (0.0020) model time 0.2510 (0.2618) loss 5.2934 (5.7750) grad_norm 1.9958 (2.4837) loss_scale 256.0000 (145.9363) mem 9655MB [2024-08-04 08:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][480/625] eta 0:00:38 lr 0.000328 wd 0.0500 time 0.2532 (0.2637) data time 0.0009 (0.0020) model time 0.2523 (0.2616) loss 6.1171 (5.7744) grad_norm 2.9014 (2.4765) loss_scale 256.0000 (148.2245) mem 9655MB [2024-08-04 08:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][490/625] eta 0:00:35 lr 0.000328 wd 0.0500 time 0.2559 (0.2635) data time 0.0008 (0.0020) model time 0.2551 (0.2615) loss 5.4934 (5.7780) grad_norm 2.2930 (2.4674) loss_scale 256.0000 (150.4196) mem 9655MB [2024-08-04 08:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][500/625] eta 0:00:32 lr 0.000327 wd 0.0500 time 0.2633 (0.2634) data time 0.0008 (0.0019) model time 0.2625 (0.2614) loss 5.8998 (5.7779) grad_norm 2.1425 (2.4677) loss_scale 256.0000 (152.5269) mem 9655MB [2024-08-04 08:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][510/625] eta 0:00:30 lr 0.000327 wd 0.0500 time 0.2583 (0.2635) data time 0.0012 (0.0019) model time 0.2571 (0.2615) loss 5.7312 (5.7816) grad_norm 2.7743 (2.4717) loss_scale 256.0000 (154.5519) mem 9655MB [2024-08-04 08:15:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][520/625] eta 0:00:27 lr 0.000327 wd 0.0500 time 0.2580 (0.2634) data time 0.0009 (0.0019) model time 0.2571 (0.2614) loss 5.6372 (5.7784) grad_norm 3.1068 (2.4944) loss_scale 256.0000 (156.4990) mem 9655MB [2024-08-04 08:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][530/625] eta 0:00:25 lr 0.000327 wd 0.0500 time 0.4415 (0.2636) data time 0.0009 (0.0019) model time 0.4406 (0.2617) loss 5.7884 (5.7790) grad_norm 2.2872 (2.5024) loss_scale 256.0000 (158.3729) mem 9655MB [2024-08-04 08:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][540/625] eta 0:00:22 lr 0.000327 wd 0.0500 time 0.2568 (0.2635) data time 0.0011 (0.0019) model time 0.2558 (0.2615) loss 4.9652 (5.7857) grad_norm 2.0756 (2.4983) loss_scale 256.0000 (160.1774) mem 9655MB [2024-08-04 08:15:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][550/625] eta 0:00:19 lr 0.000327 wd 0.0500 time 0.2594 (0.2634) data time 0.0005 (0.0019) model time 0.2589 (0.2614) loss 4.7973 (5.7804) grad_norm 1.5341 (2.4920) loss_scale 256.0000 (161.9165) mem 9655MB [2024-08-04 08:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][560/625] eta 0:00:17 lr 0.000327 wd 0.0500 time 0.2574 (0.2632) data time 0.0007 (0.0018) model time 0.2568 (0.2613) loss 4.7299 (5.7773) grad_norm 1.6144 (2.4921) loss_scale 256.0000 (163.5936) mem 9655MB [2024-08-04 08:15:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][570/625] eta 0:00:14 lr 0.000327 wd 0.0500 time 0.2540 (0.2631) data time 0.0009 (0.0018) model time 0.2531 (0.2612) loss 6.1698 (5.7746) grad_norm 1.8345 (2.4860) loss_scale 256.0000 (165.2119) mem 9655MB [2024-08-04 08:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][580/625] eta 0:00:11 lr 0.000326 wd 0.0500 time 0.2594 (0.2630) data time 0.0010 (0.0018) model time 0.2584 (0.2611) loss 5.8730 (5.7728) grad_norm 3.1221 (2.4843) loss_scale 256.0000 (166.7745) mem 9655MB [2024-08-04 08:15:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][590/625] eta 0:00:09 lr 0.000326 wd 0.0500 time 0.2552 (0.2629) data time 0.0010 (0.0018) model time 0.2543 (0.2610) loss 6.2965 (5.7730) grad_norm 1.3473 (2.4878) loss_scale 256.0000 (168.2843) mem 9655MB [2024-08-04 08:15:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][600/625] eta 0:00:06 lr 0.000326 wd 0.0500 time 0.2586 (0.2628) data time 0.0011 (0.0018) model time 0.2575 (0.2609) loss 6.5710 (5.7710) grad_norm 1.5030 (2.5038) loss_scale 256.0000 (169.7438) mem 9655MB [2024-08-04 08:15:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][610/625] eta 0:00:03 lr 0.000326 wd 0.0500 time 0.2527 (0.2630) data time 0.0006 (0.0018) model time 0.2521 (0.2611) loss 5.9858 (5.7686) grad_norm 2.4744 (2.4998) loss_scale 256.0000 (171.1555) mem 9655MB [2024-08-04 08:15:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][620/625] eta 0:00:01 lr 0.000326 wd 0.0500 time 0.2540 (0.2628) data time 0.0006 (0.0017) model time 0.2534 (0.2610) loss 5.4649 (5.7638) grad_norm 2.2295 (2.4966) loss_scale 256.0000 (172.5217) mem 9655MB [2024-08-04 08:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 227 training takes 0:02:44 [2024-08-04 08:15:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:15:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:15:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5986 (0.5986) Acc@1 89.990 (89.990) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:15:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9424 (0.7247) Acc@1 80.469 (86.550) Acc@5 95.752 (97.599) Mem 9655MB [2024-08-04 08:15:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0381 (0.8526) Acc@1 77.393 (83.189) Acc@5 94.580 (96.361) Mem 9655MB [2024-08-04 08:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.857 Acc@5 96.373 [2024-08-04 08:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.86% [2024-08-04 08:16:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:16:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.5815 (0.5815) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:16:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9170 (0.7123) Acc@1 80.664 (86.457) Acc@5 95.996 (97.683) Mem 9655MB [2024-08-04 08:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0283 (0.8357) Acc@1 77.246 (83.224) Acc@5 95.361 (96.468) Mem 9655MB [2024-08-04 08:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.909 Acc@5 96.471 [2024-08-04 08:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.91% [2024-08-04 08:16:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:16:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][0/625] eta 0:07:53 lr 0.000326 wd 0.0500 time 0.7574 (0.7574) data time 0.5145 (0.5145) model time 0.0000 (0.0000) loss 5.6666 (5.6666) grad_norm 3.8928 (3.8928) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][10/625] eta 0:03:11 lr 0.000326 wd 0.0500 time 0.2557 (0.3106) data time 0.0008 (0.0477) model time 0.0000 (0.0000) loss 4.3696 (5.6167) grad_norm 1.9027 (2.9354) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][20/625] eta 0:02:57 lr 0.000326 wd 0.0500 time 0.2589 (0.2937) data time 0.0009 (0.0254) model time 0.0000 (0.0000) loss 5.7617 (5.5034) grad_norm 2.6353 (2.9609) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][30/625] eta 0:02:47 lr 0.000325 wd 0.0500 time 0.2577 (0.2816) data time 0.0008 (0.0175) model time 0.0000 (0.0000) loss 6.7665 (5.6537) grad_norm 3.7305 (3.1096) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][40/625] eta 0:02:43 lr 0.000325 wd 0.0500 time 0.2596 (0.2797) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 5.5534 (5.7086) grad_norm 6.3065 (3.1332) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][50/625] eta 0:02:38 lr 0.000325 wd 0.0500 time 0.2547 (0.2753) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 6.1897 (5.7407) grad_norm 1.4172 (2.9975) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][60/625] eta 0:02:34 lr 0.000325 wd 0.0500 time 0.3648 (0.2739) data time 0.0007 (0.0094) model time 0.3641 (0.2659) loss 5.7077 (5.7205) grad_norm 2.2859 (3.0782) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][70/625] eta 0:02:33 lr 0.000325 wd 0.0500 time 0.2542 (0.2768) data time 0.0007 (0.0082) model time 0.2535 (0.2799) loss 7.0355 (5.7076) grad_norm 2.8722 (3.0331) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][80/625] eta 0:02:30 lr 0.000325 wd 0.0500 time 0.2541 (0.2759) data time 0.0009 (0.0073) model time 0.2531 (0.2761) loss 6.3646 (5.7292) grad_norm 3.9231 (3.2779) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][90/625] eta 0:02:26 lr 0.000325 wd 0.0500 time 0.2524 (0.2737) data time 0.0010 (0.0066) model time 0.2514 (0.2709) loss 6.3429 (5.7692) grad_norm 2.0327 (3.3017) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][100/625] eta 0:02:22 lr 0.000325 wd 0.0500 time 0.2559 (0.2719) data time 0.0007 (0.0060) model time 0.2552 (0.2676) loss 6.5818 (5.7695) grad_norm 1.5940 (3.1895) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][110/625] eta 0:02:19 lr 0.000324 wd 0.0500 time 0.2587 (0.2705) data time 0.0007 (0.0056) model time 0.2580 (0.2655) loss 6.4947 (5.7708) grad_norm 3.7431 (3.1911) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][120/625] eta 0:02:16 lr 0.000324 wd 0.0500 time 0.2505 (0.2707) data time 0.0007 (0.0052) model time 0.2498 (0.2664) loss 6.8882 (5.7645) grad_norm 3.9104 (3.1281) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][130/625] eta 0:02:13 lr 0.000324 wd 0.0500 time 0.2553 (0.2697) data time 0.0008 (0.0048) model time 0.2544 (0.2652) loss 4.6825 (5.7426) grad_norm 2.8152 (3.0894) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][140/625] eta 0:02:10 lr 0.000324 wd 0.0500 time 0.2549 (0.2687) data time 0.0009 (0.0046) model time 0.2541 (0.2640) loss 7.4534 (5.7568) grad_norm 2.0883 (3.0822) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][150/625] eta 0:02:07 lr 0.000324 wd 0.0500 time 0.2612 (0.2679) data time 0.0009 (0.0043) model time 0.2604 (0.2632) loss 6.3979 (5.7729) grad_norm 2.6289 (3.0454) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][160/625] eta 0:02:04 lr 0.000324 wd 0.0500 time 0.2584 (0.2680) data time 0.0008 (0.0041) model time 0.2576 (0.2637) loss 4.5222 (5.7581) grad_norm 2.1358 (3.0128) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][170/625] eta 0:02:01 lr 0.000324 wd 0.0500 time 0.2550 (0.2672) data time 0.0006 (0.0039) model time 0.2544 (0.2629) loss 5.7843 (5.7502) grad_norm 2.2922 (2.9942) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][180/625] eta 0:01:59 lr 0.000324 wd 0.0500 time 0.2534 (0.2675) data time 0.0009 (0.0038) model time 0.2525 (0.2636) loss 5.7904 (5.7637) grad_norm 1.5000 (2.9471) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][190/625] eta 0:01:56 lr 0.000323 wd 0.0500 time 0.2584 (0.2680) data time 0.0012 (0.0036) model time 0.2572 (0.2644) loss 6.1867 (5.7678) grad_norm 1.5770 (2.8893) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][200/625] eta 0:01:53 lr 0.000323 wd 0.0500 time 0.2499 (0.2682) data time 0.0009 (0.0035) model time 0.2490 (0.2649) loss 5.5654 (5.7559) grad_norm 3.3703 (2.8757) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][210/625] eta 0:01:51 lr 0.000323 wd 0.0500 time 0.2537 (0.2683) data time 0.0010 (0.0034) model time 0.2527 (0.2651) loss 5.0622 (5.7562) grad_norm 2.1290 (2.8415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][220/625] eta 0:01:48 lr 0.000323 wd 0.0500 time 0.2585 (0.2677) data time 0.0008 (0.0032) model time 0.2577 (0.2645) loss 5.9973 (5.7487) grad_norm 1.6508 (2.8007) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][230/625] eta 0:01:45 lr 0.000323 wd 0.0500 time 0.2597 (0.2672) data time 0.0007 (0.0031) model time 0.2590 (0.2640) loss 4.3225 (5.7492) grad_norm 1.7862 (2.7917) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][240/625] eta 0:01:42 lr 0.000323 wd 0.0500 time 0.2547 (0.2667) data time 0.0007 (0.0030) model time 0.2540 (0.2635) loss 6.6242 (5.7580) grad_norm 4.1196 (2.8276) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][250/625] eta 0:01:39 lr 0.000323 wd 0.0500 time 0.2540 (0.2663) data time 0.0008 (0.0030) model time 0.2532 (0.2631) loss 6.8630 (5.7657) grad_norm 3.3054 (2.8433) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][260/625] eta 0:01:37 lr 0.000323 wd 0.0500 time 0.2520 (0.2659) data time 0.0010 (0.0029) model time 0.2510 (0.2627) loss 6.3705 (5.7719) grad_norm 1.6697 (2.8077) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][270/625] eta 0:01:34 lr 0.000322 wd 0.0500 time 0.2549 (0.2655) data time 0.0008 (0.0028) model time 0.2541 (0.2624) loss 5.6194 (5.7682) grad_norm 2.7585 (2.7960) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][280/625] eta 0:01:31 lr 0.000322 wd 0.0500 time 0.2554 (0.2652) data time 0.0013 (0.0027) model time 0.2541 (0.2620) loss 5.9857 (5.7700) grad_norm 1.8037 (2.7696) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][290/625] eta 0:01:28 lr 0.000322 wd 0.0500 time 0.2546 (0.2649) data time 0.0008 (0.0027) model time 0.2539 (0.2617) loss 4.6768 (5.7725) grad_norm 2.9867 (2.8426) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][300/625] eta 0:01:26 lr 0.000322 wd 0.0500 time 0.2553 (0.2646) data time 0.0007 (0.0026) model time 0.2546 (0.2615) loss 5.0455 (5.7660) grad_norm 3.7317 (2.8503) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][310/625] eta 0:01:23 lr 0.000322 wd 0.0500 time 0.2559 (0.2644) data time 0.0008 (0.0026) model time 0.2550 (0.2614) loss 5.8605 (5.7712) grad_norm 1.9181 (2.8426) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][320/625] eta 0:01:20 lr 0.000322 wd 0.0500 time 0.2566 (0.2641) data time 0.0008 (0.0025) model time 0.2558 (0.2611) loss 5.8733 (5.7714) grad_norm 2.4759 (2.8191) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][330/625] eta 0:01:18 lr 0.000322 wd 0.0500 time 0.2545 (0.2645) data time 0.0006 (0.0025) model time 0.2539 (0.2617) loss 6.4281 (5.7787) grad_norm 1.8700 (2.7908) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][340/625] eta 0:01:15 lr 0.000321 wd 0.0500 time 0.2574 (0.2643) data time 0.0006 (0.0024) model time 0.2568 (0.2615) loss 6.2712 (5.7842) grad_norm 2.0920 (2.7654) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][350/625] eta 0:01:12 lr 0.000321 wd 0.0500 time 0.2553 (0.2641) data time 0.0010 (0.0024) model time 0.2543 (0.2613) loss 6.8413 (5.7871) grad_norm 1.5711 (2.7423) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][360/625] eta 0:01:10 lr 0.000321 wd 0.0500 time 0.2551 (0.2642) data time 0.0007 (0.0023) model time 0.2543 (0.2615) loss 6.8775 (5.7867) grad_norm 2.2620 (2.7252) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][370/625] eta 0:01:07 lr 0.000321 wd 0.0500 time 0.2566 (0.2640) data time 0.0007 (0.0023) model time 0.2559 (0.2613) loss 6.2354 (5.7890) grad_norm 10.5692 (2.7423) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][380/625] eta 0:01:04 lr 0.000321 wd 0.0500 time 0.2538 (0.2638) data time 0.0009 (0.0023) model time 0.2529 (0.2612) loss 6.0550 (5.7968) grad_norm 2.1547 (2.7420) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][390/625] eta 0:01:01 lr 0.000321 wd 0.0500 time 0.2577 (0.2636) data time 0.0008 (0.0022) model time 0.2569 (0.2610) loss 5.4328 (5.7882) grad_norm 1.6536 (2.7306) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][400/625] eta 0:00:59 lr 0.000321 wd 0.0500 time 0.2590 (0.2635) data time 0.0007 (0.0022) model time 0.2583 (0.2608) loss 5.5377 (5.7878) grad_norm 2.1144 (2.7046) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][410/625] eta 0:00:56 lr 0.000321 wd 0.0500 time 0.2539 (0.2633) data time 0.0007 (0.0022) model time 0.2532 (0.2607) loss 5.4439 (5.7874) grad_norm 2.3910 (2.7973) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][420/625] eta 0:00:54 lr 0.000320 wd 0.0500 time 0.2542 (0.2636) data time 0.0009 (0.0021) model time 0.2533 (0.2611) loss 5.1046 (5.7896) grad_norm 3.2461 (2.7885) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][430/625] eta 0:00:51 lr 0.000320 wd 0.0500 time 0.2591 (0.2634) data time 0.0008 (0.0021) model time 0.2583 (0.2609) loss 5.3627 (5.7880) grad_norm 2.3024 (2.7821) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][440/625] eta 0:00:48 lr 0.000320 wd 0.0500 time 0.2610 (0.2633) data time 0.0008 (0.0021) model time 0.2601 (0.2608) loss 5.9686 (5.7919) grad_norm 2.3186 (2.7729) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][450/625] eta 0:00:46 lr 0.000320 wd 0.0500 time 0.2531 (0.2631) data time 0.0009 (0.0021) model time 0.2522 (0.2607) loss 5.5487 (5.7848) grad_norm 3.5894 (2.7639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][460/625] eta 0:00:43 lr 0.000320 wd 0.0500 time 0.2543 (0.2638) data time 0.0009 (0.0020) model time 0.2534 (0.2614) loss 6.0792 (5.7808) grad_norm 3.5471 (2.7791) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][470/625] eta 0:00:40 lr 0.000320 wd 0.0500 time 0.2560 (0.2636) data time 0.0008 (0.0020) model time 0.2552 (0.2613) loss 5.7699 (5.7771) grad_norm 3.4733 (2.8041) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][480/625] eta 0:00:38 lr 0.000320 wd 0.0500 time 0.2541 (0.2638) data time 0.0009 (0.0020) model time 0.2532 (0.2616) loss 6.4924 (5.7708) grad_norm 2.3204 (2.7897) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][490/625] eta 0:00:35 lr 0.000320 wd 0.0500 time 0.2562 (0.2639) data time 0.0008 (0.0020) model time 0.2554 (0.2617) loss 6.3978 (5.7687) grad_norm 3.2887 (2.7846) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][500/625] eta 0:00:32 lr 0.000319 wd 0.0500 time 0.2530 (0.2637) data time 0.0008 (0.0019) model time 0.2522 (0.2615) loss 5.0804 (5.7718) grad_norm 1.6820 (2.7724) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][510/625] eta 0:00:30 lr 0.000319 wd 0.0500 time 0.2548 (0.2638) data time 0.0010 (0.0019) model time 0.2538 (0.2616) loss 5.3184 (5.7723) grad_norm 2.6331 (2.7666) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][520/625] eta 0:00:27 lr 0.000319 wd 0.0500 time 0.2537 (0.2636) data time 0.0007 (0.0019) model time 0.2529 (0.2615) loss 7.0677 (5.7766) grad_norm 4.8737 (2.7645) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][530/625] eta 0:00:25 lr 0.000319 wd 0.0500 time 0.2564 (0.2635) data time 0.0009 (0.0019) model time 0.2555 (0.2614) loss 5.9032 (5.7736) grad_norm 4.2526 (2.7561) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][540/625] eta 0:00:22 lr 0.000319 wd 0.0500 time 0.2544 (0.2634) data time 0.0009 (0.0019) model time 0.2536 (0.2612) loss 6.2561 (5.7723) grad_norm 2.7963 (2.7515) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][550/625] eta 0:00:19 lr 0.000319 wd 0.0500 time 0.2580 (0.2632) data time 0.0009 (0.0018) model time 0.2571 (0.2611) loss 5.7848 (5.7683) grad_norm 3.6004 (2.7505) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][560/625] eta 0:00:17 lr 0.000319 wd 0.0500 time 0.2675 (0.2631) data time 0.0006 (0.0018) model time 0.2669 (0.2610) loss 4.8328 (5.7715) grad_norm 3.5243 (2.7521) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][570/625] eta 0:00:14 lr 0.000319 wd 0.0500 time 0.2537 (0.2630) data time 0.0006 (0.0018) model time 0.2530 (0.2609) loss 6.3389 (5.7698) grad_norm 1.6747 (2.7467) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][580/625] eta 0:00:11 lr 0.000318 wd 0.0500 time 0.2538 (0.2629) data time 0.0007 (0.0018) model time 0.2531 (0.2608) loss 5.9162 (5.7739) grad_norm 4.3027 (2.7486) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][590/625] eta 0:00:09 lr 0.000318 wd 0.0500 time 0.2573 (0.2628) data time 0.0011 (0.0018) model time 0.2562 (0.2607) loss 5.9912 (5.7737) grad_norm 2.7990 (2.7479) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][600/625] eta 0:00:06 lr 0.000318 wd 0.0500 time 0.2566 (0.2630) data time 0.0006 (0.0018) model time 0.2560 (0.2609) loss 5.9980 (5.7708) grad_norm 3.8118 (2.7450) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][610/625] eta 0:00:03 lr 0.000318 wd 0.0500 time 0.2543 (0.2629) data time 0.0006 (0.0018) model time 0.2537 (0.2609) loss 6.3652 (5.7669) grad_norm 1.4694 (2.7369) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][620/625] eta 0:00:01 lr 0.000318 wd 0.0500 time 0.2543 (0.2627) data time 0.0005 (0.0017) model time 0.2538 (0.2607) loss 6.5216 (5.7659) grad_norm 1.9748 (2.7307) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 228 training takes 0:02:44 [2024-08-04 08:18:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:18:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.505 (0.505) Loss 0.5918 (0.5918) Acc@1 89.404 (89.404) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:18:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.099) Loss 0.9346 (0.7228) Acc@1 80.371 (86.279) Acc@5 95.703 (97.594) Mem 9655MB [2024-08-04 08:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0264 (0.8438) Acc@1 77.539 (83.119) Acc@5 95.068 (96.391) Mem 9655MB [2024-08-04 08:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.792 Acc@5 96.389 [2024-08-04 08:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.753 (0.753) Loss 0.5815 (0.5815) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9160 (0.7121) Acc@1 80.615 (86.439) Acc@5 96.094 (97.701) Mem 9655MB [2024-08-04 08:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0273 (0.8354) Acc@1 77.148 (83.224) Acc@5 95.361 (96.489) Mem 9655MB [2024-08-04 08:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.909 Acc@5 96.489 [2024-08-04 08:18:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][0/625] eta 0:11:36 lr 0.000318 wd 0.0500 time 1.1144 (1.1144) data time 0.5551 (0.5551) model time 0.0000 (0.0000) loss 6.1427 (6.1427) grad_norm 3.0516 (3.0516) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][10/625] eta 0:03:45 lr 0.000318 wd 0.0500 time 0.2549 (0.3662) data time 0.0008 (0.0513) model time 0.0000 (0.0000) loss 5.7490 (6.0303) grad_norm 2.9690 (3.1634) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][20/625] eta 0:03:14 lr 0.000318 wd 0.0500 time 0.2493 (0.3213) data time 0.0011 (0.0273) model time 0.0000 (0.0000) loss 4.8572 (5.7325) grad_norm 2.7636 (2.7334) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][30/625] eta 0:03:02 lr 0.000317 wd 0.0500 time 0.2585 (0.3068) data time 0.0010 (0.0188) model time 0.0000 (0.0000) loss 6.4484 (5.7868) grad_norm 2.6525 (2.7748) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][40/625] eta 0:02:52 lr 0.000317 wd 0.0500 time 0.2566 (0.2946) data time 0.0006 (0.0144) model time 0.0000 (0.0000) loss 4.5646 (5.6985) grad_norm 2.2207 (2.5936) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][50/625] eta 0:02:47 lr 0.000317 wd 0.0500 time 0.2565 (0.2906) data time 0.0011 (0.0118) model time 0.0000 (0.0000) loss 5.3489 (5.7534) grad_norm 3.3151 (2.5641) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][60/625] eta 0:02:40 lr 0.000317 wd 0.0500 time 0.2595 (0.2849) data time 0.0009 (0.0100) model time 0.2586 (0.2550) loss 6.1661 (5.7672) grad_norm 3.2180 (2.5004) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][70/625] eta 0:02:35 lr 0.000317 wd 0.0500 time 0.2560 (0.2809) data time 0.0006 (0.0087) model time 0.2554 (0.2552) loss 5.5804 (5.7847) grad_norm 2.1245 (2.4524) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][80/625] eta 0:02:31 lr 0.000317 wd 0.0500 time 0.2549 (0.2776) data time 0.0008 (0.0077) model time 0.2540 (0.2547) loss 6.2824 (5.7604) grad_norm 2.6096 (2.4268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][90/625] eta 0:02:27 lr 0.000317 wd 0.0500 time 0.2491 (0.2752) data time 0.0007 (0.0070) model time 0.2484 (0.2546) loss 7.1816 (5.7634) grad_norm 3.0886 (2.4103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][100/625] eta 0:02:23 lr 0.000317 wd 0.0500 time 0.2536 (0.2733) data time 0.0009 (0.0064) model time 0.2526 (0.2547) loss 5.6086 (5.7573) grad_norm 2.9630 (2.3838) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][110/625] eta 0:02:20 lr 0.000316 wd 0.0500 time 0.2550 (0.2734) data time 0.0010 (0.0059) model time 0.2540 (0.2579) loss 6.1266 (5.7467) grad_norm 2.0248 (2.3571) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][120/625] eta 0:02:17 lr 0.000316 wd 0.0500 time 0.2544 (0.2719) data time 0.0011 (0.0055) model time 0.2533 (0.2574) loss 5.9468 (5.7370) grad_norm 1.7834 (2.3547) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][130/625] eta 0:02:14 lr 0.000316 wd 0.0500 time 0.2613 (0.2707) data time 0.0009 (0.0052) model time 0.2604 (0.2571) loss 6.4488 (5.7446) grad_norm 1.9989 (2.3820) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][140/625] eta 0:02:10 lr 0.000316 wd 0.0500 time 0.2539 (0.2697) data time 0.0007 (0.0049) model time 0.2532 (0.2569) loss 5.3858 (5.7187) grad_norm 1.5865 (2.4291) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][150/625] eta 0:02:08 lr 0.000316 wd 0.0500 time 0.2545 (0.2697) data time 0.0009 (0.0046) model time 0.2536 (0.2581) loss 4.7571 (5.7092) grad_norm 3.1080 (2.4506) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][160/625] eta 0:02:05 lr 0.000316 wd 0.0500 time 0.2543 (0.2700) data time 0.0010 (0.0044) model time 0.2533 (0.2596) loss 5.3402 (5.6940) grad_norm 1.9367 (2.4245) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][170/625] eta 0:02:02 lr 0.000316 wd 0.0500 time 0.2558 (0.2692) data time 0.0007 (0.0042) model time 0.2551 (0.2592) loss 5.5046 (5.6966) grad_norm 2.6042 (2.4248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][180/625] eta 0:01:59 lr 0.000316 wd 0.0500 time 0.2543 (0.2685) data time 0.0010 (0.0040) model time 0.2533 (0.2590) loss 5.1913 (5.7007) grad_norm 1.7827 (2.4744) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][190/625] eta 0:01:56 lr 0.000315 wd 0.0500 time 0.2550 (0.2679) data time 0.0010 (0.0038) model time 0.2540 (0.2587) loss 6.0709 (5.7218) grad_norm 2.8363 (2.4723) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][200/625] eta 0:01:53 lr 0.000315 wd 0.0500 time 0.2546 (0.2673) data time 0.0011 (0.0037) model time 0.2535 (0.2585) loss 5.6362 (5.7239) grad_norm 45.9268 (2.7293) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][210/625] eta 0:01:50 lr 0.000315 wd 0.0500 time 0.2577 (0.2669) data time 0.0008 (0.0036) model time 0.2569 (0.2584) loss 6.5577 (5.7418) grad_norm 2.1849 (2.7174) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][220/625] eta 0:01:47 lr 0.000315 wd 0.0500 time 0.2536 (0.2664) data time 0.0009 (0.0034) model time 0.2527 (0.2582) loss 6.0354 (5.7457) grad_norm 2.6875 (2.7418) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][230/625] eta 0:01:45 lr 0.000315 wd 0.0500 time 0.2552 (0.2668) data time 0.0008 (0.0033) model time 0.2544 (0.2591) loss 5.5351 (5.7265) grad_norm 2.3883 (2.7357) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][240/625] eta 0:01:42 lr 0.000315 wd 0.0500 time 0.2550 (0.2664) data time 0.0009 (0.0032) model time 0.2541 (0.2589) loss 6.2185 (5.7309) grad_norm 2.9505 (2.7161) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:19:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][250/625] eta 0:01:40 lr 0.000315 wd 0.0500 time 0.2564 (0.2673) data time 0.0008 (0.0031) model time 0.2556 (0.2605) loss 5.9579 (5.7455) grad_norm 2.9241 (2.7130) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][260/625] eta 0:01:37 lr 0.000315 wd 0.0500 time 0.2578 (0.2669) data time 0.0008 (0.0030) model time 0.2570 (0.2602) loss 6.2783 (5.7379) grad_norm 1.8374 (2.6836) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][270/625] eta 0:01:34 lr 0.000314 wd 0.0500 time 0.2540 (0.2672) data time 0.0010 (0.0030) model time 0.2531 (0.2608) loss 6.3077 (5.7390) grad_norm 5.1325 (2.6819) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][280/625] eta 0:01:32 lr 0.000314 wd 0.0500 time 0.2580 (0.2668) data time 0.0010 (0.0029) model time 0.2570 (0.2605) loss 5.2517 (5.7382) grad_norm 2.0842 (2.6716) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][290/625] eta 0:01:29 lr 0.000314 wd 0.0500 time 0.2505 (0.2663) data time 0.0007 (0.0028) model time 0.2498 (0.2603) loss 6.7067 (5.7353) grad_norm 3.4093 (2.6640) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][300/625] eta 0:01:26 lr 0.000314 wd 0.0500 time 0.2575 (0.2660) data time 0.0008 (0.0028) model time 0.2566 (0.2601) loss 4.6439 (5.7372) grad_norm 1.5099 (2.6520) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][310/625] eta 0:01:23 lr 0.000314 wd 0.0500 time 0.2580 (0.2657) data time 0.0011 (0.0027) model time 0.2569 (0.2599) loss 6.0261 (5.7451) grad_norm 1.9706 (2.6382) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][320/625] eta 0:01:20 lr 0.000314 wd 0.0500 time 0.2577 (0.2655) data time 0.0007 (0.0027) model time 0.2570 (0.2598) loss 5.5911 (5.7496) grad_norm 2.7129 (2.6292) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][330/625] eta 0:01:18 lr 0.000314 wd 0.0500 time 0.2583 (0.2658) data time 0.0007 (0.0026) model time 0.2576 (0.2604) loss 5.4301 (5.7462) grad_norm 1.3519 (2.6067) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][340/625] eta 0:01:15 lr 0.000314 wd 0.0500 time 0.2529 (0.2661) data time 0.0008 (0.0026) model time 0.2520 (0.2609) loss 6.4916 (5.7495) grad_norm 1.8437 (2.5874) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][350/625] eta 0:01:13 lr 0.000313 wd 0.0500 time 0.2538 (0.2658) data time 0.0008 (0.0025) model time 0.2530 (0.2607) loss 5.8691 (5.7505) grad_norm 2.9307 (2.6052) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][360/625] eta 0:01:10 lr 0.000313 wd 0.0500 time 0.2557 (0.2655) data time 0.0017 (0.0025) model time 0.2540 (0.2605) loss 5.0794 (5.7429) grad_norm 1.4803 (2.5971) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][370/625] eta 0:01:07 lr 0.000313 wd 0.0500 time 0.2515 (0.2662) data time 0.0007 (0.0024) model time 0.2508 (0.2614) loss 5.6592 (5.7463) grad_norm 1.7275 (2.5926) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][380/625] eta 0:01:05 lr 0.000313 wd 0.0500 time 0.2566 (0.2664) data time 0.0006 (0.0024) model time 0.2560 (0.2618) loss 4.4195 (5.7478) grad_norm 9.7140 (2.5977) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][390/625] eta 0:01:02 lr 0.000313 wd 0.0500 time 0.2535 (0.2662) data time 0.0007 (0.0024) model time 0.2528 (0.2616) loss 7.3469 (5.7508) grad_norm 1.5300 (2.5998) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][400/625] eta 0:00:59 lr 0.000313 wd 0.0500 time 0.2556 (0.2659) data time 0.0009 (0.0023) model time 0.2547 (0.2614) loss 6.4609 (5.7592) grad_norm 1.4107 (2.5943) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][410/625] eta 0:00:57 lr 0.000313 wd 0.0500 time 0.2551 (0.2657) data time 0.0007 (0.0023) model time 0.2544 (0.2613) loss 5.0717 (5.7550) grad_norm 1.8307 (2.6050) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][420/625] eta 0:00:54 lr 0.000313 wd 0.0500 time 0.4612 (0.2660) data time 0.0007 (0.0023) model time 0.4605 (0.2616) loss 5.6870 (5.7540) grad_norm 2.0360 (2.6113) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][430/625] eta 0:00:51 lr 0.000312 wd 0.0500 time 0.2534 (0.2657) data time 0.0007 (0.0022) model time 0.2527 (0.2615) loss 5.3927 (5.7430) grad_norm 1.8885 (2.6104) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][440/625] eta 0:00:49 lr 0.000312 wd 0.0500 time 0.2524 (0.2655) data time 0.0008 (0.0022) model time 0.2515 (0.2613) loss 6.3141 (5.7394) grad_norm 2.1727 (2.6064) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][450/625] eta 0:00:46 lr 0.000312 wd 0.0500 time 0.2557 (0.2657) data time 0.0008 (0.0022) model time 0.2549 (0.2616) loss 6.3332 (5.7357) grad_norm 1.9645 (2.5973) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][460/625] eta 0:00:43 lr 0.000312 wd 0.0500 time 0.2581 (0.2659) data time 0.0015 (0.0021) model time 0.2566 (0.2619) loss 5.8098 (5.7417) grad_norm 2.7329 (2.5934) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][470/625] eta 0:00:41 lr 0.000312 wd 0.0500 time 0.2540 (0.2657) data time 0.0009 (0.0021) model time 0.2531 (0.2617) loss 5.7536 (5.7440) grad_norm 2.4856 (2.5964) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][480/625] eta 0:00:38 lr 0.000312 wd 0.0500 time 0.2574 (0.2655) data time 0.0007 (0.0021) model time 0.2566 (0.2616) loss 6.6871 (5.7428) grad_norm 2.8058 (2.6123) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][490/625] eta 0:00:35 lr 0.000312 wd 0.0500 time 0.4472 (0.2657) data time 0.0009 (0.0021) model time 0.4462 (0.2619) loss 6.9674 (5.7446) grad_norm 2.2477 (2.6120) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][500/625] eta 0:00:33 lr 0.000312 wd 0.0500 time 0.2582 (0.2655) data time 0.0009 (0.0020) model time 0.2574 (0.2617) loss 4.9895 (5.7524) grad_norm 1.4082 (2.6070) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][510/625] eta 0:00:30 lr 0.000311 wd 0.0500 time 0.2534 (0.2653) data time 0.0010 (0.0020) model time 0.2524 (0.2616) loss 4.6477 (5.7530) grad_norm 1.6981 (2.5934) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][520/625] eta 0:00:27 lr 0.000311 wd 0.0500 time 0.2556 (0.2651) data time 0.0008 (0.0020) model time 0.2548 (0.2614) loss 5.5117 (5.7453) grad_norm 2.6860 (2.5964) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][530/625] eta 0:00:25 lr 0.000311 wd 0.0500 time 0.2566 (0.2649) data time 0.0006 (0.0020) model time 0.2560 (0.2613) loss 5.4822 (5.7442) grad_norm 2.9294 (2.5963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][540/625] eta 0:00:22 lr 0.000311 wd 0.0500 time 0.2518 (0.2648) data time 0.0007 (0.0020) model time 0.2511 (0.2611) loss 4.9668 (5.7402) grad_norm 3.5029 (2.5983) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][550/625] eta 0:00:19 lr 0.000311 wd 0.0500 time 0.2575 (0.2649) data time 0.0009 (0.0019) model time 0.2565 (0.2614) loss 6.1516 (5.7424) grad_norm 2.6241 (2.5872) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][560/625] eta 0:00:17 lr 0.000311 wd 0.0500 time 0.2571 (0.2648) data time 0.0009 (0.0019) model time 0.2562 (0.2613) loss 6.1432 (5.7403) grad_norm 1.8563 (2.5910) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][570/625] eta 0:00:14 lr 0.000311 wd 0.0500 time 0.2561 (0.2646) data time 0.0010 (0.0019) model time 0.2550 (0.2611) loss 6.9695 (5.7413) grad_norm 3.8319 (2.5865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][580/625] eta 0:00:11 lr 0.000311 wd 0.0500 time 0.2553 (0.2645) data time 0.0009 (0.0019) model time 0.2544 (0.2610) loss 5.9106 (5.7405) grad_norm 1.9618 (2.5795) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][590/625] eta 0:00:09 lr 0.000310 wd 0.0500 time 0.2629 (0.2643) data time 0.0008 (0.0019) model time 0.2621 (0.2609) loss 5.8392 (5.7409) grad_norm 2.1233 (2.5752) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][600/625] eta 0:00:06 lr 0.000310 wd 0.0500 time 0.2550 (0.2642) data time 0.0006 (0.0019) model time 0.2543 (0.2608) loss 4.6714 (5.7365) grad_norm 1.6640 (2.5701) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][610/625] eta 0:00:03 lr 0.000310 wd 0.0500 time 0.2530 (0.2641) data time 0.0005 (0.0019) model time 0.2525 (0.2607) loss 5.9799 (5.7375) grad_norm 2.5742 (2.5686) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][620/625] eta 0:00:01 lr 0.000310 wd 0.0500 time 0.2531 (0.2639) data time 0.0005 (0.0018) model time 0.2526 (0.2605) loss 4.8414 (5.7359) grad_norm 2.0037 (2.5636) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 229 training takes 0:02:45 [2024-08-04 08:21:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:21:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:21:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.6299 (0.6299) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9653 (0.7547) Acc@1 80.420 (86.324) Acc@5 96.631 (97.701) Mem 9655MB [2024-08-04 08:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0488 (0.8787) Acc@1 78.369 (83.173) Acc@5 95.361 (96.380) Mem 9655MB [2024-08-04 08:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.841 Acc@5 96.393 [2024-08-04 08:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.751 (0.751) Loss 0.5820 (0.5820) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9160 (0.7118) Acc@1 80.566 (86.461) Acc@5 96.094 (97.718) Mem 9655MB [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0264 (0.8350) Acc@1 77.148 (83.243) Acc@5 95.361 (96.489) Mem 9655MB [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.925 Acc@5 96.489 [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.93% [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:21:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][0/625] eta 0:08:20 lr 0.000310 wd 0.0500 time 0.8006 (0.8006) data time 0.5631 (0.5631) model time 0.0000 (0.0000) loss 6.5327 (6.5327) grad_norm 2.5683 (2.5683) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][10/625] eta 0:03:26 lr 0.000310 wd 0.0500 time 0.2576 (0.3353) data time 0.0008 (0.0523) model time 0.0000 (0.0000) loss 4.8225 (5.8360) grad_norm 5.7640 (2.3739) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][20/625] eta 0:03:00 lr 0.000310 wd 0.0500 time 0.2583 (0.2983) data time 0.0007 (0.0278) model time 0.0000 (0.0000) loss 7.2486 (5.8692) grad_norm 2.2913 (2.1874) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][30/625] eta 0:02:49 lr 0.000310 wd 0.0500 time 0.2566 (0.2847) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 5.7831 (5.8345) grad_norm 4.7506 (2.5099) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][40/625] eta 0:02:42 lr 0.000309 wd 0.0500 time 0.2538 (0.2775) data time 0.0009 (0.0147) model time 0.0000 (0.0000) loss 5.7384 (5.7411) grad_norm 3.0298 (2.4518) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][50/625] eta 0:02:37 lr 0.000309 wd 0.0500 time 0.2516 (0.2732) data time 0.0009 (0.0120) model time 0.0000 (0.0000) loss 6.3300 (5.7223) grad_norm 1.9405 (2.4164) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][60/625] eta 0:02:34 lr 0.000309 wd 0.0500 time 0.3995 (0.2729) data time 0.0008 (0.0102) model time 0.3987 (0.2704) loss 6.2017 (5.6954) grad_norm 2.9509 (2.4310) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][70/625] eta 0:02:31 lr 0.000309 wd 0.0500 time 0.2548 (0.2733) data time 0.0006 (0.0089) model time 0.2542 (0.2726) loss 6.3547 (5.7623) grad_norm 2.1264 (2.4798) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][80/625] eta 0:02:28 lr 0.000309 wd 0.0500 time 0.2547 (0.2734) data time 0.0008 (0.0079) model time 0.2539 (0.2728) loss 5.6330 (5.7689) grad_norm 3.2831 (2.4833) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][90/625] eta 0:02:25 lr 0.000309 wd 0.0500 time 0.2596 (0.2714) data time 0.0007 (0.0071) model time 0.2588 (0.2683) loss 4.9860 (5.7337) grad_norm 1.4183 (2.4980) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][100/625] eta 0:02:22 lr 0.000309 wd 0.0500 time 0.2548 (0.2715) data time 0.0009 (0.0065) model time 0.2539 (0.2689) loss 5.3164 (5.7032) grad_norm 2.0342 (2.4412) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][110/625] eta 0:02:20 lr 0.000309 wd 0.0500 time 0.2572 (0.2720) data time 0.0006 (0.0060) model time 0.2566 (0.2701) loss 6.2339 (5.6883) grad_norm 1.7809 (2.4557) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][120/625] eta 0:02:16 lr 0.000308 wd 0.0500 time 0.2511 (0.2707) data time 0.0007 (0.0056) model time 0.2503 (0.2679) loss 4.7733 (5.6622) grad_norm 4.0739 (2.4264) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][130/625] eta 0:02:13 lr 0.000308 wd 0.0500 time 0.2547 (0.2695) data time 0.0007 (0.0052) model time 0.2539 (0.2663) loss 5.5226 (5.6599) grad_norm 5.2481 (2.4450) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][140/625] eta 0:02:10 lr 0.000308 wd 0.0500 time 0.2594 (0.2698) data time 0.0009 (0.0049) model time 0.2585 (0.2669) loss 6.4196 (5.6635) grad_norm 2.9760 (2.4785) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][150/625] eta 0:02:08 lr 0.000308 wd 0.0500 time 0.2547 (0.2700) data time 0.0009 (0.0047) model time 0.2537 (0.2675) loss 5.5389 (5.6622) grad_norm 1.9779 (2.4657) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][160/625] eta 0:02:05 lr 0.000308 wd 0.0500 time 0.2585 (0.2691) data time 0.0008 (0.0044) model time 0.2577 (0.2663) loss 5.0779 (5.6714) grad_norm 2.2259 (2.4456) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][170/625] eta 0:02:02 lr 0.000308 wd 0.0500 time 0.2534 (0.2695) data time 0.0009 (0.0042) model time 0.2525 (0.2670) loss 4.9808 (5.6674) grad_norm 3.7306 (2.4506) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][180/625] eta 0:01:59 lr 0.000308 wd 0.0500 time 0.2572 (0.2687) data time 0.0009 (0.0041) model time 0.2562 (0.2661) loss 6.6327 (5.6780) grad_norm 1.9607 (2.4187) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][190/625] eta 0:01:56 lr 0.000308 wd 0.0500 time 0.2612 (0.2688) data time 0.0008 (0.0039) model time 0.2604 (0.2663) loss 6.0823 (5.6827) grad_norm 2.1738 (2.3861) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][200/625] eta 0:01:53 lr 0.000307 wd 0.0500 time 0.2562 (0.2682) data time 0.0006 (0.0037) model time 0.2556 (0.2656) loss 4.5549 (5.6726) grad_norm 1.5020 (2.3880) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][210/625] eta 0:01:51 lr 0.000307 wd 0.0500 time 0.2553 (0.2676) data time 0.0008 (0.0036) model time 0.2545 (0.2649) loss 5.5220 (5.6721) grad_norm 1.9452 (2.3717) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][220/625] eta 0:01:48 lr 0.000307 wd 0.0500 time 0.2581 (0.2671) data time 0.0008 (0.0035) model time 0.2574 (0.2643) loss 4.2143 (5.6678) grad_norm 2.8977 (2.3505) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][230/625] eta 0:01:45 lr 0.000307 wd 0.0500 time 0.2558 (0.2672) data time 0.0007 (0.0034) model time 0.2552 (0.2646) loss 5.6345 (5.6600) grad_norm 1.6949 (2.3360) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][240/625] eta 0:01:42 lr 0.000307 wd 0.0500 time 0.2574 (0.2667) data time 0.0006 (0.0033) model time 0.2569 (0.2641) loss 5.4432 (5.6481) grad_norm 2.6087 (2.3733) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][250/625] eta 0:01:39 lr 0.000307 wd 0.0500 time 0.2578 (0.2663) data time 0.0010 (0.0032) model time 0.2568 (0.2637) loss 6.0760 (5.6552) grad_norm 2.8876 (2.4089) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][260/625] eta 0:01:37 lr 0.000307 wd 0.0500 time 0.2594 (0.2660) data time 0.0006 (0.0031) model time 0.2588 (0.2633) loss 6.0067 (5.6627) grad_norm 3.4396 (2.4503) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][270/625] eta 0:01:34 lr 0.000307 wd 0.0500 time 0.2597 (0.2656) data time 0.0007 (0.0030) model time 0.2590 (0.2629) loss 5.6325 (5.6758) grad_norm 2.1373 (2.4647) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][280/625] eta 0:01:31 lr 0.000306 wd 0.0500 time 0.2626 (0.2653) data time 0.0008 (0.0029) model time 0.2618 (0.2626) loss 6.1408 (5.6764) grad_norm 1.6943 (2.4560) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][290/625] eta 0:01:28 lr 0.000306 wd 0.0500 time 0.2542 (0.2650) data time 0.0010 (0.0029) model time 0.2532 (0.2623) loss 5.8556 (5.6761) grad_norm 2.7732 (2.5082) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][300/625] eta 0:01:26 lr 0.000306 wd 0.0500 time 0.2540 (0.2647) data time 0.0007 (0.0028) model time 0.2534 (0.2620) loss 5.9891 (5.6749) grad_norm 3.6965 (2.5513) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][310/625] eta 0:01:23 lr 0.000306 wd 0.0500 time 0.2588 (0.2644) data time 0.0008 (0.0027) model time 0.2580 (0.2618) loss 5.1439 (5.6885) grad_norm 2.0016 (2.5554) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][320/625] eta 0:01:20 lr 0.000306 wd 0.0500 time 0.2555 (0.2642) data time 0.0009 (0.0027) model time 0.2546 (0.2615) loss 4.8321 (5.6790) grad_norm 2.9653 (2.5518) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][330/625] eta 0:01:18 lr 0.000306 wd 0.0500 time 0.2460 (0.2648) data time 0.0014 (0.0026) model time 0.2447 (0.2623) loss 5.8707 (5.6699) grad_norm 2.5508 (2.5567) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][340/625] eta 0:01:15 lr 0.000306 wd 0.0500 time 0.2562 (0.2645) data time 0.0009 (0.0026) model time 0.2553 (0.2621) loss 4.8647 (5.6734) grad_norm 2.5775 (2.5415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][350/625] eta 0:01:12 lr 0.000306 wd 0.0500 time 0.2498 (0.2642) data time 0.0007 (0.0025) model time 0.2490 (0.2618) loss 4.7502 (5.6773) grad_norm 2.7610 (2.5362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][360/625] eta 0:01:09 lr 0.000305 wd 0.0500 time 0.2564 (0.2640) data time 0.0010 (0.0025) model time 0.2553 (0.2616) loss 5.7329 (5.6722) grad_norm 1.7324 (2.5378) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][370/625] eta 0:01:07 lr 0.000305 wd 0.0500 time 0.2524 (0.2638) data time 0.0006 (0.0025) model time 0.2518 (0.2614) loss 4.8071 (5.6738) grad_norm 1.3890 (2.5203) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][380/625] eta 0:01:04 lr 0.000305 wd 0.0500 time 0.2533 (0.2636) data time 0.0007 (0.0024) model time 0.2526 (0.2612) loss 6.0107 (5.6742) grad_norm 1.4574 (2.5027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][390/625] eta 0:01:01 lr 0.000305 wd 0.0500 time 0.2541 (0.2634) data time 0.0008 (0.0024) model time 0.2534 (0.2610) loss 5.8537 (5.6689) grad_norm 1.6332 (2.4923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][400/625] eta 0:00:59 lr 0.000305 wd 0.0500 time 0.2515 (0.2632) data time 0.0009 (0.0024) model time 0.2506 (0.2608) loss 5.6109 (5.6719) grad_norm 2.3095 (2.4794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][410/625] eta 0:00:56 lr 0.000305 wd 0.0500 time 0.2593 (0.2636) data time 0.0008 (0.0023) model time 0.2584 (0.2613) loss 5.3895 (5.6729) grad_norm 2.8857 (2.4761) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][420/625] eta 0:00:53 lr 0.000305 wd 0.0500 time 0.2589 (0.2634) data time 0.0012 (0.0023) model time 0.2578 (0.2611) loss 5.9959 (5.6764) grad_norm 1.5791 (2.4649) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][430/625] eta 0:00:51 lr 0.000305 wd 0.0500 time 0.2580 (0.2632) data time 0.0010 (0.0023) model time 0.2570 (0.2609) loss 6.7539 (5.6779) grad_norm 2.3733 (2.4728) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][440/625] eta 0:00:48 lr 0.000304 wd 0.0500 time 0.2530 (0.2630) data time 0.0006 (0.0022) model time 0.2524 (0.2607) loss 5.9441 (5.6807) grad_norm 1.4277 (2.4785) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][450/625] eta 0:00:46 lr 0.000304 wd 0.0500 time 0.2548 (0.2629) data time 0.0007 (0.0022) model time 0.2541 (0.2606) loss 5.3763 (5.6757) grad_norm 1.7842 (2.4790) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][460/625] eta 0:00:43 lr 0.000304 wd 0.0500 time 0.2549 (0.2632) data time 0.0007 (0.0022) model time 0.2542 (0.2610) loss 6.2982 (5.6777) grad_norm 2.3223 (2.4695) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][470/625] eta 0:00:40 lr 0.000304 wd 0.0500 time 0.2380 (0.2638) data time 0.0007 (0.0022) model time 0.2373 (0.2617) loss 5.9371 (5.6846) grad_norm 1.6689 (2.4564) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][480/625] eta 0:00:38 lr 0.000304 wd 0.0500 time 0.2742 (0.2641) data time 0.0011 (0.0021) model time 0.2731 (0.2620) loss 4.7310 (5.6865) grad_norm 3.0476 (2.4474) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][490/625] eta 0:00:35 lr 0.000304 wd 0.0500 time 0.2579 (0.2639) data time 0.0011 (0.0021) model time 0.2568 (0.2618) loss 6.3853 (5.6850) grad_norm 2.6565 (2.4611) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][500/625] eta 0:00:32 lr 0.000304 wd 0.0500 time 0.2570 (0.2638) data time 0.0006 (0.0021) model time 0.2565 (0.2617) loss 5.8926 (5.6837) grad_norm 1.3195 (2.4743) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][510/625] eta 0:00:30 lr 0.000304 wd 0.0500 time 0.2555 (0.2636) data time 0.0008 (0.0021) model time 0.2548 (0.2616) loss 5.0005 (5.6784) grad_norm 3.3006 (2.5019) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:23:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][520/625] eta 0:00:27 lr 0.000303 wd 0.0500 time 0.2552 (0.2635) data time 0.0009 (0.0020) model time 0.2543 (0.2615) loss 5.8971 (5.6824) grad_norm 1.5988 (2.5071) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][530/625] eta 0:00:25 lr 0.000303 wd 0.0500 time 0.2584 (0.2634) data time 0.0006 (0.0020) model time 0.2578 (0.2614) loss 6.4006 (5.6771) grad_norm 2.1672 (2.5001) loss_scale 512.0000 (256.4821) mem 9655MB [2024-08-04 08:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][540/625] eta 0:00:22 lr 0.000303 wd 0.0500 time 0.2672 (0.2633) data time 0.0010 (0.0020) model time 0.2662 (0.2613) loss 6.3277 (5.6714) grad_norm 2.0776 (2.5064) loss_scale 512.0000 (261.2052) mem 9655MB [2024-08-04 08:24:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][550/625] eta 0:00:19 lr 0.000303 wd 0.0500 time 0.2506 (0.2631) data time 0.0007 (0.0020) model time 0.2500 (0.2612) loss 6.0335 (5.6734) grad_norm 5.5118 (2.5086) loss_scale 512.0000 (265.7568) mem 9655MB [2024-08-04 08:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][560/625] eta 0:00:17 lr 0.000303 wd 0.0500 time 0.2580 (0.2633) data time 0.0006 (0.0020) model time 0.2574 (0.2614) loss 5.9803 (5.6763) grad_norm 3.0595 (2.5298) loss_scale 512.0000 (270.1462) mem 9655MB [2024-08-04 08:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][570/625] eta 0:00:14 lr 0.000303 wd 0.0500 time 0.2607 (0.2633) data time 0.0007 (0.0019) model time 0.2600 (0.2613) loss 6.3542 (5.6793) grad_norm 2.5762 (2.5411) loss_scale 512.0000 (274.3818) mem 9655MB [2024-08-04 08:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][580/625] eta 0:00:11 lr 0.000303 wd 0.0500 time 0.2565 (0.2631) data time 0.0007 (0.0019) model time 0.2558 (0.2612) loss 5.2145 (5.6811) grad_norm 2.0718 (2.5455) loss_scale 512.0000 (278.4716) mem 9655MB [2024-08-04 08:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][590/625] eta 0:00:09 lr 0.000303 wd 0.0500 time 0.2545 (0.2630) data time 0.0008 (0.0019) model time 0.2537 (0.2611) loss 6.3105 (5.6797) grad_norm 2.3960 (2.5462) loss_scale 512.0000 (282.4230) mem 9655MB [2024-08-04 08:24:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][600/625] eta 0:00:06 lr 0.000302 wd 0.0500 time 0.2538 (0.2629) data time 0.0010 (0.0019) model time 0.2528 (0.2610) loss 5.1318 (5.6795) grad_norm 1.7588 (2.5500) loss_scale 512.0000 (286.2429) mem 9655MB [2024-08-04 08:24:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][610/625] eta 0:00:03 lr 0.000302 wd 0.0500 time 0.2539 (0.2628) data time 0.0006 (0.0019) model time 0.2533 (0.2609) loss 6.5586 (5.6793) grad_norm 3.9425 (2.5617) loss_scale 512.0000 (289.9378) mem 9655MB [2024-08-04 08:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][620/625] eta 0:00:01 lr 0.000302 wd 0.0500 time 0.2536 (0.2627) data time 0.0005 (0.0019) model time 0.2531 (0.2608) loss 6.3341 (5.6789) grad_norm 1.9162 (2.5543) loss_scale 512.0000 (293.5137) mem 9655MB [2024-08-04 08:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 230 training takes 0:02:44 [2024-08-04 08:24:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:24:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5991 (0.5991) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:24:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9565 (0.7325) Acc@1 80.371 (86.239) Acc@5 96.387 (97.749) Mem 9655MB [2024-08-04 08:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0459 (0.8570) Acc@1 77.490 (83.015) Acc@5 94.971 (96.515) Mem 9655MB [2024-08-04 08:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.720 Acc@5 96.511 [2024-08-04 08:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-04 08:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.667 (0.667) Loss 0.5815 (0.5815) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.122) Loss 0.9160 (0.7114) Acc@1 80.469 (86.457) Acc@5 96.094 (97.723) Mem 9655MB [2024-08-04 08:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.090) Loss 1.0254 (0.8347) Acc@1 77.002 (83.217) Acc@5 95.312 (96.482) Mem 9655MB [2024-08-04 08:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.917 Acc@5 96.487 [2024-08-04 08:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:24:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][0/625] eta 0:10:51 lr 0.000302 wd 0.0500 time 1.0427 (1.0427) data time 0.6477 (0.6477) model time 0.0000 (0.0000) loss 6.5254 (6.5254) grad_norm 1.4133 (1.4133) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][10/625] eta 0:03:32 lr 0.000302 wd 0.0500 time 0.2556 (0.3450) data time 0.0010 (0.0597) model time 0.0000 (0.0000) loss 4.4838 (5.5712) grad_norm 1.6699 (2.3658) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][20/625] eta 0:03:09 lr 0.000302 wd 0.0500 time 0.4600 (0.3125) data time 0.0009 (0.0318) model time 0.0000 (0.0000) loss 5.1704 (5.6219) grad_norm 3.3907 (2.8359) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][30/625] eta 0:02:55 lr 0.000302 wd 0.0500 time 0.2561 (0.2942) data time 0.0007 (0.0218) model time 0.0000 (0.0000) loss 6.5448 (5.5938) grad_norm 2.6544 (2.6560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][40/625] eta 0:02:46 lr 0.000302 wd 0.0500 time 0.2548 (0.2847) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 6.1864 (5.6845) grad_norm 2.4556 (2.5237) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][50/625] eta 0:02:42 lr 0.000302 wd 0.0500 time 0.2555 (0.2830) data time 0.0009 (0.0136) model time 0.0000 (0.0000) loss 5.3767 (5.6408) grad_norm 1.5934 (2.4928) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][60/625] eta 0:02:39 lr 0.000301 wd 0.0500 time 0.2608 (0.2823) data time 0.0008 (0.0115) model time 0.2600 (0.2777) loss 5.0877 (5.6379) grad_norm 2.0448 (2.4594) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][70/625] eta 0:02:36 lr 0.000301 wd 0.0500 time 0.2545 (0.2814) data time 0.0011 (0.0100) model time 0.2534 (0.2764) loss 5.8143 (5.6621) grad_norm 2.7386 (2.4042) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][80/625] eta 0:02:32 lr 0.000301 wd 0.0500 time 0.2549 (0.2798) data time 0.0010 (0.0089) model time 0.2539 (0.2733) loss 6.6275 (5.6834) grad_norm 2.5037 (2.4871) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][90/625] eta 0:02:28 lr 0.000301 wd 0.0500 time 0.2543 (0.2772) data time 0.0011 (0.0080) model time 0.2532 (0.2688) loss 4.6127 (5.6976) grad_norm 1.7162 (2.4595) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][100/625] eta 0:02:24 lr 0.000301 wd 0.0500 time 0.2538 (0.2752) data time 0.0011 (0.0073) model time 0.2528 (0.2662) loss 6.6302 (5.6822) grad_norm 1.5294 (2.4144) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][110/625] eta 0:02:21 lr 0.000301 wd 0.0500 time 0.2516 (0.2755) data time 0.0009 (0.0068) model time 0.2507 (0.2681) loss 5.2772 (5.7190) grad_norm 1.9132 (2.4764) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][120/625] eta 0:02:18 lr 0.000301 wd 0.0500 time 0.2557 (0.2739) data time 0.0009 (0.0063) model time 0.2548 (0.2662) loss 6.4570 (5.7257) grad_norm 2.4810 (2.5004) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][130/625] eta 0:02:14 lr 0.000301 wd 0.0500 time 0.2604 (0.2726) data time 0.0008 (0.0059) model time 0.2597 (0.2649) loss 5.1932 (5.7123) grad_norm 1.7319 (2.4806) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][140/625] eta 0:02:12 lr 0.000300 wd 0.0500 time 0.2559 (0.2730) data time 0.0011 (0.0056) model time 0.2548 (0.2663) loss 6.4349 (5.7300) grad_norm 3.1373 (2.5694) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][150/625] eta 0:02:10 lr 0.000300 wd 0.0500 time 0.2525 (0.2747) data time 0.0007 (0.0053) model time 0.2519 (0.2695) loss 5.7400 (5.7479) grad_norm 1.6868 (2.5732) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][160/625] eta 0:02:07 lr 0.000300 wd 0.0500 time 0.2530 (0.2736) data time 0.0009 (0.0050) model time 0.2521 (0.2682) loss 6.4714 (5.7699) grad_norm 2.3240 (2.5500) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][170/625] eta 0:02:04 lr 0.000300 wd 0.0500 time 0.2558 (0.2726) data time 0.0010 (0.0047) model time 0.2548 (0.2672) loss 5.9879 (5.7637) grad_norm 1.4815 (2.5223) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][180/625] eta 0:02:00 lr 0.000300 wd 0.0500 time 0.2533 (0.2717) data time 0.0008 (0.0045) model time 0.2525 (0.2663) loss 5.7281 (5.7418) grad_norm 3.0030 (2.5352) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][190/625] eta 0:01:57 lr 0.000300 wd 0.0500 time 0.2553 (0.2709) data time 0.0008 (0.0043) model time 0.2544 (0.2655) loss 6.3829 (5.7401) grad_norm 2.8979 (2.5384) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][200/625] eta 0:01:54 lr 0.000300 wd 0.0500 time 0.2583 (0.2702) data time 0.0008 (0.0042) model time 0.2575 (0.2648) loss 5.6009 (5.7416) grad_norm 2.2314 (2.5790) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][210/625] eta 0:01:52 lr 0.000300 wd 0.0500 time 0.2584 (0.2700) data time 0.0008 (0.0040) model time 0.2577 (0.2649) loss 6.2845 (5.7327) grad_norm 1.4522 (2.5661) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][220/625] eta 0:01:49 lr 0.000299 wd 0.0500 time 0.2573 (0.2695) data time 0.0007 (0.0039) model time 0.2567 (0.2645) loss 5.7564 (5.7279) grad_norm 2.0835 (2.5596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][230/625] eta 0:01:46 lr 0.000299 wd 0.0500 time 0.2515 (0.2689) data time 0.0009 (0.0038) model time 0.2506 (0.2639) loss 5.9160 (5.7280) grad_norm 5.7614 (2.5836) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][240/625] eta 0:01:43 lr 0.000299 wd 0.0500 time 0.4154 (0.2699) data time 0.0009 (0.0036) model time 0.4145 (0.2654) loss 6.1072 (5.7231) grad_norm 3.1599 (2.6005) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][250/625] eta 0:01:41 lr 0.000299 wd 0.0500 time 0.2562 (0.2694) data time 0.0009 (0.0035) model time 0.2553 (0.2650) loss 5.0094 (5.7097) grad_norm 2.2313 (2.5907) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][260/625] eta 0:01:38 lr 0.000299 wd 0.0500 time 0.2608 (0.2689) data time 0.0007 (0.0034) model time 0.2601 (0.2646) loss 5.8338 (5.7037) grad_norm 3.3132 (2.6050) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][270/625] eta 0:01:35 lr 0.000299 wd 0.0500 time 0.2575 (0.2685) data time 0.0008 (0.0033) model time 0.2567 (0.2642) loss 5.3991 (5.7049) grad_norm 2.5344 (2.6009) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][280/625] eta 0:01:32 lr 0.000299 wd 0.0500 time 0.2547 (0.2680) data time 0.0011 (0.0033) model time 0.2535 (0.2638) loss 6.1078 (5.7143) grad_norm 1.5940 (2.6079) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][290/625] eta 0:01:29 lr 0.000299 wd 0.0500 time 0.2509 (0.2676) data time 0.0008 (0.0032) model time 0.2500 (0.2634) loss 5.2292 (5.7162) grad_norm 2.3174 (2.5990) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][300/625] eta 0:01:26 lr 0.000298 wd 0.0500 time 0.2613 (0.2673) data time 0.0008 (0.0031) model time 0.2605 (0.2631) loss 5.8421 (5.7241) grad_norm 2.3088 (2.5922) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][310/625] eta 0:01:24 lr 0.000298 wd 0.0500 time 0.2566 (0.2669) data time 0.0011 (0.0030) model time 0.2556 (0.2628) loss 5.2858 (5.7170) grad_norm 2.0758 (2.5896) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][320/625] eta 0:01:21 lr 0.000298 wd 0.0500 time 0.2563 (0.2665) data time 0.0012 (0.0030) model time 0.2551 (0.2625) loss 6.6168 (5.7265) grad_norm 2.7445 (2.5818) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][330/625] eta 0:01:18 lr 0.000298 wd 0.0500 time 0.2574 (0.2669) data time 0.0006 (0.0029) model time 0.2568 (0.2630) loss 5.7172 (5.7277) grad_norm 2.8182 (2.5783) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][340/625] eta 0:01:15 lr 0.000298 wd 0.0500 time 0.2552 (0.2665) data time 0.0010 (0.0028) model time 0.2542 (0.2627) loss 5.6257 (5.7403) grad_norm 2.6277 (2.5668) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][350/625] eta 0:01:13 lr 0.000298 wd 0.0500 time 0.2544 (0.2668) data time 0.0010 (0.0028) model time 0.2535 (0.2631) loss 6.6323 (5.7435) grad_norm 3.2873 (2.5653) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][360/625] eta 0:01:10 lr 0.000298 wd 0.0500 time 0.2546 (0.2665) data time 0.0008 (0.0027) model time 0.2538 (0.2628) loss 5.8990 (5.7505) grad_norm 3.5611 (2.5760) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][370/625] eta 0:01:08 lr 0.000298 wd 0.0500 time 0.2535 (0.2667) data time 0.0010 (0.0027) model time 0.2525 (0.2632) loss 5.7986 (5.7442) grad_norm 2.7561 (2.5859) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][380/625] eta 0:01:05 lr 0.000297 wd 0.0500 time 0.2536 (0.2668) data time 0.0011 (0.0026) model time 0.2526 (0.2633) loss 5.9181 (5.7432) grad_norm 1.6031 (2.5689) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][390/625] eta 0:01:02 lr 0.000297 wd 0.0500 time 0.2545 (0.2670) data time 0.0010 (0.0026) model time 0.2535 (0.2636) loss 4.7915 (5.7370) grad_norm 1.8064 (2.5549) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][400/625] eta 0:01:00 lr 0.000297 wd 0.0500 time 0.2585 (0.2667) data time 0.0009 (0.0026) model time 0.2575 (0.2634) loss 5.5746 (5.7380) grad_norm 1.6807 (2.5461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][410/625] eta 0:00:57 lr 0.000297 wd 0.0500 time 0.2535 (0.2665) data time 0.0010 (0.0025) model time 0.2525 (0.2632) loss 4.9889 (5.7417) grad_norm 2.9304 (2.5529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][420/625] eta 0:00:54 lr 0.000297 wd 0.0500 time 0.2537 (0.2668) data time 0.0009 (0.0025) model time 0.2529 (0.2636) loss 6.5322 (5.7425) grad_norm 2.5014 (2.5471) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][430/625] eta 0:00:51 lr 0.000297 wd 0.0500 time 0.2557 (0.2666) data time 0.0010 (0.0025) model time 0.2547 (0.2634) loss 5.6871 (5.7478) grad_norm 2.3154 (2.5505) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][440/625] eta 0:00:49 lr 0.000297 wd 0.0500 time 0.2564 (0.2668) data time 0.0006 (0.0024) model time 0.2558 (0.2637) loss 5.9809 (5.7482) grad_norm 1.8204 (2.5400) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][450/625] eta 0:00:46 lr 0.000297 wd 0.0500 time 0.3953 (0.2668) data time 0.0006 (0.0024) model time 0.3948 (0.2638) loss 4.7009 (5.7483) grad_norm 2.4666 (2.5483) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][460/625] eta 0:00:43 lr 0.000296 wd 0.0500 time 0.2557 (0.2665) data time 0.0008 (0.0024) model time 0.2549 (0.2635) loss 6.1400 (5.7486) grad_norm 2.0953 (2.5689) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][470/625] eta 0:00:41 lr 0.000296 wd 0.0500 time 0.2541 (0.2668) data time 0.0006 (0.0023) model time 0.2535 (0.2638) loss 5.0571 (5.7473) grad_norm 4.4416 (2.5976) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][480/625] eta 0:00:38 lr 0.000296 wd 0.0500 time 0.2585 (0.2666) data time 0.0010 (0.0023) model time 0.2575 (0.2637) loss 5.6930 (5.7462) grad_norm 2.6236 (2.5917) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][490/625] eta 0:00:35 lr 0.000296 wd 0.0500 time 0.2552 (0.2664) data time 0.0007 (0.0023) model time 0.2545 (0.2635) loss 5.0786 (5.7393) grad_norm 3.0190 (2.5939) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][500/625] eta 0:00:33 lr 0.000296 wd 0.0500 time 0.2510 (0.2661) data time 0.0007 (0.0023) model time 0.2503 (0.2633) loss 4.4794 (5.7425) grad_norm inf (inf) loss_scale 256.0000 (511.4890) mem 9655MB [2024-08-04 08:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][510/625] eta 0:00:30 lr 0.000296 wd 0.0500 time 0.2563 (0.2659) data time 0.0008 (0.0022) model time 0.2555 (0.2631) loss 5.5732 (5.7389) grad_norm 1.3781 (inf) loss_scale 256.0000 (506.4892) mem 9655MB [2024-08-04 08:26:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][520/625] eta 0:00:27 lr 0.000296 wd 0.0500 time 0.2686 (0.2658) data time 0.0006 (0.0022) model time 0.2680 (0.2629) loss 5.5013 (5.7354) grad_norm 2.6150 (inf) loss_scale 256.0000 (501.6814) mem 9655MB [2024-08-04 08:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][530/625] eta 0:00:25 lr 0.000296 wd 0.0500 time 0.2530 (0.2656) data time 0.0011 (0.0022) model time 0.2519 (0.2628) loss 6.5010 (5.7404) grad_norm 1.6109 (inf) loss_scale 256.0000 (497.0546) mem 9655MB [2024-08-04 08:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][540/625] eta 0:00:22 lr 0.000295 wd 0.0500 time 0.2539 (0.2654) data time 0.0011 (0.0022) model time 0.2528 (0.2626) loss 5.8710 (5.7352) grad_norm 2.9769 (inf) loss_scale 256.0000 (492.5989) mem 9655MB [2024-08-04 08:26:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][550/625] eta 0:00:19 lr 0.000295 wd 0.0500 time 0.2552 (0.2652) data time 0.0009 (0.0021) model time 0.2543 (0.2625) loss 6.1399 (5.7363) grad_norm 2.6704 (inf) loss_scale 256.0000 (488.3049) mem 9655MB [2024-08-04 08:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][560/625] eta 0:00:17 lr 0.000295 wd 0.0500 time 0.2598 (0.2651) data time 0.0009 (0.0021) model time 0.2589 (0.2623) loss 6.0005 (5.7372) grad_norm 3.8126 (inf) loss_scale 256.0000 (484.1640) mem 9655MB [2024-08-04 08:27:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][570/625] eta 0:00:14 lr 0.000295 wd 0.0500 time 0.2585 (0.2649) data time 0.0008 (0.0021) model time 0.2577 (0.2622) loss 4.8651 (5.7361) grad_norm 8.7226 (inf) loss_scale 256.0000 (480.1681) mem 9655MB [2024-08-04 08:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][580/625] eta 0:00:11 lr 0.000295 wd 0.0500 time 0.2532 (0.2648) data time 0.0009 (0.0021) model time 0.2523 (0.2621) loss 5.9110 (5.7372) grad_norm 1.7203 (inf) loss_scale 256.0000 (476.3098) mem 9655MB [2024-08-04 08:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][590/625] eta 0:00:09 lr 0.000295 wd 0.0500 time 0.2578 (0.2647) data time 0.0010 (0.0021) model time 0.2568 (0.2620) loss 5.0183 (5.7386) grad_norm 1.9570 (inf) loss_scale 256.0000 (472.5821) mem 9655MB [2024-08-04 08:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][600/625] eta 0:00:06 lr 0.000295 wd 0.0500 time 0.2523 (0.2645) data time 0.0008 (0.0020) model time 0.2515 (0.2618) loss 5.8591 (5.7401) grad_norm 2.4154 (inf) loss_scale 256.0000 (468.9784) mem 9655MB [2024-08-04 08:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][610/625] eta 0:00:03 lr 0.000295 wd 0.0500 time 0.2534 (0.2644) data time 0.0004 (0.0020) model time 0.2530 (0.2617) loss 6.0764 (5.7377) grad_norm 1.8820 (inf) loss_scale 256.0000 (465.4926) mem 9655MB [2024-08-04 08:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][620/625] eta 0:00:01 lr 0.000294 wd 0.0500 time 0.2529 (0.2642) data time 0.0003 (0.0020) model time 0.2525 (0.2616) loss 5.0138 (5.7347) grad_norm 1.8994 (inf) loss_scale 256.0000 (462.1192) mem 9655MB [2024-08-04 08:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 231 training takes 0:02:45 [2024-08-04 08:27:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:27:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.538 (0.538) Loss 0.5928 (0.5928) Acc@1 90.039 (90.039) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9160 (0.7166) Acc@1 80.664 (86.497) Acc@5 96.338 (97.665) Mem 9655MB [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0312 (0.8410) Acc@1 76.514 (83.215) Acc@5 94.775 (96.431) Mem 9655MB [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.921 Acc@5 96.431 [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.92% [2024-08-04 08:27:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:27:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.592 (0.592) Loss 0.5820 (0.5820) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 0.9160 (0.7112) Acc@1 80.566 (86.475) Acc@5 96.191 (97.723) Mem 9655MB [2024-08-04 08:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0254 (0.8343) Acc@1 77.100 (83.224) Acc@5 95.312 (96.489) Mem 9655MB [2024-08-04 08:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.933 Acc@5 96.493 [2024-08-04 08:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.93% [2024-08-04 08:27:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:27:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][0/625] eta 0:07:22 lr 0.000294 wd 0.0500 time 0.7080 (0.7080) data time 0.4558 (0.4558) model time 0.0000 (0.0000) loss 6.4921 (6.4921) grad_norm 1.3165 (1.3165) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][10/625] eta 0:03:13 lr 0.000294 wd 0.0500 time 0.2578 (0.3152) data time 0.0011 (0.0424) model time 0.0000 (0.0000) loss 5.1930 (5.6433) grad_norm 2.3357 (2.2509) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][20/625] eta 0:02:53 lr 0.000294 wd 0.0500 time 0.2546 (0.2868) data time 0.0009 (0.0227) model time 0.0000 (0.0000) loss 5.0849 (5.7702) grad_norm 2.0330 (2.2982) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][30/625] eta 0:02:45 lr 0.000294 wd 0.0500 time 0.2541 (0.2774) data time 0.0008 (0.0156) model time 0.0000 (0.0000) loss 6.1993 (5.6920) grad_norm 2.0579 (2.3019) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][40/625] eta 0:02:40 lr 0.000294 wd 0.0500 time 0.2569 (0.2749) data time 0.0015 (0.0121) model time 0.0000 (0.0000) loss 6.1551 (5.7038) grad_norm 1.5560 (2.5175) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][50/625] eta 0:02:36 lr 0.000294 wd 0.0500 time 0.2536 (0.2714) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 6.4157 (5.6459) grad_norm 1.5898 (2.4522) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][60/625] eta 0:02:31 lr 0.000294 wd 0.0500 time 0.2622 (0.2688) data time 0.0008 (0.0085) model time 0.2614 (0.2548) loss 5.8549 (5.6453) grad_norm 1.9748 (2.4035) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][70/625] eta 0:02:29 lr 0.000294 wd 0.0500 time 0.2589 (0.2696) data time 0.0008 (0.0074) model time 0.2581 (0.2641) loss 5.2939 (5.6550) grad_norm 2.3020 (2.5481) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][80/625] eta 0:02:26 lr 0.000293 wd 0.0500 time 0.2568 (0.2679) data time 0.0006 (0.0066) model time 0.2562 (0.2611) loss 5.8521 (5.6863) grad_norm 2.2656 (2.5927) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][90/625] eta 0:02:23 lr 0.000293 wd 0.0500 time 0.2562 (0.2689) data time 0.0009 (0.0060) model time 0.2553 (0.2647) loss 6.0213 (5.6977) grad_norm 1.9705 (2.6949) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][100/625] eta 0:02:21 lr 0.000293 wd 0.0500 time 0.2589 (0.2697) data time 0.0008 (0.0055) model time 0.2581 (0.2670) loss 4.9732 (5.7205) grad_norm 2.5994 (2.6563) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][110/625] eta 0:02:18 lr 0.000293 wd 0.0500 time 0.2523 (0.2684) data time 0.0009 (0.0051) model time 0.2514 (0.2648) loss 6.3413 (5.7304) grad_norm 4.1594 (2.6407) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][120/625] eta 0:02:15 lr 0.000293 wd 0.0500 time 0.2530 (0.2674) data time 0.0010 (0.0047) model time 0.2520 (0.2636) loss 6.0128 (5.7224) grad_norm 4.2327 (2.7658) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][130/625] eta 0:02:11 lr 0.000293 wd 0.0500 time 0.2674 (0.2667) data time 0.0009 (0.0044) model time 0.2665 (0.2627) loss 5.8221 (5.7244) grad_norm 1.6832 (2.7793) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][140/625] eta 0:02:10 lr 0.000293 wd 0.0500 time 0.4659 (0.2688) data time 0.0010 (0.0042) model time 0.4649 (0.2663) loss 5.2232 (5.7244) grad_norm 2.3752 (2.7318) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][150/625] eta 0:02:07 lr 0.000293 wd 0.0500 time 0.2568 (0.2694) data time 0.0006 (0.0040) model time 0.2562 (0.2674) loss 6.3167 (5.7440) grad_norm 1.6167 (2.7181) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][160/625] eta 0:02:04 lr 0.000292 wd 0.0500 time 0.2646 (0.2686) data time 0.0008 (0.0038) model time 0.2638 (0.2663) loss 5.7847 (5.7202) grad_norm 1.8034 (2.7519) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][170/625] eta 0:02:02 lr 0.000292 wd 0.0500 time 0.2542 (0.2702) data time 0.0006 (0.0037) model time 0.2536 (0.2688) loss 5.3216 (5.6965) grad_norm 3.1728 (2.8095) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][180/625] eta 0:01:59 lr 0.000292 wd 0.0500 time 0.2592 (0.2695) data time 0.0010 (0.0035) model time 0.2582 (0.2677) loss 4.8724 (5.6973) grad_norm 3.5606 (2.8429) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][190/625] eta 0:01:56 lr 0.000292 wd 0.0500 time 0.2572 (0.2689) data time 0.0010 (0.0034) model time 0.2563 (0.2670) loss 5.2060 (5.6908) grad_norm 2.4969 (2.8473) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][200/625] eta 0:01:54 lr 0.000292 wd 0.0500 time 0.2568 (0.2683) data time 0.0006 (0.0033) model time 0.2562 (0.2662) loss 5.4700 (5.6917) grad_norm 1.3483 (2.8228) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][210/625] eta 0:01:51 lr 0.000292 wd 0.0500 time 0.2551 (0.2676) data time 0.0008 (0.0032) model time 0.2544 (0.2654) loss 6.2483 (5.7005) grad_norm 3.0293 (2.8059) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][220/625] eta 0:01:48 lr 0.000292 wd 0.0500 time 0.2546 (0.2671) data time 0.0011 (0.0031) model time 0.2536 (0.2648) loss 4.8159 (5.6969) grad_norm 3.3306 (2.7865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][230/625] eta 0:01:45 lr 0.000292 wd 0.0500 time 0.2565 (0.2666) data time 0.0009 (0.0030) model time 0.2555 (0.2643) loss 5.6608 (5.7061) grad_norm 1.7932 (2.7708) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][240/625] eta 0:01:42 lr 0.000291 wd 0.0500 time 0.2532 (0.2669) data time 0.0009 (0.0029) model time 0.2523 (0.2646) loss 5.8319 (5.7190) grad_norm 3.4568 (2.7610) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][250/625] eta 0:01:39 lr 0.000291 wd 0.0500 time 0.2594 (0.2664) data time 0.0012 (0.0028) model time 0.2581 (0.2642) loss 4.5009 (5.7108) grad_norm 2.1362 (2.7315) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][260/625] eta 0:01:37 lr 0.000291 wd 0.0500 time 0.2567 (0.2660) data time 0.0009 (0.0027) model time 0.2558 (0.2637) loss 5.8776 (5.7148) grad_norm 27.4726 (2.8092) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][270/625] eta 0:01:34 lr 0.000291 wd 0.0500 time 0.2553 (0.2657) data time 0.0011 (0.0027) model time 0.2543 (0.2633) loss 5.4464 (5.7070) grad_norm 2.8188 (2.7963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][280/625] eta 0:01:31 lr 0.000291 wd 0.0500 time 0.2546 (0.2653) data time 0.0009 (0.0026) model time 0.2537 (0.2629) loss 4.7851 (5.7044) grad_norm 2.2394 (2.7696) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][290/625] eta 0:01:28 lr 0.000291 wd 0.0500 time 0.2531 (0.2650) data time 0.0008 (0.0026) model time 0.2523 (0.2626) loss 6.7009 (5.7222) grad_norm 2.8148 (2.7665) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][300/625] eta 0:01:26 lr 0.000291 wd 0.0500 time 0.2548 (0.2657) data time 0.0008 (0.0025) model time 0.2540 (0.2635) loss 4.6127 (5.7071) grad_norm 3.7745 (2.7670) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][310/625] eta 0:01:23 lr 0.000291 wd 0.0500 time 0.2532 (0.2654) data time 0.0008 (0.0025) model time 0.2524 (0.2632) loss 5.8811 (5.7008) grad_norm 1.9489 (2.7572) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][320/625] eta 0:01:20 lr 0.000291 wd 0.0500 time 0.2548 (0.2650) data time 0.0008 (0.0024) model time 0.2540 (0.2628) loss 6.2764 (5.7099) grad_norm 4.4574 (2.7551) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][330/625] eta 0:01:18 lr 0.000290 wd 0.0500 time 0.2596 (0.2648) data time 0.0006 (0.0024) model time 0.2591 (0.2626) loss 5.5675 (5.7193) grad_norm 1.9177 (2.7431) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][340/625] eta 0:01:15 lr 0.000290 wd 0.0500 time 0.2574 (0.2645) data time 0.0010 (0.0023) model time 0.2564 (0.2623) loss 4.4132 (5.7108) grad_norm 2.6662 (2.7374) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][350/625] eta 0:01:12 lr 0.000290 wd 0.0500 time 0.2625 (0.2643) data time 0.0008 (0.0023) model time 0.2617 (0.2621) loss 5.2028 (5.7133) grad_norm 3.4958 (2.7299) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][360/625] eta 0:01:09 lr 0.000290 wd 0.0500 time 0.2564 (0.2641) data time 0.0008 (0.0023) model time 0.2556 (0.2619) loss 6.5637 (5.7166) grad_norm 2.5363 (2.7176) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:28:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][370/625] eta 0:01:07 lr 0.000290 wd 0.0500 time 0.2522 (0.2639) data time 0.0008 (0.0022) model time 0.2514 (0.2617) loss 4.7743 (5.7043) grad_norm 3.0479 (2.7165) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][380/625] eta 0:01:04 lr 0.000290 wd 0.0500 time 0.2627 (0.2642) data time 0.0006 (0.0022) model time 0.2621 (0.2621) loss 5.8429 (5.7058) grad_norm 2.5227 (2.7042) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][390/625] eta 0:01:02 lr 0.000290 wd 0.0500 time 0.2556 (0.2640) data time 0.0010 (0.0022) model time 0.2547 (0.2619) loss 6.1703 (5.7030) grad_norm 2.5595 (2.7174) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][400/625] eta 0:00:59 lr 0.000290 wd 0.0500 time 0.2553 (0.2641) data time 0.0006 (0.0021) model time 0.2547 (0.2621) loss 4.3361 (5.7028) grad_norm 2.3857 (2.7060) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][410/625] eta 0:00:56 lr 0.000289 wd 0.0500 time 0.4658 (0.2644) data time 0.0008 (0.0021) model time 0.4650 (0.2624) loss 4.8367 (5.6971) grad_norm 3.0356 (2.7070) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][420/625] eta 0:00:54 lr 0.000289 wd 0.0500 time 0.2580 (0.2651) data time 0.0007 (0.0021) model time 0.2573 (0.2632) loss 5.0014 (5.7001) grad_norm 1.9506 (2.6957) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][430/625] eta 0:00:51 lr 0.000289 wd 0.0500 time 0.2556 (0.2648) data time 0.0008 (0.0021) model time 0.2548 (0.2630) loss 5.4038 (5.7008) grad_norm 1.8611 (2.7051) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][440/625] eta 0:00:48 lr 0.000289 wd 0.0500 time 0.2577 (0.2647) data time 0.0006 (0.0020) model time 0.2571 (0.2628) loss 5.8666 (5.7032) grad_norm 2.3199 (2.6977) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][450/625] eta 0:00:46 lr 0.000289 wd 0.0500 time 0.2545 (0.2645) data time 0.0010 (0.0020) model time 0.2535 (0.2626) loss 6.4730 (5.7071) grad_norm 1.5299 (2.6910) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][460/625] eta 0:00:43 lr 0.000289 wd 0.0500 time 0.2595 (0.2647) data time 0.0009 (0.0020) model time 0.2586 (0.2628) loss 6.3884 (5.7135) grad_norm 1.7707 (2.6792) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][470/625] eta 0:00:40 lr 0.000289 wd 0.0500 time 0.2553 (0.2645) data time 0.0008 (0.0020) model time 0.2544 (0.2627) loss 5.5058 (5.7119) grad_norm 2.1503 (2.6730) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][480/625] eta 0:00:38 lr 0.000289 wd 0.0500 time 0.2583 (0.2647) data time 0.0008 (0.0019) model time 0.2575 (0.2629) loss 5.8361 (5.7218) grad_norm 2.1316 (2.6594) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][490/625] eta 0:00:35 lr 0.000288 wd 0.0500 time 0.2542 (0.2645) data time 0.0008 (0.0019) model time 0.2534 (0.2627) loss 5.9520 (5.7206) grad_norm 1.5042 (2.6466) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][500/625] eta 0:00:33 lr 0.000288 wd 0.0500 time 0.2629 (0.2644) data time 0.0006 (0.0019) model time 0.2623 (0.2626) loss 4.6107 (5.7171) grad_norm 2.0720 (2.6342) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][510/625] eta 0:00:30 lr 0.000288 wd 0.0500 time 0.2544 (0.2642) data time 0.0011 (0.0019) model time 0.2533 (0.2624) loss 5.4106 (5.7193) grad_norm 1.9919 (2.6297) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][520/625] eta 0:00:27 lr 0.000288 wd 0.0500 time 0.2525 (0.2640) data time 0.0007 (0.0019) model time 0.2518 (0.2622) loss 4.3607 (5.7178) grad_norm 1.5721 (2.6205) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][530/625] eta 0:00:25 lr 0.000288 wd 0.0500 time 0.2523 (0.2638) data time 0.0009 (0.0018) model time 0.2514 (0.2621) loss 5.1037 (5.7163) grad_norm 11.2997 (2.6369) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][540/625] eta 0:00:22 lr 0.000288 wd 0.0500 time 0.2555 (0.2637) data time 0.0007 (0.0018) model time 0.2547 (0.2619) loss 6.2447 (5.7166) grad_norm 2.8347 (2.6507) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][550/625] eta 0:00:19 lr 0.000288 wd 0.0500 time 0.2581 (0.2635) data time 0.0009 (0.0018) model time 0.2571 (0.2618) loss 5.2738 (5.7101) grad_norm 1.8545 (2.6488) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][560/625] eta 0:00:17 lr 0.000288 wd 0.0500 time 0.2585 (0.2637) data time 0.0005 (0.0018) model time 0.2580 (0.2620) loss 6.0814 (5.7112) grad_norm 3.7353 (2.6436) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][570/625] eta 0:00:14 lr 0.000287 wd 0.0500 time 0.2589 (0.2636) data time 0.0008 (0.0018) model time 0.2581 (0.2618) loss 5.6818 (5.7164) grad_norm 1.9862 (2.6418) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][580/625] eta 0:00:11 lr 0.000287 wd 0.0500 time 0.2566 (0.2634) data time 0.0008 (0.0018) model time 0.2558 (0.2617) loss 4.9144 (5.7139) grad_norm 3.2588 (2.6354) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][590/625] eta 0:00:09 lr 0.000287 wd 0.0500 time 0.2659 (0.2634) data time 0.0009 (0.0017) model time 0.2650 (0.2616) loss 5.3210 (5.7118) grad_norm 2.6194 (2.6334) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:29:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][600/625] eta 0:00:06 lr 0.000287 wd 0.0500 time 0.2579 (0.2636) data time 0.0006 (0.0017) model time 0.2573 (0.2619) loss 4.9298 (5.7152) grad_norm 2.8174 (2.6349) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][610/625] eta 0:00:03 lr 0.000287 wd 0.0500 time 0.2510 (0.2635) data time 0.0006 (0.0017) model time 0.2504 (0.2618) loss 4.9363 (5.7170) grad_norm 3.3004 (2.6319) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][620/625] eta 0:00:01 lr 0.000287 wd 0.0500 time 0.2533 (0.2633) data time 0.0005 (0.0017) model time 0.2528 (0.2616) loss 5.7015 (5.7179) grad_norm 1.6533 (2.6285) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 232 training takes 0:02:44 [2024-08-04 08:30:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:30:05 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.632 (0.632) Loss 0.6162 (0.6162) Acc@1 89.893 (89.893) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 08:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.109) Loss 0.9307 (0.7344) Acc@1 80.518 (86.488) Acc@5 96.094 (97.585) Mem 9655MB [2024-08-04 08:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 1.0479 (0.8597) Acc@1 77.490 (83.103) Acc@5 94.824 (96.375) Mem 9655MB [2024-08-04 08:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.817 Acc@5 96.381 [2024-08-04 08:30:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-04 08:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.694 (0.694) Loss 0.5820 (0.5820) Acc@1 89.697 (89.697) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.123) Loss 0.9160 (0.7112) Acc@1 80.420 (86.488) Acc@5 96.143 (97.714) Mem 9655MB [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0234 (0.8342) Acc@1 77.197 (83.245) Acc@5 95.361 (96.484) Mem 9655MB [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.957 Acc@5 96.497 [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.96% [2024-08-04 08:30:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:30:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:30:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][0/625] eta 0:07:34 lr 0.000287 wd 0.0500 time 0.7273 (0.7273) data time 0.4885 (0.4885) model time 0.0000 (0.0000) loss 5.1198 (5.1198) grad_norm 3.1362 (3.1362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][10/625] eta 0:03:14 lr 0.000287 wd 0.0500 time 0.2560 (0.3170) data time 0.0005 (0.0452) model time 0.0000 (0.0000) loss 4.4657 (5.3103) grad_norm 1.8649 (2.2217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][20/625] eta 0:02:54 lr 0.000287 wd 0.0500 time 0.2583 (0.2880) data time 0.0006 (0.0240) model time 0.0000 (0.0000) loss 5.1434 (5.6160) grad_norm 2.0709 (2.2131) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][30/625] eta 0:02:44 lr 0.000286 wd 0.0500 time 0.2545 (0.2772) data time 0.0010 (0.0166) model time 0.0000 (0.0000) loss 5.7968 (5.6864) grad_norm 2.9742 (2.5261) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][40/625] eta 0:02:39 lr 0.000286 wd 0.0500 time 0.2544 (0.2723) data time 0.0008 (0.0128) model time 0.0000 (0.0000) loss 5.3194 (5.7452) grad_norm 1.5085 (2.4139) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][50/625] eta 0:02:36 lr 0.000286 wd 0.0500 time 0.2504 (0.2728) data time 0.0010 (0.0105) model time 0.0000 (0.0000) loss 6.6122 (5.7094) grad_norm 5.0902 (2.3729) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][60/625] eta 0:02:32 lr 0.000286 wd 0.0500 time 0.2559 (0.2701) data time 0.0007 (0.0089) model time 0.2552 (0.2553) loss 4.7312 (5.7926) grad_norm 1.3364 (2.5289) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][70/625] eta 0:02:32 lr 0.000286 wd 0.0500 time 0.2556 (0.2741) data time 0.0006 (0.0078) model time 0.2549 (0.2765) loss 4.9535 (5.8166) grad_norm 1.9523 (2.6170) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][80/625] eta 0:02:28 lr 0.000286 wd 0.0500 time 0.2537 (0.2719) data time 0.0009 (0.0070) model time 0.2528 (0.2694) loss 4.9067 (5.7850) grad_norm 2.6455 (2.6422) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][90/625] eta 0:02:24 lr 0.000286 wd 0.0500 time 0.2550 (0.2703) data time 0.0010 (0.0063) model time 0.2540 (0.2661) loss 5.2705 (5.7852) grad_norm 2.5661 (2.5857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][100/625] eta 0:02:21 lr 0.000286 wd 0.0500 time 0.2538 (0.2688) data time 0.0007 (0.0058) model time 0.2531 (0.2638) loss 5.8839 (5.7801) grad_norm 2.0265 (2.5415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][110/625] eta 0:02:18 lr 0.000285 wd 0.0500 time 0.2537 (0.2695) data time 0.0007 (0.0053) model time 0.2530 (0.2658) loss 5.7961 (5.7748) grad_norm 1.9207 (2.4932) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][120/625] eta 0:02:15 lr 0.000285 wd 0.0500 time 0.2555 (0.2683) data time 0.0007 (0.0049) model time 0.2548 (0.2642) loss 4.9195 (5.7601) grad_norm 2.4551 (2.4806) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][130/625] eta 0:02:12 lr 0.000285 wd 0.0500 time 0.2586 (0.2674) data time 0.0009 (0.0046) model time 0.2576 (0.2630) loss 6.3331 (5.7829) grad_norm 2.6230 (2.6005) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][140/625] eta 0:02:09 lr 0.000285 wd 0.0500 time 0.2592 (0.2666) data time 0.0008 (0.0044) model time 0.2584 (0.2622) loss 5.6266 (5.7866) grad_norm 2.2885 (2.5826) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][150/625] eta 0:02:06 lr 0.000285 wd 0.0500 time 0.2538 (0.2668) data time 0.0020 (0.0042) model time 0.2518 (0.2627) loss 5.3172 (5.7594) grad_norm 1.5760 (2.5479) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][160/625] eta 0:02:04 lr 0.000285 wd 0.0500 time 0.2570 (0.2672) data time 0.0007 (0.0040) model time 0.2563 (0.2637) loss 5.5458 (5.7405) grad_norm 2.0028 (2.5162) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][170/625] eta 0:02:01 lr 0.000285 wd 0.0500 time 0.2587 (0.2675) data time 0.0008 (0.0038) model time 0.2579 (0.2643) loss 5.7198 (5.7432) grad_norm 2.0181 (2.5448) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][180/625] eta 0:01:58 lr 0.000285 wd 0.0500 time 0.2542 (0.2668) data time 0.0009 (0.0036) model time 0.2533 (0.2636) loss 6.0716 (5.7448) grad_norm 1.9292 (2.5273) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][190/625] eta 0:01:55 lr 0.000285 wd 0.0500 time 0.2523 (0.2662) data time 0.0009 (0.0035) model time 0.2514 (0.2629) loss 5.3410 (5.7505) grad_norm 2.5866 (2.5467) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][200/625] eta 0:01:52 lr 0.000284 wd 0.0500 time 0.2558 (0.2657) data time 0.0007 (0.0034) model time 0.2551 (0.2623) loss 6.2208 (5.7442) grad_norm 3.1842 (2.5335) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][210/625] eta 0:01:50 lr 0.000284 wd 0.0500 time 0.2571 (0.2658) data time 0.0010 (0.0033) model time 0.2562 (0.2626) loss 6.1563 (5.7501) grad_norm 1.8496 (2.5020) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][220/625] eta 0:01:48 lr 0.000284 wd 0.0500 time 0.4483 (0.2668) data time 0.0008 (0.0032) model time 0.4475 (0.2641) loss 6.4271 (5.7431) grad_norm 1.9213 (2.4814) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][230/625] eta 0:01:45 lr 0.000284 wd 0.0500 time 0.2548 (0.2664) data time 0.0008 (0.0031) model time 0.2539 (0.2636) loss 6.6597 (5.7356) grad_norm 1.8219 (2.4847) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][240/625] eta 0:01:42 lr 0.000284 wd 0.0500 time 0.2569 (0.2659) data time 0.0012 (0.0030) model time 0.2557 (0.2631) loss 5.5597 (5.7448) grad_norm 1.4542 (2.4695) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][250/625] eta 0:01:39 lr 0.000284 wd 0.0500 time 0.2587 (0.2655) data time 0.0006 (0.0029) model time 0.2582 (0.2627) loss 5.7488 (5.7507) grad_norm 2.1459 (2.4572) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][260/625] eta 0:01:36 lr 0.000284 wd 0.0500 time 0.2568 (0.2652) data time 0.0008 (0.0028) model time 0.2560 (0.2623) loss 5.8086 (5.7609) grad_norm 1.7762 (2.4357) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][270/625] eta 0:01:34 lr 0.000284 wd 0.0500 time 0.2507 (0.2649) data time 0.0008 (0.0028) model time 0.2499 (0.2621) loss 4.6142 (5.7567) grad_norm 2.9504 (2.4447) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][280/625] eta 0:01:31 lr 0.000283 wd 0.0500 time 0.2583 (0.2646) data time 0.0009 (0.0027) model time 0.2575 (0.2618) loss 4.6352 (5.7714) grad_norm 1.9237 (2.4620) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][290/625] eta 0:01:28 lr 0.000283 wd 0.0500 time 0.2605 (0.2643) data time 0.0006 (0.0026) model time 0.2599 (0.2615) loss 5.3763 (5.7701) grad_norm 4.0403 (2.4819) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][300/625] eta 0:01:26 lr 0.000283 wd 0.0500 time 0.4676 (0.2647) data time 0.0010 (0.0026) model time 0.4665 (0.2621) loss 5.1842 (5.7614) grad_norm 2.1057 (2.5066) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][310/625] eta 0:01:23 lr 0.000283 wd 0.0500 time 0.2544 (0.2644) data time 0.0007 (0.0025) model time 0.2537 (0.2618) loss 5.2340 (5.7534) grad_norm 1.6487 (2.4944) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][320/625] eta 0:01:20 lr 0.000283 wd 0.0500 time 0.2543 (0.2642) data time 0.0009 (0.0025) model time 0.2534 (0.2616) loss 6.1566 (5.7443) grad_norm 2.4738 (2.4931) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][330/625] eta 0:01:17 lr 0.000283 wd 0.0500 time 0.2549 (0.2639) data time 0.0010 (0.0024) model time 0.2539 (0.2614) loss 6.1786 (5.7375) grad_norm 3.7801 (2.5014) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][340/625] eta 0:01:15 lr 0.000283 wd 0.0500 time 0.4431 (0.2642) data time 0.0009 (0.0024) model time 0.4422 (0.2618) loss 6.4066 (5.7412) grad_norm 2.3399 (2.5035) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][350/625] eta 0:01:12 lr 0.000283 wd 0.0500 time 0.2538 (0.2650) data time 0.0009 (0.0024) model time 0.2528 (0.2627) loss 5.0911 (5.7424) grad_norm 1.5268 (2.5068) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][360/625] eta 0:01:10 lr 0.000282 wd 0.0500 time 0.2539 (0.2647) data time 0.0008 (0.0023) model time 0.2531 (0.2624) loss 5.3032 (5.7365) grad_norm 1.8968 (2.5128) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][370/625] eta 0:01:07 lr 0.000282 wd 0.0500 time 0.2593 (0.2645) data time 0.0008 (0.0023) model time 0.2585 (0.2622) loss 6.2807 (5.7416) grad_norm 1.9693 (2.5250) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][380/625] eta 0:01:04 lr 0.000282 wd 0.0500 time 0.2576 (0.2643) data time 0.0007 (0.0022) model time 0.2568 (0.2620) loss 5.8790 (5.7427) grad_norm 2.5166 (2.5424) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][390/625] eta 0:01:02 lr 0.000282 wd 0.0500 time 0.2524 (0.2641) data time 0.0011 (0.0022) model time 0.2514 (0.2618) loss 4.7989 (5.7379) grad_norm 1.8431 (2.5325) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][400/625] eta 0:00:59 lr 0.000282 wd 0.0500 time 0.2562 (0.2639) data time 0.0010 (0.0022) model time 0.2551 (0.2616) loss 4.9248 (5.7401) grad_norm 2.7300 (2.5264) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:31:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][410/625] eta 0:00:56 lr 0.000282 wd 0.0500 time 0.2518 (0.2642) data time 0.0007 (0.0021) model time 0.2511 (0.2620) loss 5.0917 (5.7369) grad_norm 2.7433 (2.5243) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][420/625] eta 0:00:54 lr 0.000282 wd 0.0500 time 0.2574 (0.2640) data time 0.0012 (0.0021) model time 0.2562 (0.2618) loss 7.0473 (5.7389) grad_norm 1.6622 (2.5276) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][430/625] eta 0:00:51 lr 0.000282 wd 0.0500 time 0.3921 (0.2641) data time 0.0006 (0.0021) model time 0.3915 (0.2620) loss 6.3668 (5.7426) grad_norm 2.1809 (2.5124) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][440/625] eta 0:00:48 lr 0.000281 wd 0.0500 time 0.2551 (0.2639) data time 0.0010 (0.0021) model time 0.2542 (0.2618) loss 5.6139 (5.7382) grad_norm 3.5655 (2.5230) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][450/625] eta 0:00:46 lr 0.000281 wd 0.0500 time 0.2536 (0.2637) data time 0.0010 (0.0020) model time 0.2526 (0.2616) loss 5.2671 (5.7418) grad_norm 2.7184 (2.5192) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][460/625] eta 0:00:43 lr 0.000281 wd 0.0500 time 0.2565 (0.2635) data time 0.0006 (0.0020) model time 0.2559 (0.2614) loss 4.8113 (5.7399) grad_norm 4.2243 (2.5233) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][470/625] eta 0:00:40 lr 0.000281 wd 0.0500 time 0.2554 (0.2634) data time 0.0009 (0.0020) model time 0.2544 (0.2613) loss 4.4170 (5.7451) grad_norm 2.0441 (2.5361) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][480/625] eta 0:00:38 lr 0.000281 wd 0.0500 time 0.2722 (0.2633) data time 0.0008 (0.0020) model time 0.2713 (0.2612) loss 6.4338 (5.7484) grad_norm 3.1904 (2.5467) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][490/625] eta 0:00:35 lr 0.000281 wd 0.0500 time 0.2552 (0.2631) data time 0.0008 (0.0019) model time 0.2544 (0.2610) loss 5.8496 (5.7525) grad_norm 2.7924 (2.5461) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][500/625] eta 0:00:32 lr 0.000281 wd 0.0500 time 0.2554 (0.2630) data time 0.0009 (0.0019) model time 0.2544 (0.2610) loss 6.1769 (5.7592) grad_norm 1.8256 (2.5377) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][510/625] eta 0:00:30 lr 0.000281 wd 0.0500 time 0.2559 (0.2633) data time 0.0008 (0.0019) model time 0.2551 (0.2613) loss 5.0138 (5.7564) grad_norm 2.4415 (2.5597) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][520/625] eta 0:00:27 lr 0.000281 wd 0.0500 time 0.2570 (0.2634) data time 0.0008 (0.0019) model time 0.2562 (0.2614) loss 6.0324 (5.7542) grad_norm 1.7476 (2.5568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][530/625] eta 0:00:25 lr 0.000280 wd 0.0500 time 0.2555 (0.2632) data time 0.0009 (0.0019) model time 0.2546 (0.2613) loss 6.2697 (5.7570) grad_norm 8.9115 (2.5841) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][540/625] eta 0:00:22 lr 0.000280 wd 0.0500 time 0.2559 (0.2631) data time 0.0008 (0.0018) model time 0.2551 (0.2611) loss 5.4872 (5.7629) grad_norm 2.1421 (2.5840) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][550/625] eta 0:00:19 lr 0.000280 wd 0.0500 time 0.2540 (0.2630) data time 0.0008 (0.0018) model time 0.2532 (0.2610) loss 5.1661 (5.7603) grad_norm 3.8006 (2.5784) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][560/625] eta 0:00:17 lr 0.000280 wd 0.0500 time 0.2599 (0.2632) data time 0.0008 (0.0018) model time 0.2591 (0.2613) loss 5.4151 (5.7594) grad_norm 1.6543 (2.5792) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][570/625] eta 0:00:14 lr 0.000280 wd 0.0500 time 0.2587 (0.2635) data time 0.0011 (0.0018) model time 0.2576 (0.2616) loss 5.2112 (5.7568) grad_norm 2.3155 (2.5733) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][580/625] eta 0:00:11 lr 0.000280 wd 0.0500 time 0.2541 (0.2633) data time 0.0009 (0.0018) model time 0.2532 (0.2615) loss 5.8671 (5.7561) grad_norm 1.7082 (2.5665) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][590/625] eta 0:00:09 lr 0.000280 wd 0.0500 time 0.2503 (0.2632) data time 0.0009 (0.0018) model time 0.2494 (0.2614) loss 6.0934 (5.7569) grad_norm 1.7470 (2.5570) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][600/625] eta 0:00:06 lr 0.000280 wd 0.0500 time 0.2582 (0.2631) data time 0.0006 (0.0018) model time 0.2576 (0.2612) loss 6.2012 (5.7569) grad_norm 2.1173 (2.5509) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][610/625] eta 0:00:03 lr 0.000279 wd 0.0500 time 0.2516 (0.2630) data time 0.0004 (0.0018) model time 0.2512 (0.2611) loss 5.6997 (5.7584) grad_norm 2.8267 (2.5580) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][620/625] eta 0:00:01 lr 0.000279 wd 0.0500 time 0.2547 (0.2628) data time 0.0005 (0.0017) model time 0.2541 (0.2610) loss 5.9838 (5.7577) grad_norm 1.8376 (2.5691) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:32:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 233 training takes 0:02:44 [2024-08-04 08:32:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:32:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.6299 (0.6299) Acc@1 89.453 (89.453) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 08:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9268 (0.7496) Acc@1 81.494 (86.408) Acc@5 96.582 (97.718) Mem 9655MB [2024-08-04 08:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0342 (0.8664) Acc@1 77.441 (83.159) Acc@5 94.971 (96.494) Mem 9655MB [2024-08-04 08:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.867 Acc@5 96.505 [2024-08-04 08:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-04 08:32:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.767 (0.767) Loss 0.5820 (0.5820) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.128) Loss 0.9150 (0.7113) Acc@1 80.420 (86.515) Acc@5 96.191 (97.727) Mem 9655MB [2024-08-04 08:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0244 (0.8341) Acc@1 77.246 (83.271) Acc@5 95.361 (96.484) Mem 9655MB [2024-08-04 08:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.979 Acc@5 96.493 [2024-08-04 08:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.98% [2024-08-04 08:32:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:32:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][0/625] eta 0:07:17 lr 0.000279 wd 0.0500 time 0.7001 (0.7001) data time 0.4491 (0.4491) model time 0.0000 (0.0000) loss 4.8160 (4.8160) grad_norm 2.4716 (2.4716) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][10/625] eta 0:03:02 lr 0.000279 wd 0.0500 time 0.2600 (0.2967) data time 0.0009 (0.0417) model time 0.0000 (0.0000) loss 6.2909 (5.6253) grad_norm 2.5277 (2.2420) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][20/625] eta 0:02:48 lr 0.000279 wd 0.0500 time 0.2631 (0.2779) data time 0.0009 (0.0223) model time 0.0000 (0.0000) loss 5.1338 (5.5433) grad_norm 3.2019 (2.4644) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][30/625] eta 0:02:41 lr 0.000279 wd 0.0500 time 0.2503 (0.2709) data time 0.0010 (0.0154) model time 0.0000 (0.0000) loss 5.4483 (5.6002) grad_norm 3.8444 (2.7840) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][40/625] eta 0:02:36 lr 0.000279 wd 0.0500 time 0.2633 (0.2676) data time 0.0014 (0.0119) model time 0.0000 (0.0000) loss 5.9040 (5.6828) grad_norm 3.5782 (3.0375) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][50/625] eta 0:02:32 lr 0.000279 wd 0.0500 time 0.2555 (0.2654) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 4.9166 (5.6527) grad_norm 4.5041 (2.9597) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][60/625] eta 0:02:30 lr 0.000279 wd 0.0500 time 0.3966 (0.2662) data time 0.0009 (0.0083) model time 0.3958 (0.2690) loss 5.0941 (5.6594) grad_norm 2.2276 (2.8771) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][70/625] eta 0:02:28 lr 0.000278 wd 0.0500 time 0.2589 (0.2677) data time 0.0005 (0.0073) model time 0.2584 (0.2725) loss 5.3220 (5.6551) grad_norm 1.5456 (2.7693) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][80/625] eta 0:02:25 lr 0.000278 wd 0.0500 time 0.2626 (0.2662) data time 0.0005 (0.0065) model time 0.2621 (0.2665) loss 6.6281 (5.6971) grad_norm 1.9239 (2.7153) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][90/625] eta 0:02:21 lr 0.000278 wd 0.0500 time 0.2561 (0.2650) data time 0.0010 (0.0058) model time 0.2551 (0.2635) loss 6.6247 (5.6932) grad_norm 3.0701 (2.7541) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][100/625] eta 0:02:19 lr 0.000278 wd 0.0500 time 0.2607 (0.2660) data time 0.0006 (0.0054) model time 0.2601 (0.2657) loss 5.3307 (5.6880) grad_norm 1.9998 (2.7492) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][110/625] eta 0:02:18 lr 0.000278 wd 0.0500 time 0.2570 (0.2683) data time 0.0008 (0.0050) model time 0.2562 (0.2699) loss 5.4893 (5.7223) grad_norm 2.4105 (2.7456) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][120/625] eta 0:02:15 lr 0.000278 wd 0.0500 time 0.2581 (0.2674) data time 0.0008 (0.0046) model time 0.2574 (0.2680) loss 5.4653 (5.7039) grad_norm 1.8008 (2.7116) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][130/625] eta 0:02:11 lr 0.000278 wd 0.0500 time 0.2529 (0.2665) data time 0.0010 (0.0043) model time 0.2519 (0.2663) loss 5.4994 (5.7073) grad_norm 2.1373 (2.6898) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][140/625] eta 0:02:09 lr 0.000278 wd 0.0500 time 0.2566 (0.2670) data time 0.0009 (0.0041) model time 0.2557 (0.2670) loss 6.1044 (5.6946) grad_norm 1.9295 (2.6640) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][150/625] eta 0:02:07 lr 0.000277 wd 0.0500 time 0.2676 (0.2676) data time 0.0010 (0.0039) model time 0.2666 (0.2678) loss 6.2702 (5.6955) grad_norm 1.5273 (2.6379) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][160/625] eta 0:02:04 lr 0.000277 wd 0.0500 time 0.2550 (0.2681) data time 0.0011 (0.0037) model time 0.2540 (0.2684) loss 6.5307 (5.6900) grad_norm 2.9184 (2.6421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][170/625] eta 0:02:01 lr 0.000277 wd 0.0500 time 0.2546 (0.2674) data time 0.0009 (0.0035) model time 0.2537 (0.2673) loss 6.3472 (5.6973) grad_norm 2.1722 (2.6858) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][180/625] eta 0:01:59 lr 0.000277 wd 0.0500 time 0.2556 (0.2679) data time 0.0007 (0.0034) model time 0.2549 (0.2679) loss 6.6742 (5.6950) grad_norm 5.0714 (2.6791) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][190/625] eta 0:01:56 lr 0.000277 wd 0.0500 time 0.2521 (0.2672) data time 0.0009 (0.0033) model time 0.2512 (0.2670) loss 5.5261 (5.7044) grad_norm 3.0413 (2.6932) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][200/625] eta 0:01:53 lr 0.000277 wd 0.0500 time 0.2535 (0.2667) data time 0.0007 (0.0032) model time 0.2528 (0.2662) loss 6.6392 (5.7231) grad_norm 1.6970 (2.6887) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][210/625] eta 0:01:50 lr 0.000277 wd 0.0500 time 0.2571 (0.2662) data time 0.0008 (0.0031) model time 0.2563 (0.2656) loss 5.6068 (5.7339) grad_norm 3.1206 (2.6955) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][220/625] eta 0:01:47 lr 0.000277 wd 0.0500 time 0.2575 (0.2658) data time 0.0010 (0.0030) model time 0.2565 (0.2649) loss 5.5474 (5.7235) grad_norm 3.6149 (2.6913) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][230/625] eta 0:01:45 lr 0.000277 wd 0.0500 time 0.4385 (0.2661) data time 0.0008 (0.0029) model time 0.4377 (0.2653) loss 5.0261 (5.7222) grad_norm 1.6012 (2.7110) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][240/625] eta 0:01:42 lr 0.000276 wd 0.0500 time 0.2566 (0.2657) data time 0.0008 (0.0028) model time 0.2558 (0.2648) loss 5.6349 (5.7273) grad_norm 1.5751 (2.6908) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][250/625] eta 0:01:39 lr 0.000276 wd 0.0500 time 0.2576 (0.2652) data time 0.0008 (0.0027) model time 0.2568 (0.2642) loss 6.2871 (5.7226) grad_norm 1.9319 (2.6844) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][260/625] eta 0:01:36 lr 0.000276 wd 0.0500 time 0.2658 (0.2657) data time 0.0008 (0.0027) model time 0.2650 (0.2648) loss 4.3561 (5.7202) grad_norm 2.6091 (2.6697) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][270/625] eta 0:01:34 lr 0.000276 wd 0.0500 time 0.2558 (0.2653) data time 0.0009 (0.0026) model time 0.2549 (0.2644) loss 5.2957 (5.7244) grad_norm 1.7227 (2.6475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][280/625] eta 0:01:31 lr 0.000276 wd 0.0500 time 0.2559 (0.2655) data time 0.0008 (0.0025) model time 0.2551 (0.2646) loss 4.8722 (5.7185) grad_norm 2.0541 (2.6532) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][290/625] eta 0:01:28 lr 0.000276 wd 0.0500 time 0.2589 (0.2656) data time 0.0006 (0.0025) model time 0.2583 (0.2647) loss 6.2883 (5.7266) grad_norm 1.5944 (2.6454) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][300/625] eta 0:01:26 lr 0.000276 wd 0.0500 time 0.2561 (0.2654) data time 0.0007 (0.0024) model time 0.2554 (0.2644) loss 4.8977 (5.7271) grad_norm 2.5058 (2.6271) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][310/625] eta 0:01:23 lr 0.000276 wd 0.0500 time 0.2517 (0.2651) data time 0.0008 (0.0024) model time 0.2508 (0.2640) loss 5.3208 (5.7325) grad_norm 2.6450 (2.6382) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][320/625] eta 0:01:20 lr 0.000275 wd 0.0500 time 0.2558 (0.2648) data time 0.0009 (0.0023) model time 0.2549 (0.2637) loss 6.1619 (5.7379) grad_norm 3.6080 (2.6729) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][330/625] eta 0:01:18 lr 0.000275 wd 0.0500 time 0.2552 (0.2652) data time 0.0010 (0.0023) model time 0.2543 (0.2642) loss 5.8509 (5.7482) grad_norm 1.9013 (2.6993) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][340/625] eta 0:01:15 lr 0.000275 wd 0.0500 time 0.2503 (0.2649) data time 0.0008 (0.0023) model time 0.2495 (0.2639) loss 6.5909 (5.7473) grad_norm 2.6231 (2.7118) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][350/625] eta 0:01:12 lr 0.000275 wd 0.0500 time 0.2498 (0.2647) data time 0.0009 (0.0022) model time 0.2489 (0.2636) loss 5.6337 (5.7440) grad_norm 2.2865 (2.7051) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][360/625] eta 0:01:10 lr 0.000275 wd 0.0500 time 0.2540 (0.2648) data time 0.0009 (0.0022) model time 0.2531 (0.2638) loss 6.2233 (5.7377) grad_norm 2.2152 (2.6958) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][370/625] eta 0:01:07 lr 0.000275 wd 0.0500 time 0.2582 (0.2646) data time 0.0006 (0.0022) model time 0.2576 (0.2635) loss 4.6342 (5.7380) grad_norm 2.2854 (2.6918) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][380/625] eta 0:01:04 lr 0.000275 wd 0.0500 time 0.2555 (0.2643) data time 0.0008 (0.0021) model time 0.2547 (0.2632) loss 6.1371 (5.7385) grad_norm 1.5809 (2.6765) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][390/625] eta 0:01:02 lr 0.000275 wd 0.0500 time 0.2557 (0.2641) data time 0.0008 (0.0021) model time 0.2548 (0.2630) loss 5.0935 (5.7401) grad_norm 2.2705 (2.6609) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][400/625] eta 0:00:59 lr 0.000274 wd 0.0500 time 0.2543 (0.2647) data time 0.0007 (0.0021) model time 0.2536 (0.2637) loss 6.1438 (5.7428) grad_norm 1.9846 (2.6473) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][410/625] eta 0:00:56 lr 0.000274 wd 0.0500 time 0.2542 (0.2649) data time 0.0007 (0.0020) model time 0.2535 (0.2639) loss 6.6495 (5.7478) grad_norm 2.2804 (2.6324) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][420/625] eta 0:00:54 lr 0.000274 wd 0.0500 time 0.2557 (0.2647) data time 0.0008 (0.0020) model time 0.2549 (0.2637) loss 5.8525 (5.7433) grad_norm 2.4393 (2.6215) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][430/625] eta 0:00:51 lr 0.000274 wd 0.0500 time 0.2569 (0.2650) data time 0.0006 (0.0020) model time 0.2563 (0.2640) loss 5.4462 (5.7507) grad_norm 3.9875 (2.6352) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][440/625] eta 0:00:48 lr 0.000274 wd 0.0500 time 0.2566 (0.2648) data time 0.0006 (0.0020) model time 0.2560 (0.2638) loss 4.9825 (5.7484) grad_norm 6.4094 (2.6526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:34:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][450/625] eta 0:00:46 lr 0.000274 wd 0.0500 time 0.2593 (0.2646) data time 0.0007 (0.0019) model time 0.2586 (0.2636) loss 5.7992 (5.7477) grad_norm 2.5282 (2.6521) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][460/625] eta 0:00:43 lr 0.000274 wd 0.0500 time 0.2545 (0.2650) data time 0.0008 (0.0019) model time 0.2537 (0.2640) loss 4.4616 (5.7480) grad_norm 2.5056 (2.6469) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][470/625] eta 0:00:41 lr 0.000274 wd 0.0500 time 0.2581 (0.2648) data time 0.0006 (0.0019) model time 0.2575 (0.2638) loss 5.1870 (5.7480) grad_norm 3.6836 (2.7040) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][480/625] eta 0:00:38 lr 0.000274 wd 0.0500 time 0.2583 (0.2647) data time 0.0006 (0.0019) model time 0.2577 (0.2636) loss 6.4694 (5.7495) grad_norm 1.6574 (2.7016) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][490/625] eta 0:00:35 lr 0.000273 wd 0.0500 time 0.2551 (0.2645) data time 0.0007 (0.0018) model time 0.2544 (0.2635) loss 5.1367 (5.7434) grad_norm 3.5080 (2.7023) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][500/625] eta 0:00:33 lr 0.000273 wd 0.0500 time 0.2551 (0.2643) data time 0.0008 (0.0018) model time 0.2543 (0.2633) loss 5.7205 (5.7464) grad_norm 2.3022 (2.6997) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][510/625] eta 0:00:30 lr 0.000273 wd 0.0500 time 0.2537 (0.2646) data time 0.0010 (0.0018) model time 0.2527 (0.2636) loss 6.8010 (5.7510) grad_norm 1.5879 (2.6996) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][520/625] eta 0:00:27 lr 0.000273 wd 0.0500 time 0.2544 (0.2644) data time 0.0012 (0.0018) model time 0.2532 (0.2634) loss 5.6794 (5.7458) grad_norm 2.1889 (2.8059) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][530/625] eta 0:00:25 lr 0.000273 wd 0.0500 time 0.2554 (0.2643) data time 0.0008 (0.0018) model time 0.2546 (0.2633) loss 5.3480 (5.7409) grad_norm 1.8228 (2.8014) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][540/625] eta 0:00:22 lr 0.000273 wd 0.0500 time 0.2565 (0.2645) data time 0.0006 (0.0018) model time 0.2558 (0.2635) loss 5.3942 (5.7409) grad_norm 1.7687 (2.7913) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][550/625] eta 0:00:19 lr 0.000273 wd 0.0500 time 0.2529 (0.2643) data time 0.0010 (0.0017) model time 0.2519 (0.2633) loss 6.7077 (5.7436) grad_norm 1.6656 (2.7831) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][560/625] eta 0:00:17 lr 0.000273 wd 0.0500 time 0.2560 (0.2642) data time 0.0012 (0.0017) model time 0.2548 (0.2631) loss 5.1435 (5.7470) grad_norm 3.0900 (2.7753) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][570/625] eta 0:00:14 lr 0.000272 wd 0.0500 time 0.2577 (0.2641) data time 0.0011 (0.0017) model time 0.2566 (0.2630) loss 6.6168 (5.7453) grad_norm 1.4520 (2.7582) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][580/625] eta 0:00:11 lr 0.000272 wd 0.0500 time 0.2586 (0.2640) data time 0.0009 (0.0017) model time 0.2576 (0.2629) loss 5.0466 (5.7439) grad_norm 1.7429 (2.7462) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][590/625] eta 0:00:09 lr 0.000272 wd 0.0500 time 0.2532 (0.2638) data time 0.0005 (0.0017) model time 0.2527 (0.2627) loss 5.9106 (5.7429) grad_norm 1.9204 (2.7379) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][600/625] eta 0:00:06 lr 0.000272 wd 0.0500 time 0.2571 (0.2637) data time 0.0009 (0.0017) model time 0.2562 (0.2626) loss 5.2594 (5.7473) grad_norm 1.7681 (2.7449) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][610/625] eta 0:00:03 lr 0.000272 wd 0.0500 time 0.2515 (0.2636) data time 0.0006 (0.0017) model time 0.2509 (0.2625) loss 5.3956 (5.7489) grad_norm 2.8364 (2.7463) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][620/625] eta 0:00:01 lr 0.000272 wd 0.0500 time 0.2527 (0.2634) data time 0.0005 (0.0017) model time 0.2522 (0.2623) loss 5.4561 (5.7481) grad_norm 2.9877 (2.7407) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 08:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 234 training takes 0:02:44 [2024-08-04 08:35:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:35:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.6133 (0.6133) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.095) Loss 0.9395 (0.7353) Acc@1 81.152 (86.373) Acc@5 96.191 (97.736) Mem 9655MB [2024-08-04 08:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0303 (0.8508) Acc@1 77.344 (83.373) Acc@5 95.264 (96.517) Mem 9655MB [2024-08-04 08:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.071 Acc@5 96.499 [2024-08-04 08:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.07% [2024-08-04 08:35:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:35:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:35:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.501 (0.501) Loss 0.5830 (0.5830) Acc@1 89.746 (89.746) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.9155 (0.7112) Acc@1 80.469 (86.501) Acc@5 96.045 (97.701) Mem 9655MB [2024-08-04 08:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0244 (0.8339) Acc@1 77.246 (83.273) Acc@5 95.312 (96.470) Mem 9655MB [2024-08-04 08:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.983 Acc@5 96.481 [2024-08-04 08:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:35:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.98% [2024-08-04 08:35:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:35:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][0/625] eta 0:07:49 lr 0.000272 wd 0.0500 time 0.7509 (0.7509) data time 0.5079 (0.5079) model time 0.0000 (0.0000) loss 5.9744 (5.9744) grad_norm 3.5854 (3.5854) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][10/625] eta 0:03:14 lr 0.000272 wd 0.0500 time 0.4053 (0.3156) data time 0.0006 (0.0470) model time 0.0000 (0.0000) loss 6.1712 (5.9108) grad_norm 3.8107 (3.2943) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:35:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][20/625] eta 0:02:57 lr 0.000272 wd 0.0500 time 0.2580 (0.2928) data time 0.0010 (0.0251) model time 0.0000 (0.0000) loss 4.3803 (5.7536) grad_norm 3.6003 (3.2862) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][30/625] eta 0:02:50 lr 0.000271 wd 0.0500 time 0.2524 (0.2864) data time 0.0010 (0.0173) model time 0.0000 (0.0000) loss 5.4770 (5.6382) grad_norm 1.8321 (2.9571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][40/625] eta 0:02:43 lr 0.000271 wd 0.0500 time 0.2558 (0.2790) data time 0.0010 (0.0133) model time 0.0000 (0.0000) loss 6.0959 (5.6114) grad_norm 2.8965 (2.8495) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][50/625] eta 0:02:37 lr 0.000271 wd 0.0500 time 0.2535 (0.2745) data time 0.0011 (0.0109) model time 0.0000 (0.0000) loss 5.7172 (5.5744) grad_norm 2.1533 (2.7622) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][60/625] eta 0:02:33 lr 0.000271 wd 0.0500 time 0.2543 (0.2714) data time 0.0009 (0.0093) model time 0.2534 (0.2543) loss 5.3832 (5.6181) grad_norm 1.5449 (2.7030) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][70/625] eta 0:02:29 lr 0.000271 wd 0.0500 time 0.2562 (0.2691) data time 0.0008 (0.0081) model time 0.2554 (0.2543) loss 5.7086 (5.5868) grad_norm 2.0482 (2.6862) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][80/625] eta 0:02:25 lr 0.000271 wd 0.0500 time 0.2520 (0.2674) data time 0.0007 (0.0072) model time 0.2513 (0.2544) loss 6.3200 (5.6144) grad_norm 2.6374 (2.6847) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][90/625] eta 0:02:22 lr 0.000271 wd 0.0500 time 0.2529 (0.2662) data time 0.0008 (0.0065) model time 0.2521 (0.2546) loss 6.5738 (5.6570) grad_norm 3.3880 (2.7087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][100/625] eta 0:02:20 lr 0.000271 wd 0.0500 time 0.2545 (0.2671) data time 0.0008 (0.0060) model time 0.2536 (0.2585) loss 5.5865 (5.6491) grad_norm 2.3390 (2.6769) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][110/625] eta 0:02:17 lr 0.000271 wd 0.0500 time 0.2563 (0.2662) data time 0.0008 (0.0055) model time 0.2555 (0.2582) loss 4.7001 (5.6414) grad_norm 2.4417 (2.6611) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][120/625] eta 0:02:13 lr 0.000270 wd 0.0500 time 0.2558 (0.2653) data time 0.0007 (0.0051) model time 0.2551 (0.2576) loss 4.6085 (5.6447) grad_norm 2.1489 (2.6603) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][130/625] eta 0:02:10 lr 0.000270 wd 0.0500 time 0.2563 (0.2646) data time 0.0008 (0.0048) model time 0.2556 (0.2573) loss 4.8111 (5.6478) grad_norm 3.0614 (2.6430) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][140/625] eta 0:02:08 lr 0.000270 wd 0.0500 time 0.2555 (0.2639) data time 0.0009 (0.0045) model time 0.2546 (0.2570) loss 5.9562 (5.6577) grad_norm 2.3341 (2.6480) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][150/625] eta 0:02:05 lr 0.000270 wd 0.0500 time 0.2563 (0.2635) data time 0.0009 (0.0043) model time 0.2554 (0.2570) loss 5.5703 (5.6439) grad_norm 1.5957 (2.6160) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][160/625] eta 0:02:03 lr 0.000270 wd 0.0500 time 0.2581 (0.2653) data time 0.0006 (0.0041) model time 0.2575 (0.2601) loss 6.1691 (5.6585) grad_norm 1.6720 (2.5721) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][170/625] eta 0:02:00 lr 0.000270 wd 0.0500 time 0.2603 (0.2649) data time 0.0010 (0.0039) model time 0.2593 (0.2598) loss 6.3197 (5.6479) grad_norm 2.2926 (2.5460) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][180/625] eta 0:01:57 lr 0.000270 wd 0.0500 time 0.2543 (0.2645) data time 0.0008 (0.0038) model time 0.2534 (0.2596) loss 6.7658 (5.6504) grad_norm 1.8845 (2.5545) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][190/625] eta 0:01:54 lr 0.000270 wd 0.0500 time 0.2570 (0.2640) data time 0.0007 (0.0036) model time 0.2564 (0.2592) loss 6.2743 (5.6547) grad_norm 1.6007 (2.5550) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][200/625] eta 0:01:51 lr 0.000269 wd 0.0500 time 0.2529 (0.2635) data time 0.0009 (0.0035) model time 0.2520 (0.2588) loss 5.1330 (5.6587) grad_norm 2.8201 (2.5481) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][210/625] eta 0:01:49 lr 0.000269 wd 0.0500 time 0.2554 (0.2638) data time 0.0008 (0.0034) model time 0.2546 (0.2594) loss 6.9447 (5.6695) grad_norm 1.6089 (2.5631) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][220/625] eta 0:01:46 lr 0.000269 wd 0.0500 time 0.2545 (0.2634) data time 0.0011 (0.0032) model time 0.2533 (0.2591) loss 6.3873 (5.6682) grad_norm 1.6481 (2.5403) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][230/625] eta 0:01:44 lr 0.000269 wd 0.0500 time 0.2550 (0.2645) data time 0.0007 (0.0031) model time 0.2544 (0.2607) loss 5.2058 (5.6706) grad_norm 4.0954 (2.5297) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][240/625] eta 0:01:41 lr 0.000269 wd 0.0500 time 0.2569 (0.2649) data time 0.0010 (0.0031) model time 0.2559 (0.2613) loss 6.5182 (5.6715) grad_norm 1.7338 (2.5222) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][250/625] eta 0:01:39 lr 0.000269 wd 0.0500 time 0.2558 (0.2646) data time 0.0006 (0.0030) model time 0.2551 (0.2611) loss 5.9478 (5.6736) grad_norm 2.6056 (2.5203) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][260/625] eta 0:01:36 lr 0.000269 wd 0.0500 time 0.4521 (0.2651) data time 0.0007 (0.0029) model time 0.4514 (0.2618) loss 6.3356 (5.6779) grad_norm 2.9987 (2.5289) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][270/625] eta 0:01:34 lr 0.000269 wd 0.0500 time 0.2527 (0.2655) data time 0.0009 (0.0028) model time 0.2518 (0.2624) loss 5.9090 (5.6780) grad_norm 2.8646 (2.5459) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][280/625] eta 0:01:31 lr 0.000269 wd 0.0500 time 0.2568 (0.2652) data time 0.0009 (0.0028) model time 0.2559 (0.2622) loss 5.5753 (5.6747) grad_norm 3.0947 (2.5402) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][290/625] eta 0:01:28 lr 0.000268 wd 0.0500 time 0.2533 (0.2650) data time 0.0007 (0.0027) model time 0.2527 (0.2620) loss 5.1205 (5.6813) grad_norm 2.3177 (2.5255) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][300/625] eta 0:01:26 lr 0.000268 wd 0.0500 time 0.2545 (0.2652) data time 0.0010 (0.0026) model time 0.2535 (0.2623) loss 6.4022 (5.6861) grad_norm 2.5269 (2.5245) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][310/625] eta 0:01:23 lr 0.000268 wd 0.0500 time 0.2612 (0.2649) data time 0.0009 (0.0026) model time 0.2602 (0.2620) loss 5.1575 (5.6795) grad_norm 2.9219 (2.5356) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][320/625] eta 0:01:20 lr 0.000268 wd 0.0500 time 0.2532 (0.2646) data time 0.0007 (0.0025) model time 0.2525 (0.2617) loss 6.1924 (5.6827) grad_norm 2.0352 (2.5473) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][330/625] eta 0:01:17 lr 0.000268 wd 0.0500 time 0.2547 (0.2643) data time 0.0007 (0.0025) model time 0.2539 (0.2615) loss 5.1855 (5.6896) grad_norm 2.1530 (2.5527) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][340/625] eta 0:01:15 lr 0.000268 wd 0.0500 time 0.2586 (0.2640) data time 0.0007 (0.0024) model time 0.2580 (0.2613) loss 5.8491 (5.6790) grad_norm 2.0622 (2.5510) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][350/625] eta 0:01:12 lr 0.000268 wd 0.0500 time 0.2580 (0.2638) data time 0.0010 (0.0024) model time 0.2570 (0.2611) loss 4.4725 (5.6772) grad_norm 2.6524 (2.5682) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][360/625] eta 0:01:10 lr 0.000268 wd 0.0500 time 0.2559 (0.2642) data time 0.0008 (0.0023) model time 0.2551 (0.2615) loss 6.4861 (5.6816) grad_norm 2.8219 (2.5935) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][370/625] eta 0:01:07 lr 0.000267 wd 0.0500 time 0.2576 (0.2645) data time 0.0008 (0.0023) model time 0.2568 (0.2619) loss 5.3869 (5.6853) grad_norm 1.6380 (2.5895) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][380/625] eta 0:01:04 lr 0.000267 wd 0.0500 time 0.3744 (0.2646) data time 0.0008 (0.0023) model time 0.3736 (0.2621) loss 5.7910 (5.6887) grad_norm 2.0595 (2.5736) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][390/625] eta 0:01:02 lr 0.000267 wd 0.0500 time 0.2501 (0.2644) data time 0.0007 (0.0022) model time 0.2494 (0.2619) loss 4.7545 (5.6876) grad_norm 3.2331 (2.5721) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][400/625] eta 0:00:59 lr 0.000267 wd 0.0500 time 0.2601 (0.2646) data time 0.0009 (0.0022) model time 0.2592 (0.2622) loss 4.9462 (5.6899) grad_norm 2.3435 (2.5786) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][410/625] eta 0:00:56 lr 0.000267 wd 0.0500 time 0.2600 (0.2647) data time 0.0009 (0.0022) model time 0.2591 (0.2624) loss 5.2590 (5.6854) grad_norm 36.4385 (2.6585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][420/625] eta 0:00:54 lr 0.000267 wd 0.0500 time 0.2574 (0.2649) data time 0.0007 (0.0021) model time 0.2567 (0.2626) loss 6.0360 (5.6750) grad_norm 1.8900 (2.6522) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][430/625] eta 0:00:51 lr 0.000267 wd 0.0500 time 0.2530 (0.2647) data time 0.0009 (0.0021) model time 0.2522 (0.2624) loss 6.2549 (5.6775) grad_norm 1.4752 (2.6358) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][440/625] eta 0:00:48 lr 0.000267 wd 0.0500 time 0.2569 (0.2645) data time 0.0007 (0.0021) model time 0.2562 (0.2623) loss 5.1299 (5.6731) grad_norm 3.7362 (2.6374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][450/625] eta 0:00:46 lr 0.000267 wd 0.0500 time 0.2555 (0.2647) data time 0.0009 (0.0021) model time 0.2546 (0.2625) loss 6.2354 (5.6708) grad_norm 2.3847 (2.6558) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][460/625] eta 0:00:43 lr 0.000266 wd 0.0500 time 0.2553 (0.2645) data time 0.0008 (0.0020) model time 0.2545 (0.2623) loss 5.7490 (5.6735) grad_norm 1.6971 (2.6466) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][470/625] eta 0:00:40 lr 0.000266 wd 0.0500 time 0.2532 (0.2643) data time 0.0011 (0.0020) model time 0.2521 (0.2621) loss 5.7756 (5.6780) grad_norm 2.8352 (2.6327) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][480/625] eta 0:00:38 lr 0.000266 wd 0.0500 time 0.2590 (0.2641) data time 0.0008 (0.0020) model time 0.2582 (0.2620) loss 5.0914 (5.6731) grad_norm 1.7859 (2.6228) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][490/625] eta 0:00:35 lr 0.000266 wd 0.0500 time 0.2573 (0.2640) data time 0.0007 (0.0020) model time 0.2566 (0.2618) loss 5.9447 (5.6809) grad_norm 1.9694 (2.6167) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][500/625] eta 0:00:33 lr 0.000266 wd 0.0500 time 0.2573 (0.2646) data time 0.0008 (0.0020) model time 0.2565 (0.2626) loss 5.5837 (5.6756) grad_norm 3.0592 (2.6111) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][510/625] eta 0:00:30 lr 0.000266 wd 0.0500 time 0.2554 (0.2649) data time 0.0009 (0.0019) model time 0.2545 (0.2629) loss 6.7020 (5.6857) grad_norm 2.5044 (2.6467) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][520/625] eta 0:00:27 lr 0.000266 wd 0.0500 time 0.2579 (0.2647) data time 0.0010 (0.0019) model time 0.2569 (0.2627) loss 4.9814 (5.6882) grad_norm 3.0095 (2.6814) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][530/625] eta 0:00:25 lr 0.000266 wd 0.0500 time 0.2593 (0.2646) data time 0.0008 (0.0019) model time 0.2585 (0.2626) loss 6.1218 (5.6913) grad_norm 2.0705 (2.6748) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][540/625] eta 0:00:22 lr 0.000265 wd 0.0500 time 0.2590 (0.2645) data time 0.0007 (0.0019) model time 0.2582 (0.2625) loss 4.4403 (5.6884) grad_norm 2.2569 (2.6625) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][550/625] eta 0:00:19 lr 0.000265 wd 0.0500 time 0.2505 (0.2647) data time 0.0010 (0.0019) model time 0.2495 (0.2627) loss 4.9786 (5.6869) grad_norm 3.9281 (2.6666) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][560/625] eta 0:00:17 lr 0.000265 wd 0.0500 time 0.2633 (0.2646) data time 0.0008 (0.0019) model time 0.2625 (0.2626) loss 4.2510 (5.6820) grad_norm 2.4797 (2.6744) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][570/625] eta 0:00:14 lr 0.000265 wd 0.0500 time 0.2528 (0.2648) data time 0.0007 (0.0018) model time 0.2520 (0.2629) loss 6.8498 (5.6832) grad_norm 4.3125 (2.6725) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][580/625] eta 0:00:11 lr 0.000265 wd 0.0500 time 0.2569 (0.2647) data time 0.0008 (0.0018) model time 0.2561 (0.2628) loss 6.2393 (5.6771) grad_norm 1.9384 (2.6658) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][590/625] eta 0:00:09 lr 0.000265 wd 0.0500 time 0.2581 (0.2646) data time 0.0006 (0.0018) model time 0.2575 (0.2627) loss 5.5649 (5.6787) grad_norm 3.4765 (2.6937) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][600/625] eta 0:00:06 lr 0.000265 wd 0.0500 time 0.2555 (0.2644) data time 0.0007 (0.0018) model time 0.2548 (0.2625) loss 5.8422 (5.6797) grad_norm 2.3909 (2.6871) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][610/625] eta 0:00:03 lr 0.000265 wd 0.0500 time 0.2527 (0.2643) data time 0.0006 (0.0018) model time 0.2521 (0.2624) loss 6.1433 (5.6798) grad_norm 2.1063 (2.6849) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][620/625] eta 0:00:01 lr 0.000265 wd 0.0500 time 0.2548 (0.2641) data time 0.0003 (0.0018) model time 0.2545 (0.2622) loss 5.6545 (5.6791) grad_norm 2.2533 (2.6858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 235 training takes 0:02:45 [2024-08-04 08:38:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:38:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:38:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.533 (0.533) Loss 0.6157 (0.6157) Acc@1 90.039 (90.039) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9502 (0.7375) Acc@1 80.371 (86.461) Acc@5 95.996 (97.665) Mem 9655MB [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0264 (0.8587) Acc@1 78.711 (83.403) Acc@5 95.215 (96.482) Mem 9655MB [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.073 Acc@5 96.457 [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.07% [2024-08-04 08:38:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:38:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.552 (0.552) Loss 0.5825 (0.5825) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:38:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9150 (0.7112) Acc@1 80.420 (86.537) Acc@5 96.045 (97.714) Mem 9655MB [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0234 (0.8337) Acc@1 77.148 (83.310) Acc@5 95.312 (96.484) Mem 9655MB [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.019 Acc@5 96.491 [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.02% [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:38:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][0/625] eta 0:07:52 lr 0.000264 wd 0.0500 time 0.7561 (0.7561) data time 0.5143 (0.5143) model time 0.0000 (0.0000) loss 5.5433 (5.5433) grad_norm 4.7299 (4.7299) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][10/625] eta 0:03:05 lr 0.000264 wd 0.0500 time 0.2610 (0.3011) data time 0.0005 (0.0476) model time 0.0000 (0.0000) loss 6.6012 (5.5925) grad_norm 4.3658 (3.1061) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][20/625] eta 0:02:48 lr 0.000264 wd 0.0500 time 0.2520 (0.2793) data time 0.0009 (0.0254) model time 0.0000 (0.0000) loss 6.5398 (5.6119) grad_norm 2.7401 (3.3560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][30/625] eta 0:02:41 lr 0.000264 wd 0.0500 time 0.2519 (0.2717) data time 0.0007 (0.0175) model time 0.0000 (0.0000) loss 5.6436 (5.6790) grad_norm 1.6268 (3.2656) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][40/625] eta 0:02:36 lr 0.000264 wd 0.0500 time 0.2581 (0.2680) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 5.7937 (5.6643) grad_norm 9.2077 (3.3125) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][50/625] eta 0:02:32 lr 0.000264 wd 0.0500 time 0.2608 (0.2658) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 6.8140 (5.6701) grad_norm 1.8560 (3.1364) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][60/625] eta 0:02:29 lr 0.000264 wd 0.0500 time 0.2539 (0.2642) data time 0.0006 (0.0093) model time 0.2533 (0.2555) loss 5.5003 (5.6384) grad_norm 2.0291 (3.0300) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][70/625] eta 0:02:25 lr 0.000264 wd 0.0500 time 0.2550 (0.2630) data time 0.0009 (0.0081) model time 0.2541 (0.2553) loss 5.6150 (5.6537) grad_norm 1.8560 (2.8895) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][80/625] eta 0:02:22 lr 0.000264 wd 0.0500 time 0.2579 (0.2622) data time 0.0009 (0.0073) model time 0.2570 (0.2551) loss 6.0115 (5.6407) grad_norm 1.9620 (2.7612) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][90/625] eta 0:02:19 lr 0.000263 wd 0.0500 time 0.2558 (0.2615) data time 0.0009 (0.0066) model time 0.2549 (0.2552) loss 5.6209 (5.6149) grad_norm 1.8670 (2.6710) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][100/625] eta 0:02:18 lr 0.000263 wd 0.0500 time 0.2561 (0.2644) data time 0.0006 (0.0060) model time 0.2555 (0.2620) loss 6.4478 (5.6140) grad_norm 2.5695 (2.6606) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][110/625] eta 0:02:16 lr 0.000263 wd 0.0500 time 0.2560 (0.2655) data time 0.0008 (0.0056) model time 0.2551 (0.2643) loss 4.9111 (5.6145) grad_norm 1.6450 (2.6592) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][120/625] eta 0:02:14 lr 0.000263 wd 0.0500 time 0.2585 (0.2663) data time 0.0009 (0.0052) model time 0.2576 (0.2657) loss 4.9057 (5.6027) grad_norm 1.4586 (2.6855) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][130/625] eta 0:02:11 lr 0.000263 wd 0.0500 time 0.2544 (0.2655) data time 0.0010 (0.0049) model time 0.2534 (0.2643) loss 5.3650 (5.6009) grad_norm 2.2664 (2.6529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][140/625] eta 0:02:08 lr 0.000263 wd 0.0500 time 0.2542 (0.2648) data time 0.0007 (0.0046) model time 0.2535 (0.2633) loss 6.4531 (5.6305) grad_norm 2.5347 (2.6528) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][150/625] eta 0:02:06 lr 0.000263 wd 0.0500 time 0.2548 (0.2655) data time 0.0020 (0.0043) model time 0.2527 (0.2643) loss 6.5530 (5.6533) grad_norm 1.8323 (2.6673) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][160/625] eta 0:02:03 lr 0.000263 wd 0.0500 time 0.2535 (0.2648) data time 0.0010 (0.0041) model time 0.2525 (0.2634) loss 4.8423 (5.6668) grad_norm 2.6097 (2.6562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][170/625] eta 0:02:00 lr 0.000262 wd 0.0500 time 0.2505 (0.2643) data time 0.0007 (0.0040) model time 0.2499 (0.2627) loss 6.0523 (5.6698) grad_norm 2.0058 (2.6260) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][180/625] eta 0:01:57 lr 0.000262 wd 0.0500 time 0.2650 (0.2639) data time 0.0006 (0.0038) model time 0.2644 (0.2623) loss 6.1118 (5.6759) grad_norm 3.1983 (2.6058) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][190/625] eta 0:01:54 lr 0.000262 wd 0.0500 time 0.2569 (0.2635) data time 0.0009 (0.0036) model time 0.2560 (0.2617) loss 5.5360 (5.6746) grad_norm 2.0810 (2.5994) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][200/625] eta 0:01:51 lr 0.000262 wd 0.0500 time 0.2575 (0.2631) data time 0.0008 (0.0035) model time 0.2568 (0.2612) loss 5.9362 (5.6711) grad_norm 1.8826 (2.6113) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][210/625] eta 0:01:49 lr 0.000262 wd 0.0500 time 0.2543 (0.2636) data time 0.0011 (0.0034) model time 0.2531 (0.2620) loss 6.0142 (5.6929) grad_norm 1.6857 (2.6285) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][220/625] eta 0:01:46 lr 0.000262 wd 0.0500 time 0.4313 (0.2641) data time 0.0008 (0.0033) model time 0.4304 (0.2626) loss 5.7853 (5.6979) grad_norm 2.0933 (2.6701) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][230/625] eta 0:01:44 lr 0.000262 wd 0.0500 time 0.2560 (0.2647) data time 0.0008 (0.0032) model time 0.2552 (0.2635) loss 5.3161 (5.6907) grad_norm 2.4928 (2.6460) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][240/625] eta 0:01:42 lr 0.000262 wd 0.0500 time 0.2575 (0.2653) data time 0.0007 (0.0031) model time 0.2567 (0.2642) loss 6.3284 (5.6934) grad_norm 2.6246 (2.6282) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][250/625] eta 0:01:39 lr 0.000262 wd 0.0500 time 0.2529 (0.2649) data time 0.0011 (0.0030) model time 0.2518 (0.2637) loss 5.0637 (5.6778) grad_norm 2.3818 (2.6149) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][260/625] eta 0:01:36 lr 0.000261 wd 0.0500 time 0.2541 (0.2646) data time 0.0008 (0.0029) model time 0.2533 (0.2633) loss 5.9374 (5.6788) grad_norm 2.8323 (2.6023) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][270/625] eta 0:01:33 lr 0.000261 wd 0.0500 time 0.2541 (0.2644) data time 0.0010 (0.0028) model time 0.2531 (0.2631) loss 5.2760 (5.6841) grad_norm 1.6470 (2.5939) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][280/625] eta 0:01:31 lr 0.000261 wd 0.0500 time 0.2549 (0.2641) data time 0.0007 (0.0028) model time 0.2541 (0.2628) loss 5.3064 (5.6845) grad_norm 3.3356 (2.5939) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][290/625] eta 0:01:28 lr 0.000261 wd 0.0500 time 0.2583 (0.2643) data time 0.0011 (0.0027) model time 0.2572 (0.2631) loss 7.2621 (5.6875) grad_norm 2.5406 (2.6144) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:39:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][300/625] eta 0:01:25 lr 0.000261 wd 0.0500 time 0.2535 (0.2645) data time 0.0008 (0.0026) model time 0.2527 (0.2633) loss 6.3817 (5.6856) grad_norm 1.6701 (2.6410) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][310/625] eta 0:01:23 lr 0.000261 wd 0.0500 time 0.2598 (0.2642) data time 0.0006 (0.0026) model time 0.2592 (0.2630) loss 5.9352 (5.6963) grad_norm 1.6462 (2.6308) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][320/625] eta 0:01:20 lr 0.000261 wd 0.0500 time 0.2540 (0.2640) data time 0.0007 (0.0025) model time 0.2533 (0.2627) loss 4.4230 (5.6942) grad_norm 1.9420 (2.6253) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][330/625] eta 0:01:17 lr 0.000261 wd 0.0500 time 0.4611 (0.2644) data time 0.0009 (0.0025) model time 0.4602 (0.2632) loss 5.8592 (5.6864) grad_norm 7.5392 (2.6325) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][340/625] eta 0:01:15 lr 0.000260 wd 0.0500 time 0.2505 (0.2646) data time 0.0008 (0.0025) model time 0.2497 (0.2635) loss 6.3260 (5.6923) grad_norm 1.7574 (2.6319) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][350/625] eta 0:01:12 lr 0.000260 wd 0.0500 time 0.2577 (0.2644) data time 0.0008 (0.0024) model time 0.2569 (0.2632) loss 5.7948 (5.6907) grad_norm 1.9265 (2.6180) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][360/625] eta 0:01:09 lr 0.000260 wd 0.0500 time 0.2589 (0.2641) data time 0.0006 (0.0024) model time 0.2583 (0.2629) loss 4.5481 (5.6847) grad_norm 2.3072 (2.6342) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][370/625] eta 0:01:07 lr 0.000260 wd 0.0500 time 0.2575 (0.2639) data time 0.0009 (0.0023) model time 0.2566 (0.2627) loss 6.0568 (5.6898) grad_norm 1.9684 (2.6448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][380/625] eta 0:01:04 lr 0.000260 wd 0.0500 time 0.2592 (0.2637) data time 0.0006 (0.0023) model time 0.2586 (0.2624) loss 4.7262 (5.6897) grad_norm 3.2944 (2.6571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][390/625] eta 0:01:01 lr 0.000260 wd 0.0500 time 0.2513 (0.2635) data time 0.0008 (0.0023) model time 0.2505 (0.2622) loss 6.1765 (5.6905) grad_norm 3.0523 (2.6811) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][400/625] eta 0:00:59 lr 0.000260 wd 0.0500 time 0.2602 (0.2636) data time 0.0006 (0.0022) model time 0.2596 (0.2624) loss 6.1770 (5.6993) grad_norm 1.8603 (2.6786) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][410/625] eta 0:00:56 lr 0.000260 wd 0.0500 time 0.2646 (0.2635) data time 0.0008 (0.0022) model time 0.2638 (0.2622) loss 5.0457 (5.6917) grad_norm 3.8513 (2.7172) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][420/625] eta 0:00:53 lr 0.000260 wd 0.0500 time 0.2547 (0.2633) data time 0.0006 (0.0022) model time 0.2541 (0.2620) loss 4.5834 (5.6865) grad_norm 3.7778 (2.7431) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][430/625] eta 0:00:51 lr 0.000259 wd 0.0500 time 0.2539 (0.2631) data time 0.0011 (0.0021) model time 0.2528 (0.2618) loss 5.2995 (5.6809) grad_norm 2.1563 (2.7310) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][440/625] eta 0:00:48 lr 0.000259 wd 0.0500 time 0.2575 (0.2630) data time 0.0007 (0.0021) model time 0.2568 (0.2617) loss 5.9109 (5.6832) grad_norm 2.0688 (2.7248) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][450/625] eta 0:00:45 lr 0.000259 wd 0.0500 time 0.2493 (0.2628) data time 0.0007 (0.0021) model time 0.2486 (0.2615) loss 4.4959 (5.6800) grad_norm 2.5771 (2.7136) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][460/625] eta 0:00:43 lr 0.000259 wd 0.0500 time 0.2555 (0.2631) data time 0.0006 (0.0021) model time 0.2549 (0.2618) loss 5.9528 (5.6838) grad_norm 2.0418 (2.7077) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][470/625] eta 0:00:40 lr 0.000259 wd 0.0500 time 0.2566 (0.2634) data time 0.0010 (0.0020) model time 0.2556 (0.2621) loss 6.4144 (5.6927) grad_norm 2.0699 (2.6982) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][480/625] eta 0:00:38 lr 0.000259 wd 0.0500 time 0.2552 (0.2632) data time 0.0019 (0.0020) model time 0.2534 (0.2619) loss 5.5469 (5.6986) grad_norm 2.4579 (2.6903) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][490/625] eta 0:00:35 lr 0.000259 wd 0.0500 time 0.2474 (0.2630) data time 0.0009 (0.0020) model time 0.2466 (0.2618) loss 6.7569 (5.7054) grad_norm 4.0723 (2.6953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][500/625] eta 0:00:32 lr 0.000259 wd 0.0500 time 0.2575 (0.2629) data time 0.0007 (0.0020) model time 0.2568 (0.2616) loss 5.7422 (5.7057) grad_norm 2.4396 (2.6849) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][510/625] eta 0:00:30 lr 0.000259 wd 0.0500 time 0.2544 (0.2631) data time 0.0007 (0.0019) model time 0.2537 (0.2619) loss 6.6489 (5.7084) grad_norm 2.9653 (2.6843) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][520/625] eta 0:00:27 lr 0.000258 wd 0.0500 time 0.2577 (0.2634) data time 0.0009 (0.0019) model time 0.2568 (0.2622) loss 5.6138 (5.7062) grad_norm 1.8995 (2.6873) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:40:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][530/625] eta 0:00:25 lr 0.000258 wd 0.0500 time 0.2549 (0.2636) data time 0.0009 (0.0019) model time 0.2540 (0.2624) loss 5.6358 (5.7084) grad_norm 1.8450 (2.6901) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][540/625] eta 0:00:22 lr 0.000258 wd 0.0500 time 0.2538 (0.2637) data time 0.0009 (0.0019) model time 0.2529 (0.2625) loss 5.1797 (5.7046) grad_norm 2.8938 (2.6868) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][550/625] eta 0:00:19 lr 0.000258 wd 0.0500 time 0.2533 (0.2636) data time 0.0010 (0.0019) model time 0.2523 (0.2624) loss 5.3280 (5.7052) grad_norm 4.4890 (2.6992) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][560/625] eta 0:00:17 lr 0.000258 wd 0.0500 time 0.4352 (0.2637) data time 0.0009 (0.0019) model time 0.4343 (0.2626) loss 6.3143 (5.7067) grad_norm 2.3052 (2.7004) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][570/625] eta 0:00:14 lr 0.000258 wd 0.0500 time 0.2551 (0.2636) data time 0.0007 (0.0018) model time 0.2544 (0.2624) loss 6.0625 (5.7056) grad_norm 2.8278 (2.6960) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][580/625] eta 0:00:11 lr 0.000258 wd 0.0500 time 0.2647 (0.2635) data time 0.0008 (0.0018) model time 0.2639 (0.2623) loss 6.4876 (5.7049) grad_norm 2.0802 (2.6886) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][590/625] eta 0:00:09 lr 0.000258 wd 0.0500 time 0.2515 (0.2633) data time 0.0007 (0.0018) model time 0.2508 (0.2622) loss 4.6616 (5.6993) grad_norm 1.8588 (2.6902) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][600/625] eta 0:00:06 lr 0.000257 wd 0.0500 time 0.2573 (0.2636) data time 0.0008 (0.0018) model time 0.2566 (0.2624) loss 5.4680 (5.6999) grad_norm 2.5437 (2.6920) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][610/625] eta 0:00:03 lr 0.000257 wd 0.0500 time 0.2539 (0.2635) data time 0.0006 (0.0018) model time 0.2533 (0.2623) loss 5.8347 (5.6993) grad_norm 3.0545 (2.6979) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][620/625] eta 0:00:01 lr 0.000257 wd 0.0500 time 0.2535 (0.2633) data time 0.0003 (0.0018) model time 0.2531 (0.2621) loss 4.6539 (5.6964) grad_norm 3.4477 (2.7098) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 236 training takes 0:02:44 [2024-08-04 08:41:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:41:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:41:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.491 (0.491) Loss 0.5928 (0.5928) Acc@1 90.088 (90.088) Acc@5 98.535 (98.535) Mem 9655MB [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9243 (0.7200) Acc@1 80.176 (86.497) Acc@5 96.094 (97.701) Mem 9655MB [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0283 (0.8389) Acc@1 76.270 (83.382) Acc@5 95.020 (96.503) Mem 9655MB [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.093 Acc@5 96.511 [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.09% [2024-08-04 08:41:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:41:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.5825 (0.5825) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:41:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9146 (0.7109) Acc@1 80.371 (86.523) Acc@5 96.094 (97.723) Mem 9655MB [2024-08-04 08:41:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0234 (0.8333) Acc@1 77.100 (83.305) Acc@5 95.264 (96.482) Mem 9655MB [2024-08-04 08:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.013 Acc@5 96.491 [2024-08-04 08:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:41:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][0/625] eta 0:10:55 lr 0.000257 wd 0.0500 time 1.0486 (1.0486) data time 0.5865 (0.5865) model time 0.0000 (0.0000) loss 5.9459 (5.9459) grad_norm 2.0995 (2.0995) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][10/625] eta 0:03:29 lr 0.000257 wd 0.0500 time 0.2536 (0.3403) data time 0.0007 (0.0541) model time 0.0000 (0.0000) loss 6.2258 (5.7864) grad_norm 2.5315 (2.3871) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][20/625] eta 0:03:01 lr 0.000257 wd 0.0500 time 0.2545 (0.2996) data time 0.0008 (0.0288) model time 0.0000 (0.0000) loss 5.4763 (5.6444) grad_norm 2.6310 (2.2296) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][30/625] eta 0:02:49 lr 0.000257 wd 0.0500 time 0.2577 (0.2856) data time 0.0009 (0.0198) model time 0.0000 (0.0000) loss 6.1466 (5.5212) grad_norm 3.2239 (2.4136) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][40/625] eta 0:02:45 lr 0.000257 wd 0.0500 time 0.2569 (0.2831) data time 0.0008 (0.0152) model time 0.0000 (0.0000) loss 6.2239 (5.6169) grad_norm 2.6543 (2.5097) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][50/625] eta 0:02:41 lr 0.000257 wd 0.0500 time 0.2627 (0.2817) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 6.2892 (5.6485) grad_norm 3.6106 (3.2660) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][60/625] eta 0:02:36 lr 0.000257 wd 0.0500 time 0.2576 (0.2779) data time 0.0011 (0.0105) model time 0.2565 (0.2571) loss 5.5610 (5.6744) grad_norm 2.3756 (3.2270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][70/625] eta 0:02:32 lr 0.000256 wd 0.0500 time 0.2535 (0.2747) data time 0.0007 (0.0092) model time 0.2529 (0.2559) loss 6.5897 (5.6740) grad_norm 3.3305 (3.2208) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][80/625] eta 0:02:29 lr 0.000256 wd 0.0500 time 0.2572 (0.2747) data time 0.0009 (0.0082) model time 0.2563 (0.2617) loss 4.8966 (5.7442) grad_norm 11.4611 (3.3279) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][90/625] eta 0:02:25 lr 0.000256 wd 0.0500 time 0.2537 (0.2728) data time 0.0006 (0.0074) model time 0.2531 (0.2604) loss 6.8259 (5.7707) grad_norm 2.6457 (3.1833) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][100/625] eta 0:02:22 lr 0.000256 wd 0.0500 time 0.2568 (0.2711) data time 0.0008 (0.0068) model time 0.2561 (0.2593) loss 5.3306 (5.7714) grad_norm 2.9708 (3.0743) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:41:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][110/625] eta 0:02:19 lr 0.000256 wd 0.0500 time 0.2553 (0.2714) data time 0.0007 (0.0062) model time 0.2546 (0.2617) loss 6.5259 (5.8022) grad_norm 2.7045 (3.0280) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][120/625] eta 0:02:16 lr 0.000256 wd 0.0500 time 0.2521 (0.2702) data time 0.0008 (0.0058) model time 0.2513 (0.2608) loss 6.2425 (5.7974) grad_norm 1.5779 (2.9814) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][130/625] eta 0:02:13 lr 0.000256 wd 0.0500 time 0.2503 (0.2692) data time 0.0007 (0.0054) model time 0.2497 (0.2602) loss 5.8796 (5.7962) grad_norm 2.8734 (2.9854) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][140/625] eta 0:02:10 lr 0.000256 wd 0.0500 time 0.2558 (0.2682) data time 0.0012 (0.0051) model time 0.2547 (0.2596) loss 4.8861 (5.7681) grad_norm 4.0650 (3.0230) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][150/625] eta 0:02:07 lr 0.000255 wd 0.0500 time 0.2563 (0.2674) data time 0.0009 (0.0048) model time 0.2554 (0.2591) loss 5.6625 (5.7757) grad_norm 11.5342 (3.1020) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][160/625] eta 0:02:03 lr 0.000255 wd 0.0500 time 0.2567 (0.2667) data time 0.0010 (0.0046) model time 0.2557 (0.2587) loss 5.6980 (5.7916) grad_norm 2.9360 (3.2002) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][170/625] eta 0:02:01 lr 0.000255 wd 0.0500 time 0.2572 (0.2661) data time 0.0008 (0.0044) model time 0.2564 (0.2585) loss 6.5792 (5.8075) grad_norm 2.1357 (3.1698) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][180/625] eta 0:01:58 lr 0.000255 wd 0.0500 time 0.2564 (0.2664) data time 0.0007 (0.0042) model time 0.2558 (0.2594) loss 6.0088 (5.7904) grad_norm 3.7337 (3.1629) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][190/625] eta 0:01:55 lr 0.000255 wd 0.0500 time 0.2566 (0.2659) data time 0.0008 (0.0040) model time 0.2558 (0.2592) loss 5.6662 (5.7939) grad_norm 1.3116 (3.1302) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][200/625] eta 0:01:53 lr 0.000255 wd 0.0500 time 0.2584 (0.2665) data time 0.0008 (0.0039) model time 0.2576 (0.2604) loss 5.6399 (5.8074) grad_norm 4.8591 (3.1224) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][210/625] eta 0:01:50 lr 0.000255 wd 0.0500 time 0.2590 (0.2660) data time 0.0008 (0.0037) model time 0.2582 (0.2600) loss 6.1232 (5.8017) grad_norm 1.9054 (3.1214) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][220/625] eta 0:01:47 lr 0.000255 wd 0.0500 time 0.4577 (0.2665) data time 0.0009 (0.0036) model time 0.4568 (0.2609) loss 5.8329 (5.8036) grad_norm 1.8290 (3.0685) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][230/625] eta 0:01:45 lr 0.000255 wd 0.0500 time 0.2580 (0.2661) data time 0.0008 (0.0035) model time 0.2573 (0.2607) loss 4.9630 (5.8049) grad_norm 1.9239 (3.0349) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][240/625] eta 0:01:42 lr 0.000254 wd 0.0500 time 0.4776 (0.2666) data time 0.0007 (0.0034) model time 0.4770 (0.2616) loss 6.1223 (5.7960) grad_norm 2.3405 (3.0005) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][250/625] eta 0:01:40 lr 0.000254 wd 0.0500 time 0.2546 (0.2670) data time 0.0008 (0.0033) model time 0.2537 (0.2622) loss 5.1529 (5.7884) grad_norm 3.9684 (2.9763) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][260/625] eta 0:01:37 lr 0.000254 wd 0.0500 time 0.2671 (0.2666) data time 0.0008 (0.0032) model time 0.2663 (0.2620) loss 6.0652 (5.7783) grad_norm 1.8211 (2.9660) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][270/625] eta 0:01:34 lr 0.000254 wd 0.0500 time 0.2531 (0.2669) data time 0.0010 (0.0031) model time 0.2521 (0.2625) loss 5.9825 (5.7839) grad_norm 2.2280 (2.9542) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][280/625] eta 0:01:31 lr 0.000254 wd 0.0500 time 0.2554 (0.2665) data time 0.0009 (0.0030) model time 0.2544 (0.2622) loss 6.3779 (5.7862) grad_norm 1.5623 (2.9444) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][290/625] eta 0:01:29 lr 0.000254 wd 0.0500 time 0.2579 (0.2665) data time 0.0008 (0.0030) model time 0.2571 (0.2624) loss 6.3926 (5.7832) grad_norm 5.6726 (2.9303) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][300/625] eta 0:01:26 lr 0.000254 wd 0.0500 time 0.2533 (0.2662) data time 0.0008 (0.0029) model time 0.2525 (0.2620) loss 5.8950 (5.7814) grad_norm 1.5042 (2.9285) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][310/625] eta 0:01:23 lr 0.000254 wd 0.0500 time 0.2582 (0.2659) data time 0.0009 (0.0028) model time 0.2573 (0.2618) loss 5.9947 (5.7753) grad_norm 1.9542 (2.9053) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][320/625] eta 0:01:20 lr 0.000254 wd 0.0500 time 0.2524 (0.2655) data time 0.0008 (0.0028) model time 0.2516 (0.2615) loss 6.4841 (5.7640) grad_norm 2.2426 (2.9167) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][330/625] eta 0:01:18 lr 0.000253 wd 0.0500 time 0.2540 (0.2653) data time 0.0006 (0.0027) model time 0.2534 (0.2613) loss 6.3395 (5.7700) grad_norm 2.1422 (2.9028) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:42:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][340/625] eta 0:01:15 lr 0.000253 wd 0.0500 time 0.2585 (0.2650) data time 0.0009 (0.0027) model time 0.2577 (0.2611) loss 5.4323 (5.7651) grad_norm 1.8917 (2.9127) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][350/625] eta 0:01:12 lr 0.000253 wd 0.0500 time 0.2606 (0.2648) data time 0.0008 (0.0026) model time 0.2598 (0.2609) loss 6.4333 (5.7602) grad_norm 2.5444 (2.9015) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][360/625] eta 0:01:10 lr 0.000253 wd 0.0500 time 0.2617 (0.2646) data time 0.0009 (0.0026) model time 0.2608 (0.2608) loss 6.4515 (5.7555) grad_norm 1.5612 (2.9143) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][370/625] eta 0:01:07 lr 0.000253 wd 0.0500 time 0.2552 (0.2643) data time 0.0017 (0.0025) model time 0.2535 (0.2606) loss 5.8111 (5.7528) grad_norm 1.6307 (2.9082) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][380/625] eta 0:01:04 lr 0.000253 wd 0.0500 time 0.2584 (0.2641) data time 0.0009 (0.0025) model time 0.2575 (0.2604) loss 5.9031 (5.7521) grad_norm 2.2588 (2.8879) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][390/625] eta 0:01:02 lr 0.000253 wd 0.0500 time 0.2557 (0.2639) data time 0.0010 (0.0025) model time 0.2547 (0.2603) loss 5.0374 (5.7515) grad_norm 3.9481 (2.8808) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][400/625] eta 0:00:59 lr 0.000253 wd 0.0500 time 0.2546 (0.2638) data time 0.0007 (0.0024) model time 0.2538 (0.2602) loss 5.1403 (5.7446) grad_norm 3.3969 (2.8674) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][410/625] eta 0:00:56 lr 0.000252 wd 0.0500 time 0.2551 (0.2636) data time 0.0011 (0.0024) model time 0.2540 (0.2601) loss 5.7847 (5.7469) grad_norm 2.0235 (2.8703) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][420/625] eta 0:00:54 lr 0.000252 wd 0.0500 time 0.2575 (0.2638) data time 0.0006 (0.0024) model time 0.2569 (0.2604) loss 5.0547 (5.7526) grad_norm 1.9245 (2.8793) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][430/625] eta 0:00:51 lr 0.000252 wd 0.0500 time 0.2635 (0.2636) data time 0.0007 (0.0023) model time 0.2628 (0.2602) loss 6.4413 (5.7546) grad_norm 1.9405 (2.9069) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][440/625] eta 0:00:48 lr 0.000252 wd 0.0500 time 0.2554 (0.2635) data time 0.0009 (0.0023) model time 0.2545 (0.2601) loss 6.0830 (5.7446) grad_norm 1.8214 (2.9117) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][450/625] eta 0:00:46 lr 0.000252 wd 0.0500 time 0.2535 (0.2633) data time 0.0009 (0.0023) model time 0.2527 (0.2600) loss 6.9898 (5.7493) grad_norm 7.1199 (2.9200) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][460/625] eta 0:00:43 lr 0.000252 wd 0.0500 time 0.2586 (0.2636) data time 0.0008 (0.0022) model time 0.2578 (0.2604) loss 5.5864 (5.7470) grad_norm 2.5822 (2.9423) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][470/625] eta 0:00:40 lr 0.000252 wd 0.0500 time 0.2582 (0.2638) data time 0.0006 (0.0022) model time 0.2575 (0.2607) loss 4.8734 (5.7434) grad_norm 2.9390 (2.9278) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][480/625] eta 0:00:38 lr 0.000252 wd 0.0500 time 0.2546 (0.2641) data time 0.0011 (0.0022) model time 0.2534 (0.2611) loss 5.5774 (5.7377) grad_norm 1.7491 (2.9090) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][490/625] eta 0:00:35 lr 0.000252 wd 0.0500 time 0.2562 (0.2640) data time 0.0006 (0.0021) model time 0.2556 (0.2610) loss 4.6316 (5.7312) grad_norm 2.2490 (2.9017) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][500/625] eta 0:00:32 lr 0.000251 wd 0.0500 time 0.2530 (0.2639) data time 0.0008 (0.0021) model time 0.2522 (0.2609) loss 4.5496 (5.7238) grad_norm 2.2877 (2.9026) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][510/625] eta 0:00:30 lr 0.000251 wd 0.0500 time 0.2583 (0.2644) data time 0.0008 (0.0021) model time 0.2575 (0.2616) loss 6.5429 (5.7273) grad_norm 2.2192 (2.8991) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][520/625] eta 0:00:27 lr 0.000251 wd 0.0500 time 0.2523 (0.2643) data time 0.0007 (0.0021) model time 0.2515 (0.2614) loss 5.7281 (5.7310) grad_norm 2.7118 (2.8877) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][530/625] eta 0:00:25 lr 0.000251 wd 0.0500 time 0.2548 (0.2641) data time 0.0007 (0.0020) model time 0.2541 (0.2613) loss 4.6824 (5.7299) grad_norm 2.7020 (2.8773) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][540/625] eta 0:00:22 lr 0.000251 wd 0.0500 time 0.2537 (0.2640) data time 0.0009 (0.0020) model time 0.2528 (0.2612) loss 4.4999 (5.7289) grad_norm 3.4003 (2.8836) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][550/625] eta 0:00:19 lr 0.000251 wd 0.0500 time 0.2565 (0.2641) data time 0.0007 (0.0020) model time 0.2557 (0.2613) loss 5.9762 (5.7344) grad_norm 2.2021 (2.8886) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][560/625] eta 0:00:17 lr 0.000251 wd 0.0500 time 0.2560 (0.2639) data time 0.0010 (0.0020) model time 0.2550 (0.2612) loss 6.2974 (5.7292) grad_norm 1.7481 (2.8814) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:43:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][570/625] eta 0:00:14 lr 0.000251 wd 0.0500 time 0.2699 (0.2641) data time 0.0010 (0.0020) model time 0.2688 (0.2614) loss 5.5890 (5.7295) grad_norm 1.8254 (2.8658) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][580/625] eta 0:00:11 lr 0.000251 wd 0.0500 time 0.2539 (0.2639) data time 0.0008 (0.0019) model time 0.2531 (0.2613) loss 5.3322 (5.7332) grad_norm 1.9455 (2.8596) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][590/625] eta 0:00:09 lr 0.000250 wd 0.0500 time 0.2585 (0.2638) data time 0.0008 (0.0019) model time 0.2577 (0.2612) loss 5.1313 (5.7330) grad_norm 2.8015 (2.8470) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][600/625] eta 0:00:06 lr 0.000250 wd 0.0500 time 0.2551 (0.2637) data time 0.0008 (0.0019) model time 0.2543 (0.2610) loss 5.7603 (5.7345) grad_norm 3.2346 (2.8396) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][610/625] eta 0:00:03 lr 0.000250 wd 0.0500 time 0.2546 (0.2635) data time 0.0003 (0.0019) model time 0.2543 (0.2609) loss 4.7026 (5.7336) grad_norm 1.9814 (2.8330) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][620/625] eta 0:00:01 lr 0.000250 wd 0.0500 time 0.2523 (0.2634) data time 0.0005 (0.0019) model time 0.2517 (0.2608) loss 4.5779 (5.7318) grad_norm 4.2740 (2.8352) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 237 training takes 0:02:44 [2024-08-04 08:44:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:44:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:44:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.6074 (0.6074) Acc@1 89.600 (89.600) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:44:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.094) Loss 0.9146 (0.7251) Acc@1 81.348 (86.510) Acc@5 96.289 (97.692) Mem 9655MB [2024-08-04 08:44:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0322 (0.8437) Acc@1 76.709 (83.317) Acc@5 94.971 (96.477) Mem 9655MB [2024-08-04 08:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.041 Acc@5 96.495 [2024-08-04 08:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.787 (0.787) Loss 0.5830 (0.5830) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:44:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.126) Loss 0.9131 (0.7106) Acc@1 80.469 (86.492) Acc@5 96.045 (97.727) Mem 9655MB [2024-08-04 08:44:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0225 (0.8329) Acc@1 77.051 (83.294) Acc@5 95.264 (96.498) Mem 9655MB [2024-08-04 08:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.007 Acc@5 96.503 [2024-08-04 08:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:44:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][0/625] eta 0:10:37 lr 0.000250 wd 0.0500 time 1.0192 (1.0192) data time 0.7039 (0.7039) model time 0.0000 (0.0000) loss 5.6371 (5.6371) grad_norm 1.9214 (1.9214) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][10/625] eta 0:03:30 lr 0.000250 wd 0.0500 time 0.2541 (0.3417) data time 0.0009 (0.0649) model time 0.0000 (0.0000) loss 5.4157 (5.5794) grad_norm 1.4032 (2.0970) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][20/625] eta 0:03:07 lr 0.000250 wd 0.0500 time 0.2569 (0.3092) data time 0.0009 (0.0344) model time 0.0000 (0.0000) loss 5.7963 (5.6956) grad_norm 2.0674 (2.0415) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][30/625] eta 0:02:56 lr 0.000250 wd 0.0500 time 0.2616 (0.2960) data time 0.0007 (0.0236) model time 0.0000 (0.0000) loss 5.2297 (5.7849) grad_norm 2.2500 (2.0870) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][40/625] eta 0:02:47 lr 0.000250 wd 0.0500 time 0.2553 (0.2861) data time 0.0008 (0.0181) model time 0.0000 (0.0000) loss 4.9927 (5.7214) grad_norm 1.7799 (2.0586) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][50/625] eta 0:02:41 lr 0.000249 wd 0.0500 time 0.2530 (0.2804) data time 0.0009 (0.0147) model time 0.0000 (0.0000) loss 6.6110 (5.7195) grad_norm 2.6216 (2.4740) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][60/625] eta 0:02:37 lr 0.000249 wd 0.0500 time 0.4186 (0.2795) data time 0.0006 (0.0124) model time 0.4180 (0.2744) loss 5.9444 (5.7234) grad_norm 2.6667 (2.5263) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][70/625] eta 0:02:33 lr 0.000249 wd 0.0500 time 0.2574 (0.2763) data time 0.0010 (0.0108) model time 0.2564 (0.2649) loss 6.6312 (5.7111) grad_norm 4.4881 (2.8256) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][80/625] eta 0:02:29 lr 0.000249 wd 0.0500 time 0.2548 (0.2740) data time 0.0007 (0.0096) model time 0.2541 (0.2624) loss 4.7823 (5.7090) grad_norm 6.6581 (3.0477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][90/625] eta 0:02:25 lr 0.000249 wd 0.0500 time 0.2560 (0.2722) data time 0.0008 (0.0086) model time 0.2552 (0.2609) loss 4.9969 (5.7119) grad_norm 1.4786 (2.9365) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][100/625] eta 0:02:22 lr 0.000249 wd 0.0500 time 0.2523 (0.2707) data time 0.0006 (0.0079) model time 0.2516 (0.2599) loss 5.7445 (5.7263) grad_norm 2.5683 (2.8670) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][110/625] eta 0:02:19 lr 0.000249 wd 0.0500 time 0.2584 (0.2714) data time 0.0009 (0.0072) model time 0.2575 (0.2628) loss 6.3091 (5.7240) grad_norm 1.9255 (2.8056) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][120/625] eta 0:02:16 lr 0.000249 wd 0.0500 time 0.2577 (0.2702) data time 0.0010 (0.0067) model time 0.2567 (0.2618) loss 5.3279 (5.7459) grad_norm 1.7851 (2.7986) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][130/625] eta 0:02:13 lr 0.000249 wd 0.0500 time 0.2612 (0.2691) data time 0.0006 (0.0063) model time 0.2606 (0.2610) loss 5.9960 (5.7471) grad_norm 2.1005 (2.8351) loss_scale 1024.0000 (535.4504) mem 9655MB [2024-08-04 08:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][140/625] eta 0:02:10 lr 0.000248 wd 0.0500 time 0.2559 (0.2694) data time 0.0011 (0.0059) model time 0.2548 (0.2623) loss 5.7389 (5.7565) grad_norm 4.0527 (2.8685) loss_scale 1024.0000 (570.0993) mem 9655MB [2024-08-04 08:44:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][150/625] eta 0:02:08 lr 0.000248 wd 0.0500 time 0.2539 (0.2697) data time 0.0010 (0.0056) model time 0.2529 (0.2634) loss 5.0560 (5.7613) grad_norm 3.4777 (2.8513) loss_scale 1024.0000 (600.1589) mem 9655MB [2024-08-04 08:45:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][160/625] eta 0:02:05 lr 0.000248 wd 0.0500 time 0.2579 (0.2701) data time 0.0006 (0.0053) model time 0.2572 (0.2644) loss 5.2821 (5.7415) grad_norm 4.3314 (2.8682) loss_scale 1024.0000 (626.4845) mem 9655MB [2024-08-04 08:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][170/625] eta 0:02:02 lr 0.000248 wd 0.0500 time 0.2578 (0.2694) data time 0.0007 (0.0050) model time 0.2570 (0.2638) loss 5.2395 (5.7456) grad_norm 3.8695 (2.8474) loss_scale 1024.0000 (649.7310) mem 9655MB [2024-08-04 08:45:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][180/625] eta 0:01:59 lr 0.000248 wd 0.0500 time 0.2597 (0.2687) data time 0.0008 (0.0048) model time 0.2588 (0.2632) loss 6.3927 (5.7450) grad_norm 2.7263 (2.8550) loss_scale 1024.0000 (670.4088) mem 9655MB [2024-08-04 08:45:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][190/625] eta 0:01:56 lr 0.000248 wd 0.0500 time 0.2597 (0.2689) data time 0.0007 (0.0046) model time 0.2589 (0.2638) loss 6.9556 (5.7521) grad_norm 17.1351 (2.9149) loss_scale 1024.0000 (688.9215) mem 9655MB [2024-08-04 08:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][200/625] eta 0:01:54 lr 0.000248 wd 0.0500 time 0.2555 (0.2682) data time 0.0007 (0.0044) model time 0.2548 (0.2632) loss 5.8936 (5.7487) grad_norm 2.5159 (2.9086) loss_scale 1024.0000 (705.5920) mem 9655MB [2024-08-04 08:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][210/625] eta 0:01:51 lr 0.000248 wd 0.0500 time 0.2533 (0.2677) data time 0.0008 (0.0042) model time 0.2525 (0.2627) loss 6.3914 (5.7418) grad_norm 3.5754 (2.8843) loss_scale 1024.0000 (720.6825) mem 9655MB [2024-08-04 08:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][220/625] eta 0:01:48 lr 0.000248 wd 0.0500 time 0.2651 (0.2672) data time 0.0008 (0.0041) model time 0.2643 (0.2623) loss 6.8482 (5.7430) grad_norm 1.7456 (2.8718) loss_scale 1024.0000 (734.4072) mem 9655MB [2024-08-04 08:45:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][230/625] eta 0:01:45 lr 0.000247 wd 0.0500 time 0.2561 (0.2668) data time 0.0012 (0.0040) model time 0.2549 (0.2620) loss 5.2229 (5.7446) grad_norm 2.1002 (2.8620) loss_scale 1024.0000 (746.9437) mem 9655MB [2024-08-04 08:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][240/625] eta 0:01:42 lr 0.000247 wd 0.0500 time 0.2569 (0.2672) data time 0.0007 (0.0038) model time 0.2561 (0.2627) loss 5.6343 (5.7477) grad_norm 2.7079 (2.8502) loss_scale 1024.0000 (758.4398) mem 9655MB [2024-08-04 08:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][250/625] eta 0:01:40 lr 0.000247 wd 0.0500 time 0.2589 (0.2667) data time 0.0009 (0.0037) model time 0.2580 (0.2623) loss 4.9093 (5.7456) grad_norm 2.5880 (2.8346) loss_scale 1024.0000 (769.0199) mem 9655MB [2024-08-04 08:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][260/625] eta 0:01:37 lr 0.000247 wd 0.0500 time 0.2587 (0.2671) data time 0.0009 (0.0036) model time 0.2578 (0.2629) loss 6.3017 (5.7410) grad_norm 1.9112 (2.8219) loss_scale 1024.0000 (778.7893) mem 9655MB [2024-08-04 08:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][270/625] eta 0:01:34 lr 0.000247 wd 0.0500 time 0.2594 (0.2667) data time 0.0006 (0.0035) model time 0.2589 (0.2626) loss 6.2168 (5.7485) grad_norm 1.9630 (2.8046) loss_scale 1024.0000 (787.8376) mem 9655MB [2024-08-04 08:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][280/625] eta 0:01:31 lr 0.000247 wd 0.0500 time 0.2604 (0.2663) data time 0.0009 (0.0034) model time 0.2596 (0.2622) loss 5.5920 (5.7573) grad_norm 1.7600 (2.7930) loss_scale 1024.0000 (796.2420) mem 9655MB [2024-08-04 08:45:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][290/625] eta 0:01:29 lr 0.000247 wd 0.0500 time 0.2538 (0.2663) data time 0.0010 (0.0033) model time 0.2528 (0.2623) loss 6.1741 (5.7520) grad_norm 3.1034 (2.7778) loss_scale 1024.0000 (804.0687) mem 9655MB [2024-08-04 08:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][300/625] eta 0:01:26 lr 0.000247 wd 0.0500 time 0.2555 (0.2659) data time 0.0009 (0.0033) model time 0.2546 (0.2621) loss 5.4603 (5.7429) grad_norm 2.1606 (2.8046) loss_scale 1024.0000 (811.3754) mem 9655MB [2024-08-04 08:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][310/625] eta 0:01:23 lr 0.000247 wd 0.0500 time 0.2557 (0.2665) data time 0.0009 (0.0032) model time 0.2548 (0.2628) loss 5.4012 (5.7362) grad_norm 2.2171 (2.7978) loss_scale 1024.0000 (818.2122) mem 9655MB [2024-08-04 08:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][320/625] eta 0:01:21 lr 0.000246 wd 0.0500 time 0.2600 (0.2661) data time 0.0006 (0.0031) model time 0.2594 (0.2625) loss 5.7100 (5.7315) grad_norm 1.6052 (2.7702) loss_scale 1024.0000 (824.6231) mem 9655MB [2024-08-04 08:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][330/625] eta 0:01:18 lr 0.000246 wd 0.0500 time 0.2569 (0.2665) data time 0.0007 (0.0031) model time 0.2562 (0.2630) loss 6.5831 (5.7289) grad_norm 2.5891 (2.7557) loss_scale 1024.0000 (830.6465) mem 9655MB [2024-08-04 08:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][340/625] eta 0:01:15 lr 0.000246 wd 0.0500 time 0.2555 (0.2662) data time 0.0016 (0.0030) model time 0.2539 (0.2627) loss 6.2377 (5.7370) grad_norm 2.0626 (2.7449) loss_scale 1024.0000 (836.3167) mem 9655MB [2024-08-04 08:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][350/625] eta 0:01:13 lr 0.000246 wd 0.0500 time 0.2599 (0.2659) data time 0.0007 (0.0029) model time 0.2592 (0.2625) loss 5.6680 (5.7238) grad_norm 3.2790 (2.7312) loss_scale 1024.0000 (841.6638) mem 9655MB [2024-08-04 08:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][360/625] eta 0:01:10 lr 0.000246 wd 0.0500 time 0.2686 (0.2657) data time 0.0006 (0.0029) model time 0.2680 (0.2623) loss 5.2978 (5.7164) grad_norm 2.7758 (2.7275) loss_scale 1024.0000 (846.7147) mem 9655MB [2024-08-04 08:45:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][370/625] eta 0:01:07 lr 0.000246 wd 0.0500 time 0.2567 (0.2654) data time 0.0010 (0.0028) model time 0.2557 (0.2621) loss 5.0244 (5.7176) grad_norm 1.9363 (2.7131) loss_scale 1024.0000 (851.4933) mem 9655MB [2024-08-04 08:45:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][380/625] eta 0:01:04 lr 0.000246 wd 0.0500 time 0.2665 (0.2651) data time 0.0007 (0.0028) model time 0.2658 (0.2619) loss 5.7590 (5.7172) grad_norm 3.4102 (2.6998) loss_scale 1024.0000 (856.0210) mem 9655MB [2024-08-04 08:46:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][390/625] eta 0:01:02 lr 0.000246 wd 0.0500 time 0.2534 (0.2649) data time 0.0007 (0.0027) model time 0.2527 (0.2617) loss 6.5380 (5.7218) grad_norm 9.7617 (2.7178) loss_scale 1024.0000 (860.3171) mem 9655MB [2024-08-04 08:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][400/625] eta 0:00:59 lr 0.000245 wd 0.0500 time 0.2509 (0.2652) data time 0.0007 (0.0027) model time 0.2501 (0.2621) loss 5.7477 (5.7271) grad_norm 4.8514 (2.7326) loss_scale 1024.0000 (864.3990) mem 9655MB [2024-08-04 08:46:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][410/625] eta 0:00:56 lr 0.000245 wd 0.0500 time 0.2602 (0.2650) data time 0.0010 (0.0026) model time 0.2593 (0.2619) loss 4.5391 (5.7271) grad_norm 2.2626 (2.7551) loss_scale 1024.0000 (868.2822) mem 9655MB [2024-08-04 08:46:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][420/625] eta 0:00:54 lr 0.000245 wd 0.0500 time 0.2594 (0.2648) data time 0.0009 (0.0026) model time 0.2585 (0.2618) loss 6.0919 (5.7250) grad_norm 1.5814 (2.7505) loss_scale 1024.0000 (871.9810) mem 9655MB [2024-08-04 08:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][430/625] eta 0:00:51 lr 0.000245 wd 0.0500 time 0.2610 (0.2646) data time 0.0008 (0.0026) model time 0.2602 (0.2616) loss 5.7292 (5.7295) grad_norm 2.7581 (2.7484) loss_scale 1024.0000 (875.5081) mem 9655MB [2024-08-04 08:46:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][440/625] eta 0:00:48 lr 0.000245 wd 0.0500 time 0.2559 (0.2648) data time 0.0010 (0.0025) model time 0.2549 (0.2618) loss 5.4802 (5.7213) grad_norm 2.8290 (inf) loss_scale 512.0000 (871.9093) mem 9655MB [2024-08-04 08:46:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][450/625] eta 0:00:46 lr 0.000245 wd 0.0500 time 0.2563 (0.2646) data time 0.0007 (0.0025) model time 0.2556 (0.2617) loss 5.9958 (5.7240) grad_norm 1.7499 (inf) loss_scale 512.0000 (863.9290) mem 9655MB [2024-08-04 08:46:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][460/625] eta 0:00:43 lr 0.000245 wd 0.0500 time 0.2664 (0.2645) data time 0.0006 (0.0025) model time 0.2657 (0.2616) loss 5.8611 (5.7177) grad_norm 1.9600 (inf) loss_scale 512.0000 (856.2950) mem 9655MB [2024-08-04 08:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][470/625] eta 0:00:40 lr 0.000245 wd 0.0500 time 0.2512 (0.2644) data time 0.0008 (0.0024) model time 0.2504 (0.2615) loss 5.8103 (5.7268) grad_norm 1.4295 (inf) loss_scale 512.0000 (848.9851) mem 9655MB [2024-08-04 08:46:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][480/625] eta 0:00:38 lr 0.000245 wd 0.0500 time 0.2539 (0.2647) data time 0.0009 (0.0024) model time 0.2530 (0.2619) loss 4.4269 (5.7215) grad_norm 1.7372 (inf) loss_scale 512.0000 (841.9792) mem 9655MB [2024-08-04 08:46:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][490/625] eta 0:00:35 lr 0.000244 wd 0.0500 time 0.2568 (0.2645) data time 0.0008 (0.0024) model time 0.2560 (0.2617) loss 6.0660 (5.7168) grad_norm 2.6449 (inf) loss_scale 512.0000 (835.2587) mem 9655MB [2024-08-04 08:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][500/625] eta 0:00:33 lr 0.000244 wd 0.0500 time 0.2565 (0.2647) data time 0.0007 (0.0023) model time 0.2559 (0.2620) loss 5.7924 (5.7173) grad_norm 1.4232 (inf) loss_scale 512.0000 (828.8064) mem 9655MB [2024-08-04 08:46:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][510/625] eta 0:00:30 lr 0.000244 wd 0.0500 time 0.2553 (0.2645) data time 0.0007 (0.0023) model time 0.2545 (0.2618) loss 4.3456 (5.7145) grad_norm 4.3497 (inf) loss_scale 512.0000 (822.6067) mem 9655MB [2024-08-04 08:46:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][520/625] eta 0:00:27 lr 0.000244 wd 0.0500 time 0.2564 (0.2646) data time 0.0010 (0.0023) model time 0.2554 (0.2620) loss 6.0987 (5.7182) grad_norm 3.0196 (inf) loss_scale 512.0000 (816.6449) mem 9655MB [2024-08-04 08:46:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][530/625] eta 0:00:25 lr 0.000244 wd 0.0500 time 0.2535 (0.2648) data time 0.0007 (0.0023) model time 0.2528 (0.2622) loss 4.8738 (5.7118) grad_norm 2.4201 (inf) loss_scale 512.0000 (810.9077) mem 9655MB [2024-08-04 08:46:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][540/625] eta 0:00:22 lr 0.000244 wd 0.0500 time 0.2530 (0.2646) data time 0.0008 (0.0022) model time 0.2523 (0.2620) loss 5.6760 (5.7144) grad_norm 4.7259 (inf) loss_scale 512.0000 (805.3826) mem 9655MB [2024-08-04 08:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][550/625] eta 0:00:19 lr 0.000244 wd 0.0500 time 0.2562 (0.2645) data time 0.0008 (0.0022) model time 0.2553 (0.2619) loss 6.5490 (5.7138) grad_norm 4.1235 (inf) loss_scale 512.0000 (800.0581) mem 9655MB [2024-08-04 08:46:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][560/625] eta 0:00:17 lr 0.000244 wd 0.0500 time 0.2569 (0.2643) data time 0.0008 (0.0022) model time 0.2561 (0.2618) loss 6.3485 (5.7183) grad_norm 3.9967 (inf) loss_scale 512.0000 (794.9234) mem 9655MB [2024-08-04 08:46:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][570/625] eta 0:00:14 lr 0.000244 wd 0.0500 time 0.2530 (0.2642) data time 0.0007 (0.0022) model time 0.2523 (0.2616) loss 5.8304 (5.7177) grad_norm 5.0105 (inf) loss_scale 512.0000 (789.9685) mem 9655MB [2024-08-04 08:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][580/625] eta 0:00:11 lr 0.000243 wd 0.0500 time 0.2518 (0.2640) data time 0.0008 (0.0022) model time 0.2510 (0.2615) loss 4.8996 (5.7165) grad_norm 6.1981 (inf) loss_scale 512.0000 (785.1842) mem 9655MB [2024-08-04 08:46:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][590/625] eta 0:00:09 lr 0.000243 wd 0.0500 time 0.2516 (0.2639) data time 0.0009 (0.0021) model time 0.2507 (0.2614) loss 5.8340 (5.7115) grad_norm 1.9579 (inf) loss_scale 512.0000 (780.5618) mem 9655MB [2024-08-04 08:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][600/625] eta 0:00:06 lr 0.000243 wd 0.0500 time 0.2518 (0.2640) data time 0.0008 (0.0021) model time 0.2510 (0.2615) loss 5.2399 (5.7099) grad_norm 10.5568 (inf) loss_scale 512.0000 (776.0932) mem 9655MB [2024-08-04 08:46:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][610/625] eta 0:00:03 lr 0.000243 wd 0.0500 time 0.2525 (0.2638) data time 0.0005 (0.0021) model time 0.2520 (0.2614) loss 6.5299 (5.7101) grad_norm 2.4446 (inf) loss_scale 512.0000 (771.7709) mem 9655MB [2024-08-04 08:47:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][620/625] eta 0:00:01 lr 0.000243 wd 0.0500 time 0.2596 (0.2637) data time 0.0005 (0.0021) model time 0.2591 (0.2613) loss 5.7704 (5.7101) grad_norm 2.7269 (inf) loss_scale 512.0000 (767.5878) mem 9655MB [2024-08-04 08:47:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 238 training takes 0:02:44 [2024-08-04 08:47:02 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:47:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:47:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.6064 (0.6064) Acc@1 89.893 (89.893) Acc@5 98.438 (98.438) Mem 9655MB [2024-08-04 08:47:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9141 (0.7188) Acc@1 81.250 (86.692) Acc@5 96.289 (97.710) Mem 9655MB [2024-08-04 08:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0459 (0.8471) Acc@1 76.074 (83.350) Acc@5 95.361 (96.505) Mem 9655MB [2024-08-04 08:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.077 Acc@5 96.501 [2024-08-04 08:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:47:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.693 (0.693) Loss 0.5830 (0.5830) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 08:47:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.121) Loss 0.9121 (0.7105) Acc@1 80.518 (86.506) Acc@5 96.094 (97.732) Mem 9655MB [2024-08-04 08:47:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.089) Loss 1.0225 (0.8328) Acc@1 77.051 (83.310) Acc@5 95.264 (96.494) Mem 9655MB [2024-08-04 08:47:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.017 Acc@5 96.495 [2024-08-04 08:47:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:47:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][0/625] eta 0:12:05 lr 0.000243 wd 0.0500 time 1.1612 (1.1612) data time 0.8045 (0.8045) model time 0.0000 (0.0000) loss 6.1926 (6.1926) grad_norm 2.8088 (2.8088) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][10/625] eta 0:03:36 lr 0.000243 wd 0.0500 time 0.3945 (0.3518) data time 0.0009 (0.0739) model time 0.0000 (0.0000) loss 5.5841 (5.8086) grad_norm 2.8491 (2.4392) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][20/625] eta 0:03:05 lr 0.000243 wd 0.0500 time 0.2552 (0.3069) data time 0.0007 (0.0392) model time 0.0000 (0.0000) loss 5.7517 (5.5760) grad_norm 3.5512 (2.4951) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][30/625] eta 0:02:52 lr 0.000243 wd 0.0500 time 0.2545 (0.2905) data time 0.0012 (0.0268) model time 0.0000 (0.0000) loss 4.9118 (5.5292) grad_norm 1.8859 (2.4848) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][40/625] eta 0:02:46 lr 0.000243 wd 0.0500 time 0.2531 (0.2850) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 7.0667 (5.5560) grad_norm 2.4721 (2.3998) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][50/625] eta 0:02:40 lr 0.000242 wd 0.0500 time 0.2560 (0.2793) data time 0.0007 (0.0167) model time 0.0000 (0.0000) loss 6.1239 (5.5654) grad_norm 2.8589 (2.3306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][60/625] eta 0:02:35 lr 0.000242 wd 0.0500 time 0.2662 (0.2755) data time 0.0010 (0.0141) model time 0.2652 (0.2549) loss 6.2814 (5.5386) grad_norm 2.3934 (2.2352) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][70/625] eta 0:02:33 lr 0.000242 wd 0.0500 time 0.2564 (0.2762) data time 0.0006 (0.0123) model time 0.2559 (0.2673) loss 5.7430 (5.5789) grad_norm 2.3529 (2.2357) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][80/625] eta 0:02:31 lr 0.000242 wd 0.0500 time 0.2566 (0.2782) data time 0.0011 (0.0109) model time 0.2556 (0.2755) loss 5.7961 (5.6056) grad_norm 2.5334 (2.2429) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][90/625] eta 0:02:27 lr 0.000242 wd 0.0500 time 0.2570 (0.2761) data time 0.0010 (0.0098) model time 0.2560 (0.2712) loss 4.2531 (5.5711) grad_norm 2.2064 (2.3586) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][100/625] eta 0:02:23 lr 0.000242 wd 0.0500 time 0.2550 (0.2741) data time 0.0008 (0.0089) model time 0.2542 (0.2678) loss 4.5183 (5.5687) grad_norm 2.2561 (2.4253) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][110/625] eta 0:02:21 lr 0.000242 wd 0.0500 time 0.2545 (0.2740) data time 0.0009 (0.0082) model time 0.2536 (0.2685) loss 5.6037 (5.5756) grad_norm 1.5846 (2.4217) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][120/625] eta 0:02:17 lr 0.000242 wd 0.0500 time 0.2579 (0.2726) data time 0.0007 (0.0076) model time 0.2572 (0.2668) loss 4.5585 (5.5717) grad_norm 1.7227 (2.4370) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][130/625] eta 0:02:14 lr 0.000242 wd 0.0500 time 0.2576 (0.2715) data time 0.0007 (0.0071) model time 0.2569 (0.2654) loss 5.7571 (5.5995) grad_norm 3.2038 (2.4602) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][140/625] eta 0:02:11 lr 0.000241 wd 0.0500 time 0.2576 (0.2703) data time 0.0008 (0.0067) model time 0.2567 (0.2642) loss 5.4750 (5.5942) grad_norm 2.2686 (2.5058) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][150/625] eta 0:02:08 lr 0.000241 wd 0.0500 time 0.2589 (0.2705) data time 0.0006 (0.0063) model time 0.2583 (0.2651) loss 6.7617 (5.5976) grad_norm 1.7779 (2.5198) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][160/625] eta 0:02:06 lr 0.000241 wd 0.0500 time 0.2577 (0.2710) data time 0.0008 (0.0060) model time 0.2568 (0.2662) loss 5.6629 (5.6043) grad_norm 1.4342 (2.5050) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][170/625] eta 0:02:02 lr 0.000241 wd 0.0500 time 0.2548 (0.2701) data time 0.0010 (0.0057) model time 0.2538 (0.2652) loss 6.2242 (5.6021) grad_norm 2.1934 (2.5270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][180/625] eta 0:02:00 lr 0.000241 wd 0.0500 time 0.2576 (0.2701) data time 0.0008 (0.0054) model time 0.2569 (0.2655) loss 6.3106 (5.6336) grad_norm 2.4021 (2.5040) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][190/625] eta 0:01:57 lr 0.000241 wd 0.0500 time 0.2582 (0.2693) data time 0.0006 (0.0052) model time 0.2576 (0.2647) loss 4.9545 (5.6176) grad_norm 4.9337 (2.5259) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][200/625] eta 0:01:54 lr 0.000241 wd 0.0500 time 0.2765 (0.2696) data time 0.0008 (0.0050) model time 0.2757 (0.2654) loss 7.8995 (5.6312) grad_norm 2.3963 (2.5059) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][210/625] eta 0:01:51 lr 0.000241 wd 0.0500 time 0.2549 (0.2696) data time 0.0009 (0.0048) model time 0.2540 (0.2656) loss 5.9682 (5.6317) grad_norm 1.7542 (2.4749) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][220/625] eta 0:01:48 lr 0.000240 wd 0.0500 time 0.2536 (0.2690) data time 0.0007 (0.0046) model time 0.2529 (0.2650) loss 5.9934 (5.6460) grad_norm 1.4914 (2.4543) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][230/625] eta 0:01:46 lr 0.000240 wd 0.0500 time 0.2536 (0.2684) data time 0.0007 (0.0044) model time 0.2529 (0.2644) loss 6.1760 (5.6553) grad_norm 2.8065 (2.4629) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][240/625] eta 0:01:43 lr 0.000240 wd 0.0500 time 0.2573 (0.2679) data time 0.0008 (0.0043) model time 0.2564 (0.2639) loss 6.2658 (5.6657) grad_norm 2.0443 (2.4688) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][250/625] eta 0:01:40 lr 0.000240 wd 0.0500 time 0.2533 (0.2679) data time 0.0011 (0.0042) model time 0.2522 (0.2640) loss 6.5598 (5.6734) grad_norm 2.3968 (2.4687) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][260/625] eta 0:01:37 lr 0.000240 wd 0.0500 time 0.2537 (0.2674) data time 0.0008 (0.0040) model time 0.2528 (0.2636) loss 6.0680 (5.6768) grad_norm 1.8720 (2.4618) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][270/625] eta 0:01:34 lr 0.000240 wd 0.0500 time 0.2522 (0.2670) data time 0.0010 (0.0039) model time 0.2511 (0.2632) loss 4.5491 (5.6787) grad_norm 3.7712 (2.4673) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][280/625] eta 0:01:31 lr 0.000240 wd 0.0500 time 0.2570 (0.2667) data time 0.0006 (0.0038) model time 0.2565 (0.2629) loss 4.6812 (5.6755) grad_norm 2.3535 (2.4703) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][290/625] eta 0:01:29 lr 0.000240 wd 0.0500 time 0.2577 (0.2663) data time 0.0008 (0.0037) model time 0.2569 (0.2626) loss 6.4521 (5.6745) grad_norm 3.8883 (2.4813) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][300/625] eta 0:01:26 lr 0.000240 wd 0.0500 time 0.2548 (0.2664) data time 0.0008 (0.0036) model time 0.2540 (0.2628) loss 5.6775 (5.6725) grad_norm 3.2310 (2.4892) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][310/625] eta 0:01:23 lr 0.000239 wd 0.0500 time 0.2551 (0.2661) data time 0.0008 (0.0035) model time 0.2543 (0.2625) loss 5.3257 (5.6655) grad_norm 3.8383 (2.4937) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][320/625] eta 0:01:21 lr 0.000239 wd 0.0500 time 0.2535 (0.2658) data time 0.0007 (0.0035) model time 0.2528 (0.2623) loss 5.1735 (5.6584) grad_norm 2.7208 (2.5024) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][330/625] eta 0:01:18 lr 0.000239 wd 0.0500 time 0.4570 (0.2667) data time 0.0008 (0.0034) model time 0.4562 (0.2635) loss 6.3386 (5.6621) grad_norm 1.7311 (2.4902) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][340/625] eta 0:01:15 lr 0.000239 wd 0.0500 time 0.2533 (0.2664) data time 0.0011 (0.0033) model time 0.2522 (0.2632) loss 5.7514 (5.6647) grad_norm 1.7079 (2.4844) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][350/625] eta 0:01:13 lr 0.000239 wd 0.0500 time 0.2565 (0.2666) data time 0.0009 (0.0032) model time 0.2555 (0.2635) loss 5.3175 (5.6670) grad_norm 1.5652 (2.4684) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][360/625] eta 0:01:10 lr 0.000239 wd 0.0500 time 0.2605 (0.2663) data time 0.0006 (0.0032) model time 0.2599 (0.2633) loss 6.3340 (5.6717) grad_norm 1.7689 (2.4765) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][370/625] eta 0:01:07 lr 0.000239 wd 0.0500 time 0.2542 (0.2666) data time 0.0011 (0.0031) model time 0.2531 (0.2636) loss 5.2758 (5.6699) grad_norm 2.2253 (2.4748) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][380/625] eta 0:01:05 lr 0.000239 wd 0.0500 time 0.2608 (0.2664) data time 0.0010 (0.0031) model time 0.2598 (0.2634) loss 5.8914 (5.6694) grad_norm 2.3870 (2.4714) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][390/625] eta 0:01:02 lr 0.000239 wd 0.0500 time 0.2562 (0.2662) data time 0.0010 (0.0030) model time 0.2552 (0.2632) loss 6.2049 (5.6672) grad_norm 2.1335 (2.4661) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][400/625] eta 0:00:59 lr 0.000238 wd 0.0500 time 0.2609 (0.2659) data time 0.0006 (0.0030) model time 0.2603 (0.2630) loss 4.8158 (5.6670) grad_norm 1.6386 (2.4521) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][410/625] eta 0:00:57 lr 0.000238 wd 0.0500 time 0.2564 (0.2657) data time 0.0007 (0.0029) model time 0.2557 (0.2628) loss 5.7473 (5.6679) grad_norm 1.9791 (2.4517) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:48:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][420/625] eta 0:00:54 lr 0.000238 wd 0.0500 time 0.2573 (0.2659) data time 0.0007 (0.0029) model time 0.2567 (0.2631) loss 5.1795 (5.6577) grad_norm 1.5350 (2.4618) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][430/625] eta 0:00:51 lr 0.000238 wd 0.0500 time 0.2541 (0.2656) data time 0.0008 (0.0028) model time 0.2533 (0.2629) loss 6.1856 (5.6604) grad_norm 1.5472 (2.4475) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][440/625] eta 0:00:49 lr 0.000238 wd 0.0500 time 0.2713 (0.2655) data time 0.0009 (0.0028) model time 0.2704 (0.2627) loss 4.8980 (5.6546) grad_norm 1.7125 (2.4361) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][450/625] eta 0:00:46 lr 0.000238 wd 0.0500 time 0.2548 (0.2655) data time 0.0007 (0.0027) model time 0.2541 (0.2628) loss 6.2384 (5.6571) grad_norm 3.0462 (2.4338) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][460/625] eta 0:00:43 lr 0.000238 wd 0.0500 time 0.2565 (0.2653) data time 0.0007 (0.0027) model time 0.2558 (0.2626) loss 6.5782 (5.6595) grad_norm 3.4987 (2.4369) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][470/625] eta 0:00:41 lr 0.000238 wd 0.0500 time 0.2581 (0.2651) data time 0.0006 (0.0027) model time 0.2575 (0.2625) loss 5.8990 (5.6593) grad_norm 3.5852 (2.4464) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][480/625] eta 0:00:38 lr 0.000238 wd 0.0500 time 0.2565 (0.2650) data time 0.0011 (0.0026) model time 0.2554 (0.2623) loss 6.1815 (5.6702) grad_norm 2.4685 (2.4662) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][490/625] eta 0:00:35 lr 0.000237 wd 0.0500 time 0.2575 (0.2648) data time 0.0007 (0.0026) model time 0.2568 (0.2622) loss 4.7284 (5.6656) grad_norm 4.2312 (2.4751) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][500/625] eta 0:00:33 lr 0.000237 wd 0.0500 time 0.2550 (0.2650) data time 0.0007 (0.0026) model time 0.2543 (0.2625) loss 4.8902 (5.6616) grad_norm 1.8350 (2.4777) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][510/625] eta 0:00:30 lr 0.000237 wd 0.0500 time 0.2572 (0.2653) data time 0.0007 (0.0025) model time 0.2565 (0.2627) loss 5.7114 (5.6636) grad_norm 2.2694 (2.4770) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][520/625] eta 0:00:27 lr 0.000237 wd 0.0500 time 0.2520 (0.2661) data time 0.0010 (0.0025) model time 0.2510 (0.2637) loss 5.1552 (5.6560) grad_norm 2.9754 (2.4729) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][530/625] eta 0:00:25 lr 0.000237 wd 0.0500 time 0.2548 (0.2659) data time 0.0010 (0.0025) model time 0.2538 (0.2635) loss 4.2663 (5.6541) grad_norm 2.5869 (2.4686) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][540/625] eta 0:00:22 lr 0.000237 wd 0.0500 time 0.2557 (0.2657) data time 0.0006 (0.0024) model time 0.2550 (0.2633) loss 4.4513 (5.6516) grad_norm 2.0668 (2.4579) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][550/625] eta 0:00:19 lr 0.000237 wd 0.0500 time 0.2539 (0.2657) data time 0.0007 (0.0024) model time 0.2532 (0.2634) loss 4.5266 (5.6485) grad_norm 2.4687 (2.4503) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][560/625] eta 0:00:17 lr 0.000237 wd 0.0500 time 0.2553 (0.2656) data time 0.0008 (0.0024) model time 0.2545 (0.2632) loss 5.2533 (5.6500) grad_norm 2.1525 (2.4665) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][570/625] eta 0:00:14 lr 0.000237 wd 0.0500 time 0.2553 (0.2657) data time 0.0009 (0.0024) model time 0.2544 (0.2634) loss 6.2518 (5.6498) grad_norm 2.6891 (2.4752) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][580/625] eta 0:00:11 lr 0.000236 wd 0.0500 time 0.2579 (0.2655) data time 0.0008 (0.0023) model time 0.2570 (0.2632) loss 5.5595 (5.6550) grad_norm 1.5638 (2.4697) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][590/625] eta 0:00:09 lr 0.000236 wd 0.0500 time 0.2527 (0.2653) data time 0.0009 (0.0023) model time 0.2518 (0.2630) loss 6.9013 (5.6588) grad_norm 1.6676 (2.4660) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][600/625] eta 0:00:06 lr 0.000236 wd 0.0500 time 0.2546 (0.2652) data time 0.0007 (0.0023) model time 0.2539 (0.2629) loss 5.9566 (5.6646) grad_norm 3.5259 (2.4566) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][610/625] eta 0:00:03 lr 0.000236 wd 0.0500 time 0.2525 (0.2650) data time 0.0004 (0.0023) model time 0.2521 (0.2628) loss 4.6419 (5.6579) grad_norm 1.7591 (2.4502) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][620/625] eta 0:00:01 lr 0.000236 wd 0.0500 time 0.2532 (0.2648) data time 0.0006 (0.0022) model time 0.2526 (0.2626) loss 5.5621 (5.6557) grad_norm 2.1476 (2.4461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:49:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 239 training takes 0:02:45 [2024-08-04 08:49:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:49:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.567 (0.567) Loss 0.5850 (0.5850) Acc@1 90.234 (90.234) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:49:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 0.9390 (0.7212) Acc@1 80.811 (86.572) Acc@5 96.045 (97.665) Mem 9655MB [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0205 (0.8448) Acc@1 77.734 (83.443) Acc@5 95.117 (96.468) Mem 9655MB [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.159 Acc@5 96.465 [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.16% [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:49:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5825 (0.5825) Acc@1 89.795 (89.795) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 08:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.9106 (0.7101) Acc@1 80.420 (86.506) Acc@5 96.094 (97.732) Mem 9655MB [2024-08-04 08:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0215 (0.8324) Acc@1 77.100 (83.338) Acc@5 95.312 (96.501) Mem 9655MB [2024-08-04 08:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.045 Acc@5 96.501 [2024-08-04 08:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-04 08:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.05% [2024-08-04 08:49:57 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:49:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][0/625] eta 0:08:12 lr 0.000236 wd 0.0500 time 0.7881 (0.7881) data time 0.5480 (0.5480) model time 0.0000 (0.0000) loss 5.5008 (5.5008) grad_norm 2.2172 (2.2172) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][10/625] eta 0:03:28 lr 0.000236 wd 0.0500 time 0.2533 (0.3392) data time 0.0011 (0.0506) model time 0.0000 (0.0000) loss 5.1604 (5.5043) grad_norm 1.7308 (2.5458) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][20/625] eta 0:03:02 lr 0.000236 wd 0.0500 time 0.2551 (0.3010) data time 0.0006 (0.0270) model time 0.0000 (0.0000) loss 5.3718 (5.5747) grad_norm 3.7638 (2.4043) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][30/625] eta 0:02:50 lr 0.000236 wd 0.0500 time 0.2560 (0.2864) data time 0.0018 (0.0186) model time 0.0000 (0.0000) loss 4.9924 (5.4859) grad_norm 1.4259 (2.6508) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][40/625] eta 0:02:43 lr 0.000236 wd 0.0500 time 0.2532 (0.2791) data time 0.0009 (0.0143) model time 0.0000 (0.0000) loss 5.7677 (5.4676) grad_norm 2.8740 (2.5681) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][50/625] eta 0:02:37 lr 0.000235 wd 0.0500 time 0.2506 (0.2747) data time 0.0010 (0.0118) model time 0.0000 (0.0000) loss 5.2749 (5.4808) grad_norm 3.0181 (2.5683) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][60/625] eta 0:02:34 lr 0.000235 wd 0.0500 time 0.2566 (0.2734) data time 0.0008 (0.0100) model time 0.2558 (0.2658) loss 4.4861 (5.5233) grad_norm 1.7868 (2.4831) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][70/625] eta 0:02:30 lr 0.000235 wd 0.0500 time 0.2569 (0.2709) data time 0.0008 (0.0087) model time 0.2562 (0.2604) loss 6.2970 (5.5508) grad_norm 1.7001 (2.4575) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][80/625] eta 0:02:26 lr 0.000235 wd 0.0500 time 0.2601 (0.2692) data time 0.0008 (0.0077) model time 0.2593 (0.2590) loss 4.7431 (5.5499) grad_norm 1.9429 (2.4819) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][90/625] eta 0:02:23 lr 0.000235 wd 0.0500 time 0.2533 (0.2677) data time 0.0010 (0.0070) model time 0.2523 (0.2578) loss 5.8440 (5.5532) grad_norm 2.0361 (2.4772) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][100/625] eta 0:02:19 lr 0.000235 wd 0.0500 time 0.2564 (0.2666) data time 0.0009 (0.0064) model time 0.2556 (0.2575) loss 5.9029 (5.5382) grad_norm 2.8219 (2.4631) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][110/625] eta 0:02:16 lr 0.000235 wd 0.0500 time 0.2549 (0.2657) data time 0.0010 (0.0059) model time 0.2539 (0.2572) loss 6.1143 (5.5187) grad_norm 3.2584 (2.5196) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][120/625] eta 0:02:14 lr 0.000235 wd 0.0500 time 0.2550 (0.2663) data time 0.0009 (0.0055) model time 0.2542 (0.2593) loss 5.7247 (5.5229) grad_norm 3.4971 (2.6155) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][130/625] eta 0:02:11 lr 0.000235 wd 0.0500 time 0.2513 (0.2655) data time 0.0020 (0.0051) model time 0.2493 (0.2587) loss 6.1395 (5.5408) grad_norm 2.2401 (2.6134) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][140/625] eta 0:02:08 lr 0.000234 wd 0.0500 time 0.2574 (0.2648) data time 0.0008 (0.0048) model time 0.2566 (0.2583) loss 6.2556 (5.5544) grad_norm 2.2123 (2.6429) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][150/625] eta 0:02:06 lr 0.000234 wd 0.0500 time 0.2533 (0.2657) data time 0.0009 (0.0046) model time 0.2524 (0.2602) loss 5.5703 (5.5714) grad_norm 2.3546 (2.6123) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][160/625] eta 0:02:03 lr 0.000234 wd 0.0500 time 0.2581 (0.2652) data time 0.0006 (0.0044) model time 0.2575 (0.2598) loss 5.2548 (5.5538) grad_norm 1.6945 (2.5735) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][170/625] eta 0:02:00 lr 0.000234 wd 0.0500 time 0.2602 (0.2646) data time 0.0011 (0.0042) model time 0.2591 (0.2594) loss 5.0799 (5.5685) grad_norm 2.8717 (2.5803) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][180/625] eta 0:01:57 lr 0.000234 wd 0.0500 time 0.2604 (0.2642) data time 0.0008 (0.0040) model time 0.2596 (0.2591) loss 5.4837 (5.5770) grad_norm 2.7075 (2.5825) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][190/625] eta 0:01:54 lr 0.000234 wd 0.0500 time 0.2541 (0.2637) data time 0.0009 (0.0038) model time 0.2533 (0.2588) loss 5.5673 (5.5709) grad_norm 2.0546 (2.5938) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][200/625] eta 0:01:52 lr 0.000234 wd 0.0500 time 0.2596 (0.2643) data time 0.0009 (0.0037) model time 0.2587 (0.2599) loss 5.7583 (5.5813) grad_norm 2.6193 (2.5707) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][210/625] eta 0:01:49 lr 0.000234 wd 0.0500 time 0.2560 (0.2639) data time 0.0010 (0.0036) model time 0.2550 (0.2595) loss 5.4681 (5.5927) grad_norm 1.5284 (2.5798) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][220/625] eta 0:01:46 lr 0.000234 wd 0.0500 time 0.2582 (0.2635) data time 0.0006 (0.0034) model time 0.2576 (0.2593) loss 5.2089 (5.6018) grad_norm 2.1782 (2.5925) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:50:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][230/625] eta 0:01:43 lr 0.000233 wd 0.0500 time 0.2536 (0.2632) data time 0.0011 (0.0033) model time 0.2525 (0.2590) loss 5.3886 (5.6014) grad_norm 4.8297 (2.5942) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][240/625] eta 0:01:41 lr 0.000233 wd 0.0500 time 0.2581 (0.2637) data time 0.0007 (0.0032) model time 0.2575 (0.2598) loss 5.3250 (5.5942) grad_norm 3.2844 (2.6043) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][250/625] eta 0:01:38 lr 0.000233 wd 0.0500 time 0.2563 (0.2634) data time 0.0007 (0.0031) model time 0.2556 (0.2596) loss 6.3685 (5.5952) grad_norm 1.8166 (2.5883) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][260/625] eta 0:01:36 lr 0.000233 wd 0.0500 time 0.2601 (0.2632) data time 0.0010 (0.0031) model time 0.2590 (0.2594) loss 4.7038 (5.6000) grad_norm 2.7754 (2.5752) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][270/625] eta 0:01:33 lr 0.000233 wd 0.0500 time 0.2554 (0.2629) data time 0.0010 (0.0030) model time 0.2544 (0.2592) loss 6.3860 (5.6001) grad_norm 3.7252 (2.5868) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][280/625] eta 0:01:30 lr 0.000233 wd 0.0500 time 0.2523 (0.2626) data time 0.0008 (0.0029) model time 0.2515 (0.2590) loss 5.3653 (5.5942) grad_norm 2.8216 (2.5901) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][290/625] eta 0:01:27 lr 0.000233 wd 0.0500 time 0.2574 (0.2624) data time 0.0009 (0.0028) model time 0.2565 (0.2588) loss 5.7396 (5.6043) grad_norm 2.5790 (2.6253) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][300/625] eta 0:01:25 lr 0.000233 wd 0.0500 time 0.2575 (0.2622) data time 0.0007 (0.0028) model time 0.2569 (0.2587) loss 5.2182 (5.6028) grad_norm 2.1200 (2.6356) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][310/625] eta 0:01:22 lr 0.000233 wd 0.0500 time 0.2541 (0.2626) data time 0.0008 (0.0027) model time 0.2533 (0.2593) loss 5.1693 (5.5988) grad_norm 1.9190 (2.6280) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][320/625] eta 0:01:20 lr 0.000232 wd 0.0500 time 0.2615 (0.2624) data time 0.0008 (0.0027) model time 0.2607 (0.2592) loss 6.0519 (5.5918) grad_norm 2.6230 (2.6174) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][330/625] eta 0:01:17 lr 0.000232 wd 0.0500 time 0.4674 (0.2636) data time 0.0008 (0.0026) model time 0.4666 (0.2606) loss 5.1403 (5.5963) grad_norm 1.4595 (2.6023) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][340/625] eta 0:01:15 lr 0.000232 wd 0.0500 time 0.2616 (0.2634) data time 0.0007 (0.0026) model time 0.2609 (0.2604) loss 5.9637 (5.5881) grad_norm 3.9851 (2.6385) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][350/625] eta 0:01:12 lr 0.000232 wd 0.0500 time 0.2525 (0.2637) data time 0.0007 (0.0025) model time 0.2519 (0.2609) loss 6.3346 (5.6037) grad_norm 2.0333 (2.6387) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][360/625] eta 0:01:09 lr 0.000232 wd 0.0500 time 0.2598 (0.2635) data time 0.0010 (0.0025) model time 0.2589 (0.2607) loss 6.0180 (5.6128) grad_norm 2.2694 (2.6521) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][370/625] eta 0:01:07 lr 0.000232 wd 0.0500 time 0.2547 (0.2633) data time 0.0007 (0.0024) model time 0.2540 (0.2605) loss 5.8345 (5.6122) grad_norm 2.6656 (2.6594) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][380/625] eta 0:01:04 lr 0.000232 wd 0.0500 time 0.3868 (0.2634) data time 0.0007 (0.0024) model time 0.3861 (0.2607) loss 6.5016 (5.6131) grad_norm 3.9252 (2.6565) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][390/625] eta 0:01:01 lr 0.000232 wd 0.0500 time 0.2571 (0.2632) data time 0.0006 (0.0024) model time 0.2565 (0.2605) loss 5.7296 (5.6140) grad_norm 1.8658 (2.6415) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][400/625] eta 0:00:59 lr 0.000232 wd 0.0500 time 0.2528 (0.2630) data time 0.0010 (0.0023) model time 0.2518 (0.2604) loss 5.9049 (5.6197) grad_norm 3.1178 (2.6333) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][410/625] eta 0:00:56 lr 0.000231 wd 0.0500 time 0.2522 (0.2633) data time 0.0009 (0.0023) model time 0.2513 (0.2608) loss 5.3611 (5.6174) grad_norm 2.2147 (2.6325) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][420/625] eta 0:00:53 lr 0.000231 wd 0.0500 time 0.2558 (0.2632) data time 0.0011 (0.0023) model time 0.2547 (0.2606) loss 6.4363 (5.6196) grad_norm 1.8049 (2.6224) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][430/625] eta 0:00:51 lr 0.000231 wd 0.0500 time 0.2612 (0.2630) data time 0.0008 (0.0022) model time 0.2604 (0.2605) loss 5.4964 (5.6251) grad_norm 2.4241 (2.6223) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][440/625] eta 0:00:48 lr 0.000231 wd 0.0500 time 0.2564 (0.2633) data time 0.0011 (0.0022) model time 0.2553 (0.2609) loss 6.5308 (5.6266) grad_norm 2.4598 (2.6195) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][450/625] eta 0:00:46 lr 0.000231 wd 0.0500 time 0.2533 (0.2632) data time 0.0009 (0.0022) model time 0.2524 (0.2608) loss 6.2046 (5.6312) grad_norm 2.7294 (2.6077) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:51:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][460/625] eta 0:00:43 lr 0.000231 wd 0.0500 time 0.2591 (0.2634) data time 0.0007 (0.0022) model time 0.2584 (0.2611) loss 4.9582 (5.6297) grad_norm 2.7373 (2.5959) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][470/625] eta 0:00:40 lr 0.000231 wd 0.0500 time 0.2570 (0.2633) data time 0.0007 (0.0021) model time 0.2563 (0.2609) loss 5.4818 (5.6347) grad_norm 1.4945 (2.5872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][480/625] eta 0:00:38 lr 0.000231 wd 0.0500 time 0.2557 (0.2635) data time 0.0009 (0.0021) model time 0.2548 (0.2612) loss 5.0936 (5.6357) grad_norm 1.5785 (2.5896) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][490/625] eta 0:00:35 lr 0.000231 wd 0.0500 time 0.2584 (0.2634) data time 0.0009 (0.0021) model time 0.2575 (0.2611) loss 3.9066 (5.6354) grad_norm 2.8225 (2.5872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][500/625] eta 0:00:32 lr 0.000230 wd 0.0500 time 0.2549 (0.2632) data time 0.0006 (0.0021) model time 0.2543 (0.2610) loss 4.8651 (5.6391) grad_norm 3.5605 (2.5912) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][510/625] eta 0:00:30 lr 0.000230 wd 0.0500 time 0.2561 (0.2634) data time 0.0009 (0.0020) model time 0.2552 (0.2612) loss 6.7462 (5.6357) grad_norm 3.8989 (2.5953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][520/625] eta 0:00:27 lr 0.000230 wd 0.0500 time 0.2542 (0.2633) data time 0.0012 (0.0020) model time 0.2530 (0.2611) loss 5.8378 (5.6383) grad_norm 8.1168 (2.6077) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][530/625] eta 0:00:24 lr 0.000230 wd 0.0500 time 0.2550 (0.2631) data time 0.0007 (0.0020) model time 0.2544 (0.2610) loss 6.0971 (5.6328) grad_norm 5.9763 (2.6307) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][540/625] eta 0:00:22 lr 0.000230 wd 0.0500 time 0.2528 (0.2630) data time 0.0009 (0.0020) model time 0.2519 (0.2609) loss 6.1311 (5.6340) grad_norm 4.2094 (2.6298) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][550/625] eta 0:00:19 lr 0.000230 wd 0.0500 time 0.2556 (0.2631) data time 0.0007 (0.0020) model time 0.2549 (0.2609) loss 4.9314 (5.6355) grad_norm 1.8901 (2.6300) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][560/625] eta 0:00:17 lr 0.000230 wd 0.0500 time 0.2677 (0.2630) data time 0.0010 (0.0019) model time 0.2667 (0.2609) loss 4.9097 (5.6313) grad_norm 2.0065 (2.6220) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][570/625] eta 0:00:14 lr 0.000230 wd 0.0500 time 0.2576 (0.2629) data time 0.0008 (0.0019) model time 0.2568 (0.2608) loss 4.7926 (5.6285) grad_norm 2.1660 (2.6250) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][580/625] eta 0:00:11 lr 0.000230 wd 0.0500 time 0.2537 (0.2628) data time 0.0010 (0.0019) model time 0.2527 (0.2606) loss 5.7703 (5.6315) grad_norm 3.0455 (2.6226) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][590/625] eta 0:00:09 lr 0.000229 wd 0.0500 time 0.2550 (0.2626) data time 0.0007 (0.0019) model time 0.2544 (0.2605) loss 4.9821 (5.6359) grad_norm 2.7739 (2.6237) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][600/625] eta 0:00:06 lr 0.000229 wd 0.0500 time 0.2567 (0.2631) data time 0.0009 (0.0019) model time 0.2558 (0.2611) loss 5.4182 (5.6391) grad_norm 1.4615 (2.6138) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][610/625] eta 0:00:03 lr 0.000229 wd 0.0500 time 0.2534 (0.2630) data time 0.0004 (0.0019) model time 0.2530 (0.2610) loss 6.3080 (5.6403) grad_norm 1.6189 (2.6064) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][620/625] eta 0:00:01 lr 0.000229 wd 0.0500 time 0.2542 (0.2628) data time 0.0006 (0.0018) model time 0.2536 (0.2608) loss 5.4454 (5.6365) grad_norm 3.1513 (2.5990) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 240 training takes 0:02:44 [2024-08-04 08:52:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:52:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6050 (0.6050) Acc@1 89.893 (89.893) Acc@5 98.584 (98.584) Mem 9655MB [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9053 (0.7199) Acc@1 81.592 (86.674) Acc@5 96.631 (97.803) Mem 9655MB [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0098 (0.8377) Acc@1 77.930 (83.522) Acc@5 95.410 (96.605) Mem 9655MB [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.205 Acc@5 96.577 [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.21% [2024-08-04 08:52:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:52:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5825 (0.5825) Acc@1 89.795 (89.795) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9116 (0.7104) Acc@1 80.469 (86.523) Acc@5 96.045 (97.732) Mem 9655MB [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0195 (0.8323) Acc@1 77.246 (83.373) Acc@5 95.410 (96.515) Mem 9655MB [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.079 Acc@5 96.509 [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.08% [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:52:47 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][0/625] eta 0:07:27 lr 0.000229 wd 0.0500 time 0.7159 (0.7159) data time 0.4768 (0.4768) model time 0.0000 (0.0000) loss 6.0090 (6.0090) grad_norm 2.7377 (2.7377) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][10/625] eta 0:03:14 lr 0.000229 wd 0.0500 time 0.2535 (0.3160) data time 0.0007 (0.0442) model time 0.0000 (0.0000) loss 6.1980 (5.5078) grad_norm 1.9796 (2.3699) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][20/625] eta 0:02:57 lr 0.000229 wd 0.0500 time 0.2546 (0.2933) data time 0.0009 (0.0235) model time 0.0000 (0.0000) loss 4.9474 (5.5024) grad_norm 2.1175 (2.5281) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][30/625] eta 0:02:50 lr 0.000229 wd 0.0500 time 0.2506 (0.2873) data time 0.0009 (0.0163) model time 0.0000 (0.0000) loss 6.1306 (5.4949) grad_norm 1.3610 (2.4492) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][40/625] eta 0:02:45 lr 0.000229 wd 0.0500 time 0.2556 (0.2829) data time 0.0009 (0.0125) model time 0.0000 (0.0000) loss 6.3234 (5.6484) grad_norm 9.9180 (2.5649) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][50/625] eta 0:02:41 lr 0.000229 wd 0.0500 time 0.2597 (0.2813) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 5.2435 (5.6689) grad_norm 4.3128 (2.5071) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][60/625] eta 0:02:36 lr 0.000228 wd 0.0500 time 0.2547 (0.2771) data time 0.0007 (0.0087) model time 0.2541 (0.2548) loss 5.7808 (5.6574) grad_norm 2.0247 (2.4169) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][70/625] eta 0:02:33 lr 0.000228 wd 0.0500 time 0.2546 (0.2767) data time 0.0009 (0.0076) model time 0.2537 (0.2641) loss 6.3216 (5.6655) grad_norm 3.1320 (2.4044) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][80/625] eta 0:02:29 lr 0.000228 wd 0.0500 time 0.2559 (0.2742) data time 0.0008 (0.0068) model time 0.2551 (0.2611) loss 5.1232 (5.6839) grad_norm 4.0662 (2.5037) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][90/625] eta 0:02:25 lr 0.000228 wd 0.0500 time 0.2549 (0.2721) data time 0.0009 (0.0061) model time 0.2540 (0.2595) loss 6.2817 (5.6778) grad_norm 2.2148 (2.5300) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][100/625] eta 0:02:22 lr 0.000228 wd 0.0500 time 0.2545 (0.2706) data time 0.0009 (0.0056) model time 0.2536 (0.2588) loss 6.4421 (5.6638) grad_norm 7.7760 (2.6034) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][110/625] eta 0:02:19 lr 0.000228 wd 0.0500 time 0.2537 (0.2710) data time 0.0007 (0.0052) model time 0.2531 (0.2613) loss 4.6489 (5.6470) grad_norm 3.5808 (2.6183) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][120/625] eta 0:02:16 lr 0.000228 wd 0.0500 time 0.2571 (0.2698) data time 0.0009 (0.0049) model time 0.2562 (0.2605) loss 5.7989 (5.6675) grad_norm 1.2871 (2.5941) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][130/625] eta 0:02:13 lr 0.000228 wd 0.0500 time 0.2567 (0.2687) data time 0.0007 (0.0046) model time 0.2560 (0.2598) loss 6.0219 (5.6588) grad_norm 1.7708 (2.5876) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][140/625] eta 0:02:09 lr 0.000228 wd 0.0500 time 0.2525 (0.2678) data time 0.0011 (0.0043) model time 0.2514 (0.2592) loss 5.3908 (5.6606) grad_norm 1.5680 (2.5697) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][150/625] eta 0:02:06 lr 0.000227 wd 0.0500 time 0.2647 (0.2671) data time 0.0008 (0.0041) model time 0.2638 (0.2589) loss 5.0848 (5.6767) grad_norm 2.8344 (2.5784) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][160/625] eta 0:02:03 lr 0.000227 wd 0.0500 time 0.2605 (0.2665) data time 0.0006 (0.0039) model time 0.2599 (0.2587) loss 6.3516 (5.6677) grad_norm 2.3444 (2.5602) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][170/625] eta 0:02:00 lr 0.000227 wd 0.0500 time 0.2603 (0.2659) data time 0.0009 (0.0037) model time 0.2594 (0.2584) loss 6.5954 (5.6614) grad_norm 2.1176 (2.5282) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][180/625] eta 0:01:58 lr 0.000227 wd 0.0500 time 0.2565 (0.2653) data time 0.0009 (0.0036) model time 0.2556 (0.2582) loss 6.7622 (5.6768) grad_norm 1.7538 (2.4983) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][190/625] eta 0:01:55 lr 0.000227 wd 0.0500 time 0.2537 (0.2648) data time 0.0011 (0.0034) model time 0.2526 (0.2579) loss 5.4319 (5.6784) grad_norm 3.8582 (2.4713) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][200/625] eta 0:01:52 lr 0.000227 wd 0.0500 time 0.2543 (0.2653) data time 0.0009 (0.0033) model time 0.2534 (0.2589) loss 5.4750 (5.6780) grad_norm 1.6214 (2.4529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][210/625] eta 0:01:50 lr 0.000227 wd 0.0500 time 0.2687 (0.2659) data time 0.0010 (0.0032) model time 0.2677 (0.2600) loss 6.0161 (5.6730) grad_norm 1.4227 (2.5659) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][220/625] eta 0:01:47 lr 0.000227 wd 0.0500 time 0.2564 (0.2654) data time 0.0007 (0.0031) model time 0.2557 (0.2597) loss 6.7929 (5.6597) grad_norm 2.4322 (2.5557) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][230/625] eta 0:01:44 lr 0.000227 wd 0.0500 time 0.2568 (0.2650) data time 0.0006 (0.0030) model time 0.2562 (0.2595) loss 5.9913 (5.6660) grad_norm 1.5775 (2.5536) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][240/625] eta 0:01:41 lr 0.000226 wd 0.0500 time 0.2596 (0.2647) data time 0.0009 (0.0029) model time 0.2587 (0.2593) loss 6.4101 (5.6536) grad_norm 1.4763 (2.5472) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][250/625] eta 0:01:39 lr 0.000226 wd 0.0500 time 0.2605 (0.2644) data time 0.0007 (0.0028) model time 0.2598 (0.2591) loss 5.7071 (5.6511) grad_norm 2.9420 (2.5381) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][260/625] eta 0:01:36 lr 0.000226 wd 0.0500 time 0.2565 (0.2640) data time 0.0006 (0.0028) model time 0.2560 (0.2589) loss 5.9695 (5.6586) grad_norm 1.9486 (2.5938) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][270/625] eta 0:01:33 lr 0.000226 wd 0.0500 time 0.2594 (0.2645) data time 0.0007 (0.0027) model time 0.2587 (0.2596) loss 6.0954 (5.6592) grad_norm 4.6978 (2.6198) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][280/625] eta 0:01:31 lr 0.000226 wd 0.0500 time 0.2559 (0.2642) data time 0.0007 (0.0026) model time 0.2552 (0.2595) loss 6.1426 (5.6627) grad_norm 3.3017 (2.6051) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][290/625] eta 0:01:28 lr 0.000226 wd 0.0500 time 0.2558 (0.2639) data time 0.0007 (0.0026) model time 0.2551 (0.2592) loss 6.3107 (5.6719) grad_norm 3.0396 (2.5915) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][300/625] eta 0:01:25 lr 0.000226 wd 0.0500 time 0.2627 (0.2640) data time 0.0007 (0.0025) model time 0.2620 (0.2596) loss 4.4782 (5.6716) grad_norm 2.8866 (2.6075) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][310/625] eta 0:01:23 lr 0.000226 wd 0.0500 time 0.2562 (0.2638) data time 0.0009 (0.0025) model time 0.2553 (0.2594) loss 5.6977 (5.6688) grad_norm 1.6972 (2.6359) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][320/625] eta 0:01:20 lr 0.000226 wd 0.0500 time 0.2575 (0.2635) data time 0.0010 (0.0024) model time 0.2566 (0.2592) loss 6.1902 (5.6640) grad_norm 2.9874 (2.6425) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][330/625] eta 0:01:17 lr 0.000226 wd 0.0500 time 0.2572 (0.2633) data time 0.0006 (0.0024) model time 0.2565 (0.2591) loss 5.3508 (5.6629) grad_norm 2.1759 (2.6268) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][340/625] eta 0:01:14 lr 0.000225 wd 0.0500 time 0.2565 (0.2631) data time 0.0011 (0.0023) model time 0.2554 (0.2590) loss 6.1217 (5.6799) grad_norm 2.0102 (2.6423) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][350/625] eta 0:01:12 lr 0.000225 wd 0.0500 time 0.2545 (0.2629) data time 0.0009 (0.0023) model time 0.2537 (0.2588) loss 4.8984 (5.6862) grad_norm 2.1068 (2.6729) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][360/625] eta 0:01:09 lr 0.000225 wd 0.0500 time 0.2548 (0.2630) data time 0.0006 (0.0022) model time 0.2542 (0.2591) loss 5.3000 (5.6898) grad_norm 2.5004 (2.6675) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][370/625] eta 0:01:07 lr 0.000225 wd 0.0500 time 0.2508 (0.2633) data time 0.0008 (0.0022) model time 0.2500 (0.2595) loss 6.1254 (5.6926) grad_norm 2.2501 (2.6499) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][380/625] eta 0:01:04 lr 0.000225 wd 0.0500 time 0.4012 (0.2635) data time 0.0010 (0.0022) model time 0.4002 (0.2598) loss 6.1234 (5.6929) grad_norm 1.8951 (2.6437) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][390/625] eta 0:01:01 lr 0.000225 wd 0.0500 time 0.2563 (0.2633) data time 0.0009 (0.0021) model time 0.2554 (0.2597) loss 5.8813 (5.6956) grad_norm 2.3968 (2.6432) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][400/625] eta 0:00:59 lr 0.000225 wd 0.0500 time 0.2681 (0.2636) data time 0.0008 (0.0021) model time 0.2674 (0.2601) loss 5.7120 (5.6975) grad_norm 2.4434 (2.6352) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][410/625] eta 0:00:56 lr 0.000225 wd 0.0500 time 0.4505 (0.2639) data time 0.0007 (0.0021) model time 0.4498 (0.2605) loss 6.7500 (5.7006) grad_norm 3.7475 (2.6397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][420/625] eta 0:00:54 lr 0.000225 wd 0.0500 time 0.2555 (0.2642) data time 0.0007 (0.0021) model time 0.2548 (0.2609) loss 4.5415 (5.6986) grad_norm 3.7705 (2.6483) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][430/625] eta 0:00:51 lr 0.000224 wd 0.0500 time 0.2567 (0.2640) data time 0.0006 (0.0020) model time 0.2561 (0.2607) loss 6.4078 (5.7026) grad_norm 2.5232 (2.6429) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][440/625] eta 0:00:48 lr 0.000224 wd 0.0500 time 0.2556 (0.2638) data time 0.0009 (0.0020) model time 0.2547 (0.2606) loss 4.4992 (5.7000) grad_norm 1.9463 (2.6416) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][450/625] eta 0:00:46 lr 0.000224 wd 0.0500 time 0.2526 (0.2640) data time 0.0009 (0.0020) model time 0.2517 (0.2608) loss 6.5556 (5.6959) grad_norm 2.0980 (2.6427) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][460/625] eta 0:00:43 lr 0.000224 wd 0.0500 time 0.2591 (0.2638) data time 0.0006 (0.0020) model time 0.2585 (0.2607) loss 6.5659 (5.6934) grad_norm 2.2122 (2.6338) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][470/625] eta 0:00:40 lr 0.000224 wd 0.0500 time 0.2539 (0.2636) data time 0.0006 (0.0019) model time 0.2533 (0.2605) loss 6.5535 (5.6951) grad_norm 3.0642 (2.6268) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][480/625] eta 0:00:38 lr 0.000224 wd 0.0500 time 0.2546 (0.2634) data time 0.0008 (0.0019) model time 0.2538 (0.2604) loss 6.4501 (5.7028) grad_norm 3.3270 (2.6339) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][490/625] eta 0:00:35 lr 0.000224 wd 0.0500 time 0.2551 (0.2633) data time 0.0008 (0.0019) model time 0.2544 (0.2602) loss 5.9839 (5.7084) grad_norm 1.7634 (2.6346) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][500/625] eta 0:00:32 lr 0.000224 wd 0.0500 time 0.2622 (0.2631) data time 0.0006 (0.0019) model time 0.2616 (0.2602) loss 6.1507 (5.7136) grad_norm 3.4516 (2.6497) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][510/625] eta 0:00:30 lr 0.000224 wd 0.0500 time 0.2597 (0.2630) data time 0.0007 (0.0019) model time 0.2590 (0.2600) loss 4.9807 (5.7113) grad_norm 3.9370 (2.6538) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][520/625] eta 0:00:27 lr 0.000223 wd 0.0500 time 0.2555 (0.2629) data time 0.0008 (0.0018) model time 0.2546 (0.2600) loss 6.4976 (5.7156) grad_norm 1.9225 (2.6506) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][530/625] eta 0:00:24 lr 0.000223 wd 0.0500 time 0.2578 (0.2628) data time 0.0008 (0.0018) model time 0.2570 (0.2599) loss 6.5078 (5.7213) grad_norm 2.6377 (2.6709) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][540/625] eta 0:00:22 lr 0.000223 wd 0.0500 time 0.2602 (0.2626) data time 0.0007 (0.0018) model time 0.2595 (0.2598) loss 6.6892 (5.7234) grad_norm 2.1546 (2.6766) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][550/625] eta 0:00:19 lr 0.000223 wd 0.0500 time 0.2586 (0.2626) data time 0.0007 (0.0018) model time 0.2580 (0.2597) loss 6.0247 (5.7206) grad_norm 2.7803 (2.6936) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][560/625] eta 0:00:17 lr 0.000223 wd 0.0500 time 0.4356 (0.2628) data time 0.0009 (0.0018) model time 0.4347 (0.2600) loss 6.4624 (5.7226) grad_norm 3.2660 (2.6972) loss_scale 1024.0000 (512.9127) mem 9655MB [2024-08-04 08:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][570/625] eta 0:00:14 lr 0.000223 wd 0.0500 time 0.2580 (0.2626) data time 0.0009 (0.0018) model time 0.2571 (0.2599) loss 5.7495 (5.7212) grad_norm 2.4714 (2.6931) loss_scale 1024.0000 (521.8634) mem 9655MB [2024-08-04 08:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][580/625] eta 0:00:11 lr 0.000223 wd 0.0500 time 0.2560 (0.2625) data time 0.0008 (0.0017) model time 0.2552 (0.2598) loss 5.5269 (5.7191) grad_norm 2.4196 (2.6923) loss_scale 1024.0000 (530.5060) mem 9655MB [2024-08-04 08:55:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][590/625] eta 0:00:09 lr 0.000223 wd 0.0500 time 0.2514 (0.2624) data time 0.0011 (0.0017) model time 0.2503 (0.2597) loss 5.6893 (5.7209) grad_norm 2.0241 (2.6984) loss_scale 1024.0000 (538.8562) mem 9655MB [2024-08-04 08:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][600/625] eta 0:00:06 lr 0.000223 wd 0.0500 time 0.2561 (0.2626) data time 0.0009 (0.0017) model time 0.2552 (0.2600) loss 5.6724 (5.7272) grad_norm 1.8935 (2.6949) loss_scale 1024.0000 (546.9285) mem 9655MB [2024-08-04 08:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][610/625] eta 0:00:03 lr 0.000222 wd 0.0500 time 0.2536 (0.2625) data time 0.0006 (0.0017) model time 0.2530 (0.2599) loss 5.7522 (5.7259) grad_norm 2.0468 (2.6924) loss_scale 1024.0000 (554.7365) mem 9655MB [2024-08-04 08:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][620/625] eta 0:00:01 lr 0.000222 wd 0.0500 time 0.2540 (0.2624) data time 0.0006 (0.0017) model time 0.2535 (0.2597) loss 6.7641 (5.7288) grad_norm 2.3989 (2.6879) loss_scale 1024.0000 (562.2931) mem 9655MB [2024-08-04 08:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 241 training takes 0:02:44 [2024-08-04 08:55:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:55:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:55:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.5820 (0.5820) Acc@1 90.332 (90.332) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9131 (0.7122) Acc@1 81.006 (86.737) Acc@5 96.191 (97.781) Mem 9655MB [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0400 (0.8344) Acc@1 77.344 (83.575) Acc@5 95.068 (96.580) Mem 9655MB [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.283 Acc@5 96.577 [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.28% [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:55:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.501 (0.501) Loss 0.5830 (0.5830) Acc@1 89.844 (89.844) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9097 (0.7101) Acc@1 80.566 (86.572) Acc@5 96.143 (97.736) Mem 9655MB [2024-08-04 08:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0215 (0.8321) Acc@1 77.393 (83.424) Acc@5 95.410 (96.508) Mem 9655MB [2024-08-04 08:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.127 Acc@5 96.497 [2024-08-04 08:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.13% [2024-08-04 08:55:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:55:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:55:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][0/625] eta 0:07:09 lr 0.000222 wd 0.0500 time 0.6870 (0.6870) data time 0.4470 (0.4470) model time 0.0000 (0.0000) loss 5.0589 (5.0589) grad_norm 4.1975 (4.1975) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][10/625] eta 0:03:13 lr 0.000222 wd 0.0500 time 0.2556 (0.3144) data time 0.0006 (0.0414) model time 0.0000 (0.0000) loss 6.6052 (5.7350) grad_norm 1.7613 (2.5159) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][20/625] eta 0:02:53 lr 0.000222 wd 0.0500 time 0.2509 (0.2870) data time 0.0010 (0.0222) model time 0.0000 (0.0000) loss 6.1041 (5.6314) grad_norm 2.1000 (2.3973) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][30/625] eta 0:02:48 lr 0.000222 wd 0.0500 time 0.2525 (0.2837) data time 0.0008 (0.0153) model time 0.0000 (0.0000) loss 5.2463 (5.6442) grad_norm 1.8980 (2.5133) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][40/625] eta 0:02:43 lr 0.000222 wd 0.0500 time 0.2536 (0.2799) data time 0.0016 (0.0118) model time 0.0000 (0.0000) loss 5.7785 (5.6343) grad_norm 2.1835 (2.4607) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][50/625] eta 0:02:38 lr 0.000222 wd 0.0500 time 0.2546 (0.2752) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 6.7278 (5.6543) grad_norm 2.2411 (2.5636) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][60/625] eta 0:02:36 lr 0.000222 wd 0.0500 time 0.3878 (0.2766) data time 0.0016 (0.0082) model time 0.3862 (0.2830) loss 6.2284 (5.6795) grad_norm 4.0125 (2.6415) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][70/625] eta 0:02:33 lr 0.000222 wd 0.0500 time 0.2540 (0.2768) data time 0.0011 (0.0072) model time 0.2529 (0.2803) loss 5.4928 (5.6743) grad_norm 1.8252 (2.5847) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][80/625] eta 0:02:29 lr 0.000221 wd 0.0500 time 0.2522 (0.2742) data time 0.0008 (0.0064) model time 0.2514 (0.2716) loss 5.2110 (5.6910) grad_norm 3.6245 (2.6256) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][90/625] eta 0:02:26 lr 0.000221 wd 0.0500 time 0.2536 (0.2744) data time 0.0008 (0.0058) model time 0.2527 (0.2724) loss 6.5930 (5.7007) grad_norm 3.5491 (2.7276) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][100/625] eta 0:02:23 lr 0.000221 wd 0.0500 time 0.2606 (0.2725) data time 0.0007 (0.0053) model time 0.2599 (0.2690) loss 6.0467 (5.7318) grad_norm 3.5287 (2.7481) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][110/625] eta 0:02:19 lr 0.000221 wd 0.0500 time 0.2533 (0.2710) data time 0.0009 (0.0049) model time 0.2524 (0.2665) loss 4.7582 (5.7304) grad_norm 2.1395 (2.7222) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][120/625] eta 0:02:16 lr 0.000221 wd 0.0500 time 0.2531 (0.2697) data time 0.0008 (0.0046) model time 0.2522 (0.2649) loss 4.7914 (5.7208) grad_norm 3.7118 (2.7531) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][130/625] eta 0:02:13 lr 0.000221 wd 0.0500 time 0.2530 (0.2687) data time 0.0010 (0.0043) model time 0.2521 (0.2637) loss 5.9809 (5.7424) grad_norm 2.3284 (2.7368) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][140/625] eta 0:02:10 lr 0.000221 wd 0.0500 time 0.2555 (0.2693) data time 0.0007 (0.0041) model time 0.2548 (0.2651) loss 5.1415 (5.7263) grad_norm 3.4147 (2.7696) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][150/625] eta 0:02:08 lr 0.000221 wd 0.0500 time 0.2552 (0.2697) data time 0.0006 (0.0039) model time 0.2546 (0.2659) loss 6.2688 (5.7361) grad_norm 2.9645 (2.7953) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][160/625] eta 0:02:05 lr 0.000221 wd 0.0500 time 0.2545 (0.2702) data time 0.0010 (0.0037) model time 0.2535 (0.2669) loss 5.5007 (5.7350) grad_norm 1.6950 (2.7972) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][170/625] eta 0:02:02 lr 0.000221 wd 0.0500 time 0.2541 (0.2703) data time 0.0006 (0.0035) model time 0.2535 (0.2673) loss 4.4097 (5.7128) grad_norm 1.9334 (2.7704) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][180/625] eta 0:01:59 lr 0.000220 wd 0.0500 time 0.2540 (0.2696) data time 0.0008 (0.0034) model time 0.2532 (0.2664) loss 5.7300 (5.7197) grad_norm 1.7758 (2.8360) loss_scale 1024.0000 (1024.0000) mem 9655MB [2024-08-04 08:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][190/625] eta 0:01:56 lr 0.000220 wd 0.0500 time 0.2524 (0.2689) data time 0.0011 (0.0033) model time 0.2513 (0.2657) loss 6.0729 (5.7333) grad_norm 2.3044 (inf) loss_scale 512.0000 (1013.2775) mem 9655MB [2024-08-04 08:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][200/625] eta 0:01:54 lr 0.000220 wd 0.0500 time 0.2596 (0.2683) data time 0.0008 (0.0032) model time 0.2588 (0.2650) loss 6.0934 (5.7374) grad_norm 1.5846 (inf) loss_scale 512.0000 (988.3383) mem 9655MB [2024-08-04 08:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][210/625] eta 0:01:51 lr 0.000220 wd 0.0500 time 0.2557 (0.2677) data time 0.0012 (0.0031) model time 0.2545 (0.2643) loss 6.2017 (5.7395) grad_norm 1.8921 (inf) loss_scale 512.0000 (965.7630) mem 9655MB [2024-08-04 08:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][220/625] eta 0:01:48 lr 0.000220 wd 0.0500 time 0.2571 (0.2671) data time 0.0011 (0.0030) model time 0.2560 (0.2638) loss 6.2521 (5.7317) grad_norm 2.4847 (inf) loss_scale 512.0000 (945.2308) mem 9655MB [2024-08-04 08:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][230/625] eta 0:01:45 lr 0.000220 wd 0.0500 time 0.2569 (0.2667) data time 0.0007 (0.0029) model time 0.2562 (0.2633) loss 5.4722 (5.7241) grad_norm 2.5656 (inf) loss_scale 512.0000 (926.4762) mem 9655MB [2024-08-04 08:56:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][240/625] eta 0:01:42 lr 0.000220 wd 0.0500 time 0.2558 (0.2671) data time 0.0009 (0.0028) model time 0.2549 (0.2640) loss 5.0728 (5.7125) grad_norm 3.5028 (inf) loss_scale 512.0000 (909.2780) mem 9655MB [2024-08-04 08:56:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][250/625] eta 0:01:39 lr 0.000220 wd 0.0500 time 0.2493 (0.2666) data time 0.0007 (0.0027) model time 0.2486 (0.2635) loss 6.3885 (5.7194) grad_norm 1.6164 (inf) loss_scale 512.0000 (893.4502) mem 9655MB [2024-08-04 08:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][260/625] eta 0:01:37 lr 0.000220 wd 0.0500 time 0.2535 (0.2662) data time 0.0007 (0.0027) model time 0.2528 (0.2631) loss 4.8012 (5.7240) grad_norm 3.6119 (inf) loss_scale 512.0000 (878.8352) mem 9655MB [2024-08-04 08:56:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][270/625] eta 0:01:34 lr 0.000219 wd 0.0500 time 0.2598 (0.2658) data time 0.0007 (0.0026) model time 0.2591 (0.2627) loss 4.8289 (5.7189) grad_norm 3.1835 (inf) loss_scale 512.0000 (865.2989) mem 9655MB [2024-08-04 08:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][280/625] eta 0:01:31 lr 0.000219 wd 0.0500 time 0.2526 (0.2655) data time 0.0006 (0.0025) model time 0.2520 (0.2624) loss 5.0135 (5.7196) grad_norm 2.0904 (inf) loss_scale 512.0000 (852.7260) mem 9655MB [2024-08-04 08:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][290/625] eta 0:01:28 lr 0.000219 wd 0.0500 time 0.2601 (0.2653) data time 0.0006 (0.0025) model time 0.2595 (0.2622) loss 6.1589 (5.7150) grad_norm 1.9370 (inf) loss_scale 512.0000 (841.0172) mem 9655MB [2024-08-04 08:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][300/625] eta 0:01:26 lr 0.000219 wd 0.0500 time 0.2543 (0.2650) data time 0.0008 (0.0024) model time 0.2534 (0.2620) loss 5.2936 (5.6992) grad_norm 3.3741 (inf) loss_scale 512.0000 (830.0864) mem 9655MB [2024-08-04 08:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][310/625] eta 0:01:23 lr 0.000219 wd 0.0500 time 0.2544 (0.2647) data time 0.0010 (0.0024) model time 0.2534 (0.2617) loss 5.0537 (5.6942) grad_norm 1.8498 (inf) loss_scale 512.0000 (819.8585) mem 9655MB [2024-08-04 08:57:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][320/625] eta 0:01:20 lr 0.000219 wd 0.0500 time 0.2587 (0.2645) data time 0.0007 (0.0023) model time 0.2580 (0.2616) loss 6.1384 (5.6893) grad_norm 1.5170 (inf) loss_scale 512.0000 (810.2679) mem 9655MB [2024-08-04 08:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][330/625] eta 0:01:18 lr 0.000219 wd 0.0500 time 0.2574 (0.2644) data time 0.0008 (0.0023) model time 0.2566 (0.2615) loss 4.9009 (5.6842) grad_norm 1.5702 (inf) loss_scale 512.0000 (801.2568) mem 9655MB [2024-08-04 08:57:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][340/625] eta 0:01:15 lr 0.000219 wd 0.0500 time 0.2562 (0.2647) data time 0.0007 (0.0023) model time 0.2555 (0.2620) loss 5.4386 (5.6827) grad_norm 4.4456 (inf) loss_scale 512.0000 (792.7742) mem 9655MB [2024-08-04 08:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][350/625] eta 0:01:12 lr 0.000219 wd 0.0500 time 0.2544 (0.2650) data time 0.0007 (0.0022) model time 0.2537 (0.2623) loss 5.7598 (5.6838) grad_norm 2.8815 (inf) loss_scale 512.0000 (784.7749) mem 9655MB [2024-08-04 08:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][360/625] eta 0:01:10 lr 0.000218 wd 0.0500 time 0.2632 (0.2648) data time 0.0010 (0.0022) model time 0.2623 (0.2622) loss 5.9292 (5.6806) grad_norm 2.2787 (inf) loss_scale 512.0000 (777.2188) mem 9655MB [2024-08-04 08:57:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][370/625] eta 0:01:07 lr 0.000218 wd 0.0500 time 0.2558 (0.2646) data time 0.0008 (0.0022) model time 0.2550 (0.2620) loss 6.4868 (5.6865) grad_norm 2.0161 (inf) loss_scale 512.0000 (770.0701) mem 9655MB [2024-08-04 08:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][380/625] eta 0:01:04 lr 0.000218 wd 0.0500 time 0.2603 (0.2644) data time 0.0010 (0.0021) model time 0.2593 (0.2618) loss 6.2129 (5.6862) grad_norm 2.1833 (inf) loss_scale 512.0000 (763.2966) mem 9655MB [2024-08-04 08:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][390/625] eta 0:01:02 lr 0.000218 wd 0.0500 time 0.2571 (0.2642) data time 0.0008 (0.0021) model time 0.2563 (0.2616) loss 6.0011 (5.6896) grad_norm 5.8254 (inf) loss_scale 512.0000 (756.8696) mem 9655MB [2024-08-04 08:57:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][400/625] eta 0:00:59 lr 0.000218 wd 0.0500 time 0.2547 (0.2646) data time 0.0007 (0.0021) model time 0.2539 (0.2621) loss 6.0953 (5.6852) grad_norm 2.9065 (inf) loss_scale 512.0000 (750.7631) mem 9655MB [2024-08-04 08:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][410/625] eta 0:00:56 lr 0.000218 wd 0.0500 time 0.2537 (0.2644) data time 0.0012 (0.0020) model time 0.2525 (0.2619) loss 6.3509 (5.6818) grad_norm 6.1342 (inf) loss_scale 512.0000 (744.9538) mem 9655MB [2024-08-04 08:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][420/625] eta 0:00:54 lr 0.000218 wd 0.0500 time 0.2564 (0.2646) data time 0.0006 (0.0020) model time 0.2558 (0.2622) loss 6.3463 (5.6885) grad_norm 1.6050 (inf) loss_scale 512.0000 (739.4204) mem 9655MB [2024-08-04 08:57:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][430/625] eta 0:00:51 lr 0.000218 wd 0.0500 time 0.2543 (0.2645) data time 0.0007 (0.0020) model time 0.2536 (0.2621) loss 6.7576 (5.6949) grad_norm 2.0234 (inf) loss_scale 512.0000 (734.1439) mem 9655MB [2024-08-04 08:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][440/625] eta 0:00:48 lr 0.000218 wd 0.0500 time 0.2613 (0.2643) data time 0.0008 (0.0020) model time 0.2605 (0.2619) loss 5.7755 (5.6897) grad_norm 1.4695 (inf) loss_scale 512.0000 (729.1066) mem 9655MB [2024-08-04 08:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][450/625] eta 0:00:46 lr 0.000218 wd 0.0500 time 0.2601 (0.2644) data time 0.0007 (0.0019) model time 0.2594 (0.2621) loss 6.4414 (5.6916) grad_norm 3.5774 (inf) loss_scale 512.0000 (724.2927) mem 9655MB [2024-08-04 08:57:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][460/625] eta 0:00:43 lr 0.000217 wd 0.0500 time 0.2627 (0.2642) data time 0.0010 (0.0019) model time 0.2618 (0.2619) loss 5.0757 (5.7004) grad_norm 1.8069 (inf) loss_scale 512.0000 (719.6876) mem 9655MB [2024-08-04 08:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][470/625] eta 0:00:40 lr 0.000217 wd 0.0500 time 0.2556 (0.2641) data time 0.0011 (0.0019) model time 0.2544 (0.2618) loss 5.3978 (5.6991) grad_norm 2.1379 (inf) loss_scale 512.0000 (715.2781) mem 9655MB [2024-08-04 08:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][480/625] eta 0:00:38 lr 0.000217 wd 0.0500 time 0.2735 (0.2640) data time 0.0010 (0.0019) model time 0.2725 (0.2617) loss 6.0084 (5.6994) grad_norm 1.9290 (inf) loss_scale 512.0000 (711.0520) mem 9655MB [2024-08-04 08:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][490/625] eta 0:00:35 lr 0.000217 wd 0.0500 time 0.2541 (0.2638) data time 0.0007 (0.0019) model time 0.2534 (0.2616) loss 5.3163 (5.6956) grad_norm 1.5738 (inf) loss_scale 512.0000 (706.9980) mem 9655MB [2024-08-04 08:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][500/625] eta 0:00:32 lr 0.000217 wd 0.0500 time 0.2567 (0.2637) data time 0.0013 (0.0019) model time 0.2554 (0.2614) loss 6.0289 (5.7001) grad_norm 1.5328 (inf) loss_scale 512.0000 (703.1058) mem 9655MB [2024-08-04 08:57:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][510/625] eta 0:00:30 lr 0.000217 wd 0.0500 time 0.2575 (0.2639) data time 0.0008 (0.0018) model time 0.2567 (0.2617) loss 5.2023 (5.7012) grad_norm 11.8592 (inf) loss_scale 512.0000 (699.3659) mem 9655MB [2024-08-04 08:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][520/625] eta 0:00:27 lr 0.000217 wd 0.0500 time 0.2565 (0.2642) data time 0.0010 (0.0018) model time 0.2555 (0.2620) loss 4.8143 (5.7013) grad_norm 2.7672 (inf) loss_scale 512.0000 (695.7697) mem 9655MB [2024-08-04 08:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][530/625] eta 0:00:25 lr 0.000217 wd 0.0500 time 0.2575 (0.2640) data time 0.0008 (0.0018) model time 0.2567 (0.2619) loss 5.2285 (5.6958) grad_norm 3.4605 (inf) loss_scale 512.0000 (692.3089) mem 9655MB [2024-08-04 08:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][540/625] eta 0:00:22 lr 0.000217 wd 0.0500 time 0.2600 (0.2639) data time 0.0010 (0.0018) model time 0.2590 (0.2618) loss 6.1644 (5.6954) grad_norm 2.8664 (inf) loss_scale 512.0000 (688.9760) mem 9655MB [2024-08-04 08:58:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][550/625] eta 0:00:19 lr 0.000216 wd 0.0500 time 0.2551 (0.2641) data time 0.0008 (0.0018) model time 0.2543 (0.2620) loss 5.0827 (5.6981) grad_norm 2.1133 (inf) loss_scale 512.0000 (685.7641) mem 9655MB [2024-08-04 08:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][560/625] eta 0:00:17 lr 0.000216 wd 0.0500 time 0.4447 (0.2647) data time 0.0010 (0.0018) model time 0.4437 (0.2626) loss 5.5508 (5.7003) grad_norm 3.2921 (inf) loss_scale 512.0000 (682.6667) mem 9655MB [2024-08-04 08:58:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][570/625] eta 0:00:14 lr 0.000216 wd 0.0500 time 0.2554 (0.2645) data time 0.0006 (0.0017) model time 0.2548 (0.2625) loss 5.2399 (5.6963) grad_norm 1.9405 (inf) loss_scale 512.0000 (679.6778) mem 9655MB [2024-08-04 08:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][580/625] eta 0:00:11 lr 0.000216 wd 0.0500 time 0.2568 (0.2644) data time 0.0007 (0.0017) model time 0.2561 (0.2624) loss 5.9996 (5.6907) grad_norm 2.8868 (inf) loss_scale 512.0000 (676.7917) mem 9655MB [2024-08-04 08:58:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][590/625] eta 0:00:09 lr 0.000216 wd 0.0500 time 0.2552 (0.2642) data time 0.0006 (0.0017) model time 0.2546 (0.2623) loss 6.1365 (5.6954) grad_norm 2.3718 (inf) loss_scale 512.0000 (674.0034) mem 9655MB [2024-08-04 08:58:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][600/625] eta 0:00:06 lr 0.000216 wd 0.0500 time 0.2637 (0.2645) data time 0.0006 (0.0017) model time 0.2630 (0.2625) loss 5.0305 (5.6955) grad_norm 2.4850 (inf) loss_scale 512.0000 (671.3078) mem 9655MB [2024-08-04 08:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][610/625] eta 0:00:03 lr 0.000216 wd 0.0500 time 0.2521 (0.2644) data time 0.0006 (0.0017) model time 0.2515 (0.2624) loss 5.8076 (5.6883) grad_norm 2.2152 (inf) loss_scale 512.0000 (668.7005) mem 9655MB [2024-08-04 08:58:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][620/625] eta 0:00:01 lr 0.000216 wd 0.0500 time 0.2618 (0.2642) data time 0.0004 (0.0017) model time 0.2614 (0.2623) loss 5.9634 (5.6858) grad_norm 1.6706 (inf) loss_scale 512.0000 (666.1771) mem 9655MB [2024-08-04 08:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 242 training takes 0:02:45 [2024-08-04 08:58:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 08:58:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 08:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5923 (0.5923) Acc@1 89.600 (89.600) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8975 (0.7172) Acc@1 82.178 (86.665) Acc@5 95.996 (97.745) Mem 9655MB [2024-08-04 08:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0273 (0.8349) Acc@1 77.148 (83.615) Acc@5 94.922 (96.552) Mem 9655MB [2024-08-04 08:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.295 Acc@5 96.557 [2024-08-04 08:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 08:58:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.30% [2024-08-04 08:58:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 08:58:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 08:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.5835 (0.5835) Acc@1 89.893 (89.893) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9097 (0.7100) Acc@1 80.615 (86.581) Acc@5 96.240 (97.741) Mem 9655MB [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0205 (0.8318) Acc@1 77.539 (83.429) Acc@5 95.215 (96.498) Mem 9655MB [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.139 Acc@5 96.487 [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.14% [2024-08-04 08:58:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 08:58:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 08:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][0/625] eta 0:07:55 lr 0.000216 wd 0.0500 time 0.7610 (0.7610) data time 0.5233 (0.5233) model time 0.0000 (0.0000) loss 4.9091 (4.9091) grad_norm 4.3348 (4.3348) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][10/625] eta 0:03:05 lr 0.000216 wd 0.0500 time 0.2550 (0.3014) data time 0.0008 (0.0484) model time 0.0000 (0.0000) loss 5.8687 (5.4610) grad_norm 5.9289 (3.9169) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][20/625] eta 0:02:49 lr 0.000215 wd 0.0500 time 0.2592 (0.2799) data time 0.0008 (0.0258) model time 0.0000 (0.0000) loss 5.9590 (5.5619) grad_norm 5.6997 (3.9242) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][30/625] eta 0:02:41 lr 0.000215 wd 0.0500 time 0.2575 (0.2719) data time 0.0010 (0.0178) model time 0.0000 (0.0000) loss 4.9151 (5.5836) grad_norm 3.4899 (3.5932) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][40/625] eta 0:02:36 lr 0.000215 wd 0.0500 time 0.2588 (0.2678) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 6.1948 (5.5916) grad_norm 3.5491 (3.3336) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][50/625] eta 0:02:34 lr 0.000215 wd 0.0500 time 0.2556 (0.2692) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 5.4012 (5.5797) grad_norm 2.3530 (3.2018) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][60/625] eta 0:02:32 lr 0.000215 wd 0.0500 time 0.3988 (0.2695) data time 0.0008 (0.0095) model time 0.3980 (0.2699) loss 6.5604 (5.6587) grad_norm 3.0556 (3.0742) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][70/625] eta 0:02:29 lr 0.000215 wd 0.0500 time 0.2549 (0.2702) data time 0.0009 (0.0083) model time 0.2540 (0.2717) loss 6.1028 (5.6407) grad_norm 1.5275 (2.9701) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][80/625] eta 0:02:26 lr 0.000215 wd 0.0500 time 0.2547 (0.2684) data time 0.0008 (0.0074) model time 0.2539 (0.2660) loss 6.0131 (5.6499) grad_norm 3.3953 (2.8953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][90/625] eta 0:02:22 lr 0.000215 wd 0.0500 time 0.2579 (0.2670) data time 0.0007 (0.0067) model time 0.2572 (0.2633) loss 6.3464 (5.6669) grad_norm 2.9879 (3.1500) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][100/625] eta 0:02:21 lr 0.000215 wd 0.0500 time 0.4467 (0.2696) data time 0.0008 (0.0061) model time 0.4458 (0.2691) loss 6.2175 (5.6354) grad_norm 3.0348 (3.1262) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][110/625] eta 0:02:18 lr 0.000214 wd 0.0500 time 0.2508 (0.2698) data time 0.0009 (0.0056) model time 0.2499 (0.2693) loss 6.5195 (5.6286) grad_norm 2.4526 (3.0647) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:58:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][120/625] eta 0:02:15 lr 0.000214 wd 0.0500 time 0.2562 (0.2685) data time 0.0006 (0.0052) model time 0.2555 (0.2671) loss 6.8613 (5.6237) grad_norm 2.0076 (3.0986) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][130/625] eta 0:02:12 lr 0.000214 wd 0.0500 time 0.2548 (0.2677) data time 0.0007 (0.0049) model time 0.2541 (0.2658) loss 5.8821 (5.6442) grad_norm 2.4772 (3.0959) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][140/625] eta 0:02:09 lr 0.000214 wd 0.0500 time 0.2575 (0.2668) data time 0.0009 (0.0046) model time 0.2566 (0.2646) loss 6.6151 (5.6532) grad_norm 1.7291 (3.0477) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][150/625] eta 0:02:06 lr 0.000214 wd 0.0500 time 0.2547 (0.2668) data time 0.0007 (0.0044) model time 0.2540 (0.2648) loss 4.5598 (5.6274) grad_norm 2.0068 (3.0076) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][160/625] eta 0:02:03 lr 0.000214 wd 0.0500 time 0.2610 (0.2662) data time 0.0008 (0.0041) model time 0.2602 (0.2639) loss 5.1448 (5.6141) grad_norm 2.2839 (3.0491) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][170/625] eta 0:02:01 lr 0.000214 wd 0.0500 time 0.2588 (0.2664) data time 0.0008 (0.0040) model time 0.2580 (0.2643) loss 4.6987 (5.6217) grad_norm 2.2576 (3.0036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][180/625] eta 0:01:58 lr 0.000214 wd 0.0500 time 0.2621 (0.2659) data time 0.0007 (0.0038) model time 0.2614 (0.2637) loss 5.1421 (5.6235) grad_norm 1.5943 (3.0255) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][190/625] eta 0:01:55 lr 0.000214 wd 0.0500 time 0.2539 (0.2654) data time 0.0006 (0.0036) model time 0.2533 (0.2631) loss 5.7225 (5.6197) grad_norm 1.8946 (3.0124) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][200/625] eta 0:01:52 lr 0.000214 wd 0.0500 time 0.4187 (0.2658) data time 0.0008 (0.0035) model time 0.4179 (0.2637) loss 4.9945 (5.6260) grad_norm 2.1882 (2.9744) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][210/625] eta 0:01:50 lr 0.000213 wd 0.0500 time 0.2530 (0.2654) data time 0.0007 (0.0034) model time 0.2522 (0.2632) loss 6.3245 (5.6440) grad_norm 2.0480 (2.9448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][220/625] eta 0:01:47 lr 0.000213 wd 0.0500 time 0.2581 (0.2649) data time 0.0006 (0.0033) model time 0.2575 (0.2627) loss 5.0522 (5.6336) grad_norm 2.2440 (2.9033) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][230/625] eta 0:01:44 lr 0.000213 wd 0.0500 time 0.2551 (0.2650) data time 0.0009 (0.0032) model time 0.2542 (0.2629) loss 5.1104 (5.6314) grad_norm 2.4692 (2.8950) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][240/625] eta 0:01:41 lr 0.000213 wd 0.0500 time 0.2557 (0.2647) data time 0.0008 (0.0031) model time 0.2549 (0.2626) loss 6.0071 (5.6420) grad_norm 9.9668 (2.9186) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][250/625] eta 0:01:39 lr 0.000213 wd 0.0500 time 0.2592 (0.2644) data time 0.0008 (0.0030) model time 0.2584 (0.2623) loss 5.2433 (5.6434) grad_norm 3.7168 (2.9050) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][260/625] eta 0:01:36 lr 0.000213 wd 0.0500 time 0.2552 (0.2648) data time 0.0008 (0.0029) model time 0.2544 (0.2628) loss 6.0177 (5.6391) grad_norm 1.6245 (2.8759) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][270/625] eta 0:01:33 lr 0.000213 wd 0.0500 time 0.2520 (0.2644) data time 0.0011 (0.0028) model time 0.2509 (0.2624) loss 4.8842 (5.6462) grad_norm 1.4958 (2.8726) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][280/625] eta 0:01:31 lr 0.000213 wd 0.0500 time 0.2560 (0.2642) data time 0.0007 (0.0028) model time 0.2553 (0.2621) loss 5.0443 (5.6532) grad_norm 1.9444 (2.8668) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][290/625] eta 0:01:28 lr 0.000213 wd 0.0500 time 0.2578 (0.2639) data time 0.0008 (0.0027) model time 0.2569 (0.2618) loss 6.3881 (5.6497) grad_norm 1.2525 (2.8458) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][300/625] eta 0:01:25 lr 0.000212 wd 0.0500 time 0.2556 (0.2641) data time 0.0008 (0.0027) model time 0.2548 (0.2621) loss 5.5866 (5.6542) grad_norm 1.7708 (2.8190) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][310/625] eta 0:01:23 lr 0.000212 wd 0.0500 time 0.2585 (0.2639) data time 0.0014 (0.0026) model time 0.2571 (0.2619) loss 6.9160 (5.6563) grad_norm 1.9827 (2.8117) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][320/625] eta 0:01:20 lr 0.000212 wd 0.0500 time 0.2570 (0.2637) data time 0.0010 (0.0026) model time 0.2560 (0.2617) loss 6.3631 (5.6463) grad_norm 4.6390 (2.8447) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][330/625] eta 0:01:17 lr 0.000212 wd 0.0500 time 0.4277 (0.2640) data time 0.0008 (0.0025) model time 0.4269 (0.2622) loss 5.8337 (5.6535) grad_norm 2.3955 (2.8718) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 08:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][340/625] eta 0:01:15 lr 0.000212 wd 0.0500 time 0.2538 (0.2638) data time 0.0008 (0.0025) model time 0.2530 (0.2620) loss 6.2164 (5.6557) grad_norm 1.9619 (2.8529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][350/625] eta 0:01:12 lr 0.000212 wd 0.0500 time 0.2570 (0.2653) data time 0.0010 (0.0024) model time 0.2560 (0.2637) loss 5.0609 (5.6659) grad_norm 3.4514 (2.8519) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][360/625] eta 0:01:10 lr 0.000212 wd 0.0500 time 0.2541 (0.2650) data time 0.0007 (0.0024) model time 0.2534 (0.2634) loss 6.2014 (5.6797) grad_norm 2.9460 (2.8320) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:00:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][370/625] eta 0:01:07 lr 0.000212 wd 0.0500 time 0.2544 (0.2659) data time 0.0007 (0.0023) model time 0.2537 (0.2644) loss 6.1620 (5.6774) grad_norm 2.2310 (2.8205) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:00:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][380/625] eta 0:01:05 lr 0.000212 wd 0.0500 time 0.2511 (0.2657) data time 0.0009 (0.0023) model time 0.2501 (0.2642) loss 4.8194 (5.6745) grad_norm 4.2394 (inf) loss_scale 256.0000 (505.9528) mem 9655MB [2024-08-04 09:00:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][390/625] eta 0:01:02 lr 0.000212 wd 0.0500 time 0.2558 (0.2654) data time 0.0007 (0.0023) model time 0.2551 (0.2639) loss 5.6202 (5.6663) grad_norm 3.2042 (inf) loss_scale 256.0000 (499.5601) mem 9655MB [2024-08-04 09:00:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][400/625] eta 0:00:59 lr 0.000211 wd 0.0500 time 0.2555 (0.2652) data time 0.0007 (0.0022) model time 0.2548 (0.2637) loss 6.2274 (5.6661) grad_norm 1.8657 (inf) loss_scale 256.0000 (493.4863) mem 9655MB [2024-08-04 09:00:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][410/625] eta 0:00:56 lr 0.000211 wd 0.0500 time 0.2572 (0.2650) data time 0.0010 (0.0022) model time 0.2562 (0.2635) loss 5.7040 (5.6583) grad_norm 1.7877 (inf) loss_scale 256.0000 (487.7080) mem 9655MB [2024-08-04 09:00:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][420/625] eta 0:00:54 lr 0.000211 wd 0.0500 time 0.2527 (0.2652) data time 0.0011 (0.0022) model time 0.2516 (0.2637) loss 5.1738 (5.6560) grad_norm 4.4883 (inf) loss_scale 256.0000 (482.2043) mem 9655MB [2024-08-04 09:00:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][430/625] eta 0:00:51 lr 0.000211 wd 0.0500 time 0.2539 (0.2650) data time 0.0010 (0.0021) model time 0.2529 (0.2635) loss 4.5315 (5.6548) grad_norm 3.8510 (inf) loss_scale 256.0000 (476.9559) mem 9655MB [2024-08-04 09:00:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][440/625] eta 0:00:48 lr 0.000211 wd 0.0500 time 0.2531 (0.2648) data time 0.0008 (0.0021) model time 0.2523 (0.2633) loss 5.9270 (5.6529) grad_norm 1.5073 (inf) loss_scale 256.0000 (471.9456) mem 9655MB [2024-08-04 09:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][450/625] eta 0:00:46 lr 0.000211 wd 0.0500 time 0.2596 (0.2649) data time 0.0008 (0.0021) model time 0.2588 (0.2634) loss 6.3527 (5.6557) grad_norm 1.7368 (inf) loss_scale 256.0000 (467.1574) mem 9655MB [2024-08-04 09:00:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][460/625] eta 0:00:43 lr 0.000211 wd 0.0500 time 0.2577 (0.2647) data time 0.0007 (0.0021) model time 0.2571 (0.2632) loss 4.8321 (5.6564) grad_norm 2.9253 (inf) loss_scale 256.0000 (462.5770) mem 9655MB [2024-08-04 09:00:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][470/625] eta 0:00:41 lr 0.000211 wd 0.0500 time 0.2564 (0.2648) data time 0.0011 (0.0020) model time 0.2553 (0.2633) loss 6.7053 (5.6545) grad_norm 8.5291 (inf) loss_scale 256.0000 (458.1911) mem 9655MB [2024-08-04 09:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][480/625] eta 0:00:38 lr 0.000211 wd 0.0500 time 0.2497 (0.2646) data time 0.0009 (0.0020) model time 0.2487 (0.2631) loss 5.9085 (5.6494) grad_norm 2.0900 (inf) loss_scale 256.0000 (453.9875) mem 9655MB [2024-08-04 09:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][490/625] eta 0:00:35 lr 0.000210 wd 0.0500 time 0.2577 (0.2644) data time 0.0008 (0.0020) model time 0.2570 (0.2629) loss 5.4282 (5.6459) grad_norm 3.5990 (inf) loss_scale 256.0000 (449.9552) mem 9655MB [2024-08-04 09:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][500/625] eta 0:00:33 lr 0.000210 wd 0.0500 time 0.2554 (0.2643) data time 0.0009 (0.0020) model time 0.2545 (0.2628) loss 4.7331 (5.6421) grad_norm 2.8344 (inf) loss_scale 256.0000 (446.0838) mem 9655MB [2024-08-04 09:00:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][510/625] eta 0:00:30 lr 0.000210 wd 0.0500 time 0.2605 (0.2641) data time 0.0007 (0.0019) model time 0.2598 (0.2626) loss 6.1977 (5.6428) grad_norm 2.3782 (inf) loss_scale 256.0000 (442.3640) mem 9655MB [2024-08-04 09:00:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][520/625] eta 0:00:27 lr 0.000210 wd 0.0500 time 0.2588 (0.2640) data time 0.0011 (0.0020) model time 0.2577 (0.2624) loss 5.8698 (5.6398) grad_norm 2.6264 (inf) loss_scale 256.0000 (438.7869) mem 9655MB [2024-08-04 09:00:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][530/625] eta 0:00:25 lr 0.000210 wd 0.0500 time 0.2544 (0.2638) data time 0.0008 (0.0019) model time 0.2537 (0.2623) loss 5.5845 (5.6413) grad_norm 3.3681 (inf) loss_scale 256.0000 (435.3446) mem 9655MB [2024-08-04 09:00:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][540/625] eta 0:00:22 lr 0.000210 wd 0.0500 time 0.2551 (0.2637) data time 0.0007 (0.0019) model time 0.2545 (0.2622) loss 5.4931 (5.6427) grad_norm 2.5482 (inf) loss_scale 256.0000 (432.0296) mem 9655MB [2024-08-04 09:00:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][550/625] eta 0:00:19 lr 0.000210 wd 0.0500 time 0.2536 (0.2636) data time 0.0007 (0.0019) model time 0.2528 (0.2621) loss 6.2877 (5.6422) grad_norm 1.7655 (inf) loss_scale 256.0000 (428.8348) mem 9655MB [2024-08-04 09:00:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][560/625] eta 0:00:17 lr 0.000210 wd 0.0500 time 0.2565 (0.2637) data time 0.0010 (0.0019) model time 0.2555 (0.2622) loss 4.9173 (5.6392) grad_norm 2.5126 (inf) loss_scale 256.0000 (425.7540) mem 9655MB [2024-08-04 09:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][570/625] eta 0:00:14 lr 0.000210 wd 0.0500 time 0.2581 (0.2636) data time 0.0006 (0.0019) model time 0.2575 (0.2621) loss 4.6267 (5.6355) grad_norm 2.3238 (inf) loss_scale 256.0000 (422.7811) mem 9655MB [2024-08-04 09:01:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][580/625] eta 0:00:11 lr 0.000210 wd 0.0500 time 0.2582 (0.2634) data time 0.0009 (0.0018) model time 0.2573 (0.2619) loss 5.2588 (5.6374) grad_norm 2.3989 (inf) loss_scale 256.0000 (419.9105) mem 9655MB [2024-08-04 09:01:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][590/625] eta 0:00:09 lr 0.000209 wd 0.0500 time 0.2570 (0.2633) data time 0.0008 (0.0018) model time 0.2562 (0.2618) loss 6.2240 (5.6372) grad_norm 2.0720 (inf) loss_scale 256.0000 (417.1371) mem 9655MB [2024-08-04 09:01:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][600/625] eta 0:00:06 lr 0.000209 wd 0.0500 time 0.2573 (0.2635) data time 0.0015 (0.0018) model time 0.2558 (0.2620) loss 5.9428 (5.6381) grad_norm 1.6670 (inf) loss_scale 256.0000 (414.4559) mem 9655MB [2024-08-04 09:01:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][610/625] eta 0:00:03 lr 0.000209 wd 0.0500 time 0.2525 (0.2634) data time 0.0004 (0.0018) model time 0.2521 (0.2619) loss 5.7426 (5.6393) grad_norm 2.9334 (inf) loss_scale 256.0000 (411.8625) mem 9655MB [2024-08-04 09:01:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][620/625] eta 0:00:01 lr 0.000209 wd 0.0500 time 0.2530 (0.2632) data time 0.0004 (0.0018) model time 0.2526 (0.2617) loss 6.5940 (5.6419) grad_norm 2.2940 (inf) loss_scale 256.0000 (409.3527) mem 9655MB [2024-08-04 09:01:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 243 training takes 0:02:44 [2024-08-04 09:01:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:01:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:01:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.6006 (0.6006) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 09:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.9062 (0.7216) Acc@1 81.787 (86.737) Acc@5 96.240 (97.683) Mem 9655MB [2024-08-04 09:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0342 (0.8411) Acc@1 77.002 (83.568) Acc@5 95.410 (96.515) Mem 9655MB [2024-08-04 09:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.263 Acc@5 96.533 [2024-08-04 09:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.784 (0.784) Loss 0.5840 (0.5840) Acc@1 89.893 (89.893) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 09:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9092 (0.7100) Acc@1 80.615 (86.612) Acc@5 96.289 (97.741) Mem 9655MB [2024-08-04 09:01:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0195 (0.8317) Acc@1 77.539 (83.445) Acc@5 95.117 (96.494) Mem 9655MB [2024-08-04 09:01:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.151 Acc@5 96.489 [2024-08-04 09:01:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:01:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.15% [2024-08-04 09:01:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:01:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][0/625] eta 0:07:38 lr 0.000209 wd 0.0500 time 0.7336 (0.7336) data time 0.4912 (0.4912) model time 0.0000 (0.0000) loss 5.6376 (5.6376) grad_norm 2.2940 (2.2940) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][10/625] eta 0:03:03 lr 0.000209 wd 0.0500 time 0.2560 (0.2981) data time 0.0008 (0.0455) model time 0.0000 (0.0000) loss 5.6101 (5.5169) grad_norm 2.0645 (1.9980) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][20/625] eta 0:02:54 lr 0.000209 wd 0.0500 time 0.2574 (0.2881) data time 0.0009 (0.0242) model time 0.0000 (0.0000) loss 5.9352 (5.7634) grad_norm 4.3497 (2.5051) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][30/625] eta 0:02:48 lr 0.000209 wd 0.0500 time 0.2561 (0.2835) data time 0.0012 (0.0167) model time 0.0000 (0.0000) loss 5.2014 (5.7732) grad_norm 2.2200 (2.6809) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][40/625] eta 0:02:43 lr 0.000209 wd 0.0500 time 0.2576 (0.2797) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 6.1630 (5.7644) grad_norm 1.9568 (3.1833) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][50/625] eta 0:02:38 lr 0.000209 wd 0.0500 time 0.2537 (0.2752) data time 0.0007 (0.0105) model time 0.0000 (0.0000) loss 6.2250 (5.8018) grad_norm 1.5816 (3.1524) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][60/625] eta 0:02:33 lr 0.000208 wd 0.0500 time 0.2589 (0.2722) data time 0.0010 (0.0090) model time 0.2579 (0.2558) loss 4.8521 (5.7550) grad_norm 2.6416 (3.1114) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][70/625] eta 0:02:29 lr 0.000208 wd 0.0500 time 0.2543 (0.2698) data time 0.0014 (0.0079) model time 0.2529 (0.2551) loss 4.5970 (5.6967) grad_norm 6.2199 (3.0494) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][80/625] eta 0:02:26 lr 0.000208 wd 0.0500 time 0.2606 (0.2681) data time 0.0007 (0.0070) model time 0.2599 (0.2550) loss 5.8638 (5.6880) grad_norm 2.2983 (2.9216) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][90/625] eta 0:02:22 lr 0.000208 wd 0.0500 time 0.2567 (0.2667) data time 0.0008 (0.0064) model time 0.2559 (0.2548) loss 6.2112 (5.7020) grad_norm 1.7295 (2.8552) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][100/625] eta 0:02:19 lr 0.000208 wd 0.0500 time 0.2532 (0.2657) data time 0.0009 (0.0058) model time 0.2523 (0.2550) loss 4.5296 (5.6747) grad_norm 2.2516 (2.7855) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][110/625] eta 0:02:16 lr 0.000208 wd 0.0500 time 0.2556 (0.2648) data time 0.0009 (0.0054) model time 0.2547 (0.2550) loss 4.4405 (5.6635) grad_norm 2.6432 (2.7737) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][120/625] eta 0:02:13 lr 0.000208 wd 0.0500 time 0.2558 (0.2640) data time 0.0009 (0.0050) model time 0.2549 (0.2549) loss 6.3640 (5.6646) grad_norm 2.1227 (2.7225) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][130/625] eta 0:02:10 lr 0.000208 wd 0.0500 time 0.2558 (0.2635) data time 0.0009 (0.0047) model time 0.2549 (0.2550) loss 6.8556 (5.6718) grad_norm 2.3607 (2.7220) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][140/625] eta 0:02:08 lr 0.000208 wd 0.0500 time 0.2550 (0.2653) data time 0.0011 (0.0045) model time 0.2539 (0.2587) loss 4.6435 (5.6525) grad_norm 2.7644 (2.6934) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][150/625] eta 0:02:06 lr 0.000208 wd 0.0500 time 0.2532 (0.2655) data time 0.0010 (0.0042) model time 0.2523 (0.2595) loss 5.6743 (5.6574) grad_norm 1.8362 (2.7219) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:01:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][160/625] eta 0:02:03 lr 0.000207 wd 0.0500 time 0.2582 (0.2649) data time 0.0007 (0.0040) model time 0.2574 (0.2592) loss 4.5790 (5.6235) grad_norm 1.9889 (2.7252) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][170/625] eta 0:02:00 lr 0.000207 wd 0.0500 time 0.2558 (0.2644) data time 0.0011 (0.0038) model time 0.2547 (0.2589) loss 5.1824 (5.6221) grad_norm 1.8362 (2.7518) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][180/625] eta 0:01:57 lr 0.000207 wd 0.0500 time 0.2554 (0.2639) data time 0.0009 (0.0037) model time 0.2545 (0.2585) loss 6.2350 (5.6354) grad_norm 1.8052 (2.7111) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][190/625] eta 0:01:54 lr 0.000207 wd 0.0500 time 0.2567 (0.2635) data time 0.0009 (0.0035) model time 0.2558 (0.2582) loss 6.0743 (5.6302) grad_norm 2.5913 (2.7022) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][200/625] eta 0:01:51 lr 0.000207 wd 0.0500 time 0.2537 (0.2631) data time 0.0008 (0.0034) model time 0.2529 (0.2580) loss 5.7796 (5.6371) grad_norm 1.9987 (2.6675) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][210/625] eta 0:01:49 lr 0.000207 wd 0.0500 time 0.2560 (0.2635) data time 0.0011 (0.0033) model time 0.2549 (0.2588) loss 6.2792 (5.6420) grad_norm 2.1091 (2.6389) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][220/625] eta 0:01:46 lr 0.000207 wd 0.0500 time 0.2561 (0.2632) data time 0.0009 (0.0032) model time 0.2552 (0.2587) loss 5.7521 (5.6577) grad_norm 1.6875 (2.6367) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][230/625] eta 0:01:44 lr 0.000207 wd 0.0500 time 0.2505 (0.2638) data time 0.0007 (0.0031) model time 0.2498 (0.2596) loss 6.1793 (5.6665) grad_norm 1.7204 (2.6284) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][240/625] eta 0:01:41 lr 0.000207 wd 0.0500 time 0.2562 (0.2635) data time 0.0007 (0.0030) model time 0.2555 (0.2594) loss 6.1082 (5.6631) grad_norm 2.2110 (2.6285) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][250/625] eta 0:01:38 lr 0.000206 wd 0.0500 time 0.2566 (0.2632) data time 0.0008 (0.0029) model time 0.2558 (0.2592) loss 4.2402 (5.6504) grad_norm 3.1978 (2.6194) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][260/625] eta 0:01:36 lr 0.000206 wd 0.0500 time 0.2549 (0.2643) data time 0.0008 (0.0028) model time 0.2541 (0.2607) loss 6.1764 (5.6437) grad_norm 2.1107 (2.6099) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][270/625] eta 0:01:33 lr 0.000206 wd 0.0500 time 0.2544 (0.2640) data time 0.0007 (0.0028) model time 0.2536 (0.2605) loss 4.7518 (5.6325) grad_norm 2.1316 (2.5909) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][280/625] eta 0:01:30 lr 0.000206 wd 0.0500 time 0.2539 (0.2638) data time 0.0007 (0.0027) model time 0.2532 (0.2603) loss 5.2706 (5.6351) grad_norm 2.7082 (2.5812) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][290/625] eta 0:01:28 lr 0.000206 wd 0.0500 time 0.2587 (0.2635) data time 0.0007 (0.0026) model time 0.2579 (0.2601) loss 5.0375 (5.6220) grad_norm 1.9027 (2.5636) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][300/625] eta 0:01:25 lr 0.000206 wd 0.0500 time 0.2590 (0.2633) data time 0.0006 (0.0026) model time 0.2584 (0.2599) loss 6.1408 (5.6374) grad_norm 1.8211 (2.5512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][310/625] eta 0:01:23 lr 0.000206 wd 0.0500 time 0.2526 (0.2637) data time 0.0008 (0.0025) model time 0.2517 (0.2605) loss 5.8189 (5.6287) grad_norm 3.7838 (2.5557) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][320/625] eta 0:01:20 lr 0.000206 wd 0.0500 time 0.2550 (0.2634) data time 0.0010 (0.0025) model time 0.2540 (0.2603) loss 6.0665 (5.6272) grad_norm 1.6082 (2.5419) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][330/625] eta 0:01:17 lr 0.000206 wd 0.0500 time 0.2592 (0.2632) data time 0.0012 (0.0024) model time 0.2580 (0.2601) loss 5.6617 (5.6245) grad_norm 2.3114 (2.5354) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][340/625] eta 0:01:14 lr 0.000206 wd 0.0500 time 0.2546 (0.2630) data time 0.0008 (0.0024) model time 0.2538 (0.2599) loss 6.0899 (5.6295) grad_norm 1.5683 (2.5217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][350/625] eta 0:01:12 lr 0.000205 wd 0.0500 time 0.2528 (0.2628) data time 0.0010 (0.0024) model time 0.2518 (0.2598) loss 6.4127 (5.6308) grad_norm 2.8598 (2.5130) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][360/625] eta 0:01:09 lr 0.000205 wd 0.0500 time 0.2563 (0.2626) data time 0.0007 (0.0023) model time 0.2556 (0.2596) loss 6.0108 (5.6387) grad_norm 2.8904 (2.5420) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][370/625] eta 0:01:07 lr 0.000205 wd 0.0500 time 0.2556 (0.2630) data time 0.0009 (0.0023) model time 0.2548 (0.2601) loss 5.8991 (5.6404) grad_norm 2.7353 (2.5324) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][380/625] eta 0:01:04 lr 0.000205 wd 0.0500 time 0.2542 (0.2629) data time 0.0009 (0.0023) model time 0.2533 (0.2600) loss 4.6113 (5.6323) grad_norm 2.2356 (2.5229) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][390/625] eta 0:01:01 lr 0.000205 wd 0.0500 time 0.2549 (0.2627) data time 0.0009 (0.0022) model time 0.2540 (0.2598) loss 6.4565 (5.6273) grad_norm 2.6853 (2.5212) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][400/625] eta 0:00:59 lr 0.000205 wd 0.0500 time 0.2540 (0.2633) data time 0.0006 (0.0022) model time 0.2534 (0.2606) loss 6.0146 (5.6274) grad_norm 1.8018 (2.5138) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][410/625] eta 0:00:56 lr 0.000205 wd 0.0500 time 0.4676 (0.2636) data time 0.0009 (0.0022) model time 0.4667 (0.2610) loss 5.8980 (5.6302) grad_norm 3.3737 (2.5052) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][420/625] eta 0:00:54 lr 0.000205 wd 0.0500 time 0.2550 (0.2635) data time 0.0011 (0.0021) model time 0.2539 (0.2609) loss 4.9747 (5.6264) grad_norm 11.8383 (2.5263) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][430/625] eta 0:00:51 lr 0.000205 wd 0.0500 time 0.2549 (0.2637) data time 0.0009 (0.0021) model time 0.2540 (0.2612) loss 6.3918 (5.6267) grad_norm 2.2983 (2.5299) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][440/625] eta 0:00:48 lr 0.000205 wd 0.0500 time 0.2546 (0.2636) data time 0.0007 (0.0021) model time 0.2539 (0.2611) loss 5.6381 (5.6347) grad_norm 3.1053 (2.5214) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][450/625] eta 0:00:46 lr 0.000204 wd 0.0500 time 0.2585 (0.2639) data time 0.0014 (0.0020) model time 0.2571 (0.2615) loss 5.5498 (5.6353) grad_norm 1.5592 (2.5245) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][460/625] eta 0:00:43 lr 0.000204 wd 0.0500 time 0.2557 (0.2637) data time 0.0007 (0.0020) model time 0.2551 (0.2614) loss 5.6296 (5.6334) grad_norm 1.7757 (2.5248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][470/625] eta 0:00:40 lr 0.000204 wd 0.0500 time 0.2549 (0.2640) data time 0.0008 (0.0020) model time 0.2541 (0.2617) loss 5.7310 (5.6333) grad_norm 2.6622 (2.5396) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][480/625] eta 0:00:38 lr 0.000204 wd 0.0500 time 0.2599 (0.2639) data time 0.0010 (0.0020) model time 0.2589 (0.2616) loss 6.0360 (5.6360) grad_norm 1.8782 (2.5486) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][490/625] eta 0:00:35 lr 0.000204 wd 0.0500 time 0.2537 (0.2637) data time 0.0012 (0.0020) model time 0.2525 (0.2614) loss 6.4597 (5.6420) grad_norm 3.0883 (2.5586) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][500/625] eta 0:00:32 lr 0.000204 wd 0.0500 time 0.2574 (0.2636) data time 0.0006 (0.0019) model time 0.2567 (0.2613) loss 5.9257 (5.6385) grad_norm 2.7376 (2.5551) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][510/625] eta 0:00:30 lr 0.000204 wd 0.0500 time 0.2577 (0.2642) data time 0.0007 (0.0019) model time 0.2570 (0.2621) loss 5.1873 (5.6322) grad_norm 1.9189 (2.5581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][520/625] eta 0:00:27 lr 0.000204 wd 0.0500 time 0.2572 (0.2645) data time 0.0012 (0.0019) model time 0.2561 (0.2624) loss 5.4677 (5.6335) grad_norm 1.8019 (2.5599) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][530/625] eta 0:00:25 lr 0.000204 wd 0.0500 time 0.2550 (0.2643) data time 0.0008 (0.0019) model time 0.2542 (0.2622) loss 4.8893 (5.6328) grad_norm 2.2013 (2.5554) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][540/625] eta 0:00:22 lr 0.000203 wd 0.0500 time 0.2568 (0.2645) data time 0.0006 (0.0019) model time 0.2562 (0.2624) loss 6.2761 (5.6275) grad_norm 2.5005 (2.5528) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][550/625] eta 0:00:19 lr 0.000203 wd 0.0500 time 0.2501 (0.2643) data time 0.0010 (0.0018) model time 0.2491 (0.2623) loss 6.3998 (5.6322) grad_norm 6.9398 (2.5555) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][560/625] eta 0:00:17 lr 0.000203 wd 0.0500 time 0.2529 (0.2642) data time 0.0007 (0.0018) model time 0.2523 (0.2621) loss 5.0516 (5.6348) grad_norm 2.7506 (2.5578) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][570/625] eta 0:00:14 lr 0.000203 wd 0.0500 time 0.2579 (0.2640) data time 0.0009 (0.0018) model time 0.2570 (0.2620) loss 5.8023 (5.6310) grad_norm 2.2571 (2.5501) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][580/625] eta 0:00:11 lr 0.000203 wd 0.0500 time 0.2578 (0.2639) data time 0.0009 (0.0018) model time 0.2569 (0.2618) loss 5.5131 (5.6363) grad_norm 1.9331 (2.5452) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][590/625] eta 0:00:09 lr 0.000203 wd 0.0500 time 0.2524 (0.2637) data time 0.0010 (0.0018) model time 0.2514 (0.2617) loss 5.8605 (5.6379) grad_norm 2.0356 (2.5449) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][600/625] eta 0:00:06 lr 0.000203 wd 0.0500 time 0.2573 (0.2639) data time 0.0009 (0.0018) model time 0.2564 (0.2619) loss 5.9866 (5.6359) grad_norm 2.0182 (2.5391) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:03:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][610/625] eta 0:00:03 lr 0.000203 wd 0.0500 time 0.2517 (0.2641) data time 0.0006 (0.0018) model time 0.2511 (0.2622) loss 6.3604 (5.6372) grad_norm 3.5821 (2.5549) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][620/625] eta 0:00:01 lr 0.000203 wd 0.0500 time 0.2555 (0.2640) data time 0.0006 (0.0017) model time 0.2550 (0.2620) loss 6.0788 (5.6383) grad_norm 3.0410 (2.5661) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 244 training takes 0:02:44 [2024-08-04 09:04:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:04:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5952 (0.5952) Acc@1 90.137 (90.137) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 09:04:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.093) Loss 0.9150 (0.7219) Acc@1 81.494 (86.852) Acc@5 96.387 (97.741) Mem 9655MB [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0186 (0.8426) Acc@1 77.734 (83.722) Acc@5 95.361 (96.559) Mem 9655MB [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.411 Acc@5 96.543 [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.41% [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:04:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5840 (0.5840) Acc@1 89.893 (89.893) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 09:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9087 (0.7100) Acc@1 80.615 (86.648) Acc@5 96.191 (97.741) Mem 9655MB [2024-08-04 09:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0186 (0.8315) Acc@1 77.539 (83.461) Acc@5 95.117 (96.501) Mem 9655MB [2024-08-04 09:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.165 Acc@5 96.491 [2024-08-04 09:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:04:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.17% [2024-08-04 09:04:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:04:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][0/625] eta 0:07:56 lr 0.000203 wd 0.0500 time 0.7629 (0.7629) data time 0.5182 (0.5182) model time 0.0000 (0.0000) loss 5.5871 (5.5871) grad_norm 2.4646 (2.4646) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][10/625] eta 0:03:14 lr 0.000202 wd 0.0500 time 0.2575 (0.3168) data time 0.0006 (0.0479) model time 0.0000 (0.0000) loss 4.7572 (5.6940) grad_norm 2.6486 (2.4854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][20/625] eta 0:02:54 lr 0.000202 wd 0.0500 time 0.2586 (0.2883) data time 0.0005 (0.0255) model time 0.0000 (0.0000) loss 4.6998 (5.6021) grad_norm 3.5874 (2.5655) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][30/625] eta 0:02:48 lr 0.000202 wd 0.0500 time 0.2611 (0.2840) data time 0.0009 (0.0176) model time 0.0000 (0.0000) loss 6.4894 (5.5975) grad_norm 2.3652 (2.7129) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][40/625] eta 0:02:43 lr 0.000202 wd 0.0500 time 0.2565 (0.2802) data time 0.0009 (0.0135) model time 0.0000 (0.0000) loss 5.7509 (5.6575) grad_norm 7.8301 (2.7318) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][50/625] eta 0:02:42 lr 0.000202 wd 0.0500 time 0.2529 (0.2826) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 6.2460 (5.6956) grad_norm 1.8902 (2.6667) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][60/625] eta 0:02:38 lr 0.000202 wd 0.0500 time 0.3575 (0.2800) data time 0.0007 (0.0094) model time 0.3568 (0.2658) loss 6.4174 (5.7194) grad_norm 2.2849 (2.6646) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][70/625] eta 0:02:33 lr 0.000202 wd 0.0500 time 0.2679 (0.2768) data time 0.0010 (0.0082) model time 0.2670 (0.2610) loss 5.1857 (5.7195) grad_norm 2.7198 (2.8721) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][80/625] eta 0:02:30 lr 0.000202 wd 0.0500 time 0.2577 (0.2768) data time 0.0010 (0.0073) model time 0.2567 (0.2659) loss 6.3075 (5.7179) grad_norm 2.0516 (2.8512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][90/625] eta 0:02:26 lr 0.000202 wd 0.0500 time 0.2573 (0.2747) data time 0.0009 (0.0066) model time 0.2564 (0.2635) loss 5.2899 (5.6978) grad_norm 1.7777 (2.8817) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][100/625] eta 0:02:23 lr 0.000202 wd 0.0500 time 0.2587 (0.2728) data time 0.0007 (0.0061) model time 0.2580 (0.2617) loss 5.5469 (5.6773) grad_norm 1.5461 (2.8209) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][110/625] eta 0:02:20 lr 0.000201 wd 0.0500 time 0.2500 (0.2729) data time 0.0010 (0.0056) model time 0.2490 (0.2636) loss 5.4252 (5.6826) grad_norm 2.1756 (2.7556) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][120/625] eta 0:02:17 lr 0.000201 wd 0.0500 time 0.2578 (0.2715) data time 0.0008 (0.0052) model time 0.2569 (0.2624) loss 5.7634 (5.6705) grad_norm 2.7363 (2.7337) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][130/625] eta 0:02:13 lr 0.000201 wd 0.0500 time 0.2525 (0.2702) data time 0.0009 (0.0049) model time 0.2516 (0.2614) loss 5.6628 (5.6603) grad_norm 3.0900 (2.7155) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][140/625] eta 0:02:10 lr 0.000201 wd 0.0500 time 0.2527 (0.2692) data time 0.0010 (0.0046) model time 0.2517 (0.2607) loss 5.9709 (5.6682) grad_norm 1.9728 (2.7083) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][150/625] eta 0:02:07 lr 0.000201 wd 0.0500 time 0.2470 (0.2684) data time 0.0007 (0.0044) model time 0.2464 (0.2602) loss 5.8368 (5.6582) grad_norm 2.3958 (2.6585) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][160/625] eta 0:02:04 lr 0.000201 wd 0.0500 time 0.2571 (0.2677) data time 0.0006 (0.0041) model time 0.2565 (0.2598) loss 5.2269 (5.6404) grad_norm 2.0946 (2.6448) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][170/625] eta 0:02:01 lr 0.000201 wd 0.0500 time 0.2569 (0.2670) data time 0.0008 (0.0040) model time 0.2561 (0.2594) loss 5.0025 (5.6493) grad_norm 4.9216 (2.6835) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][180/625] eta 0:01:59 lr 0.000201 wd 0.0500 time 0.2556 (0.2674) data time 0.0010 (0.0038) model time 0.2547 (0.2605) loss 4.8487 (5.6472) grad_norm 2.3708 (2.6926) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:04:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][190/625] eta 0:01:56 lr 0.000201 wd 0.0500 time 0.2553 (0.2669) data time 0.0010 (0.0037) model time 0.2543 (0.2602) loss 6.3770 (5.6540) grad_norm 3.4646 (2.7153) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][200/625] eta 0:01:53 lr 0.000201 wd 0.0500 time 0.2588 (0.2664) data time 0.0008 (0.0035) model time 0.2579 (0.2599) loss 5.4970 (5.6724) grad_norm 3.1178 (2.7248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][210/625] eta 0:01:50 lr 0.000200 wd 0.0500 time 0.2559 (0.2658) data time 0.0008 (0.0034) model time 0.2550 (0.2595) loss 6.5145 (5.6701) grad_norm 1.7757 (2.7362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][220/625] eta 0:01:47 lr 0.000200 wd 0.0500 time 0.2552 (0.2654) data time 0.0011 (0.0033) model time 0.2541 (0.2592) loss 6.4789 (5.6746) grad_norm 2.7142 (2.7416) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][230/625] eta 0:01:44 lr 0.000200 wd 0.0500 time 0.2567 (0.2650) data time 0.0009 (0.0032) model time 0.2559 (0.2591) loss 5.3468 (5.6738) grad_norm 2.5875 (2.7163) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][240/625] eta 0:01:41 lr 0.000200 wd 0.0500 time 0.2586 (0.2647) data time 0.0008 (0.0031) model time 0.2578 (0.2589) loss 6.3984 (5.6726) grad_norm 2.6470 (2.6935) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][250/625] eta 0:01:39 lr 0.000200 wd 0.0500 time 0.2589 (0.2644) data time 0.0009 (0.0030) model time 0.2580 (0.2587) loss 5.4242 (5.6678) grad_norm 2.1116 (2.6866) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][260/625] eta 0:01:36 lr 0.000200 wd 0.0500 time 0.2528 (0.2640) data time 0.0008 (0.0029) model time 0.2520 (0.2586) loss 6.2863 (5.6788) grad_norm 5.5467 (2.6789) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][270/625] eta 0:01:33 lr 0.000200 wd 0.0500 time 0.2522 (0.2637) data time 0.0010 (0.0029) model time 0.2512 (0.2584) loss 6.4582 (5.6842) grad_norm 2.5970 (2.6902) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][280/625] eta 0:01:30 lr 0.000200 wd 0.0500 time 0.2552 (0.2634) data time 0.0010 (0.0028) model time 0.2542 (0.2582) loss 5.8359 (5.6804) grad_norm 2.1295 (2.6923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][290/625] eta 0:01:28 lr 0.000200 wd 0.0500 time 0.2633 (0.2638) data time 0.0006 (0.0027) model time 0.2627 (0.2588) loss 6.1343 (5.6809) grad_norm 2.0446 (2.6923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][300/625] eta 0:01:25 lr 0.000200 wd 0.0500 time 0.2551 (0.2635) data time 0.0007 (0.0027) model time 0.2544 (0.2587) loss 6.5714 (5.6845) grad_norm 2.1054 (2.6871) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][310/625] eta 0:01:22 lr 0.000199 wd 0.0500 time 0.2530 (0.2633) data time 0.0008 (0.0026) model time 0.2522 (0.2585) loss 6.2280 (5.6795) grad_norm 2.2461 (2.6741) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][320/625] eta 0:01:20 lr 0.000199 wd 0.0500 time 0.2540 (0.2631) data time 0.0006 (0.0026) model time 0.2533 (0.2584) loss 6.4433 (5.6891) grad_norm 2.8734 (2.6872) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][330/625] eta 0:01:17 lr 0.000199 wd 0.0500 time 0.2579 (0.2629) data time 0.0008 (0.0025) model time 0.2571 (0.2584) loss 5.8918 (5.6972) grad_norm 2.8081 (2.6941) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][340/625] eta 0:01:14 lr 0.000199 wd 0.0500 time 0.2664 (0.2627) data time 0.0011 (0.0025) model time 0.2653 (0.2583) loss 5.3945 (5.6975) grad_norm 4.3031 (2.6859) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][350/625] eta 0:01:12 lr 0.000199 wd 0.0500 time 0.2583 (0.2636) data time 0.0008 (0.0024) model time 0.2575 (0.2595) loss 5.6473 (5.6969) grad_norm 2.1398 (2.6743) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][360/625] eta 0:01:09 lr 0.000199 wd 0.0500 time 0.2584 (0.2635) data time 0.0008 (0.0024) model time 0.2575 (0.2594) loss 6.0215 (5.6968) grad_norm 1.7626 (2.6658) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][370/625] eta 0:01:07 lr 0.000199 wd 0.0500 time 0.2560 (0.2633) data time 0.0006 (0.0024) model time 0.2554 (0.2593) loss 5.7632 (5.7026) grad_norm 2.3042 (2.6445) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][380/625] eta 0:01:04 lr 0.000199 wd 0.0500 time 0.2595 (0.2631) data time 0.0006 (0.0023) model time 0.2589 (0.2591) loss 6.0315 (5.6957) grad_norm 2.3438 (2.6399) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][390/625] eta 0:01:01 lr 0.000199 wd 0.0500 time 0.2555 (0.2630) data time 0.0008 (0.0023) model time 0.2547 (0.2591) loss 6.2299 (5.6977) grad_norm 1.4515 (2.6274) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][400/625] eta 0:00:59 lr 0.000199 wd 0.0500 time 0.2548 (0.2628) data time 0.0006 (0.0022) model time 0.2542 (0.2589) loss 5.9743 (5.6935) grad_norm 1.4751 (2.6207) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][410/625] eta 0:00:56 lr 0.000198 wd 0.0500 time 0.4223 (0.2630) data time 0.0008 (0.0022) model time 0.4215 (0.2593) loss 5.3933 (5.6934) grad_norm 1.9128 (2.6148) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:05:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][420/625] eta 0:00:53 lr 0.000198 wd 0.0500 time 0.2651 (0.2633) data time 0.0006 (0.0022) model time 0.2645 (0.2597) loss 4.7369 (5.6853) grad_norm 2.6941 (2.6066) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][430/625] eta 0:00:51 lr 0.000198 wd 0.0500 time 0.2490 (0.2632) data time 0.0009 (0.0022) model time 0.2480 (0.2596) loss 6.3281 (5.6830) grad_norm 1.4266 (2.5907) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][440/625] eta 0:00:48 lr 0.000198 wd 0.0500 time 0.2660 (0.2630) data time 0.0008 (0.0021) model time 0.2652 (0.2595) loss 5.2204 (5.6790) grad_norm 2.8482 (2.5916) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][450/625] eta 0:00:46 lr 0.000198 wd 0.0500 time 0.2568 (0.2629) data time 0.0007 (0.0021) model time 0.2561 (0.2594) loss 4.7435 (5.6754) grad_norm 2.4732 (2.5789) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][460/625] eta 0:00:43 lr 0.000198 wd 0.0500 time 0.2554 (0.2627) data time 0.0009 (0.0021) model time 0.2545 (0.2593) loss 6.8112 (5.6752) grad_norm 2.3551 (2.5681) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][470/625] eta 0:00:40 lr 0.000198 wd 0.0500 time 0.2530 (0.2626) data time 0.0009 (0.0021) model time 0.2520 (0.2592) loss 4.8645 (5.6709) grad_norm 2.0676 (2.5547) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][480/625] eta 0:00:38 lr 0.000198 wd 0.0500 time 0.2556 (0.2624) data time 0.0008 (0.0020) model time 0.2548 (0.2591) loss 4.8709 (5.6676) grad_norm 2.8944 (2.5582) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][490/625] eta 0:00:35 lr 0.000198 wd 0.0500 time 0.2558 (0.2623) data time 0.0009 (0.0020) model time 0.2549 (0.2590) loss 5.1280 (5.6721) grad_norm 2.0039 (2.5539) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][500/625] eta 0:00:32 lr 0.000197 wd 0.0500 time 0.2597 (0.2622) data time 0.0010 (0.0020) model time 0.2587 (0.2589) loss 5.0879 (5.6665) grad_norm 2.6374 (2.5499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][510/625] eta 0:00:30 lr 0.000197 wd 0.0500 time 0.2589 (0.2624) data time 0.0006 (0.0020) model time 0.2584 (0.2592) loss 4.8844 (5.6668) grad_norm 2.5498 (2.5477) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][520/625] eta 0:00:27 lr 0.000197 wd 0.0500 time 0.2569 (0.2627) data time 0.0006 (0.0019) model time 0.2563 (0.2596) loss 6.4931 (5.6716) grad_norm 2.5008 (2.5404) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][530/625] eta 0:00:24 lr 0.000197 wd 0.0500 time 0.2528 (0.2626) data time 0.0011 (0.0019) model time 0.2518 (0.2595) loss 5.4285 (5.6749) grad_norm 2.6878 (2.5507) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][540/625] eta 0:00:22 lr 0.000197 wd 0.0500 time 0.2547 (0.2625) data time 0.0009 (0.0019) model time 0.2538 (0.2594) loss 4.7939 (5.6644) grad_norm 1.7671 (2.5526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][550/625] eta 0:00:19 lr 0.000197 wd 0.0500 time 0.2597 (0.2624) data time 0.0007 (0.0019) model time 0.2589 (0.2594) loss 5.3774 (5.6678) grad_norm 2.4589 (2.5621) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][560/625] eta 0:00:17 lr 0.000197 wd 0.0500 time 0.2564 (0.2625) data time 0.0010 (0.0019) model time 0.2554 (0.2596) loss 6.9446 (5.6709) grad_norm 4.1106 (2.5725) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][570/625] eta 0:00:14 lr 0.000197 wd 0.0500 time 0.2559 (0.2627) data time 0.0012 (0.0019) model time 0.2547 (0.2598) loss 5.6935 (5.6696) grad_norm 2.9720 (2.5673) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][580/625] eta 0:00:11 lr 0.000197 wd 0.0500 time 0.2519 (0.2626) data time 0.0010 (0.0018) model time 0.2508 (0.2597) loss 6.1312 (5.6694) grad_norm 5.7122 (2.5711) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][590/625] eta 0:00:09 lr 0.000197 wd 0.0500 time 0.2553 (0.2628) data time 0.0007 (0.0018) model time 0.2546 (0.2600) loss 5.0492 (5.6644) grad_norm 2.9572 (2.5721) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][600/625] eta 0:00:06 lr 0.000196 wd 0.0500 time 0.2582 (0.2627) data time 0.0006 (0.0018) model time 0.2576 (0.2599) loss 5.6055 (5.6616) grad_norm 2.1793 (2.5710) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][610/625] eta 0:00:03 lr 0.000196 wd 0.0500 time 0.2520 (0.2625) data time 0.0004 (0.0018) model time 0.2517 (0.2598) loss 6.2605 (5.6692) grad_norm 2.9275 (2.5691) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][620/625] eta 0:00:01 lr 0.000196 wd 0.0500 time 0.2533 (0.2624) data time 0.0007 (0.0018) model time 0.2527 (0.2596) loss 6.2671 (5.6748) grad_norm 9.6039 (2.5880) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 245 training takes 0:02:43 [2024-08-04 09:06:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:06:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:06:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.488 (0.488) Loss 0.6069 (0.6069) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:06:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9229 (0.7285) Acc@1 81.543 (86.768) Acc@5 96.240 (97.714) Mem 9655MB [2024-08-04 09:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0205 (0.8486) Acc@1 77.686 (83.619) Acc@5 95.605 (96.543) Mem 9655MB [2024-08-04 09:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.249 Acc@5 96.565 [2024-08-04 09:06:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.703 (0.703) Loss 0.5835 (0.5835) Acc@1 89.893 (89.893) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:06:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.126) Loss 0.9087 (0.7097) Acc@1 80.664 (86.652) Acc@5 96.240 (97.758) Mem 9655MB [2024-08-04 09:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0186 (0.8312) Acc@1 77.686 (83.482) Acc@5 95.117 (96.519) Mem 9655MB [2024-08-04 09:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.187 Acc@5 96.507 [2024-08-04 09:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.19% [2024-08-04 09:06:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:06:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:06:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][0/625] eta 0:07:37 lr 0.000196 wd 0.0500 time 0.7327 (0.7327) data time 0.4907 (0.4907) model time 0.0000 (0.0000) loss 5.8978 (5.8978) grad_norm 2.5124 (2.5124) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:06:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][10/625] eta 0:03:12 lr 0.000196 wd 0.0500 time 0.4031 (0.3122) data time 0.0008 (0.0454) model time 0.0000 (0.0000) loss 5.3541 (5.7288) grad_norm 1.8762 (2.3468) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][20/625] eta 0:02:52 lr 0.000196 wd 0.0500 time 0.2551 (0.2853) data time 0.0009 (0.0243) model time 0.0000 (0.0000) loss 5.2165 (5.5865) grad_norm 2.2306 (2.7375) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][30/625] eta 0:02:44 lr 0.000196 wd 0.0500 time 0.2548 (0.2758) data time 0.0006 (0.0167) model time 0.0000 (0.0000) loss 6.0828 (5.5829) grad_norm 4.3298 (3.0555) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][40/625] eta 0:02:38 lr 0.000196 wd 0.0500 time 0.2558 (0.2710) data time 0.0008 (0.0129) model time 0.0000 (0.0000) loss 5.8045 (5.5648) grad_norm 2.9580 (2.9404) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][50/625] eta 0:02:36 lr 0.000196 wd 0.0500 time 0.2529 (0.2720) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 5.9563 (5.5682) grad_norm 1.9685 (2.7802) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][60/625] eta 0:02:33 lr 0.000196 wd 0.0500 time 0.3830 (0.2714) data time 0.0009 (0.0089) model time 0.3821 (0.2673) loss 5.8754 (5.5928) grad_norm 4.0376 (2.8043) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][70/625] eta 0:02:30 lr 0.000196 wd 0.0500 time 0.2582 (0.2719) data time 0.0006 (0.0078) model time 0.2575 (0.2708) loss 6.4292 (5.6155) grad_norm 1.5755 (2.7965) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][80/625] eta 0:02:28 lr 0.000195 wd 0.0500 time 0.2537 (0.2721) data time 0.0009 (0.0070) model time 0.2528 (0.2714) loss 5.8143 (5.6406) grad_norm 2.8403 (2.8227) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][90/625] eta 0:02:24 lr 0.000195 wd 0.0500 time 0.2569 (0.2707) data time 0.0009 (0.0063) model time 0.2560 (0.2680) loss 6.0752 (5.6674) grad_norm 2.0406 (2.9248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][100/625] eta 0:02:23 lr 0.000195 wd 0.0500 time 0.2547 (0.2731) data time 0.0011 (0.0058) model time 0.2536 (0.2732) loss 5.3325 (5.6463) grad_norm 1.9926 (2.9243) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][110/625] eta 0:02:20 lr 0.000195 wd 0.0500 time 0.2558 (0.2731) data time 0.0007 (0.0053) model time 0.2552 (0.2731) loss 5.5679 (5.6268) grad_norm 2.4790 (3.0315) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][120/625] eta 0:02:17 lr 0.000195 wd 0.0500 time 0.2598 (0.2732) data time 0.0009 (0.0050) model time 0.2589 (0.2730) loss 4.5745 (5.6245) grad_norm 1.7128 (3.0730) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][130/625] eta 0:02:14 lr 0.000195 wd 0.0500 time 0.2547 (0.2721) data time 0.0008 (0.0047) model time 0.2539 (0.2712) loss 6.4139 (5.6259) grad_norm 1.6326 (3.0155) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][140/625] eta 0:02:13 lr 0.000195 wd 0.0500 time 0.2560 (0.2752) data time 0.0007 (0.0044) model time 0.2554 (0.2760) loss 5.4351 (5.6150) grad_norm 2.9650 (2.9791) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][150/625] eta 0:02:10 lr 0.000195 wd 0.0500 time 0.2577 (0.2748) data time 0.0006 (0.0042) model time 0.2571 (0.2753) loss 5.4041 (5.6148) grad_norm 2.6485 (2.9763) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][160/625] eta 0:02:07 lr 0.000195 wd 0.0500 time 0.2659 (0.2751) data time 0.0009 (0.0040) model time 0.2650 (0.2756) loss 5.9829 (5.6296) grad_norm 1.2719 (2.9252) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][170/625] eta 0:02:04 lr 0.000195 wd 0.0500 time 0.2553 (0.2740) data time 0.0008 (0.0038) model time 0.2545 (0.2739) loss 5.2676 (5.6246) grad_norm 1.7899 (2.9236) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][180/625] eta 0:02:01 lr 0.000194 wd 0.0500 time 0.2522 (0.2729) data time 0.0007 (0.0036) model time 0.2515 (0.2724) loss 6.9509 (5.6391) grad_norm 5.0703 (2.9578) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][190/625] eta 0:01:58 lr 0.000194 wd 0.0500 time 0.2551 (0.2721) data time 0.0008 (0.0035) model time 0.2543 (0.2711) loss 5.6385 (5.6552) grad_norm 2.2416 (2.9425) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][200/625] eta 0:01:55 lr 0.000194 wd 0.0500 time 0.2584 (0.2713) data time 0.0006 (0.0034) model time 0.2578 (0.2701) loss 5.1230 (5.6449) grad_norm 2.7328 (2.9053) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][210/625] eta 0:01:52 lr 0.000194 wd 0.0500 time 0.2514 (0.2715) data time 0.0011 (0.0032) model time 0.2504 (0.2703) loss 5.2829 (5.6485) grad_norm 2.1286 (2.8681) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][220/625] eta 0:01:49 lr 0.000194 wd 0.0500 time 0.2574 (0.2708) data time 0.0008 (0.0031) model time 0.2565 (0.2695) loss 5.4397 (5.6424) grad_norm 1.8296 (2.8466) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:07:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][230/625] eta 0:01:46 lr 0.000194 wd 0.0500 time 0.2570 (0.2702) data time 0.0012 (0.0030) model time 0.2558 (0.2687) loss 4.0629 (5.6310) grad_norm 1.9760 (2.8236) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][240/625] eta 0:01:44 lr 0.000194 wd 0.0500 time 0.2575 (0.2704) data time 0.0008 (0.0029) model time 0.2567 (0.2690) loss 5.1150 (5.6284) grad_norm 1.8924 (2.7865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][250/625] eta 0:01:41 lr 0.000194 wd 0.0500 time 0.2566 (0.2698) data time 0.0010 (0.0029) model time 0.2556 (0.2683) loss 6.2831 (5.6340) grad_norm 6.3883 (2.7827) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][260/625] eta 0:01:38 lr 0.000194 wd 0.0500 time 0.2547 (0.2693) data time 0.0006 (0.0028) model time 0.2540 (0.2677) loss 4.2361 (5.6346) grad_norm 2.2390 (2.7670) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][270/625] eta 0:01:35 lr 0.000193 wd 0.0500 time 0.2544 (0.2687) data time 0.0011 (0.0027) model time 0.2533 (0.2671) loss 6.6955 (5.6395) grad_norm 2.5723 (2.7747) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][280/625] eta 0:01:32 lr 0.000193 wd 0.0500 time 0.2538 (0.2683) data time 0.0008 (0.0027) model time 0.2530 (0.2665) loss 6.1597 (5.6521) grad_norm 2.0593 (2.7581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][290/625] eta 0:01:29 lr 0.000193 wd 0.0500 time 0.2562 (0.2678) data time 0.0009 (0.0026) model time 0.2553 (0.2660) loss 6.6676 (5.6486) grad_norm 1.8668 (2.7505) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][300/625] eta 0:01:26 lr 0.000193 wd 0.0500 time 0.2599 (0.2675) data time 0.0007 (0.0025) model time 0.2592 (0.2657) loss 4.8186 (5.6478) grad_norm 3.7310 (2.7525) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][310/625] eta 0:01:24 lr 0.000193 wd 0.0500 time 0.2570 (0.2671) data time 0.0006 (0.0025) model time 0.2563 (0.2652) loss 4.7604 (5.6563) grad_norm 2.5926 (2.7570) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][320/625] eta 0:01:21 lr 0.000193 wd 0.0500 time 0.2510 (0.2667) data time 0.0008 (0.0024) model time 0.2502 (0.2648) loss 4.4852 (5.6482) grad_norm 2.1792 (2.7350) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][330/625] eta 0:01:18 lr 0.000193 wd 0.0500 time 0.2571 (0.2664) data time 0.0009 (0.0024) model time 0.2562 (0.2645) loss 6.4384 (5.6456) grad_norm 2.0084 (2.7179) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][340/625] eta 0:01:15 lr 0.000193 wd 0.0500 time 0.2557 (0.2661) data time 0.0007 (0.0024) model time 0.2550 (0.2642) loss 5.2670 (5.6449) grad_norm 2.8414 (2.7072) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][350/625] eta 0:01:13 lr 0.000193 wd 0.0500 time 0.2607 (0.2667) data time 0.0007 (0.0023) model time 0.2601 (0.2648) loss 4.7849 (5.6510) grad_norm 2.2603 (2.7041) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][360/625] eta 0:01:10 lr 0.000193 wd 0.0500 time 0.2576 (0.2664) data time 0.0017 (0.0023) model time 0.2560 (0.2645) loss 5.3540 (5.6369) grad_norm 1.6450 (2.6917) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][370/625] eta 0:01:07 lr 0.000192 wd 0.0500 time 0.2549 (0.2666) data time 0.0006 (0.0022) model time 0.2543 (0.2648) loss 4.2018 (5.6301) grad_norm 2.4057 (2.6947) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][380/625] eta 0:01:05 lr 0.000192 wd 0.0500 time 0.2592 (0.2663) data time 0.0009 (0.0022) model time 0.2584 (0.2645) loss 4.8606 (5.6243) grad_norm 1.8457 (2.6946) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][390/625] eta 0:01:02 lr 0.000192 wd 0.0500 time 0.2574 (0.2661) data time 0.0011 (0.0022) model time 0.2563 (0.2643) loss 5.6331 (5.6161) grad_norm 1.7182 (2.7761) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][400/625] eta 0:00:59 lr 0.000192 wd 0.0500 time 0.2562 (0.2659) data time 0.0010 (0.0021) model time 0.2552 (0.2640) loss 5.0783 (5.6272) grad_norm 3.0329 (2.7671) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][410/625] eta 0:00:57 lr 0.000192 wd 0.0500 time 0.2616 (0.2656) data time 0.0012 (0.0021) model time 0.2604 (0.2638) loss 5.7337 (5.6183) grad_norm 3.6507 (2.7545) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][420/625] eta 0:00:54 lr 0.000192 wd 0.0500 time 0.2553 (0.2654) data time 0.0009 (0.0021) model time 0.2545 (0.2636) loss 4.7824 (5.6279) grad_norm 2.5664 (2.7415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][430/625] eta 0:00:51 lr 0.000192 wd 0.0500 time 0.2567 (0.2652) data time 0.0009 (0.0021) model time 0.2558 (0.2633) loss 5.7949 (5.6334) grad_norm 2.1113 (2.7370) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][440/625] eta 0:00:49 lr 0.000192 wd 0.0500 time 0.2567 (0.2650) data time 0.0009 (0.0020) model time 0.2557 (0.2631) loss 6.1623 (5.6354) grad_norm 2.5303 (2.7203) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][450/625] eta 0:00:46 lr 0.000192 wd 0.0500 time 0.2601 (0.2648) data time 0.0006 (0.0020) model time 0.2595 (0.2629) loss 5.5063 (5.6391) grad_norm 1.7070 (2.7078) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:08:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][460/625] eta 0:00:43 lr 0.000192 wd 0.0500 time 0.2569 (0.2648) data time 0.0008 (0.0020) model time 0.2560 (0.2630) loss 4.5086 (5.6310) grad_norm 1.6367 (2.6983) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][470/625] eta 0:00:41 lr 0.000191 wd 0.0500 time 0.2546 (0.2646) data time 0.0009 (0.0020) model time 0.2538 (0.2628) loss 5.6188 (5.6342) grad_norm 1.9802 (2.7016) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:09:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][480/625] eta 0:00:38 lr 0.000191 wd 0.0500 time 0.2544 (0.2645) data time 0.0008 (0.0020) model time 0.2535 (0.2626) loss 5.9287 (5.6429) grad_norm 2.8878 (2.6921) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][490/625] eta 0:00:35 lr 0.000191 wd 0.0500 time 0.2602 (0.2643) data time 0.0008 (0.0019) model time 0.2593 (0.2625) loss 6.5219 (5.6495) grad_norm 2.4617 (2.6993) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][500/625] eta 0:00:33 lr 0.000191 wd 0.0500 time 0.2605 (0.2646) data time 0.0008 (0.0019) model time 0.2597 (0.2628) loss 5.0775 (5.6520) grad_norm 2.6433 (2.6955) loss_scale 512.0000 (258.0439) mem 9655MB [2024-08-04 09:09:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][510/625] eta 0:00:30 lr 0.000191 wd 0.0500 time 0.2552 (0.2644) data time 0.0007 (0.0019) model time 0.2545 (0.2626) loss 6.5905 (5.6514) grad_norm 2.9519 (2.6891) loss_scale 512.0000 (263.0137) mem 9655MB [2024-08-04 09:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][520/625] eta 0:00:27 lr 0.000191 wd 0.0500 time 0.2520 (0.2642) data time 0.0011 (0.0019) model time 0.2509 (0.2625) loss 6.1686 (5.6541) grad_norm 2.9078 (2.6848) loss_scale 512.0000 (267.7927) mem 9655MB [2024-08-04 09:09:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][530/625] eta 0:00:25 lr 0.000191 wd 0.0500 time 0.2528 (0.2641) data time 0.0009 (0.0019) model time 0.2519 (0.2623) loss 5.6176 (5.6496) grad_norm 3.1659 (2.6956) loss_scale 512.0000 (272.3917) mem 9655MB [2024-08-04 09:09:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][540/625] eta 0:00:22 lr 0.000191 wd 0.0500 time 0.2608 (0.2644) data time 0.0006 (0.0018) model time 0.2602 (0.2626) loss 5.7180 (5.6495) grad_norm 2.1559 (2.6994) loss_scale 512.0000 (276.8207) mem 9655MB [2024-08-04 09:09:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][550/625] eta 0:00:19 lr 0.000191 wd 0.0500 time 0.2592 (0.2644) data time 0.0009 (0.0018) model time 0.2583 (0.2627) loss 5.2066 (5.6388) grad_norm 1.9535 (2.6981) loss_scale 512.0000 (281.0889) mem 9655MB [2024-08-04 09:09:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][560/625] eta 0:00:17 lr 0.000191 wd 0.0500 time 0.2578 (0.2645) data time 0.0011 (0.0018) model time 0.2567 (0.2628) loss 4.6452 (5.6405) grad_norm 2.0261 (2.6877) loss_scale 512.0000 (285.2050) mem 9655MB [2024-08-04 09:09:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][570/625] eta 0:00:14 lr 0.000190 wd 0.0500 time 0.2558 (0.2644) data time 0.0007 (0.0018) model time 0.2550 (0.2627) loss 6.4136 (5.6407) grad_norm 2.6062 (2.6896) loss_scale 512.0000 (289.1769) mem 9655MB [2024-08-04 09:09:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][580/625] eta 0:00:11 lr 0.000190 wd 0.0500 time 0.2679 (0.2643) data time 0.0008 (0.0018) model time 0.2671 (0.2626) loss 5.0827 (5.6382) grad_norm 7.7769 (2.7069) loss_scale 512.0000 (293.0120) mem 9655MB [2024-08-04 09:09:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][590/625] eta 0:00:09 lr 0.000190 wd 0.0500 time 0.2538 (0.2642) data time 0.0008 (0.0018) model time 0.2530 (0.2625) loss 4.7709 (5.6419) grad_norm 3.7827 (2.7114) loss_scale 512.0000 (296.7174) mem 9655MB [2024-08-04 09:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][600/625] eta 0:00:06 lr 0.000190 wd 0.0500 time 0.2540 (0.2640) data time 0.0013 (0.0018) model time 0.2527 (0.2623) loss 5.9691 (5.6420) grad_norm 13.4270 (2.7257) loss_scale 512.0000 (300.2995) mem 9655MB [2024-08-04 09:09:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][610/625] eta 0:00:03 lr 0.000190 wd 0.0500 time 0.2536 (0.2639) data time 0.0004 (0.0018) model time 0.2533 (0.2622) loss 5.5539 (5.6429) grad_norm 6.1979 (2.7314) loss_scale 512.0000 (303.7643) mem 9655MB [2024-08-04 09:09:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][620/625] eta 0:00:01 lr 0.000190 wd 0.0500 time 0.2532 (0.2639) data time 0.0005 (0.0017) model time 0.2527 (0.2622) loss 5.4575 (5.6412) grad_norm 3.9840 (2.7351) loss_scale 512.0000 (307.1176) mem 9655MB [2024-08-04 09:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 246 training takes 0:02:44 [2024-08-04 09:09:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:09:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5894 (0.5894) Acc@1 90.186 (90.186) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 09:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9077 (0.7136) Acc@1 81.055 (86.741) Acc@5 96.143 (97.696) Mem 9655MB [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0215 (0.8325) Acc@1 77.832 (83.738) Acc@5 95.703 (96.563) Mem 9655MB [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.441 Acc@5 96.555 [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.44% [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:09:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.5830 (0.5830) Acc@1 89.941 (89.941) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9087 (0.7096) Acc@1 80.713 (86.661) Acc@5 96.191 (97.745) Mem 9655MB [2024-08-04 09:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0176 (0.8309) Acc@1 77.686 (83.498) Acc@5 95.215 (96.519) Mem 9655MB [2024-08-04 09:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.199 Acc@5 96.515 [2024-08-04 09:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:09:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.20% [2024-08-04 09:09:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:09:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:09:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][0/625] eta 0:07:57 lr 0.000190 wd 0.0500 time 0.7643 (0.7643) data time 0.5189 (0.5189) model time 0.0000 (0.0000) loss 6.1838 (6.1838) grad_norm 3.7083 (3.7083) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][10/625] eta 0:03:05 lr 0.000190 wd 0.0500 time 0.2553 (0.3022) data time 0.0011 (0.0481) model time 0.0000 (0.0000) loss 5.4699 (5.6589) grad_norm 3.1732 (3.3950) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][20/625] eta 0:02:49 lr 0.000190 wd 0.0500 time 0.2538 (0.2805) data time 0.0008 (0.0256) model time 0.0000 (0.0000) loss 5.4907 (5.7745) grad_norm 3.5998 (3.7948) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][30/625] eta 0:02:46 lr 0.000190 wd 0.0500 time 0.2581 (0.2793) data time 0.0007 (0.0176) model time 0.0000 (0.0000) loss 5.5702 (5.7300) grad_norm 2.6693 (3.4967) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][40/625] eta 0:02:41 lr 0.000190 wd 0.0500 time 0.2563 (0.2763) data time 0.0007 (0.0135) model time 0.0000 (0.0000) loss 5.1134 (5.7714) grad_norm 2.4826 (3.2867) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][50/625] eta 0:02:38 lr 0.000189 wd 0.0500 time 0.2492 (0.2761) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 4.9539 (5.7080) grad_norm 1.5651 (3.1300) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][60/625] eta 0:02:34 lr 0.000189 wd 0.0500 time 0.2561 (0.2730) data time 0.0007 (0.0094) model time 0.2554 (0.2566) loss 5.5646 (5.7091) grad_norm 3.3945 (3.0414) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][70/625] eta 0:02:30 lr 0.000189 wd 0.0500 time 0.2581 (0.2709) data time 0.0008 (0.0082) model time 0.2573 (0.2567) loss 6.3471 (5.7128) grad_norm 1.5358 (2.9378) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][80/625] eta 0:02:26 lr 0.000189 wd 0.0500 time 0.2643 (0.2691) data time 0.0008 (0.0073) model time 0.2636 (0.2563) loss 5.8176 (5.7416) grad_norm 2.8067 (2.8682) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][90/625] eta 0:02:23 lr 0.000189 wd 0.0500 time 0.2539 (0.2676) data time 0.0007 (0.0066) model time 0.2532 (0.2559) loss 5.6762 (5.7364) grad_norm 1.7163 (2.8047) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][100/625] eta 0:02:19 lr 0.000189 wd 0.0500 time 0.2519 (0.2666) data time 0.0009 (0.0060) model time 0.2510 (0.2560) loss 5.9159 (5.7424) grad_norm 1.7684 (2.7480) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][110/625] eta 0:02:17 lr 0.000189 wd 0.0500 time 0.2602 (0.2675) data time 0.0006 (0.0056) model time 0.2595 (0.2594) loss 6.1921 (5.7444) grad_norm 8.6580 (2.7391) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][120/625] eta 0:02:14 lr 0.000189 wd 0.0500 time 0.2593 (0.2665) data time 0.0008 (0.0052) model time 0.2585 (0.2587) loss 5.6155 (5.7380) grad_norm 2.0485 (2.6758) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][130/625] eta 0:02:11 lr 0.000189 wd 0.0500 time 0.2615 (0.2657) data time 0.0013 (0.0049) model time 0.2602 (0.2583) loss 5.8872 (5.7203) grad_norm 4.0217 (2.6519) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][140/625] eta 0:02:09 lr 0.000189 wd 0.0500 time 0.2611 (0.2665) data time 0.0009 (0.0046) model time 0.2602 (0.2602) loss 6.3803 (5.7113) grad_norm 2.0969 (2.6784) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][150/625] eta 0:02:06 lr 0.000188 wd 0.0500 time 0.2565 (0.2667) data time 0.0007 (0.0043) model time 0.2558 (0.2610) loss 4.3477 (5.6928) grad_norm 1.7878 (2.6376) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][160/625] eta 0:02:03 lr 0.000188 wd 0.0500 time 0.2580 (0.2660) data time 0.0008 (0.0041) model time 0.2572 (0.2604) loss 6.3385 (5.7149) grad_norm 3.5791 (2.6351) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][170/625] eta 0:02:00 lr 0.000188 wd 0.0500 time 0.2550 (0.2654) data time 0.0007 (0.0039) model time 0.2543 (0.2600) loss 5.1304 (5.7212) grad_norm 2.1935 (2.6224) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][180/625] eta 0:01:58 lr 0.000188 wd 0.0500 time 0.2625 (0.2657) data time 0.0006 (0.0038) model time 0.2619 (0.2607) loss 5.8813 (5.7366) grad_norm 3.9758 (2.7612) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][190/625] eta 0:01:55 lr 0.000188 wd 0.0500 time 0.2567 (0.2661) data time 0.0008 (0.0036) model time 0.2558 (0.2616) loss 4.2627 (5.7329) grad_norm 2.0848 (2.7751) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][200/625] eta 0:01:52 lr 0.000188 wd 0.0500 time 0.2590 (0.2656) data time 0.0007 (0.0035) model time 0.2583 (0.2612) loss 4.6056 (5.7085) grad_norm 1.7846 (2.7584) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][210/625] eta 0:01:50 lr 0.000188 wd 0.0500 time 0.2548 (0.2658) data time 0.0009 (0.0034) model time 0.2539 (0.2616) loss 4.4467 (5.6941) grad_norm 2.0264 (2.7344) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][220/625] eta 0:01:47 lr 0.000188 wd 0.0500 time 0.2572 (0.2653) data time 0.0006 (0.0033) model time 0.2565 (0.2611) loss 6.4940 (5.6966) grad_norm 3.3725 (2.7227) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][230/625] eta 0:01:44 lr 0.000188 wd 0.0500 time 0.2564 (0.2657) data time 0.0008 (0.0032) model time 0.2556 (0.2618) loss 4.9194 (5.6892) grad_norm 1.4835 (2.7108) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][240/625] eta 0:01:42 lr 0.000188 wd 0.0500 time 0.2722 (0.2654) data time 0.0007 (0.0031) model time 0.2715 (0.2616) loss 6.1095 (5.6926) grad_norm 2.2477 (2.6902) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][250/625] eta 0:01:39 lr 0.000187 wd 0.0500 time 0.2547 (0.2658) data time 0.0010 (0.0030) model time 0.2537 (0.2622) loss 4.6552 (5.6912) grad_norm 2.0080 (2.6932) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][260/625] eta 0:01:36 lr 0.000187 wd 0.0500 time 0.2560 (0.2654) data time 0.0011 (0.0029) model time 0.2549 (0.2618) loss 5.4590 (5.6927) grad_norm 2.5282 (2.6936) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][270/625] eta 0:01:34 lr 0.000187 wd 0.0500 time 0.2573 (0.2658) data time 0.0007 (0.0029) model time 0.2566 (0.2625) loss 5.2515 (5.6906) grad_norm 2.3422 (2.6960) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][280/625] eta 0:01:31 lr 0.000187 wd 0.0500 time 0.2578 (0.2654) data time 0.0009 (0.0028) model time 0.2569 (0.2621) loss 5.8068 (5.6819) grad_norm 2.8420 (2.6858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][290/625] eta 0:01:29 lr 0.000187 wd 0.0500 time 0.2573 (0.2658) data time 0.0006 (0.0027) model time 0.2567 (0.2627) loss 6.2274 (5.6770) grad_norm 1.5367 (2.6731) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][300/625] eta 0:01:26 lr 0.000187 wd 0.0500 time 0.2690 (0.2666) data time 0.0008 (0.0027) model time 0.2682 (0.2637) loss 5.3688 (5.6668) grad_norm 2.4550 (2.6606) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][310/625] eta 0:01:23 lr 0.000187 wd 0.0500 time 0.2613 (0.2662) data time 0.0009 (0.0026) model time 0.2604 (0.2634) loss 5.4118 (5.6600) grad_norm 2.3602 (2.6582) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][320/625] eta 0:01:21 lr 0.000187 wd 0.0500 time 0.2539 (0.2659) data time 0.0010 (0.0026) model time 0.2529 (0.2630) loss 6.2774 (5.6588) grad_norm 2.7073 (2.6498) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][330/625] eta 0:01:18 lr 0.000187 wd 0.0500 time 0.4371 (0.2668) data time 0.0009 (0.0025) model time 0.4362 (0.2641) loss 6.3255 (5.6577) grad_norm 4.1943 (2.6494) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][340/625] eta 0:01:15 lr 0.000187 wd 0.0500 time 0.2599 (0.2664) data time 0.0011 (0.0025) model time 0.2589 (0.2638) loss 5.9453 (5.6610) grad_norm 1.5971 (2.6483) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][350/625] eta 0:01:13 lr 0.000186 wd 0.0500 time 0.2611 (0.2661) data time 0.0008 (0.0024) model time 0.2603 (0.2635) loss 6.2398 (5.6575) grad_norm 2.1436 (2.6617) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][360/625] eta 0:01:10 lr 0.000186 wd 0.0500 time 0.2555 (0.2659) data time 0.0008 (0.0024) model time 0.2547 (0.2632) loss 6.2110 (5.6575) grad_norm 3.9737 (2.6964) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][370/625] eta 0:01:07 lr 0.000186 wd 0.0500 time 0.2538 (0.2661) data time 0.0009 (0.0023) model time 0.2529 (0.2636) loss 4.9958 (5.6528) grad_norm 2.5758 (2.7440) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][380/625] eta 0:01:05 lr 0.000186 wd 0.0500 time 0.2551 (0.2659) data time 0.0008 (0.0023) model time 0.2543 (0.2633) loss 5.9459 (5.6586) grad_norm 2.0700 (2.7538) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][390/625] eta 0:01:02 lr 0.000186 wd 0.0500 time 0.2549 (0.2656) data time 0.0008 (0.0023) model time 0.2541 (0.2631) loss 4.8916 (5.6612) grad_norm 2.4248 (2.7521) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][400/625] eta 0:00:59 lr 0.000186 wd 0.0500 time 0.2576 (0.2654) data time 0.0009 (0.0022) model time 0.2567 (0.2629) loss 5.5822 (5.6732) grad_norm 2.2077 (2.7887) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][410/625] eta 0:00:57 lr 0.000186 wd 0.0500 time 0.4356 (0.2656) data time 0.0007 (0.0022) model time 0.4349 (0.2631) loss 6.1908 (5.6752) grad_norm 2.8597 (2.7931) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][420/625] eta 0:00:54 lr 0.000186 wd 0.0500 time 0.2562 (0.2658) data time 0.0008 (0.0022) model time 0.2553 (0.2635) loss 6.0225 (5.6800) grad_norm 3.9160 (2.8362) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][430/625] eta 0:00:51 lr 0.000186 wd 0.0500 time 0.2594 (0.2656) data time 0.0009 (0.0021) model time 0.2585 (0.2633) loss 5.5566 (5.6827) grad_norm 2.9381 (2.8247) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][440/625] eta 0:00:49 lr 0.000186 wd 0.0500 time 0.2510 (0.2654) data time 0.0007 (0.0021) model time 0.2504 (0.2630) loss 6.3101 (5.6824) grad_norm 1.9772 (2.8135) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][450/625] eta 0:00:46 lr 0.000185 wd 0.0500 time 0.2597 (0.2652) data time 0.0010 (0.0021) model time 0.2587 (0.2629) loss 6.1086 (5.6782) grad_norm 2.4225 (2.8084) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][460/625] eta 0:00:43 lr 0.000185 wd 0.0500 time 0.2530 (0.2650) data time 0.0008 (0.0021) model time 0.2522 (0.2627) loss 6.1375 (5.6803) grad_norm 2.0412 (2.8174) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][470/625] eta 0:00:41 lr 0.000185 wd 0.0500 time 0.2538 (0.2652) data time 0.0009 (0.0020) model time 0.2529 (0.2630) loss 5.9592 (5.6781) grad_norm 4.0928 (2.8128) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][480/625] eta 0:00:38 lr 0.000185 wd 0.0500 time 0.2555 (0.2650) data time 0.0010 (0.0020) model time 0.2545 (0.2628) loss 5.5412 (5.6773) grad_norm 2.1066 (2.8212) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][490/625] eta 0:00:35 lr 0.000185 wd 0.0500 time 0.2589 (0.2648) data time 0.0007 (0.0020) model time 0.2581 (0.2626) loss 6.2546 (5.6803) grad_norm 3.7914 (2.8474) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:11:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][500/625] eta 0:00:33 lr 0.000185 wd 0.0500 time 0.2600 (0.2647) data time 0.0009 (0.0020) model time 0.2591 (0.2624) loss 6.6255 (5.6828) grad_norm 1.8937 (2.8348) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][510/625] eta 0:00:30 lr 0.000185 wd 0.0500 time 0.2552 (0.2645) data time 0.0008 (0.0019) model time 0.2544 (0.2623) loss 5.3015 (5.6801) grad_norm 2.0440 (2.8266) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:12:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][520/625] eta 0:00:27 lr 0.000185 wd 0.0500 time 0.2477 (0.2644) data time 0.0006 (0.0019) model time 0.2471 (0.2622) loss 6.8953 (5.6871) grad_norm 2.6169 (2.8157) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:12:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][530/625] eta 0:00:25 lr 0.000185 wd 0.0500 time 0.2628 (0.2642) data time 0.0008 (0.0019) model time 0.2620 (0.2621) loss 4.9084 (5.6890) grad_norm 1.7354 (inf) loss_scale 256.0000 (510.5537) mem 9655MB [2024-08-04 09:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][540/625] eta 0:00:22 lr 0.000185 wd 0.0500 time 0.2604 (0.2645) data time 0.0006 (0.0019) model time 0.2598 (0.2623) loss 5.2677 (5.6926) grad_norm 2.3593 (inf) loss_scale 256.0000 (505.8484) mem 9655MB [2024-08-04 09:12:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][550/625] eta 0:00:19 lr 0.000185 wd 0.0500 time 0.2563 (0.2643) data time 0.0008 (0.0019) model time 0.2555 (0.2622) loss 5.9657 (5.6893) grad_norm 1.7965 (inf) loss_scale 256.0000 (501.3140) mem 9655MB [2024-08-04 09:12:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][560/625] eta 0:00:17 lr 0.000184 wd 0.0500 time 0.2549 (0.2642) data time 0.0009 (0.0018) model time 0.2539 (0.2621) loss 5.5583 (5.6917) grad_norm 2.2803 (inf) loss_scale 256.0000 (496.9412) mem 9655MB [2024-08-04 09:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][570/625] eta 0:00:14 lr 0.000184 wd 0.0500 time 0.2594 (0.2641) data time 0.0007 (0.0018) model time 0.2587 (0.2620) loss 5.6572 (5.6868) grad_norm 3.0887 (inf) loss_scale 256.0000 (492.7215) mem 9655MB [2024-08-04 09:12:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][580/625] eta 0:00:11 lr 0.000184 wd 0.0500 time 0.2595 (0.2639) data time 0.0007 (0.0018) model time 0.2588 (0.2618) loss 5.8800 (5.6861) grad_norm 2.1279 (inf) loss_scale 256.0000 (488.6472) mem 9655MB [2024-08-04 09:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][590/625] eta 0:00:09 lr 0.000184 wd 0.0500 time 0.2536 (0.2638) data time 0.0008 (0.0018) model time 0.2528 (0.2617) loss 5.8250 (5.6840) grad_norm 2.7705 (inf) loss_scale 256.0000 (484.7107) mem 9655MB [2024-08-04 09:12:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][600/625] eta 0:00:06 lr 0.000184 wd 0.0500 time 0.2590 (0.2638) data time 0.0006 (0.0018) model time 0.2584 (0.2617) loss 5.3071 (5.6809) grad_norm 6.1732 (inf) loss_scale 256.0000 (480.9052) mem 9655MB [2024-08-04 09:12:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][610/625] eta 0:00:03 lr 0.000184 wd 0.0500 time 0.2540 (0.2640) data time 0.0004 (0.0018) model time 0.2535 (0.2619) loss 5.5339 (5.6823) grad_norm 1.7463 (inf) loss_scale 256.0000 (477.2242) mem 9655MB [2024-08-04 09:12:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][620/625] eta 0:00:01 lr 0.000184 wd 0.0500 time 0.2553 (0.2638) data time 0.0005 (0.0018) model time 0.2548 (0.2618) loss 5.9113 (5.6823) grad_norm 4.3085 (inf) loss_scale 256.0000 (473.6618) mem 9655MB [2024-08-04 09:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 247 training takes 0:02:44 [2024-08-04 09:12:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:12:31 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.5874 (0.5874) Acc@1 90.283 (90.283) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 09:12:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.8999 (0.7124) Acc@1 81.104 (86.719) Acc@5 96.631 (97.812) Mem 9655MB [2024-08-04 09:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0127 (0.8308) Acc@1 77.344 (83.575) Acc@5 95.361 (96.608) Mem 9655MB [2024-08-04 09:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.237 Acc@5 96.607 [2024-08-04 09:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5830 (0.5830) Acc@1 89.941 (89.941) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.9077 (0.7092) Acc@1 80.811 (86.679) Acc@5 96.240 (97.745) Mem 9655MB [2024-08-04 09:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0176 (0.8306) Acc@1 77.686 (83.496) Acc@5 95.312 (96.526) Mem 9655MB [2024-08-04 09:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.197 Acc@5 96.521 [2024-08-04 09:12:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][0/625] eta 0:10:52 lr 0.000184 wd 0.0500 time 1.0439 (1.0439) data time 0.7417 (0.7417) model time 0.0000 (0.0000) loss 5.3775 (5.3775) grad_norm 1.5207 (1.5207) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][10/625] eta 0:03:22 lr 0.000184 wd 0.0500 time 0.2490 (0.3286) data time 0.0007 (0.0684) model time 0.0000 (0.0000) loss 6.2539 (5.5848) grad_norm 2.4109 (2.6559) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][20/625] eta 0:02:57 lr 0.000184 wd 0.0500 time 0.2546 (0.2939) data time 0.0006 (0.0362) model time 0.0000 (0.0000) loss 4.5766 (5.6836) grad_norm 3.1681 (2.5484) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][30/625] eta 0:02:50 lr 0.000183 wd 0.0500 time 0.2487 (0.2873) data time 0.0010 (0.0248) model time 0.0000 (0.0000) loss 6.3394 (5.6354) grad_norm 4.1003 (2.5796) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][40/625] eta 0:02:48 lr 0.000183 wd 0.0500 time 0.2520 (0.2885) data time 0.0008 (0.0190) model time 0.0000 (0.0000) loss 5.9488 (5.6128) grad_norm 3.0893 (2.6121) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][50/625] eta 0:02:44 lr 0.000183 wd 0.0500 time 0.2539 (0.2857) data time 0.0010 (0.0155) model time 0.0000 (0.0000) loss 5.4561 (5.6467) grad_norm 2.1277 (2.7766) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][60/625] eta 0:02:38 lr 0.000183 wd 0.0500 time 0.2559 (0.2808) data time 0.0009 (0.0131) model time 0.2550 (0.2549) loss 5.6224 (5.6466) grad_norm 2.4609 (2.7328) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][70/625] eta 0:02:33 lr 0.000183 wd 0.0500 time 0.2605 (0.2774) data time 0.0008 (0.0114) model time 0.2597 (0.2551) loss 5.8337 (5.6860) grad_norm 2.5144 (2.7400) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][80/625] eta 0:02:29 lr 0.000183 wd 0.0500 time 0.2768 (0.2750) data time 0.0006 (0.0101) model time 0.2761 (0.2556) loss 5.9338 (5.6579) grad_norm 3.4647 (2.7092) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][90/625] eta 0:02:25 lr 0.000183 wd 0.0500 time 0.2551 (0.2728) data time 0.0008 (0.0091) model time 0.2543 (0.2553) loss 4.8925 (5.6629) grad_norm 3.4829 (2.7166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][100/625] eta 0:02:23 lr 0.000183 wd 0.0500 time 0.2571 (0.2730) data time 0.0007 (0.0083) model time 0.2564 (0.2590) loss 6.1190 (5.6568) grad_norm 4.4601 (2.7191) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][110/625] eta 0:02:19 lr 0.000183 wd 0.0500 time 0.2544 (0.2715) data time 0.0010 (0.0077) model time 0.2534 (0.2584) loss 5.2128 (5.6474) grad_norm 3.7384 (2.7865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][120/625] eta 0:02:16 lr 0.000183 wd 0.0500 time 0.2685 (0.2702) data time 0.0006 (0.0071) model time 0.2679 (0.2580) loss 5.7418 (5.6394) grad_norm 1.7788 (2.8323) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][130/625] eta 0:02:13 lr 0.000183 wd 0.0500 time 0.2588 (0.2692) data time 0.0009 (0.0066) model time 0.2580 (0.2576) loss 5.8311 (5.6217) grad_norm 2.9368 (2.8353) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][140/625] eta 0:02:10 lr 0.000182 wd 0.0500 time 0.2553 (0.2683) data time 0.0008 (0.0062) model time 0.2544 (0.2574) loss 5.2959 (5.6114) grad_norm 2.2405 (2.8382) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][150/625] eta 0:02:07 lr 0.000182 wd 0.0500 time 0.2544 (0.2674) data time 0.0007 (0.0059) model time 0.2537 (0.2571) loss 4.1692 (5.5876) grad_norm 2.0950 (2.8018) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][160/625] eta 0:02:04 lr 0.000182 wd 0.0500 time 0.2546 (0.2667) data time 0.0010 (0.0056) model time 0.2536 (0.2570) loss 5.7187 (5.5769) grad_norm 1.9197 (2.7666) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][170/625] eta 0:02:01 lr 0.000182 wd 0.0500 time 0.2518 (0.2661) data time 0.0011 (0.0053) model time 0.2507 (0.2568) loss 6.5415 (5.5735) grad_norm 2.3243 (2.7521) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][180/625] eta 0:01:58 lr 0.000182 wd 0.0500 time 0.2599 (0.2657) data time 0.0007 (0.0050) model time 0.2592 (0.2568) loss 4.7087 (5.5729) grad_norm 2.6572 (2.8222) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][190/625] eta 0:01:55 lr 0.000182 wd 0.0500 time 0.2580 (0.2651) data time 0.0008 (0.0048) model time 0.2571 (0.2567) loss 6.1972 (5.5762) grad_norm 1.8967 (2.8472) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][200/625] eta 0:01:52 lr 0.000182 wd 0.0500 time 0.2514 (0.2656) data time 0.0011 (0.0046) model time 0.2503 (0.2578) loss 6.2990 (5.5747) grad_norm 1.7210 (2.8119) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][210/625] eta 0:01:50 lr 0.000182 wd 0.0500 time 0.2532 (0.2658) data time 0.0007 (0.0045) model time 0.2524 (0.2585) loss 6.2968 (5.5825) grad_norm 1.7142 (2.7727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][220/625] eta 0:01:47 lr 0.000182 wd 0.0500 time 0.2529 (0.2654) data time 0.0009 (0.0043) model time 0.2520 (0.2583) loss 5.3549 (5.5790) grad_norm 1.4725 (2.7459) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][230/625] eta 0:01:44 lr 0.000182 wd 0.0500 time 0.2543 (0.2657) data time 0.0006 (0.0042) model time 0.2537 (0.2591) loss 6.4004 (5.5848) grad_norm 3.4089 (2.7578) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][240/625] eta 0:01:42 lr 0.000181 wd 0.0500 time 0.2544 (0.2661) data time 0.0009 (0.0040) model time 0.2535 (0.2599) loss 5.7107 (5.5978) grad_norm 1.6404 (2.7783) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][250/625] eta 0:01:39 lr 0.000181 wd 0.0500 time 0.2568 (0.2661) data time 0.0007 (0.0039) model time 0.2562 (0.2602) loss 6.7078 (5.6087) grad_norm 8.0528 (2.7985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][260/625] eta 0:01:36 lr 0.000181 wd 0.0500 time 0.2580 (0.2657) data time 0.0007 (0.0038) model time 0.2573 (0.2599) loss 5.1519 (5.6037) grad_norm 2.2008 (2.8060) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][270/625] eta 0:01:34 lr 0.000181 wd 0.0500 time 0.2581 (0.2654) data time 0.0009 (0.0037) model time 0.2572 (0.2598) loss 5.7775 (5.5982) grad_norm 3.5541 (2.7953) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][280/625] eta 0:01:31 lr 0.000181 wd 0.0500 time 0.2575 (0.2651) data time 0.0008 (0.0036) model time 0.2567 (0.2595) loss 5.8780 (5.5934) grad_norm 5.6700 (2.8388) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][290/625] eta 0:01:28 lr 0.000181 wd 0.0500 time 0.2567 (0.2652) data time 0.0007 (0.0035) model time 0.2560 (0.2599) loss 4.8632 (5.5980) grad_norm 2.3665 (2.8206) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][300/625] eta 0:01:26 lr 0.000181 wd 0.0500 time 0.2553 (0.2649) data time 0.0007 (0.0034) model time 0.2546 (0.2597) loss 5.8864 (5.5961) grad_norm 6.0104 (2.8300) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:13:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][310/625] eta 0:01:23 lr 0.000181 wd 0.0500 time 0.2586 (0.2651) data time 0.0011 (0.0033) model time 0.2575 (0.2601) loss 4.5273 (5.5919) grad_norm 4.4050 (2.8497) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][320/625] eta 0:01:20 lr 0.000181 wd 0.0500 time 0.4647 (0.2655) data time 0.0007 (0.0033) model time 0.4640 (0.2607) loss 6.4241 (5.5910) grad_norm 2.9579 (2.9030) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][330/625] eta 0:01:18 lr 0.000181 wd 0.0500 time 0.2560 (0.2652) data time 0.0009 (0.0032) model time 0.2551 (0.2605) loss 5.8189 (5.5935) grad_norm 3.0003 (2.8894) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][340/625] eta 0:01:15 lr 0.000180 wd 0.0500 time 0.2696 (0.2654) data time 0.0010 (0.0031) model time 0.2686 (0.2609) loss 5.0445 (5.5990) grad_norm 3.3112 (2.9047) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][350/625] eta 0:01:13 lr 0.000180 wd 0.0500 time 0.2547 (0.2656) data time 0.0008 (0.0031) model time 0.2539 (0.2613) loss 4.7804 (5.5916) grad_norm 2.2685 (2.8861) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][360/625] eta 0:01:10 lr 0.000180 wd 0.0500 time 0.2555 (0.2658) data time 0.0007 (0.0030) model time 0.2548 (0.2616) loss 5.3285 (5.5888) grad_norm 1.4428 (2.8758) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][370/625] eta 0:01:07 lr 0.000180 wd 0.0500 time 0.2541 (0.2655) data time 0.0009 (0.0029) model time 0.2531 (0.2613) loss 5.5680 (5.5886) grad_norm 1.9164 (2.8558) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][380/625] eta 0:01:05 lr 0.000180 wd 0.0500 time 0.3993 (0.2656) data time 0.0007 (0.0029) model time 0.3986 (0.2616) loss 5.2283 (5.5887) grad_norm 1.9050 (2.8634) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][390/625] eta 0:01:02 lr 0.000180 wd 0.0500 time 0.2554 (0.2654) data time 0.0006 (0.0028) model time 0.2548 (0.2614) loss 6.3859 (5.5824) grad_norm 4.1406 (2.8694) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][400/625] eta 0:00:59 lr 0.000180 wd 0.0500 time 0.2563 (0.2652) data time 0.0008 (0.0028) model time 0.2555 (0.2613) loss 4.4525 (5.5765) grad_norm 1.8345 (2.8582) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][410/625] eta 0:00:57 lr 0.000180 wd 0.0500 time 0.4511 (0.2655) data time 0.0010 (0.0027) model time 0.4500 (0.2617) loss 5.7269 (5.5764) grad_norm 1.9307 (2.8567) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][420/625] eta 0:00:54 lr 0.000180 wd 0.0500 time 0.2520 (0.2653) data time 0.0007 (0.0027) model time 0.2513 (0.2615) loss 6.0718 (5.5840) grad_norm 1.4737 (2.8502) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][430/625] eta 0:00:51 lr 0.000180 wd 0.0500 time 0.2544 (0.2651) data time 0.0008 (0.0027) model time 0.2535 (0.2614) loss 5.4701 (5.5916) grad_norm 2.9411 (2.8373) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][440/625] eta 0:00:49 lr 0.000179 wd 0.0500 time 0.2520 (0.2653) data time 0.0011 (0.0026) model time 0.2510 (0.2617) loss 5.1003 (5.5926) grad_norm 3.4249 (2.8244) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][450/625] eta 0:00:46 lr 0.000179 wd 0.0500 time 0.2584 (0.2651) data time 0.0007 (0.0026) model time 0.2576 (0.2616) loss 5.0979 (5.5948) grad_norm 3.5667 (2.8191) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][460/625] eta 0:00:43 lr 0.000179 wd 0.0500 time 0.2559 (0.2656) data time 0.0008 (0.0025) model time 0.2551 (0.2622) loss 6.3037 (5.5897) grad_norm 3.1052 (2.8139) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][470/625] eta 0:00:41 lr 0.000179 wd 0.0500 time 0.2530 (0.2654) data time 0.0009 (0.0025) model time 0.2522 (0.2620) loss 6.0738 (5.5998) grad_norm 2.2726 (2.8072) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][480/625] eta 0:00:38 lr 0.000179 wd 0.0500 time 0.2508 (0.2652) data time 0.0020 (0.0025) model time 0.2488 (0.2619) loss 6.8889 (5.5983) grad_norm 2.1352 (2.8020) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][490/625] eta 0:00:35 lr 0.000179 wd 0.0500 time 0.2536 (0.2650) data time 0.0007 (0.0024) model time 0.2529 (0.2617) loss 5.3514 (5.6106) grad_norm 3.8607 (2.8022) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][500/625] eta 0:00:33 lr 0.000179 wd 0.0500 time 0.2543 (0.2649) data time 0.0008 (0.0024) model time 0.2535 (0.2616) loss 5.2400 (5.6118) grad_norm 3.6691 (2.8486) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][510/625] eta 0:00:30 lr 0.000179 wd 0.0500 time 0.2612 (0.2651) data time 0.0006 (0.0024) model time 0.2606 (0.2618) loss 6.4706 (5.6132) grad_norm 2.4817 (2.8393) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][520/625] eta 0:00:27 lr 0.000179 wd 0.0500 time 0.2548 (0.2649) data time 0.0007 (0.0024) model time 0.2541 (0.2617) loss 6.2809 (5.6168) grad_norm 1.9836 (2.8377) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][530/625] eta 0:00:25 lr 0.000179 wd 0.0500 time 0.2569 (0.2651) data time 0.0008 (0.0023) model time 0.2561 (0.2619) loss 4.3905 (5.6180) grad_norm 1.9770 (2.8268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:14:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][540/625] eta 0:00:22 lr 0.000179 wd 0.0500 time 0.2535 (0.2649) data time 0.0008 (0.0023) model time 0.2527 (0.2618) loss 7.0459 (5.6179) grad_norm 8.2244 (2.8382) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][550/625] eta 0:00:19 lr 0.000178 wd 0.0500 time 0.2542 (0.2647) data time 0.0009 (0.0023) model time 0.2533 (0.2617) loss 5.6054 (5.6150) grad_norm 1.7386 (2.8313) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][560/625] eta 0:00:17 lr 0.000178 wd 0.0500 time 0.2551 (0.2646) data time 0.0008 (0.0023) model time 0.2543 (0.2615) loss 4.6860 (5.6118) grad_norm 3.0294 (2.8324) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][570/625] eta 0:00:14 lr 0.000178 wd 0.0500 time 0.2535 (0.2644) data time 0.0010 (0.0022) model time 0.2526 (0.2614) loss 4.6887 (5.6116) grad_norm 2.5607 (2.8357) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][580/625] eta 0:00:11 lr 0.000178 wd 0.0500 time 0.2521 (0.2643) data time 0.0006 (0.0022) model time 0.2515 (0.2613) loss 6.7098 (5.6095) grad_norm 2.4401 (2.8374) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][590/625] eta 0:00:09 lr 0.000178 wd 0.0500 time 0.2522 (0.2641) data time 0.0007 (0.0022) model time 0.2515 (0.2612) loss 5.4659 (5.6076) grad_norm 1.8527 (2.8308) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][600/625] eta 0:00:06 lr 0.000178 wd 0.0500 time 0.2625 (0.2640) data time 0.0011 (0.0022) model time 0.2614 (0.2611) loss 5.9578 (5.6087) grad_norm 2.3832 (2.8273) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][610/625] eta 0:00:03 lr 0.000178 wd 0.0500 time 0.2575 (0.2639) data time 0.0004 (0.0022) model time 0.2571 (0.2610) loss 6.1611 (5.6118) grad_norm 2.7822 (2.8312) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][620/625] eta 0:00:01 lr 0.000178 wd 0.0500 time 0.2506 (0.2637) data time 0.0004 (0.0021) model time 0.2502 (0.2608) loss 5.3716 (5.6133) grad_norm 2.1470 (2.8342) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 248 training takes 0:02:44 [2024-08-04 09:15:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:15:20 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.498 (0.498) Loss 0.5991 (0.5991) Acc@1 89.746 (89.746) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 09:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.097) Loss 0.9097 (0.7213) Acc@1 81.689 (86.776) Acc@5 96.240 (97.785) Mem 9655MB [2024-08-04 09:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0068 (0.8392) Acc@1 77.734 (83.708) Acc@5 95.508 (96.615) Mem 9655MB [2024-08-04 09:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.397 Acc@5 96.627 [2024-08-04 09:15:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:15:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.731 (0.731) Loss 0.5825 (0.5825) Acc@1 89.990 (89.990) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.129) Loss 0.9072 (0.7090) Acc@1 81.006 (86.719) Acc@5 96.240 (97.741) Mem 9655MB [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0166 (0.8302) Acc@1 77.686 (83.531) Acc@5 95.312 (96.522) Mem 9655MB [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.231 Acc@5 96.521 [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.23% [2024-08-04 09:15:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:15:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][0/625] eta 0:08:02 lr 0.000178 wd 0.0500 time 0.7727 (0.7727) data time 0.5150 (0.5150) model time 0.0000 (0.0000) loss 6.1974 (6.1974) grad_norm 5.4556 (5.4556) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][10/625] eta 0:03:16 lr 0.000178 wd 0.0500 time 0.2554 (0.3203) data time 0.0007 (0.0476) model time 0.0000 (0.0000) loss 5.8591 (5.6686) grad_norm 1.9149 (2.6568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][20/625] eta 0:02:55 lr 0.000178 wd 0.0500 time 0.2561 (0.2895) data time 0.0006 (0.0254) model time 0.0000 (0.0000) loss 5.0829 (5.6428) grad_norm 2.1886 (2.5895) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][30/625] eta 0:02:49 lr 0.000177 wd 0.0500 time 0.2581 (0.2854) data time 0.0006 (0.0175) model time 0.0000 (0.0000) loss 5.7479 (5.6482) grad_norm 1.9942 (2.7610) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][40/625] eta 0:02:42 lr 0.000177 wd 0.0500 time 0.2559 (0.2781) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 6.2048 (5.6561) grad_norm 2.0990 (2.7345) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][50/625] eta 0:02:39 lr 0.000177 wd 0.0500 time 0.2570 (0.2781) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 5.9725 (5.6837) grad_norm 2.7526 (2.8373) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][60/625] eta 0:02:35 lr 0.000177 wd 0.0500 time 0.2547 (0.2744) data time 0.0010 (0.0093) model time 0.2536 (0.2549) loss 4.7792 (5.6337) grad_norm 3.8799 (3.0452) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][70/625] eta 0:02:30 lr 0.000177 wd 0.0500 time 0.2580 (0.2718) data time 0.0006 (0.0081) model time 0.2574 (0.2548) loss 6.0450 (5.6679) grad_norm 4.4866 (3.1314) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][80/625] eta 0:02:27 lr 0.000177 wd 0.0500 time 0.2590 (0.2699) data time 0.0007 (0.0073) model time 0.2583 (0.2549) loss 4.5069 (5.6721) grad_norm 2.3336 (3.1618) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][90/625] eta 0:02:23 lr 0.000177 wd 0.0500 time 0.2606 (0.2685) data time 0.0008 (0.0066) model time 0.2598 (0.2552) loss 6.7222 (5.6943) grad_norm 2.1287 (3.1232) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][100/625] eta 0:02:21 lr 0.000177 wd 0.0500 time 0.2555 (0.2694) data time 0.0007 (0.0060) model time 0.2548 (0.2595) loss 6.9777 (5.6881) grad_norm 1.8992 (3.0664) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][110/625] eta 0:02:18 lr 0.000177 wd 0.0500 time 0.2542 (0.2698) data time 0.0009 (0.0056) model time 0.2534 (0.2617) loss 5.8632 (5.6734) grad_norm 2.6729 (2.9923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][120/625] eta 0:02:15 lr 0.000177 wd 0.0500 time 0.2541 (0.2686) data time 0.0007 (0.0052) model time 0.2534 (0.2608) loss 5.0396 (5.6619) grad_norm 1.5175 (2.9083) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][130/625] eta 0:02:12 lr 0.000176 wd 0.0500 time 0.2577 (0.2677) data time 0.0007 (0.0048) model time 0.2570 (0.2602) loss 4.5916 (5.6336) grad_norm 1.8757 (2.8917) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][140/625] eta 0:02:10 lr 0.000176 wd 0.0500 time 0.2577 (0.2684) data time 0.0009 (0.0046) model time 0.2569 (0.2620) loss 5.7388 (5.6365) grad_norm 1.8440 (2.9370) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][150/625] eta 0:02:07 lr 0.000176 wd 0.0500 time 0.2540 (0.2677) data time 0.0009 (0.0043) model time 0.2531 (0.2615) loss 6.1945 (5.6578) grad_norm 1.9015 (2.9377) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][160/625] eta 0:02:04 lr 0.000176 wd 0.0500 time 0.2559 (0.2670) data time 0.0009 (0.0041) model time 0.2550 (0.2609) loss 6.6009 (5.6786) grad_norm 3.1741 (2.9691) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][170/625] eta 0:02:01 lr 0.000176 wd 0.0500 time 0.3871 (0.2671) data time 0.0009 (0.0039) model time 0.3862 (0.2615) loss 5.2350 (5.6674) grad_norm 2.0550 (2.9301) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][180/625] eta 0:01:58 lr 0.000176 wd 0.0500 time 0.2561 (0.2665) data time 0.0007 (0.0037) model time 0.2554 (0.2611) loss 6.2581 (5.6647) grad_norm 3.0860 (2.9326) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][190/625] eta 0:01:55 lr 0.000176 wd 0.0500 time 0.2509 (0.2659) data time 0.0007 (0.0036) model time 0.2502 (0.2606) loss 5.5541 (5.6685) grad_norm 2.0416 (2.8991) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][200/625] eta 0:01:52 lr 0.000176 wd 0.0500 time 0.2538 (0.2654) data time 0.0006 (0.0035) model time 0.2532 (0.2602) loss 5.7478 (5.6671) grad_norm 3.0656 (2.8612) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][210/625] eta 0:01:49 lr 0.000176 wd 0.0500 time 0.2541 (0.2650) data time 0.0010 (0.0034) model time 0.2531 (0.2599) loss 5.2472 (5.6664) grad_norm 2.1796 (2.8457) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][220/625] eta 0:01:47 lr 0.000176 wd 0.0500 time 0.2528 (0.2651) data time 0.0008 (0.0032) model time 0.2520 (0.2603) loss 6.7286 (5.6562) grad_norm 2.9804 (2.8294) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][230/625] eta 0:01:44 lr 0.000175 wd 0.0500 time 0.2564 (0.2648) data time 0.0007 (0.0031) model time 0.2556 (0.2601) loss 5.4608 (5.6596) grad_norm 4.2178 (2.8268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][240/625] eta 0:01:42 lr 0.000175 wd 0.0500 time 0.2544 (0.2652) data time 0.0009 (0.0030) model time 0.2536 (0.2608) loss 5.6230 (5.6593) grad_norm 2.2675 (2.8239) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][250/625] eta 0:01:39 lr 0.000175 wd 0.0500 time 0.2553 (0.2648) data time 0.0008 (0.0030) model time 0.2545 (0.2605) loss 5.2692 (5.6609) grad_norm 5.0228 (2.8383) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][260/625] eta 0:01:36 lr 0.000175 wd 0.0500 time 0.2523 (0.2645) data time 0.0008 (0.0029) model time 0.2516 (0.2603) loss 4.7676 (5.6616) grad_norm 4.8615 (2.8886) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][270/625] eta 0:01:33 lr 0.000175 wd 0.0500 time 0.2535 (0.2642) data time 0.0011 (0.0028) model time 0.2524 (0.2601) loss 5.8847 (5.6575) grad_norm 2.5587 (2.9103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][280/625] eta 0:01:31 lr 0.000175 wd 0.0500 time 0.2588 (0.2639) data time 0.0006 (0.0027) model time 0.2582 (0.2599) loss 5.5667 (5.6587) grad_norm 2.1208 (2.8944) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][290/625] eta 0:01:28 lr 0.000175 wd 0.0500 time 0.3794 (0.2641) data time 0.0009 (0.0027) model time 0.3785 (0.2602) loss 6.5358 (5.6672) grad_norm 4.9181 (2.8772) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][300/625] eta 0:01:25 lr 0.000175 wd 0.0500 time 0.2555 (0.2638) data time 0.0006 (0.0026) model time 0.2549 (0.2600) loss 4.6689 (5.6760) grad_norm 1.7874 (2.8453) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][310/625] eta 0:01:23 lr 0.000175 wd 0.0500 time 0.2499 (0.2636) data time 0.0009 (0.0026) model time 0.2489 (0.2598) loss 6.2852 (5.6714) grad_norm 2.9334 (2.8485) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][320/625] eta 0:01:20 lr 0.000175 wd 0.0500 time 0.2580 (0.2633) data time 0.0006 (0.0025) model time 0.2574 (0.2596) loss 5.9716 (5.6663) grad_norm 4.1064 (2.8470) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][330/625] eta 0:01:17 lr 0.000175 wd 0.0500 time 0.2553 (0.2632) data time 0.0009 (0.0025) model time 0.2544 (0.2595) loss 6.3790 (5.6694) grad_norm 2.6000 (2.8471) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][340/625] eta 0:01:14 lr 0.000174 wd 0.0500 time 0.2568 (0.2630) data time 0.0011 (0.0024) model time 0.2557 (0.2594) loss 6.2078 (5.6641) grad_norm 1.8744 (2.8468) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][350/625] eta 0:01:12 lr 0.000174 wd 0.0500 time 0.2593 (0.2638) data time 0.0007 (0.0024) model time 0.2586 (0.2605) loss 5.5977 (5.6645) grad_norm 2.3079 (2.8276) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][360/625] eta 0:01:09 lr 0.000174 wd 0.0500 time 0.2591 (0.2636) data time 0.0008 (0.0023) model time 0.2584 (0.2603) loss 5.9745 (5.6668) grad_norm 2.1154 (2.8387) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][370/625] eta 0:01:07 lr 0.000174 wd 0.0500 time 0.2565 (0.2634) data time 0.0008 (0.0023) model time 0.2556 (0.2601) loss 5.5222 (5.6708) grad_norm 4.1425 (2.9109) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][380/625] eta 0:01:04 lr 0.000174 wd 0.0500 time 0.2535 (0.2632) data time 0.0009 (0.0023) model time 0.2526 (0.2600) loss 6.2542 (5.6745) grad_norm 1.8121 (2.9127) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][390/625] eta 0:01:01 lr 0.000174 wd 0.0500 time 0.2666 (0.2631) data time 0.0009 (0.0022) model time 0.2657 (0.2599) loss 5.8166 (5.6775) grad_norm 2.8951 (2.9279) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][400/625] eta 0:00:59 lr 0.000174 wd 0.0500 time 0.2552 (0.2629) data time 0.0007 (0.0022) model time 0.2545 (0.2598) loss 4.7900 (5.6729) grad_norm 2.1809 (2.9148) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][410/625] eta 0:00:56 lr 0.000174 wd 0.0500 time 0.2538 (0.2627) data time 0.0009 (0.0022) model time 0.2529 (0.2596) loss 5.4231 (5.6761) grad_norm 2.4500 (2.9077) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][420/625] eta 0:00:53 lr 0.000174 wd 0.0500 time 0.2509 (0.2625) data time 0.0012 (0.0021) model time 0.2497 (0.2595) loss 5.2278 (5.6740) grad_norm 1.5535 (2.8914) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][430/625] eta 0:00:51 lr 0.000174 wd 0.0500 time 0.2557 (0.2624) data time 0.0010 (0.0021) model time 0.2547 (0.2594) loss 5.4995 (5.6738) grad_norm 3.6179 (2.8754) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][440/625] eta 0:00:48 lr 0.000173 wd 0.0500 time 0.2536 (0.2622) data time 0.0011 (0.0021) model time 0.2525 (0.2592) loss 4.8677 (5.6765) grad_norm 2.2986 (2.8580) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][450/625] eta 0:00:45 lr 0.000173 wd 0.0500 time 0.2620 (0.2625) data time 0.0007 (0.0021) model time 0.2613 (0.2596) loss 5.8375 (5.6747) grad_norm 2.4706 (2.8421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][460/625] eta 0:00:43 lr 0.000173 wd 0.0500 time 0.2537 (0.2627) data time 0.0010 (0.0020) model time 0.2526 (0.2599) loss 6.1614 (5.6786) grad_norm 2.0262 (2.8262) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][470/625] eta 0:00:40 lr 0.000173 wd 0.0500 time 0.2563 (0.2626) data time 0.0007 (0.0020) model time 0.2556 (0.2598) loss 5.3528 (5.6823) grad_norm 2.6677 (2.8158) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][480/625] eta 0:00:38 lr 0.000173 wd 0.0500 time 0.2599 (0.2632) data time 0.0008 (0.0020) model time 0.2591 (0.2605) loss 5.6523 (5.6788) grad_norm 3.0381 (2.8040) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][490/625] eta 0:00:35 lr 0.000173 wd 0.0500 time 0.2589 (0.2630) data time 0.0007 (0.0020) model time 0.2582 (0.2604) loss 5.3107 (5.6752) grad_norm 3.5047 (2.8005) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][500/625] eta 0:00:32 lr 0.000173 wd 0.0500 time 0.2552 (0.2633) data time 0.0007 (0.0019) model time 0.2545 (0.2607) loss 5.9803 (5.6735) grad_norm 1.7313 (2.7917) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][510/625] eta 0:00:30 lr 0.000173 wd 0.0500 time 0.4322 (0.2635) data time 0.0007 (0.0019) model time 0.4314 (0.2610) loss 6.0296 (5.6730) grad_norm 2.5436 (2.8079) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][520/625] eta 0:00:27 lr 0.000173 wd 0.0500 time 0.2551 (0.2634) data time 0.0009 (0.0019) model time 0.2542 (0.2609) loss 6.1275 (5.6728) grad_norm 2.4640 (2.8013) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][530/625] eta 0:00:25 lr 0.000173 wd 0.0500 time 0.4420 (0.2636) data time 0.0009 (0.0019) model time 0.4411 (0.2612) loss 6.0902 (5.6738) grad_norm 4.8771 (2.8098) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][540/625] eta 0:00:22 lr 0.000173 wd 0.0500 time 0.2591 (0.2635) data time 0.0009 (0.0019) model time 0.2582 (0.2611) loss 5.8326 (5.6734) grad_norm 2.0482 (2.8026) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][550/625] eta 0:00:19 lr 0.000172 wd 0.0500 time 0.2543 (0.2638) data time 0.0009 (0.0019) model time 0.2534 (0.2614) loss 5.1812 (5.6754) grad_norm 1.8108 (2.7921) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][560/625] eta 0:00:17 lr 0.000172 wd 0.0500 time 0.4346 (0.2640) data time 0.0007 (0.0018) model time 0.4339 (0.2617) loss 5.2795 (5.6748) grad_norm 1.7547 (2.7851) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][570/625] eta 0:00:14 lr 0.000172 wd 0.0500 time 0.2549 (0.2640) data time 0.0010 (0.0018) model time 0.2539 (0.2617) loss 6.1183 (5.6733) grad_norm 2.5240 (2.7861) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:17:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][580/625] eta 0:00:11 lr 0.000172 wd 0.0500 time 0.2531 (0.2639) data time 0.0010 (0.0018) model time 0.2521 (0.2616) loss 5.7827 (5.6744) grad_norm 1.7169 (2.7775) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][590/625] eta 0:00:09 lr 0.000172 wd 0.0500 time 0.2565 (0.2638) data time 0.0007 (0.0018) model time 0.2558 (0.2615) loss 6.1181 (5.6738) grad_norm 1.8708 (2.7670) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][600/625] eta 0:00:06 lr 0.000172 wd 0.0500 time 0.2558 (0.2636) data time 0.0011 (0.0018) model time 0.2547 (0.2614) loss 5.1343 (5.6752) grad_norm 1.9012 (2.7674) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][610/625] eta 0:00:03 lr 0.000172 wd 0.0500 time 0.2545 (0.2638) data time 0.0004 (0.0018) model time 0.2541 (0.2616) loss 5.0329 (5.6696) grad_norm 2.4136 (2.7629) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][620/625] eta 0:00:01 lr 0.000172 wd 0.0500 time 0.2540 (0.2637) data time 0.0004 (0.0018) model time 0.2536 (0.2614) loss 7.0200 (5.6710) grad_norm 5.6860 (2.7741) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 249 training takes 0:02:44 [2024-08-04 09:18:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:18:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5864 (0.5864) Acc@1 90.381 (90.381) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.8979 (0.7098) Acc@1 81.689 (86.936) Acc@5 96.533 (97.807) Mem 9655MB [2024-08-04 09:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0059 (0.8320) Acc@1 78.613 (83.731) Acc@5 95.459 (96.601) Mem 9655MB [2024-08-04 09:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.413 Acc@5 96.603 [2024-08-04 09:18:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.740 (0.740) Loss 0.5830 (0.5830) Acc@1 89.990 (89.990) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9053 (0.7087) Acc@1 80.957 (86.723) Acc@5 96.240 (97.745) Mem 9655MB [2024-08-04 09:18:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0166 (0.8298) Acc@1 77.588 (83.531) Acc@5 95.361 (96.533) Mem 9655MB [2024-08-04 09:18:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.229 Acc@5 96.531 [2024-08-04 09:18:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-04 09:18:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][0/625] eta 0:12:06 lr 0.000172 wd 0.0500 time 1.1619 (1.1619) data time 0.5477 (0.5477) model time 0.0000 (0.0000) loss 5.3047 (5.3047) grad_norm 4.7436 (4.7436) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][10/625] eta 0:03:33 lr 0.000172 wd 0.0500 time 0.3652 (0.3478) data time 0.0007 (0.0506) model time 0.0000 (0.0000) loss 5.7253 (5.6781) grad_norm 2.2190 (2.9779) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][20/625] eta 0:03:03 lr 0.000172 wd 0.0500 time 0.2540 (0.3041) data time 0.0010 (0.0269) model time 0.0000 (0.0000) loss 5.3650 (5.6085) grad_norm 3.0009 (2.6551) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][30/625] eta 0:02:55 lr 0.000171 wd 0.0500 time 0.2537 (0.2954) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 5.4946 (5.6899) grad_norm 2.4160 (2.4913) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][40/625] eta 0:02:47 lr 0.000171 wd 0.0500 time 0.2543 (0.2857) data time 0.0007 (0.0143) model time 0.0000 (0.0000) loss 5.8239 (5.6439) grad_norm 3.4873 (2.5096) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][50/625] eta 0:02:40 lr 0.000171 wd 0.0500 time 0.2543 (0.2797) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 5.0952 (5.5759) grad_norm 2.0224 (2.4965) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][60/625] eta 0:02:35 lr 0.000171 wd 0.0500 time 0.2512 (0.2757) data time 0.0009 (0.0099) model time 0.2503 (0.2544) loss 5.8863 (5.6002) grad_norm 4.4331 (2.5026) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][70/625] eta 0:02:33 lr 0.000171 wd 0.0500 time 0.2563 (0.2758) data time 0.0006 (0.0086) model time 0.2557 (0.2648) loss 4.7609 (5.6303) grad_norm 1.9084 (2.4849) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][80/625] eta 0:02:28 lr 0.000171 wd 0.0500 time 0.2577 (0.2733) data time 0.0010 (0.0077) model time 0.2567 (0.2615) loss 6.2891 (5.6056) grad_norm 2.5584 (2.4549) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][90/625] eta 0:02:25 lr 0.000171 wd 0.0500 time 0.2512 (0.2715) data time 0.0008 (0.0069) model time 0.2503 (0.2601) loss 4.7602 (5.6011) grad_norm 2.4787 (2.4175) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][100/625] eta 0:02:22 lr 0.000171 wd 0.0500 time 0.2571 (0.2720) data time 0.0011 (0.0064) model time 0.2560 (0.2631) loss 5.9917 (5.5948) grad_norm 2.1158 (2.3926) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][110/625] eta 0:02:19 lr 0.000171 wd 0.0500 time 0.2541 (0.2707) data time 0.0006 (0.0059) model time 0.2535 (0.2620) loss 5.6350 (5.6180) grad_norm 2.5844 (2.4349) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][120/625] eta 0:02:16 lr 0.000171 wd 0.0500 time 0.2605 (0.2712) data time 0.0009 (0.0055) model time 0.2596 (0.2641) loss 5.4473 (5.6117) grad_norm 1.7452 (2.4338) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][130/625] eta 0:02:13 lr 0.000171 wd 0.0500 time 0.2542 (0.2702) data time 0.0008 (0.0051) model time 0.2533 (0.2631) loss 5.2477 (5.6079) grad_norm 2.1487 (2.4609) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][140/625] eta 0:02:11 lr 0.000170 wd 0.0500 time 0.2561 (0.2720) data time 0.0010 (0.0048) model time 0.2551 (0.2667) loss 5.4240 (5.6187) grad_norm 2.6242 (2.4691) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][150/625] eta 0:02:09 lr 0.000170 wd 0.0500 time 0.2576 (0.2725) data time 0.0009 (0.0046) model time 0.2567 (0.2679) loss 5.7988 (5.6325) grad_norm 2.6968 (2.5063) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][160/625] eta 0:02:06 lr 0.000170 wd 0.0500 time 0.2583 (0.2718) data time 0.0008 (0.0044) model time 0.2575 (0.2671) loss 4.8919 (5.6319) grad_norm 3.0402 (2.5945) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][170/625] eta 0:02:03 lr 0.000170 wd 0.0500 time 0.2576 (0.2709) data time 0.0012 (0.0042) model time 0.2564 (0.2662) loss 5.7436 (5.6420) grad_norm 2.4250 (2.7808) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][180/625] eta 0:02:00 lr 0.000170 wd 0.0500 time 0.2497 (0.2707) data time 0.0009 (0.0040) model time 0.2489 (0.2662) loss 6.0566 (5.6360) grad_norm 1.8387 (2.8801) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][190/625] eta 0:01:57 lr 0.000170 wd 0.0500 time 0.2553 (0.2700) data time 0.0009 (0.0038) model time 0.2545 (0.2655) loss 5.0440 (5.6365) grad_norm 3.4822 (2.8865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][200/625] eta 0:01:54 lr 0.000170 wd 0.0500 time 0.2557 (0.2693) data time 0.0008 (0.0037) model time 0.2549 (0.2648) loss 6.4234 (5.6329) grad_norm 2.2670 (2.8855) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][210/625] eta 0:01:52 lr 0.000170 wd 0.0500 time 0.2568 (0.2702) data time 0.0009 (0.0036) model time 0.2559 (0.2662) loss 6.1449 (5.6328) grad_norm 2.1758 (2.8499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][220/625] eta 0:01:49 lr 0.000170 wd 0.0500 time 0.2567 (0.2696) data time 0.0008 (0.0034) model time 0.2559 (0.2656) loss 5.7570 (5.6160) grad_norm 1.8050 (2.8097) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][230/625] eta 0:01:46 lr 0.000170 wd 0.0500 time 0.2594 (0.2690) data time 0.0010 (0.0033) model time 0.2584 (0.2650) loss 5.6622 (5.6120) grad_norm 2.1961 (2.7962) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][240/625] eta 0:01:43 lr 0.000169 wd 0.0500 time 0.2589 (0.2685) data time 0.0006 (0.0032) model time 0.2583 (0.2645) loss 5.1704 (5.6058) grad_norm 3.7372 (2.7967) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][250/625] eta 0:01:40 lr 0.000169 wd 0.0500 time 0.2541 (0.2680) data time 0.0011 (0.0031) model time 0.2530 (0.2641) loss 4.6927 (5.5992) grad_norm 2.1716 (2.7922) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][260/625] eta 0:01:37 lr 0.000169 wd 0.0500 time 0.2618 (0.2676) data time 0.0006 (0.0031) model time 0.2612 (0.2637) loss 6.5230 (5.5981) grad_norm 2.5847 (2.7760) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][270/625] eta 0:01:34 lr 0.000169 wd 0.0500 time 0.2486 (0.2672) data time 0.0007 (0.0030) model time 0.2478 (0.2633) loss 5.0490 (5.6004) grad_norm 2.8363 (2.7563) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][280/625] eta 0:01:32 lr 0.000169 wd 0.0500 time 0.2627 (0.2668) data time 0.0008 (0.0029) model time 0.2618 (0.2630) loss 6.2607 (5.5993) grad_norm 2.2684 (2.7426) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][290/625] eta 0:01:29 lr 0.000169 wd 0.0500 time 0.2531 (0.2664) data time 0.0008 (0.0028) model time 0.2522 (0.2627) loss 6.7183 (5.5945) grad_norm 1.9843 (2.7181) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][300/625] eta 0:01:26 lr 0.000169 wd 0.0500 time 0.2643 (0.2661) data time 0.0008 (0.0028) model time 0.2636 (0.2624) loss 4.6430 (5.5962) grad_norm 1.4623 (2.6916) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][310/625] eta 0:01:23 lr 0.000169 wd 0.0500 time 0.2590 (0.2658) data time 0.0008 (0.0027) model time 0.2582 (0.2621) loss 5.5078 (5.5967) grad_norm 3.8665 (2.7131) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][320/625] eta 0:01:20 lr 0.000169 wd 0.0500 time 0.2586 (0.2655) data time 0.0010 (0.0027) model time 0.2577 (0.2619) loss 5.3815 (5.5914) grad_norm 2.9320 (2.7053) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][330/625] eta 0:01:18 lr 0.000169 wd 0.0500 time 0.2513 (0.2653) data time 0.0010 (0.0026) model time 0.2503 (0.2617) loss 6.0611 (5.6059) grad_norm 5.0336 (2.7108) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][340/625] eta 0:01:15 lr 0.000169 wd 0.0500 time 0.2606 (0.2650) data time 0.0010 (0.0026) model time 0.2596 (0.2615) loss 4.7830 (5.6025) grad_norm 1.7409 (2.6966) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][350/625] eta 0:01:12 lr 0.000168 wd 0.0500 time 0.2540 (0.2647) data time 0.0010 (0.0025) model time 0.2530 (0.2612) loss 6.4179 (5.5980) grad_norm 1.6777 (2.6717) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][360/625] eta 0:01:10 lr 0.000168 wd 0.0500 time 0.2568 (0.2645) data time 0.0006 (0.0025) model time 0.2562 (0.2611) loss 5.4359 (5.6019) grad_norm 3.0373 (2.6729) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][370/625] eta 0:01:07 lr 0.000168 wd 0.0500 time 0.2496 (0.2643) data time 0.0008 (0.0024) model time 0.2487 (0.2609) loss 6.3576 (5.6044) grad_norm 1.7952 (2.6639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][380/625] eta 0:01:04 lr 0.000168 wd 0.0500 time 0.2591 (0.2642) data time 0.0009 (0.0024) model time 0.2582 (0.2608) loss 5.4089 (5.6156) grad_norm 2.1850 (2.6526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:19:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][390/625] eta 0:01:02 lr 0.000168 wd 0.0500 time 0.2547 (0.2640) data time 0.0009 (0.0023) model time 0.2537 (0.2607) loss 6.0114 (5.6105) grad_norm 2.8389 (2.6996) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][400/625] eta 0:00:59 lr 0.000168 wd 0.0500 time 0.2536 (0.2643) data time 0.0007 (0.0023) model time 0.2529 (0.2611) loss 6.5540 (5.6130) grad_norm 1.6367 (2.7021) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][410/625] eta 0:00:56 lr 0.000168 wd 0.0500 time 0.2568 (0.2641) data time 0.0010 (0.0023) model time 0.2558 (0.2610) loss 5.2269 (5.6111) grad_norm 1.9528 (2.6916) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][420/625] eta 0:00:54 lr 0.000168 wd 0.0500 time 0.2595 (0.2640) data time 0.0010 (0.0022) model time 0.2584 (0.2609) loss 6.0802 (5.6099) grad_norm 4.9094 (2.6912) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][430/625] eta 0:00:51 lr 0.000168 wd 0.0500 time 0.4529 (0.2643) data time 0.0011 (0.0022) model time 0.4518 (0.2612) loss 6.7988 (5.6143) grad_norm 1.5552 (2.6763) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][440/625] eta 0:00:48 lr 0.000168 wd 0.0500 time 0.2630 (0.2641) data time 0.0009 (0.0022) model time 0.2622 (0.2611) loss 5.8582 (5.6141) grad_norm 2.5147 (2.6659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][450/625] eta 0:00:46 lr 0.000168 wd 0.0500 time 0.2569 (0.2640) data time 0.0010 (0.0022) model time 0.2559 (0.2610) loss 5.7347 (5.6199) grad_norm 6.0331 (2.6603) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][460/625] eta 0:00:43 lr 0.000167 wd 0.0500 time 0.2568 (0.2638) data time 0.0009 (0.0021) model time 0.2558 (0.2608) loss 5.5523 (5.6223) grad_norm 1.8437 (2.6580) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][470/625] eta 0:00:40 lr 0.000167 wd 0.0500 time 0.2545 (0.2640) data time 0.0010 (0.0021) model time 0.2535 (0.2612) loss 5.9967 (5.6123) grad_norm 5.5499 (2.6598) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][480/625] eta 0:00:38 lr 0.000167 wd 0.0500 time 0.2556 (0.2639) data time 0.0009 (0.0021) model time 0.2548 (0.2610) loss 5.4066 (5.6119) grad_norm 1.8837 (2.6602) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][490/625] eta 0:00:35 lr 0.000167 wd 0.0500 time 0.2545 (0.2637) data time 0.0009 (0.0021) model time 0.2536 (0.2609) loss 5.8271 (5.6058) grad_norm 2.2407 (2.6604) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][500/625] eta 0:00:32 lr 0.000167 wd 0.0500 time 0.2555 (0.2636) data time 0.0006 (0.0020) model time 0.2548 (0.2608) loss 5.1420 (5.6066) grad_norm 1.6441 (2.6520) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][510/625] eta 0:00:30 lr 0.000167 wd 0.0500 time 0.2579 (0.2638) data time 0.0010 (0.0020) model time 0.2569 (0.2611) loss 6.3124 (5.6079) grad_norm 1.9992 (2.6461) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][520/625] eta 0:00:27 lr 0.000167 wd 0.0500 time 0.2575 (0.2640) data time 0.0011 (0.0020) model time 0.2564 (0.2613) loss 6.5723 (5.6072) grad_norm 5.3865 (2.6544) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][530/625] eta 0:00:25 lr 0.000167 wd 0.0500 time 0.2576 (0.2638) data time 0.0009 (0.0020) model time 0.2566 (0.2612) loss 6.4680 (5.6100) grad_norm 3.0533 (2.6574) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][540/625] eta 0:00:22 lr 0.000167 wd 0.0500 time 0.2535 (0.2640) data time 0.0008 (0.0020) model time 0.2527 (0.2614) loss 5.1481 (5.6119) grad_norm 2.3266 (2.6566) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][550/625] eta 0:00:19 lr 0.000167 wd 0.0500 time 0.2539 (0.2639) data time 0.0009 (0.0019) model time 0.2531 (0.2613) loss 6.0955 (5.6127) grad_norm 2.4129 (2.6565) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][560/625] eta 0:00:17 lr 0.000166 wd 0.0500 time 0.2579 (0.2637) data time 0.0007 (0.0019) model time 0.2572 (0.2612) loss 4.8854 (5.6149) grad_norm 3.0972 (2.6568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][570/625] eta 0:00:14 lr 0.000166 wd 0.0500 time 0.2609 (0.2636) data time 0.0006 (0.0019) model time 0.2603 (0.2611) loss 4.9750 (5.6191) grad_norm 3.2311 (2.6541) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][580/625] eta 0:00:11 lr 0.000166 wd 0.0500 time 0.2538 (0.2635) data time 0.0007 (0.0019) model time 0.2531 (0.2610) loss 6.0870 (5.6201) grad_norm 2.7099 (2.6500) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][590/625] eta 0:00:09 lr 0.000166 wd 0.0500 time 0.2549 (0.2634) data time 0.0011 (0.0019) model time 0.2538 (0.2609) loss 6.1576 (5.6179) grad_norm 1.5325 (2.6446) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][600/625] eta 0:00:06 lr 0.000166 wd 0.0500 time 0.2549 (0.2635) data time 0.0009 (0.0019) model time 0.2540 (0.2610) loss 5.1357 (5.6173) grad_norm 6.7504 (2.6475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][610/625] eta 0:00:03 lr 0.000166 wd 0.0500 time 0.2525 (0.2633) data time 0.0004 (0.0019) model time 0.2521 (0.2609) loss 5.7341 (5.6175) grad_norm 3.0083 (2.6444) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][620/625] eta 0:00:01 lr 0.000166 wd 0.0500 time 0.2554 (0.2631) data time 0.0003 (0.0018) model time 0.2551 (0.2607) loss 5.0218 (5.6178) grad_norm 3.0446 (2.6394) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 250 training takes 0:02:44 [2024-08-04 09:20:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:20:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5913 (0.5913) Acc@1 90.430 (90.430) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 09:21:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9111 (0.7159) Acc@1 81.445 (86.936) Acc@5 96.240 (97.767) Mem 9655MB [2024-08-04 09:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0078 (0.8324) Acc@1 78.271 (83.856) Acc@5 95.215 (96.622) Mem 9655MB [2024-08-04 09:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.541 Acc@5 96.629 [2024-08-04 09:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.54% [2024-08-04 09:21:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:21:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.613 (0.613) Loss 0.5830 (0.5830) Acc@1 89.990 (89.990) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.106) Loss 0.9058 (0.7090) Acc@1 81.055 (86.750) Acc@5 96.338 (97.763) Mem 9655MB [2024-08-04 09:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0156 (0.8298) Acc@1 77.686 (83.568) Acc@5 95.361 (96.549) Mem 9655MB [2024-08-04 09:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.261 Acc@5 96.549 [2024-08-04 09:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:21:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.26% [2024-08-04 09:21:04 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:21:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][0/625] eta 0:08:02 lr 0.000166 wd 0.0500 time 0.7728 (0.7728) data time 0.5345 (0.5345) model time 0.0000 (0.0000) loss 6.1113 (6.1113) grad_norm 1.9674 (1.9674) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][10/625] eta 0:03:06 lr 0.000166 wd 0.0500 time 0.2566 (0.3027) data time 0.0007 (0.0494) model time 0.0000 (0.0000) loss 6.0185 (5.6014) grad_norm 2.3256 (2.3013) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][20/625] eta 0:02:49 lr 0.000166 wd 0.0500 time 0.2517 (0.2803) data time 0.0009 (0.0264) model time 0.0000 (0.0000) loss 6.7157 (5.7828) grad_norm 3.6008 (2.6320) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][30/625] eta 0:02:49 lr 0.000166 wd 0.0500 time 0.4677 (0.2853) data time 0.0008 (0.0182) model time 0.0000 (0.0000) loss 6.1042 (5.8555) grad_norm 3.5636 (2.6748) loss_scale 512.0000 (280.7742) mem 9655MB [2024-08-04 09:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][40/625] eta 0:02:42 lr 0.000166 wd 0.0500 time 0.2565 (0.2782) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 5.0691 (5.8391) grad_norm 1.4526 (2.6052) loss_scale 512.0000 (337.1707) mem 9655MB [2024-08-04 09:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][50/625] eta 0:02:37 lr 0.000165 wd 0.0500 time 0.2715 (0.2744) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 5.6300 (5.7537) grad_norm 2.8027 (2.5186) loss_scale 512.0000 (371.4510) mem 9655MB [2024-08-04 09:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][60/625] eta 0:02:33 lr 0.000165 wd 0.0500 time 0.2587 (0.2713) data time 0.0007 (0.0097) model time 0.2580 (0.2545) loss 4.9605 (5.6914) grad_norm 1.3761 (2.3964) loss_scale 512.0000 (394.4918) mem 9655MB [2024-08-04 09:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][70/625] eta 0:02:29 lr 0.000165 wd 0.0500 time 0.2576 (0.2691) data time 0.0005 (0.0085) model time 0.2571 (0.2548) loss 5.5687 (5.7051) grad_norm 2.2466 (2.3739) loss_scale 512.0000 (411.0423) mem 9655MB [2024-08-04 09:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][80/625] eta 0:02:25 lr 0.000165 wd 0.0500 time 0.2592 (0.2675) data time 0.0008 (0.0075) model time 0.2583 (0.2548) loss 5.2359 (5.6985) grad_norm 2.7685 (2.3933) loss_scale 512.0000 (423.5062) mem 9655MB [2024-08-04 09:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][90/625] eta 0:02:22 lr 0.000165 wd 0.0500 time 0.2511 (0.2662) data time 0.0006 (0.0068) model time 0.2505 (0.2549) loss 5.8195 (5.6697) grad_norm 1.6850 (2.3973) loss_scale 512.0000 (433.2308) mem 9655MB [2024-08-04 09:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][100/625] eta 0:02:20 lr 0.000165 wd 0.0500 time 0.2554 (0.2673) data time 0.0011 (0.0062) model time 0.2544 (0.2592) loss 5.3850 (5.6537) grad_norm 2.6919 (2.3798) loss_scale 512.0000 (441.0297) mem 9655MB [2024-08-04 09:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][110/625] eta 0:02:17 lr 0.000165 wd 0.0500 time 0.2581 (0.2664) data time 0.0008 (0.0057) model time 0.2573 (0.2586) loss 5.8492 (5.6540) grad_norm 1.8886 (2.3605) loss_scale 512.0000 (447.4234) mem 9655MB [2024-08-04 09:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][120/625] eta 0:02:14 lr 0.000165 wd 0.0500 time 0.2556 (0.2654) data time 0.0007 (0.0053) model time 0.2549 (0.2580) loss 6.2097 (5.6257) grad_norm 3.2708 (2.3301) loss_scale 512.0000 (452.7603) mem 9655MB [2024-08-04 09:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][130/625] eta 0:02:11 lr 0.000165 wd 0.0500 time 0.2522 (0.2647) data time 0.0007 (0.0050) model time 0.2514 (0.2576) loss 6.0313 (5.6026) grad_norm 3.1233 (2.3187) loss_scale 512.0000 (457.2824) mem 9655MB [2024-08-04 09:21:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][140/625] eta 0:02:08 lr 0.000165 wd 0.0500 time 0.2601 (0.2656) data time 0.0010 (0.0047) model time 0.2592 (0.2598) loss 4.9171 (5.5937) grad_norm 2.9322 (2.5450) loss_scale 512.0000 (461.1631) mem 9655MB [2024-08-04 09:21:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][150/625] eta 0:02:06 lr 0.000164 wd 0.0500 time 0.2582 (0.2672) data time 0.0008 (0.0045) model time 0.2574 (0.2626) loss 6.6068 (5.5985) grad_norm 2.8892 (2.5238) loss_scale 512.0000 (464.5298) mem 9655MB [2024-08-04 09:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][160/625] eta 0:02:03 lr 0.000164 wd 0.0500 time 0.2559 (0.2665) data time 0.0009 (0.0043) model time 0.2550 (0.2619) loss 5.2101 (5.5925) grad_norm 2.6651 (2.5459) loss_scale 512.0000 (467.4783) mem 9655MB [2024-08-04 09:21:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][170/625] eta 0:02:01 lr 0.000164 wd 0.0500 time 0.2545 (0.2669) data time 0.0007 (0.0041) model time 0.2538 (0.2627) loss 6.7534 (5.5858) grad_norm 4.6981 (2.5461) loss_scale 512.0000 (470.0819) mem 9655MB [2024-08-04 09:21:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][180/625] eta 0:01:58 lr 0.000164 wd 0.0500 time 0.2549 (0.2663) data time 0.0010 (0.0039) model time 0.2540 (0.2621) loss 6.3082 (5.6009) grad_norm 1.8125 (2.5363) loss_scale 512.0000 (472.3978) mem 9655MB [2024-08-04 09:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][190/625] eta 0:01:55 lr 0.000164 wd 0.0500 time 0.2593 (0.2658) data time 0.0005 (0.0037) model time 0.2588 (0.2617) loss 4.3915 (5.6056) grad_norm 1.8611 (2.5326) loss_scale 512.0000 (474.4712) mem 9655MB [2024-08-04 09:21:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][200/625] eta 0:01:53 lr 0.000164 wd 0.0500 time 0.3798 (0.2668) data time 0.0007 (0.0036) model time 0.3790 (0.2633) loss 5.8662 (5.5974) grad_norm 2.9207 (2.5546) loss_scale 512.0000 (476.3383) mem 9655MB [2024-08-04 09:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][210/625] eta 0:01:51 lr 0.000164 wd 0.0500 time 0.2529 (0.2678) data time 0.0008 (0.0035) model time 0.2521 (0.2647) loss 5.3746 (5.6034) grad_norm 3.0768 (2.5678) loss_scale 512.0000 (478.0284) mem 9655MB [2024-08-04 09:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][220/625] eta 0:01:48 lr 0.000164 wd 0.0500 time 0.2550 (0.2673) data time 0.0009 (0.0033) model time 0.2542 (0.2642) loss 5.8168 (5.6113) grad_norm 3.2444 (2.5582) loss_scale 512.0000 (479.5656) mem 9655MB [2024-08-04 09:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][230/625] eta 0:01:45 lr 0.000164 wd 0.0500 time 0.2611 (0.2677) data time 0.0006 (0.0032) model time 0.2605 (0.2648) loss 5.8519 (5.6194) grad_norm 1.3916 (2.5555) loss_scale 512.0000 (480.9697) mem 9655MB [2024-08-04 09:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][240/625] eta 0:01:42 lr 0.000164 wd 0.0500 time 0.2586 (0.2672) data time 0.0006 (0.0031) model time 0.2580 (0.2643) loss 6.6150 (5.6231) grad_norm 2.5865 (2.6484) loss_scale 512.0000 (482.2573) mem 9655MB [2024-08-04 09:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][250/625] eta 0:01:40 lr 0.000164 wd 0.0500 time 0.2576 (0.2676) data time 0.0006 (0.0031) model time 0.2571 (0.2650) loss 5.5740 (5.6251) grad_norm 2.2313 (2.6522) loss_scale 512.0000 (483.4422) mem 9655MB [2024-08-04 09:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][260/625] eta 0:01:37 lr 0.000163 wd 0.0500 time 0.2557 (0.2680) data time 0.0008 (0.0030) model time 0.2549 (0.2655) loss 5.7119 (5.6227) grad_norm 5.2835 (2.6889) loss_scale 512.0000 (484.5364) mem 9655MB [2024-08-04 09:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][270/625] eta 0:01:35 lr 0.000163 wd 0.0500 time 0.2567 (0.2683) data time 0.0009 (0.0029) model time 0.2558 (0.2660) loss 5.5410 (5.6293) grad_norm 2.0797 (2.7340) loss_scale 512.0000 (485.5498) mem 9655MB [2024-08-04 09:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][280/625] eta 0:01:32 lr 0.000163 wd 0.0500 time 0.2603 (0.2679) data time 0.0005 (0.0028) model time 0.2598 (0.2655) loss 5.6250 (5.6255) grad_norm 2.2789 (2.7267) loss_scale 512.0000 (486.4911) mem 9655MB [2024-08-04 09:22:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][290/625] eta 0:01:29 lr 0.000163 wd 0.0500 time 0.2595 (0.2676) data time 0.0008 (0.0028) model time 0.2588 (0.2653) loss 5.2788 (5.6244) grad_norm 1.8302 (2.7158) loss_scale 512.0000 (487.3677) mem 9655MB [2024-08-04 09:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][300/625] eta 0:01:26 lr 0.000163 wd 0.0500 time 0.2571 (0.2673) data time 0.0008 (0.0027) model time 0.2564 (0.2649) loss 6.2954 (5.6280) grad_norm 3.4231 (2.7229) loss_scale 512.0000 (488.1860) mem 9655MB [2024-08-04 09:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][310/625] eta 0:01:24 lr 0.000163 wd 0.0500 time 0.2597 (0.2669) data time 0.0008 (0.0027) model time 0.2589 (0.2645) loss 5.5599 (5.6340) grad_norm 2.2832 (2.7499) loss_scale 512.0000 (488.9518) mem 9655MB [2024-08-04 09:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][320/625] eta 0:01:21 lr 0.000163 wd 0.0500 time 0.2589 (0.2666) data time 0.0007 (0.0026) model time 0.2583 (0.2642) loss 5.8622 (5.6346) grad_norm 4.4292 (2.7789) loss_scale 512.0000 (489.6698) mem 9655MB [2024-08-04 09:22:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][330/625] eta 0:01:18 lr 0.000163 wd 0.0500 time 0.4211 (0.2674) data time 0.0009 (0.0025) model time 0.4202 (0.2652) loss 5.8659 (5.6314) grad_norm 2.4766 (2.7865) loss_scale 512.0000 (490.3444) mem 9655MB [2024-08-04 09:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][340/625] eta 0:01:16 lr 0.000163 wd 0.0500 time 0.2521 (0.2671) data time 0.0009 (0.0025) model time 0.2511 (0.2648) loss 5.7548 (5.6350) grad_norm 2.5751 (2.7737) loss_scale 512.0000 (490.9795) mem 9655MB [2024-08-04 09:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][350/625] eta 0:01:13 lr 0.000163 wd 0.0500 time 0.2556 (0.2668) data time 0.0008 (0.0025) model time 0.2548 (0.2646) loss 6.4792 (5.6296) grad_norm 2.3875 (2.7650) loss_scale 512.0000 (491.5783) mem 9655MB [2024-08-04 09:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][360/625] eta 0:01:10 lr 0.000163 wd 0.0500 time 0.2544 (0.2665) data time 0.0008 (0.0024) model time 0.2536 (0.2643) loss 5.7197 (5.6210) grad_norm 1.6904 (2.7585) loss_scale 512.0000 (492.1440) mem 9655MB [2024-08-04 09:22:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][370/625] eta 0:01:07 lr 0.000162 wd 0.0500 time 0.2567 (0.2662) data time 0.0008 (0.0024) model time 0.2560 (0.2640) loss 6.1408 (5.6188) grad_norm 3.3895 (2.7542) loss_scale 512.0000 (492.6792) mem 9655MB [2024-08-04 09:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][380/625] eta 0:01:05 lr 0.000162 wd 0.0500 time 0.2571 (0.2659) data time 0.0009 (0.0023) model time 0.2562 (0.2637) loss 6.2032 (5.6295) grad_norm 1.9312 (2.7694) loss_scale 512.0000 (493.1864) mem 9655MB [2024-08-04 09:22:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][390/625] eta 0:01:02 lr 0.000162 wd 0.0500 time 0.2579 (0.2661) data time 0.0008 (0.0023) model time 0.2571 (0.2640) loss 5.7775 (5.6292) grad_norm 3.5870 (2.7655) loss_scale 512.0000 (493.6675) mem 9655MB [2024-08-04 09:22:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][400/625] eta 0:00:59 lr 0.000162 wd 0.0500 time 0.2519 (0.2659) data time 0.0009 (0.0023) model time 0.2510 (0.2637) loss 4.4831 (5.6232) grad_norm 1.4220 (2.7486) loss_scale 512.0000 (494.1247) mem 9655MB [2024-08-04 09:22:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][410/625] eta 0:00:57 lr 0.000162 wd 0.0500 time 0.4382 (0.2661) data time 0.0014 (0.0022) model time 0.4369 (0.2640) loss 6.2169 (5.6218) grad_norm 2.7629 (2.7381) loss_scale 512.0000 (494.5596) mem 9655MB [2024-08-04 09:22:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][420/625] eta 0:00:54 lr 0.000162 wd 0.0500 time 0.2535 (0.2663) data time 0.0009 (0.0022) model time 0.2526 (0.2642) loss 5.2949 (5.6200) grad_norm 1.4761 (2.7335) loss_scale 512.0000 (494.9739) mem 9655MB [2024-08-04 09:22:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][430/625] eta 0:00:51 lr 0.000162 wd 0.0500 time 0.2581 (0.2660) data time 0.0008 (0.0022) model time 0.2572 (0.2640) loss 6.1317 (5.6216) grad_norm 1.6546 (2.7274) loss_scale 512.0000 (495.3689) mem 9655MB [2024-08-04 09:23:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][440/625] eta 0:00:49 lr 0.000162 wd 0.0500 time 0.2556 (0.2663) data time 0.0009 (0.0021) model time 0.2547 (0.2643) loss 6.2618 (5.6155) grad_norm 2.3676 (2.7157) loss_scale 512.0000 (495.7460) mem 9655MB [2024-08-04 09:23:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][450/625] eta 0:00:46 lr 0.000162 wd 0.0500 time 0.2572 (0.2660) data time 0.0006 (0.0021) model time 0.2566 (0.2641) loss 4.6968 (5.6184) grad_norm 1.8826 (2.7123) loss_scale 512.0000 (496.1064) mem 9655MB [2024-08-04 09:23:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][460/625] eta 0:00:43 lr 0.000162 wd 0.0500 time 0.2566 (0.2662) data time 0.0010 (0.0021) model time 0.2556 (0.2643) loss 6.0216 (5.6143) grad_norm 2.0842 (2.7114) loss_scale 512.0000 (496.4512) mem 9655MB [2024-08-04 09:23:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][470/625] eta 0:00:41 lr 0.000162 wd 0.0500 time 0.2580 (0.2660) data time 0.0006 (0.0021) model time 0.2574 (0.2641) loss 5.1526 (5.6123) grad_norm 2.5313 (2.7004) loss_scale 512.0000 (496.7813) mem 9655MB [2024-08-04 09:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][480/625] eta 0:00:38 lr 0.000161 wd 0.0500 time 0.2579 (0.2658) data time 0.0008 (0.0020) model time 0.2570 (0.2639) loss 6.1292 (5.6255) grad_norm 2.4299 (2.6888) loss_scale 512.0000 (497.0977) mem 9655MB [2024-08-04 09:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][490/625] eta 0:00:35 lr 0.000161 wd 0.0500 time 0.2677 (0.2656) data time 0.0005 (0.0020) model time 0.2671 (0.2637) loss 4.8580 (5.6212) grad_norm 6.0131 (2.6890) loss_scale 512.0000 (497.4012) mem 9655MB [2024-08-04 09:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][500/625] eta 0:00:33 lr 0.000161 wd 0.0500 time 0.2590 (0.2654) data time 0.0007 (0.0020) model time 0.2583 (0.2635) loss 6.2357 (5.6196) grad_norm 2.9058 (2.6968) loss_scale 512.0000 (497.6926) mem 9655MB [2024-08-04 09:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][510/625] eta 0:00:30 lr 0.000161 wd 0.0500 time 0.2555 (0.2652) data time 0.0006 (0.0020) model time 0.2549 (0.2633) loss 6.0855 (5.6197) grad_norm 1.9340 (2.6975) loss_scale 512.0000 (497.9726) mem 9655MB [2024-08-04 09:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][520/625] eta 0:00:27 lr 0.000161 wd 0.0500 time 0.2719 (0.2651) data time 0.0009 (0.0020) model time 0.2711 (0.2632) loss 6.3339 (5.6261) grad_norm 2.0229 (2.6848) loss_scale 512.0000 (498.2418) mem 9655MB [2024-08-04 09:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][530/625] eta 0:00:25 lr 0.000161 wd 0.0500 time 0.2582 (0.2649) data time 0.0006 (0.0019) model time 0.2576 (0.2630) loss 4.7066 (5.6193) grad_norm 1.7204 (2.6833) loss_scale 512.0000 (498.5009) mem 9655MB [2024-08-04 09:23:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][540/625] eta 0:00:22 lr 0.000161 wd 0.0500 time 0.2533 (0.2648) data time 0.0008 (0.0019) model time 0.2525 (0.2628) loss 6.1289 (5.6185) grad_norm 2.0814 (2.6826) loss_scale 512.0000 (498.7505) mem 9655MB [2024-08-04 09:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][550/625] eta 0:00:19 lr 0.000161 wd 0.0500 time 0.2592 (0.2646) data time 0.0007 (0.0019) model time 0.2585 (0.2627) loss 4.6320 (5.6278) grad_norm 1.8360 (2.6861) loss_scale 512.0000 (498.9909) mem 9655MB [2024-08-04 09:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][560/625] eta 0:00:17 lr 0.000161 wd 0.0500 time 0.2554 (0.2645) data time 0.0008 (0.0019) model time 0.2546 (0.2625) loss 5.6453 (5.6279) grad_norm 3.5305 (2.6774) loss_scale 512.0000 (499.2228) mem 9655MB [2024-08-04 09:23:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][570/625] eta 0:00:14 lr 0.000161 wd 0.0500 time 0.2578 (0.2643) data time 0.0005 (0.0019) model time 0.2573 (0.2624) loss 5.7414 (5.6283) grad_norm 1.6399 (2.6754) loss_scale 512.0000 (499.4466) mem 9655MB [2024-08-04 09:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][580/625] eta 0:00:11 lr 0.000161 wd 0.0500 time 0.2599 (0.2642) data time 0.0007 (0.0019) model time 0.2592 (0.2623) loss 6.2051 (5.6283) grad_norm 1.9286 (2.6667) loss_scale 512.0000 (499.6627) mem 9655MB [2024-08-04 09:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][590/625] eta 0:00:09 lr 0.000160 wd 0.0500 time 0.2558 (0.2640) data time 0.0008 (0.0018) model time 0.2550 (0.2621) loss 6.8275 (5.6336) grad_norm 2.1988 (2.6691) loss_scale 512.0000 (499.8714) mem 9655MB [2024-08-04 09:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][600/625] eta 0:00:06 lr 0.000160 wd 0.0500 time 0.2553 (0.2639) data time 0.0007 (0.0018) model time 0.2546 (0.2620) loss 5.1017 (5.6356) grad_norm 3.8593 (2.6709) loss_scale 512.0000 (500.0732) mem 9655MB [2024-08-04 09:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][610/625] eta 0:00:03 lr 0.000160 wd 0.0500 time 0.2527 (0.2638) data time 0.0004 (0.0018) model time 0.2523 (0.2619) loss 4.6978 (5.6358) grad_norm 3.5208 (2.7013) loss_scale 512.0000 (500.2684) mem 9655MB [2024-08-04 09:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][620/625] eta 0:00:01 lr 0.000160 wd 0.0500 time 0.2534 (0.2637) data time 0.0006 (0.0018) model time 0.2528 (0.2618) loss 6.3182 (5.6370) grad_norm 1.7262 (2.7071) loss_scale 512.0000 (500.4573) mem 9655MB [2024-08-04 09:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 251 training takes 0:02:44 [2024-08-04 09:23:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:23:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.5981 (0.5981) Acc@1 90.381 (90.381) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 09:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9004 (0.7261) Acc@1 81.982 (86.887) Acc@5 96.387 (97.807) Mem 9655MB [2024-08-04 09:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0254 (0.8451) Acc@1 77.783 (83.759) Acc@5 95.508 (96.670) Mem 9655MB [2024-08-04 09:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.439 Acc@5 96.675 [2024-08-04 09:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:23:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.705 (0.705) Loss 0.5830 (0.5830) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.123) Loss 0.9058 (0.7092) Acc@1 81.104 (86.790) Acc@5 96.240 (97.763) Mem 9655MB [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0156 (0.8299) Acc@1 77.832 (83.584) Acc@5 95.361 (96.563) Mem 9655MB [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.273 Acc@5 96.563 [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.27% [2024-08-04 09:23:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:23:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][0/625] eta 0:07:32 lr 0.000160 wd 0.0500 time 0.7245 (0.7245) data time 0.4786 (0.4786) model time 0.0000 (0.0000) loss 6.3772 (6.3772) grad_norm 1.6508 (1.6508) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][10/625] eta 0:03:02 lr 0.000160 wd 0.0500 time 0.2532 (0.2974) data time 0.0009 (0.0444) model time 0.0000 (0.0000) loss 5.7707 (5.9091) grad_norm 1.9829 (2.4953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][20/625] eta 0:02:51 lr 0.000160 wd 0.0500 time 0.2544 (0.2840) data time 0.0009 (0.0237) model time 0.0000 (0.0000) loss 5.4062 (5.7852) grad_norm 3.4206 (2.7498) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][30/625] eta 0:02:43 lr 0.000160 wd 0.0500 time 0.2585 (0.2751) data time 0.0007 (0.0163) model time 0.0000 (0.0000) loss 4.8058 (5.7324) grad_norm 3.2613 (2.7522) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][40/625] eta 0:02:38 lr 0.000160 wd 0.0500 time 0.2574 (0.2704) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 5.1732 (5.7030) grad_norm 4.9527 (2.6962) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][50/625] eta 0:02:33 lr 0.000160 wd 0.0500 time 0.2579 (0.2677) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 6.3340 (5.7168) grad_norm 1.9382 (2.6273) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][60/625] eta 0:02:31 lr 0.000160 wd 0.0500 time 0.2565 (0.2674) data time 0.0009 (0.0088) model time 0.2556 (0.2646) loss 6.2412 (5.6648) grad_norm 1.9001 (2.5733) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][70/625] eta 0:02:27 lr 0.000159 wd 0.0500 time 0.2587 (0.2661) data time 0.0008 (0.0076) model time 0.2579 (0.2609) loss 6.2095 (5.6743) grad_norm 1.6464 (2.5292) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][80/625] eta 0:02:25 lr 0.000159 wd 0.0500 time 0.2594 (0.2673) data time 0.0006 (0.0068) model time 0.2588 (0.2657) loss 5.8381 (5.6673) grad_norm 1.9342 (2.7218) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][90/625] eta 0:02:22 lr 0.000159 wd 0.0500 time 0.2533 (0.2659) data time 0.0006 (0.0062) model time 0.2526 (0.2627) loss 5.5137 (5.6940) grad_norm 2.7716 (2.6798) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][100/625] eta 0:02:19 lr 0.000159 wd 0.0500 time 0.2597 (0.2663) data time 0.0006 (0.0057) model time 0.2590 (0.2638) loss 5.8701 (5.6854) grad_norm 2.1069 (2.7310) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][110/625] eta 0:02:17 lr 0.000159 wd 0.0500 time 0.2557 (0.2671) data time 0.0005 (0.0052) model time 0.2552 (0.2656) loss 5.4109 (5.6741) grad_norm 1.8499 (2.7526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][120/625] eta 0:02:15 lr 0.000159 wd 0.0500 time 0.2537 (0.2677) data time 0.0008 (0.0049) model time 0.2529 (0.2668) loss 4.9172 (5.6585) grad_norm 2.1080 (2.7484) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][130/625] eta 0:02:12 lr 0.000159 wd 0.0500 time 0.2561 (0.2668) data time 0.0011 (0.0046) model time 0.2550 (0.2653) loss 5.0319 (5.6662) grad_norm 2.3758 (2.7296) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][140/625] eta 0:02:09 lr 0.000159 wd 0.0500 time 0.2551 (0.2660) data time 0.0010 (0.0043) model time 0.2541 (0.2641) loss 6.6992 (5.6976) grad_norm 2.9200 (2.7426) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][150/625] eta 0:02:06 lr 0.000159 wd 0.0500 time 0.2529 (0.2654) data time 0.0007 (0.0041) model time 0.2522 (0.2633) loss 5.6647 (5.6889) grad_norm 1.7311 (2.6980) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][160/625] eta 0:02:03 lr 0.000159 wd 0.0500 time 0.2562 (0.2648) data time 0.0010 (0.0039) model time 0.2553 (0.2625) loss 6.1800 (5.6946) grad_norm 2.2365 (2.6850) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][170/625] eta 0:02:00 lr 0.000159 wd 0.0500 time 0.2554 (0.2653) data time 0.0010 (0.0037) model time 0.2544 (0.2633) loss 5.2868 (5.6805) grad_norm 1.5597 (2.7252) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][180/625] eta 0:01:58 lr 0.000158 wd 0.0500 time 0.2556 (0.2655) data time 0.0007 (0.0036) model time 0.2549 (0.2637) loss 5.5082 (5.6851) grad_norm 2.7474 (2.7951) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][190/625] eta 0:01:55 lr 0.000158 wd 0.0500 time 0.2517 (0.2650) data time 0.0009 (0.0034) model time 0.2508 (0.2630) loss 4.7828 (5.6778) grad_norm 2.5840 (2.8760) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][200/625] eta 0:01:52 lr 0.000158 wd 0.0500 time 0.2518 (0.2654) data time 0.0008 (0.0033) model time 0.2510 (0.2637) loss 5.2491 (5.6735) grad_norm 2.5237 (2.8757) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][210/625] eta 0:01:49 lr 0.000158 wd 0.0500 time 0.2547 (0.2650) data time 0.0009 (0.0032) model time 0.2538 (0.2632) loss 6.4237 (5.6781) grad_norm 2.3589 (2.8872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][220/625] eta 0:01:47 lr 0.000158 wd 0.0500 time 0.2597 (0.2646) data time 0.0006 (0.0031) model time 0.2591 (0.2628) loss 5.5926 (5.6742) grad_norm 2.3270 (2.8523) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][230/625] eta 0:01:44 lr 0.000158 wd 0.0500 time 0.2563 (0.2643) data time 0.0006 (0.0030) model time 0.2557 (0.2624) loss 5.2768 (5.6708) grad_norm 2.7216 (2.8338) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:24:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][240/625] eta 0:01:41 lr 0.000158 wd 0.0500 time 0.2601 (0.2646) data time 0.0010 (0.0029) model time 0.2591 (0.2629) loss 6.0257 (5.6765) grad_norm 1.9186 (2.8134) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][250/625] eta 0:01:39 lr 0.000158 wd 0.0500 time 0.2529 (0.2643) data time 0.0009 (0.0028) model time 0.2520 (0.2624) loss 5.5380 (5.6616) grad_norm 2.6654 (2.8089) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][260/625] eta 0:01:36 lr 0.000158 wd 0.0500 time 0.2617 (0.2640) data time 0.0006 (0.0028) model time 0.2611 (0.2622) loss 5.6006 (5.6572) grad_norm 2.1947 (2.8244) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][270/625] eta 0:01:33 lr 0.000158 wd 0.0500 time 0.2532 (0.2637) data time 0.0006 (0.0027) model time 0.2526 (0.2619) loss 6.3229 (5.6687) grad_norm 1.8508 (2.8684) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][280/625] eta 0:01:30 lr 0.000158 wd 0.0500 time 0.2512 (0.2635) data time 0.0008 (0.0026) model time 0.2505 (0.2616) loss 5.7335 (5.6655) grad_norm 1.8033 (2.8495) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][290/625] eta 0:01:28 lr 0.000158 wd 0.0500 time 0.2543 (0.2636) data time 0.0009 (0.0026) model time 0.2534 (0.2618) loss 5.4869 (5.6568) grad_norm 2.0632 (2.8620) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][300/625] eta 0:01:25 lr 0.000157 wd 0.0500 time 0.2615 (0.2641) data time 0.0006 (0.0025) model time 0.2609 (0.2624) loss 5.5726 (5.6479) grad_norm 1.6730 (2.8525) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][310/625] eta 0:01:23 lr 0.000157 wd 0.0500 time 0.2568 (0.2638) data time 0.0007 (0.0025) model time 0.2561 (0.2621) loss 5.7211 (5.6476) grad_norm 1.6551 (2.8505) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][320/625] eta 0:01:20 lr 0.000157 wd 0.0500 time 0.2538 (0.2635) data time 0.0010 (0.0024) model time 0.2528 (0.2618) loss 6.0253 (5.6546) grad_norm 2.8989 (2.8646) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][330/625] eta 0:01:17 lr 0.000157 wd 0.0500 time 0.2559 (0.2637) data time 0.0010 (0.0024) model time 0.2549 (0.2620) loss 5.6242 (5.6555) grad_norm 7.5623 (2.8713) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][340/625] eta 0:01:15 lr 0.000157 wd 0.0500 time 0.2601 (0.2634) data time 0.0005 (0.0023) model time 0.2595 (0.2617) loss 6.0320 (5.6497) grad_norm 1.7880 (2.8562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][350/625] eta 0:01:12 lr 0.000157 wd 0.0500 time 0.2558 (0.2637) data time 0.0007 (0.0023) model time 0.2550 (0.2621) loss 4.7146 (5.6429) grad_norm 2.5101 (2.8880) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][360/625] eta 0:01:09 lr 0.000157 wd 0.0500 time 0.2660 (0.2636) data time 0.0007 (0.0023) model time 0.2652 (0.2620) loss 5.0136 (5.6390) grad_norm 4.0770 (2.8955) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][370/625] eta 0:01:07 lr 0.000157 wd 0.0500 time 0.2564 (0.2635) data time 0.0011 (0.0022) model time 0.2553 (0.2618) loss 5.8709 (5.6485) grad_norm 1.9868 (2.8967) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][380/625] eta 0:01:04 lr 0.000157 wd 0.0500 time 0.2569 (0.2633) data time 0.0005 (0.0022) model time 0.2564 (0.2616) loss 5.8439 (5.6475) grad_norm 1.7159 (2.8788) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][390/625] eta 0:01:01 lr 0.000157 wd 0.0500 time 0.2540 (0.2631) data time 0.0008 (0.0022) model time 0.2532 (0.2615) loss 5.9859 (5.6535) grad_norm 2.7790 (2.8634) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][400/625] eta 0:00:59 lr 0.000157 wd 0.0500 time 0.2603 (0.2629) data time 0.0010 (0.0021) model time 0.2593 (0.2613) loss 4.0663 (5.6491) grad_norm 1.5952 (2.8479) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][410/625] eta 0:00:56 lr 0.000156 wd 0.0500 time 0.4469 (0.2632) data time 0.0009 (0.0021) model time 0.4460 (0.2616) loss 5.2267 (5.6477) grad_norm 2.3134 (2.8399) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][420/625] eta 0:00:54 lr 0.000156 wd 0.0500 time 0.2555 (0.2640) data time 0.0007 (0.0021) model time 0.2548 (0.2625) loss 5.3886 (5.6473) grad_norm 1.8572 (2.8355) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][430/625] eta 0:00:51 lr 0.000156 wd 0.0500 time 0.2601 (0.2638) data time 0.0006 (0.0020) model time 0.2595 (0.2624) loss 5.8772 (5.6381) grad_norm 2.8200 (2.8313) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][440/625] eta 0:00:48 lr 0.000156 wd 0.0500 time 0.2587 (0.2636) data time 0.0010 (0.0020) model time 0.2578 (0.2622) loss 6.7850 (5.6417) grad_norm 2.2251 (2.8207) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][450/625] eta 0:00:46 lr 0.000156 wd 0.0500 time 0.2524 (0.2643) data time 0.0008 (0.0020) model time 0.2515 (0.2629) loss 5.5611 (5.6408) grad_norm 2.1655 (2.8087) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][460/625] eta 0:00:43 lr 0.000156 wd 0.0500 time 0.2541 (0.2652) data time 0.0011 (0.0020) model time 0.2530 (0.2640) loss 6.3237 (5.6436) grad_norm 3.6880 (2.8012) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:25:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][470/625] eta 0:00:41 lr 0.000156 wd 0.0500 time 0.2534 (0.2650) data time 0.0010 (0.0019) model time 0.2524 (0.2638) loss 5.2206 (5.6432) grad_norm 2.0149 (2.7984) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][480/625] eta 0:00:38 lr 0.000156 wd 0.0500 time 0.2580 (0.2653) data time 0.0006 (0.0019) model time 0.2574 (0.2640) loss 4.3063 (5.6407) grad_norm 2.6379 (2.7837) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][490/625] eta 0:00:35 lr 0.000156 wd 0.0500 time 0.2520 (0.2651) data time 0.0007 (0.0019) model time 0.2513 (0.2638) loss 5.3913 (5.6429) grad_norm 10.8940 (2.7872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][500/625] eta 0:00:33 lr 0.000156 wd 0.0500 time 0.2528 (0.2649) data time 0.0007 (0.0019) model time 0.2521 (0.2636) loss 5.5253 (5.6409) grad_norm 3.0173 (2.7954) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][510/625] eta 0:00:30 lr 0.000156 wd 0.0500 time 0.2562 (0.2647) data time 0.0010 (0.0019) model time 0.2552 (0.2635) loss 5.8081 (5.6433) grad_norm 3.1295 (2.8012) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][520/625] eta 0:00:27 lr 0.000155 wd 0.0500 time 0.2580 (0.2649) data time 0.0009 (0.0018) model time 0.2571 (0.2637) loss 6.6431 (5.6514) grad_norm 3.7243 (2.7994) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][530/625] eta 0:00:25 lr 0.000155 wd 0.0500 time 0.2539 (0.2647) data time 0.0010 (0.0018) model time 0.2529 (0.2635) loss 5.6379 (5.6506) grad_norm 2.8622 (2.7984) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][540/625] eta 0:00:22 lr 0.000155 wd 0.0500 time 0.2546 (0.2646) data time 0.0009 (0.0018) model time 0.2536 (0.2633) loss 5.0045 (5.6488) grad_norm 2.2721 (2.7916) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][550/625] eta 0:00:19 lr 0.000155 wd 0.0500 time 0.2531 (0.2645) data time 0.0008 (0.0018) model time 0.2523 (0.2632) loss 5.0118 (5.6436) grad_norm 1.7846 (2.7858) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][560/625] eta 0:00:17 lr 0.000155 wd 0.0500 time 0.2594 (0.2643) data time 0.0011 (0.0018) model time 0.2583 (0.2631) loss 6.2718 (5.6436) grad_norm 2.1575 (2.7790) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][570/625] eta 0:00:14 lr 0.000155 wd 0.0500 time 0.2569 (0.2642) data time 0.0009 (0.0018) model time 0.2560 (0.2629) loss 6.2368 (5.6406) grad_norm 11.6570 (2.7866) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][580/625] eta 0:00:11 lr 0.000155 wd 0.0500 time 0.2525 (0.2640) data time 0.0007 (0.0017) model time 0.2518 (0.2628) loss 6.3115 (5.6387) grad_norm 3.0911 (2.7810) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][590/625] eta 0:00:09 lr 0.000155 wd 0.0500 time 0.2561 (0.2640) data time 0.0010 (0.0017) model time 0.2551 (0.2627) loss 4.6405 (5.6422) grad_norm 2.5200 (2.7830) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][600/625] eta 0:00:06 lr 0.000155 wd 0.0500 time 0.2597 (0.2638) data time 0.0006 (0.0017) model time 0.2591 (0.2626) loss 6.0573 (5.6441) grad_norm 2.7280 (2.7790) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][610/625] eta 0:00:03 lr 0.000155 wd 0.0500 time 0.2515 (0.2637) data time 0.0006 (0.0017) model time 0.2509 (0.2624) loss 5.9603 (5.6471) grad_norm 4.5059 (2.7846) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][620/625] eta 0:00:01 lr 0.000155 wd 0.0500 time 0.2519 (0.2635) data time 0.0005 (0.0017) model time 0.2514 (0.2622) loss 6.0354 (5.6488) grad_norm 2.6578 (2.7789) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 252 training takes 0:02:44 [2024-08-04 09:26:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:26:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5884 (0.5884) Acc@1 90.332 (90.332) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9062 (0.7144) Acc@1 81.445 (86.825) Acc@5 96.436 (97.852) Mem 9655MB [2024-08-04 09:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0117 (0.8312) Acc@1 77.246 (83.694) Acc@5 95.508 (96.696) Mem 9655MB [2024-08-04 09:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.335 Acc@5 96.685 [2024-08-04 09:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.761 (0.761) Loss 0.5830 (0.5830) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:26:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9048 (0.7089) Acc@1 81.152 (86.803) Acc@5 96.240 (97.772) Mem 9655MB [2024-08-04 09:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0156 (0.8295) Acc@1 77.881 (83.617) Acc@5 95.312 (96.566) Mem 9655MB [2024-08-04 09:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.307 Acc@5 96.559 [2024-08-04 09:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.31% [2024-08-04 09:26:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:26:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:26:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][0/625] eta 0:07:21 lr 0.000154 wd 0.0500 time 0.7061 (0.7061) data time 0.4645 (0.4645) model time 0.0000 (0.0000) loss 5.1665 (5.1665) grad_norm 2.8670 (2.8670) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][10/625] eta 0:03:23 lr 0.000154 wd 0.0500 time 0.2539 (0.3315) data time 0.0006 (0.0431) model time 0.0000 (0.0000) loss 5.1427 (5.2746) grad_norm 1.4706 (3.0476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][20/625] eta 0:02:58 lr 0.000154 wd 0.0500 time 0.2587 (0.2956) data time 0.0007 (0.0230) model time 0.0000 (0.0000) loss 6.1218 (5.3575) grad_norm 1.8082 (2.6829) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][30/625] eta 0:02:48 lr 0.000154 wd 0.0500 time 0.2589 (0.2828) data time 0.0007 (0.0158) model time 0.0000 (0.0000) loss 4.9229 (5.3474) grad_norm 1.8663 (2.5972) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][40/625] eta 0:02:41 lr 0.000154 wd 0.0500 time 0.2688 (0.2764) data time 0.0006 (0.0122) model time 0.0000 (0.0000) loss 5.0700 (5.4133) grad_norm 1.5415 (2.5618) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:26:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][50/625] eta 0:02:36 lr 0.000154 wd 0.0500 time 0.2541 (0.2724) data time 0.0010 (0.0100) model time 0.0000 (0.0000) loss 5.5627 (5.3468) grad_norm 2.2954 (2.5155) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][60/625] eta 0:02:32 lr 0.000154 wd 0.0500 time 0.2605 (0.2698) data time 0.0013 (0.0085) model time 0.2592 (0.2548) loss 5.8934 (5.4212) grad_norm 2.1069 (2.5517) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][70/625] eta 0:02:29 lr 0.000154 wd 0.0500 time 0.2571 (0.2700) data time 0.0006 (0.0075) model time 0.2565 (0.2629) loss 4.5538 (5.4798) grad_norm 1.9123 (2.5133) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][80/625] eta 0:02:26 lr 0.000154 wd 0.0500 time 0.2582 (0.2683) data time 0.0007 (0.0067) model time 0.2575 (0.2602) loss 5.8937 (5.5102) grad_norm 1.7545 (2.5558) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][90/625] eta 0:02:22 lr 0.000154 wd 0.0500 time 0.2533 (0.2669) data time 0.0008 (0.0060) model time 0.2525 (0.2588) loss 5.2853 (5.5161) grad_norm 2.5843 (2.6301) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][100/625] eta 0:02:19 lr 0.000154 wd 0.0500 time 0.2595 (0.2659) data time 0.0006 (0.0055) model time 0.2590 (0.2582) loss 5.1073 (5.5471) grad_norm 4.3591 (2.6930) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][110/625] eta 0:02:17 lr 0.000154 wd 0.0500 time 0.2571 (0.2668) data time 0.0007 (0.0051) model time 0.2564 (0.2612) loss 6.5297 (5.5483) grad_norm 1.9319 (2.7392) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][120/625] eta 0:02:15 lr 0.000153 wd 0.0500 time 0.2558 (0.2676) data time 0.0007 (0.0047) model time 0.2551 (0.2632) loss 6.3461 (5.5863) grad_norm 2.2591 (2.7155) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][130/625] eta 0:02:12 lr 0.000153 wd 0.0500 time 0.2580 (0.2669) data time 0.0009 (0.0044) model time 0.2570 (0.2624) loss 5.0999 (5.6001) grad_norm 1.4084 (2.7249) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][140/625] eta 0:02:09 lr 0.000153 wd 0.0500 time 0.2545 (0.2661) data time 0.0006 (0.0042) model time 0.2539 (0.2616) loss 5.4159 (5.5814) grad_norm 1.9755 (2.6990) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][150/625] eta 0:02:06 lr 0.000153 wd 0.0500 time 0.2540 (0.2665) data time 0.0007 (0.0040) model time 0.2533 (0.2626) loss 6.2506 (5.5835) grad_norm 2.9344 (2.6967) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][160/625] eta 0:02:03 lr 0.000153 wd 0.0500 time 0.2556 (0.2660) data time 0.0007 (0.0038) model time 0.2549 (0.2621) loss 5.5581 (5.5899) grad_norm 1.9943 (2.6942) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][170/625] eta 0:02:00 lr 0.000153 wd 0.0500 time 0.2560 (0.2654) data time 0.0006 (0.0036) model time 0.2554 (0.2616) loss 6.1197 (5.5863) grad_norm 3.8962 (2.6949) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][180/625] eta 0:01:58 lr 0.000153 wd 0.0500 time 0.2555 (0.2657) data time 0.0008 (0.0035) model time 0.2548 (0.2621) loss 5.8920 (5.6060) grad_norm 2.4351 (2.6864) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][190/625] eta 0:01:55 lr 0.000153 wd 0.0500 time 0.2609 (0.2662) data time 0.0007 (0.0033) model time 0.2603 (0.2631) loss 4.7079 (5.6056) grad_norm 2.0170 (2.6435) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][200/625] eta 0:01:52 lr 0.000153 wd 0.0500 time 0.2558 (0.2657) data time 0.0005 (0.0032) model time 0.2553 (0.2625) loss 5.5845 (5.6053) grad_norm 2.3332 (2.6282) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][210/625] eta 0:01:50 lr 0.000153 wd 0.0500 time 0.2530 (0.2653) data time 0.0010 (0.0031) model time 0.2520 (0.2620) loss 4.9350 (5.6003) grad_norm 4.0274 (2.6613) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][220/625] eta 0:01:47 lr 0.000153 wd 0.0500 time 0.2566 (0.2648) data time 0.0008 (0.0030) model time 0.2558 (0.2616) loss 5.6196 (5.5938) grad_norm 2.2343 (2.6421) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][230/625] eta 0:01:44 lr 0.000152 wd 0.0500 time 0.2589 (0.2645) data time 0.0007 (0.0029) model time 0.2582 (0.2613) loss 4.9085 (5.5937) grad_norm 3.3246 (2.6921) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][240/625] eta 0:01:41 lr 0.000152 wd 0.0500 time 0.2685 (0.2642) data time 0.0008 (0.0028) model time 0.2677 (0.2610) loss 6.2325 (5.6029) grad_norm 2.9797 (2.6973) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][250/625] eta 0:01:38 lr 0.000152 wd 0.0500 time 0.2569 (0.2638) data time 0.0006 (0.0028) model time 0.2563 (0.2607) loss 5.6239 (5.6204) grad_norm 1.6534 (2.6872) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][260/625] eta 0:01:36 lr 0.000152 wd 0.0500 time 0.4508 (0.2643) data time 0.0008 (0.0027) model time 0.4500 (0.2614) loss 6.0743 (5.6187) grad_norm 2.7115 (2.6717) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][270/625] eta 0:01:33 lr 0.000152 wd 0.0500 time 0.2557 (0.2640) data time 0.0009 (0.0026) model time 0.2548 (0.2611) loss 5.8134 (5.6226) grad_norm 3.6600 (2.7401) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:27:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][280/625] eta 0:01:31 lr 0.000152 wd 0.0500 time 0.2577 (0.2638) data time 0.0009 (0.0026) model time 0.2567 (0.2609) loss 6.4939 (5.6258) grad_norm 2.7777 (2.7415) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][290/625] eta 0:01:28 lr 0.000152 wd 0.0500 time 0.2514 (0.2635) data time 0.0006 (0.0025) model time 0.2508 (0.2607) loss 5.5387 (5.6264) grad_norm 4.4193 (2.7349) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][300/625] eta 0:01:25 lr 0.000152 wd 0.0500 time 0.2544 (0.2633) data time 0.0010 (0.0025) model time 0.2534 (0.2605) loss 5.3741 (5.6274) grad_norm 1.5207 (2.7167) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][310/625] eta 0:01:22 lr 0.000152 wd 0.0500 time 0.2567 (0.2631) data time 0.0009 (0.0024) model time 0.2559 (0.2604) loss 5.4218 (5.6311) grad_norm 2.6107 (2.7227) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][320/625] eta 0:01:20 lr 0.000152 wd 0.0500 time 0.2523 (0.2629) data time 0.0010 (0.0024) model time 0.2514 (0.2602) loss 5.8363 (5.6315) grad_norm 2.6488 (2.7172) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][330/625] eta 0:01:17 lr 0.000152 wd 0.0500 time 0.2562 (0.2627) data time 0.0007 (0.0023) model time 0.2555 (0.2600) loss 6.3913 (5.6404) grad_norm 2.1118 (2.7181) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][340/625] eta 0:01:15 lr 0.000151 wd 0.0500 time 0.4506 (0.2637) data time 0.0007 (0.0023) model time 0.4499 (0.2612) loss 4.7746 (5.6402) grad_norm 4.1560 (2.7190) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][350/625] eta 0:01:12 lr 0.000151 wd 0.0500 time 0.2526 (0.2639) data time 0.0009 (0.0022) model time 0.2517 (0.2615) loss 5.1133 (5.6386) grad_norm 1.8653 (2.7019) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][360/625] eta 0:01:09 lr 0.000151 wd 0.0500 time 0.2528 (0.2637) data time 0.0007 (0.0022) model time 0.2521 (0.2613) loss 6.6944 (5.6449) grad_norm 1.7242 (2.6829) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][370/625] eta 0:01:07 lr 0.000151 wd 0.0500 time 0.2541 (0.2639) data time 0.0006 (0.0022) model time 0.2535 (0.2616) loss 5.8658 (5.6511) grad_norm 2.5604 (2.6839) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][380/625] eta 0:01:04 lr 0.000151 wd 0.0500 time 0.2582 (0.2640) data time 0.0009 (0.0021) model time 0.2573 (0.2618) loss 6.6099 (5.6513) grad_norm 1.8559 (2.6739) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][390/625] eta 0:01:02 lr 0.000151 wd 0.0500 time 0.2538 (0.2643) data time 0.0008 (0.0021) model time 0.2530 (0.2621) loss 5.7256 (5.6495) grad_norm 3.4736 (2.6718) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][400/625] eta 0:00:59 lr 0.000151 wd 0.0500 time 0.2562 (0.2643) data time 0.0009 (0.0021) model time 0.2553 (0.2622) loss 6.2287 (5.6545) grad_norm 2.0040 (2.6617) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][410/625] eta 0:00:56 lr 0.000151 wd 0.0500 time 0.2567 (0.2646) data time 0.0006 (0.0020) model time 0.2561 (0.2626) loss 5.2280 (5.6495) grad_norm 1.6912 (2.6467) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][420/625] eta 0:00:54 lr 0.000151 wd 0.0500 time 0.2637 (0.2645) data time 0.0010 (0.0020) model time 0.2627 (0.2624) loss 5.2275 (5.6478) grad_norm 2.3478 (2.6416) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][430/625] eta 0:00:51 lr 0.000151 wd 0.0500 time 0.2522 (0.2643) data time 0.0009 (0.0020) model time 0.2513 (0.2623) loss 5.5006 (5.6497) grad_norm 2.7033 (2.6412) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][440/625] eta 0:00:48 lr 0.000151 wd 0.0500 time 0.2574 (0.2645) data time 0.0009 (0.0020) model time 0.2565 (0.2625) loss 5.6342 (5.6495) grad_norm 2.5305 (2.6474) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][450/625] eta 0:00:46 lr 0.000150 wd 0.0500 time 0.2562 (0.2643) data time 0.0008 (0.0019) model time 0.2554 (0.2624) loss 5.0418 (5.6505) grad_norm 3.9503 (2.6531) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][460/625] eta 0:00:43 lr 0.000150 wd 0.0500 time 0.2624 (0.2642) data time 0.0007 (0.0019) model time 0.2616 (0.2622) loss 5.2230 (5.6558) grad_norm 2.5719 (2.6604) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][470/625] eta 0:00:41 lr 0.000150 wd 0.0500 time 0.4264 (0.2646) data time 0.0007 (0.0019) model time 0.4256 (0.2628) loss 6.5768 (5.6516) grad_norm 2.6092 (2.6837) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][480/625] eta 0:00:38 lr 0.000150 wd 0.0500 time 0.2548 (0.2644) data time 0.0010 (0.0019) model time 0.2538 (0.2626) loss 6.1297 (5.6460) grad_norm 2.6555 (2.6848) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][490/625] eta 0:00:35 lr 0.000150 wd 0.0500 time 0.2569 (0.2643) data time 0.0011 (0.0019) model time 0.2558 (0.2624) loss 5.4680 (5.6447) grad_norm 3.4721 (2.6812) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][500/625] eta 0:00:33 lr 0.000150 wd 0.0500 time 0.2585 (0.2645) data time 0.0005 (0.0018) model time 0.2579 (0.2627) loss 4.9723 (5.6437) grad_norm 3.9421 (2.6819) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][510/625] eta 0:00:30 lr 0.000150 wd 0.0500 time 0.2583 (0.2643) data time 0.0008 (0.0018) model time 0.2574 (0.2625) loss 5.0883 (5.6443) grad_norm 3.9980 (2.6827) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][520/625] eta 0:00:27 lr 0.000150 wd 0.0500 time 0.2554 (0.2641) data time 0.0011 (0.0018) model time 0.2543 (0.2623) loss 5.1781 (5.6402) grad_norm 2.1025 (2.6788) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][530/625] eta 0:00:25 lr 0.000150 wd 0.0500 time 0.2515 (0.2640) data time 0.0010 (0.0018) model time 0.2504 (0.2621) loss 5.1845 (5.6372) grad_norm 3.9608 (2.6820) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][540/625] eta 0:00:22 lr 0.000150 wd 0.0500 time 0.2579 (0.2638) data time 0.0007 (0.0018) model time 0.2572 (0.2620) loss 5.0513 (5.6353) grad_norm 2.6444 (2.6766) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][550/625] eta 0:00:19 lr 0.000150 wd 0.0500 time 0.2578 (0.2639) data time 0.0011 (0.0018) model time 0.2567 (0.2622) loss 5.4149 (5.6304) grad_norm 2.2348 (2.6678) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][560/625] eta 0:00:17 lr 0.000150 wd 0.0500 time 0.2573 (0.2638) data time 0.0009 (0.0017) model time 0.2564 (0.2620) loss 5.3142 (5.6270) grad_norm 2.5159 (2.6628) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][570/625] eta 0:00:14 lr 0.000149 wd 0.0500 time 0.2512 (0.2637) data time 0.0009 (0.0017) model time 0.2503 (0.2619) loss 4.8826 (5.6253) grad_norm 2.7983 (2.6539) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][580/625] eta 0:00:11 lr 0.000149 wd 0.0500 time 0.2554 (0.2636) data time 0.0011 (0.0017) model time 0.2543 (0.2618) loss 6.3968 (5.6274) grad_norm 2.5671 (2.6660) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][590/625] eta 0:00:09 lr 0.000149 wd 0.0500 time 0.2556 (0.2634) data time 0.0011 (0.0017) model time 0.2544 (0.2616) loss 6.0495 (5.6237) grad_norm 2.3815 (2.6618) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][600/625] eta 0:00:06 lr 0.000149 wd 0.0500 time 0.2550 (0.2633) data time 0.0008 (0.0017) model time 0.2542 (0.2615) loss 4.7931 (5.6223) grad_norm 1.6186 (2.6665) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][610/625] eta 0:00:03 lr 0.000149 wd 0.0500 time 0.2517 (0.2635) data time 0.0004 (0.0017) model time 0.2512 (0.2618) loss 5.2296 (5.6217) grad_norm 4.3385 (2.6732) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][620/625] eta 0:00:01 lr 0.000149 wd 0.0500 time 0.2533 (0.2633) data time 0.0003 (0.0017) model time 0.2529 (0.2616) loss 3.9656 (5.6205) grad_norm 1.7622 (2.6655) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 253 training takes 0:02:44 [2024-08-04 09:29:28 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:29:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:29:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5771 (0.5771) Acc@1 90.137 (90.137) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.8965 (0.7019) Acc@1 81.543 (86.887) Acc@5 96.436 (97.754) Mem 9655MB [2024-08-04 09:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 0.9941 (0.8194) Acc@1 78.662 (83.817) Acc@5 95.508 (96.610) Mem 9655MB [2024-08-04 09:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.483 Acc@5 96.629 [2024-08-04 09:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:29:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.758 (0.758) Loss 0.5830 (0.5830) Acc@1 90.039 (90.039) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:29:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.129) Loss 0.9038 (0.7086) Acc@1 81.201 (86.812) Acc@5 96.240 (97.776) Mem 9655MB [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0156 (0.8293) Acc@1 77.979 (83.629) Acc@5 95.361 (96.582) Mem 9655MB [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.325 Acc@5 96.575 [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.33% [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:29:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:29:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][0/625] eta 0:07:58 lr 0.000149 wd 0.0500 time 0.7656 (0.7656) data time 0.4997 (0.4997) model time 0.0000 (0.0000) loss 6.1243 (6.1243) grad_norm 2.5608 (2.5608) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][10/625] eta 0:03:06 lr 0.000149 wd 0.0500 time 0.2601 (0.3026) data time 0.0007 (0.0462) model time 0.0000 (0.0000) loss 4.9770 (5.9802) grad_norm 2.1544 (2.2254) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][20/625] eta 0:02:49 lr 0.000149 wd 0.0500 time 0.2526 (0.2806) data time 0.0006 (0.0247) model time 0.0000 (0.0000) loss 5.1357 (5.7376) grad_norm 8.6921 (2.4998) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][30/625] eta 0:02:42 lr 0.000149 wd 0.0500 time 0.2557 (0.2728) data time 0.0006 (0.0170) model time 0.0000 (0.0000) loss 5.9673 (5.6863) grad_norm 1.8553 (3.1353) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][40/625] eta 0:02:37 lr 0.000149 wd 0.0500 time 0.2574 (0.2688) data time 0.0011 (0.0131) model time 0.0000 (0.0000) loss 4.6881 (5.6087) grad_norm 1.7847 (3.1238) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][50/625] eta 0:02:33 lr 0.000149 wd 0.0500 time 0.2563 (0.2662) data time 0.0006 (0.0107) model time 0.0000 (0.0000) loss 6.1299 (5.5554) grad_norm 1.9335 (2.9503) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][60/625] eta 0:02:29 lr 0.000148 wd 0.0500 time 0.2546 (0.2644) data time 0.0006 (0.0091) model time 0.2540 (0.2542) loss 4.7986 (5.5605) grad_norm 2.1551 (2.9281) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][70/625] eta 0:02:29 lr 0.000148 wd 0.0500 time 0.2559 (0.2691) data time 0.0006 (0.0079) model time 0.2553 (0.2758) loss 5.1742 (5.5538) grad_norm 2.6158 (2.8317) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][80/625] eta 0:02:25 lr 0.000148 wd 0.0500 time 0.2523 (0.2673) data time 0.0011 (0.0071) model time 0.2512 (0.2682) loss 5.4631 (5.5354) grad_norm 1.9162 (2.7575) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][90/625] eta 0:02:22 lr 0.000148 wd 0.0500 time 0.2593 (0.2660) data time 0.0009 (0.0064) model time 0.2584 (0.2649) loss 6.5100 (5.5207) grad_norm 3.1378 (inf) loss_scale 256.0000 (500.7473) mem 9655MB [2024-08-04 09:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][100/625] eta 0:02:20 lr 0.000148 wd 0.0500 time 0.2534 (0.2668) data time 0.0007 (0.0059) model time 0.2528 (0.2666) loss 5.0422 (5.5216) grad_norm 2.6963 (inf) loss_scale 256.0000 (476.5149) mem 9655MB [2024-08-04 09:30:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][110/625] eta 0:02:17 lr 0.000148 wd 0.0500 time 0.2590 (0.2677) data time 0.0011 (0.0054) model time 0.2578 (0.2681) loss 4.9711 (5.5369) grad_norm 4.2181 (inf) loss_scale 256.0000 (456.6486) mem 9655MB [2024-08-04 09:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][120/625] eta 0:02:14 lr 0.000148 wd 0.0500 time 0.2601 (0.2667) data time 0.0007 (0.0050) model time 0.2594 (0.2662) loss 5.2425 (5.5397) grad_norm 1.9978 (inf) loss_scale 256.0000 (440.0661) mem 9655MB [2024-08-04 09:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][130/625] eta 0:02:11 lr 0.000148 wd 0.0500 time 0.2520 (0.2659) data time 0.0007 (0.0047) model time 0.2513 (0.2648) loss 5.0658 (5.5398) grad_norm 3.5014 (inf) loss_scale 256.0000 (426.0153) mem 9655MB [2024-08-04 09:30:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][140/625] eta 0:02:10 lr 0.000148 wd 0.0500 time 0.4312 (0.2692) data time 0.0007 (0.0045) model time 0.4306 (0.2699) loss 4.8761 (5.5363) grad_norm 1.6520 (inf) loss_scale 256.0000 (413.9574) mem 9655MB [2024-08-04 09:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][150/625] eta 0:02:08 lr 0.000148 wd 0.0500 time 0.2542 (0.2696) data time 0.0007 (0.0042) model time 0.2534 (0.2704) loss 4.8352 (5.5422) grad_norm 3.4476 (inf) loss_scale 256.0000 (403.4967) mem 9655MB [2024-08-04 09:30:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][160/625] eta 0:02:05 lr 0.000148 wd 0.0500 time 0.2668 (0.2689) data time 0.0008 (0.0040) model time 0.2660 (0.2692) loss 5.3945 (5.5575) grad_norm 1.9198 (inf) loss_scale 256.0000 (394.3354) mem 9655MB [2024-08-04 09:30:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][170/625] eta 0:02:02 lr 0.000147 wd 0.0500 time 0.2561 (0.2692) data time 0.0008 (0.0038) model time 0.2553 (0.2696) loss 6.5616 (5.5798) grad_norm 5.5322 (inf) loss_scale 256.0000 (386.2456) mem 9655MB [2024-08-04 09:30:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][180/625] eta 0:01:59 lr 0.000147 wd 0.0500 time 0.2583 (0.2695) data time 0.0008 (0.0037) model time 0.2575 (0.2698) loss 5.3860 (5.5756) grad_norm 1.3916 (inf) loss_scale 256.0000 (379.0497) mem 9655MB [2024-08-04 09:30:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][190/625] eta 0:01:56 lr 0.000147 wd 0.0500 time 0.2537 (0.2687) data time 0.0008 (0.0035) model time 0.2530 (0.2687) loss 4.5395 (5.5611) grad_norm 2.9020 (inf) loss_scale 256.0000 (372.6073) mem 9655MB [2024-08-04 09:30:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][200/625] eta 0:01:53 lr 0.000147 wd 0.0500 time 0.2561 (0.2681) data time 0.0008 (0.0034) model time 0.2553 (0.2678) loss 5.5592 (5.5594) grad_norm 1.9665 (inf) loss_scale 256.0000 (366.8060) mem 9655MB [2024-08-04 09:30:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][210/625] eta 0:01:51 lr 0.000147 wd 0.0500 time 0.2550 (0.2675) data time 0.0007 (0.0033) model time 0.2543 (0.2670) loss 4.7754 (5.5607) grad_norm 1.8303 (inf) loss_scale 256.0000 (361.5545) mem 9655MB [2024-08-04 09:30:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][220/625] eta 0:01:48 lr 0.000147 wd 0.0500 time 0.2562 (0.2669) data time 0.0008 (0.0032) model time 0.2554 (0.2663) loss 5.7747 (5.5716) grad_norm 4.8454 (inf) loss_scale 256.0000 (356.7783) mem 9655MB [2024-08-04 09:30:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][230/625] eta 0:01:45 lr 0.000147 wd 0.0500 time 0.2540 (0.2665) data time 0.0007 (0.0031) model time 0.2533 (0.2656) loss 6.0309 (5.5738) grad_norm 2.7002 (inf) loss_scale 256.0000 (352.4156) mem 9655MB [2024-08-04 09:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][240/625] eta 0:01:42 lr 0.000147 wd 0.0500 time 0.2552 (0.2668) data time 0.0007 (0.0030) model time 0.2544 (0.2660) loss 5.9910 (5.5756) grad_norm 1.6748 (inf) loss_scale 256.0000 (348.4149) mem 9655MB [2024-08-04 09:30:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][250/625] eta 0:01:40 lr 0.000147 wd 0.0500 time 0.2527 (0.2669) data time 0.0007 (0.0029) model time 0.2519 (0.2662) loss 4.7369 (5.5677) grad_norm 4.0290 (inf) loss_scale 256.0000 (344.7331) mem 9655MB [2024-08-04 09:30:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][260/625] eta 0:01:37 lr 0.000147 wd 0.0500 time 0.2518 (0.2665) data time 0.0010 (0.0028) model time 0.2508 (0.2657) loss 5.7152 (5.5806) grad_norm 8.2874 (inf) loss_scale 256.0000 (341.3333) mem 9655MB [2024-08-04 09:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][270/625] eta 0:01:34 lr 0.000147 wd 0.0500 time 0.2545 (0.2668) data time 0.0009 (0.0028) model time 0.2535 (0.2660) loss 5.9461 (5.5930) grad_norm 2.3486 (inf) loss_scale 256.0000 (338.1845) mem 9655MB [2024-08-04 09:30:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][280/625] eta 0:01:31 lr 0.000147 wd 0.0500 time 0.2575 (0.2664) data time 0.0007 (0.0027) model time 0.2568 (0.2655) loss 4.8104 (5.5814) grad_norm 2.7935 (inf) loss_scale 256.0000 (335.2598) mem 9655MB [2024-08-04 09:30:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][290/625] eta 0:01:29 lr 0.000146 wd 0.0500 time 0.2543 (0.2661) data time 0.0008 (0.0026) model time 0.2536 (0.2652) loss 5.6834 (5.5865) grad_norm 1.9152 (inf) loss_scale 256.0000 (332.5361) mem 9655MB [2024-08-04 09:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][300/625] eta 0:01:26 lr 0.000146 wd 0.0500 time 0.2546 (0.2657) data time 0.0006 (0.0026) model time 0.2539 (0.2647) loss 5.3962 (5.5852) grad_norm 2.1880 (inf) loss_scale 256.0000 (329.9934) mem 9655MB [2024-08-04 09:30:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][310/625] eta 0:01:23 lr 0.000146 wd 0.0500 time 0.2519 (0.2654) data time 0.0010 (0.0025) model time 0.2509 (0.2643) loss 5.8786 (5.5849) grad_norm 1.4772 (inf) loss_scale 256.0000 (327.6141) mem 9655MB [2024-08-04 09:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][320/625] eta 0:01:20 lr 0.000146 wd 0.0500 time 0.2607 (0.2652) data time 0.0009 (0.0025) model time 0.2598 (0.2641) loss 5.1071 (5.5880) grad_norm 1.7031 (inf) loss_scale 256.0000 (325.3832) mem 9655MB [2024-08-04 09:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][330/625] eta 0:01:18 lr 0.000146 wd 0.0500 time 0.2595 (0.2653) data time 0.0009 (0.0024) model time 0.2587 (0.2642) loss 4.5799 (5.5904) grad_norm 1.6591 (inf) loss_scale 256.0000 (323.2870) mem 9655MB [2024-08-04 09:31:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][340/625] eta 0:01:15 lr 0.000146 wd 0.0500 time 0.2540 (0.2650) data time 0.0010 (0.0024) model time 0.2530 (0.2639) loss 5.8340 (5.5946) grad_norm 3.5627 (inf) loss_scale 256.0000 (321.3138) mem 9655MB [2024-08-04 09:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][350/625] eta 0:01:12 lr 0.000146 wd 0.0500 time 0.2548 (0.2648) data time 0.0006 (0.0023) model time 0.2542 (0.2637) loss 6.0908 (5.5948) grad_norm 2.0782 (inf) loss_scale 256.0000 (319.4530) mem 9655MB [2024-08-04 09:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][360/625] eta 0:01:10 lr 0.000146 wd 0.0500 time 0.2592 (0.2646) data time 0.0008 (0.0023) model time 0.2584 (0.2634) loss 4.9750 (5.5896) grad_norm 1.6696 (inf) loss_scale 256.0000 (317.6953) mem 9655MB [2024-08-04 09:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][370/625] eta 0:01:07 lr 0.000146 wd 0.0500 time 0.2597 (0.2643) data time 0.0009 (0.0023) model time 0.2587 (0.2631) loss 5.0606 (5.5857) grad_norm 3.0066 (inf) loss_scale 256.0000 (316.0323) mem 9655MB [2024-08-04 09:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][380/625] eta 0:01:04 lr 0.000146 wd 0.0500 time 0.2541 (0.2641) data time 0.0008 (0.0022) model time 0.2534 (0.2629) loss 6.5492 (5.5917) grad_norm 2.0047 (inf) loss_scale 256.0000 (314.4567) mem 9655MB [2024-08-04 09:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][390/625] eta 0:01:02 lr 0.000146 wd 0.0500 time 0.2590 (0.2639) data time 0.0008 (0.0022) model time 0.2582 (0.2626) loss 6.3004 (5.5948) grad_norm 2.1996 (inf) loss_scale 256.0000 (312.9616) mem 9655MB [2024-08-04 09:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][400/625] eta 0:00:59 lr 0.000145 wd 0.0500 time 0.2593 (0.2644) data time 0.0005 (0.0022) model time 0.2587 (0.2632) loss 4.7910 (5.5915) grad_norm 1.8695 (inf) loss_scale 256.0000 (311.5411) mem 9655MB [2024-08-04 09:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][410/625] eta 0:00:56 lr 0.000145 wd 0.0500 time 0.2519 (0.2642) data time 0.0008 (0.0021) model time 0.2510 (0.2630) loss 5.6536 (5.5923) grad_norm 1.7266 (inf) loss_scale 256.0000 (310.1898) mem 9655MB [2024-08-04 09:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][420/625] eta 0:00:54 lr 0.000145 wd 0.0500 time 0.2576 (0.2644) data time 0.0009 (0.0021) model time 0.2567 (0.2633) loss 5.7237 (5.5865) grad_norm 2.2799 (inf) loss_scale 256.0000 (308.9026) mem 9655MB [2024-08-04 09:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][430/625] eta 0:00:51 lr 0.000145 wd 0.0500 time 0.2550 (0.2642) data time 0.0009 (0.0021) model time 0.2541 (0.2631) loss 6.4001 (5.5847) grad_norm 1.8850 (inf) loss_scale 256.0000 (307.6752) mem 9655MB [2024-08-04 09:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][440/625] eta 0:00:48 lr 0.000145 wd 0.0500 time 0.2505 (0.2644) data time 0.0008 (0.0020) model time 0.2496 (0.2633) loss 5.8830 (5.5923) grad_norm 23.8887 (inf) loss_scale 256.0000 (306.5034) mem 9655MB [2024-08-04 09:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][450/625] eta 0:00:46 lr 0.000145 wd 0.0500 time 0.2578 (0.2646) data time 0.0011 (0.0020) model time 0.2568 (0.2634) loss 5.9706 (5.5970) grad_norm 1.7329 (inf) loss_scale 256.0000 (305.3836) mem 9655MB [2024-08-04 09:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][460/625] eta 0:00:43 lr 0.000145 wd 0.0500 time 0.2556 (0.2644) data time 0.0009 (0.0020) model time 0.2547 (0.2632) loss 6.2386 (5.5923) grad_norm 1.6804 (inf) loss_scale 256.0000 (304.3124) mem 9655MB [2024-08-04 09:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][470/625] eta 0:00:40 lr 0.000145 wd 0.0500 time 0.2606 (0.2642) data time 0.0006 (0.0020) model time 0.2600 (0.2630) loss 6.0112 (5.5977) grad_norm 1.4597 (inf) loss_scale 256.0000 (303.2866) mem 9655MB [2024-08-04 09:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][480/625] eta 0:00:38 lr 0.000145 wd 0.0500 time 0.2552 (0.2640) data time 0.0009 (0.0020) model time 0.2543 (0.2628) loss 4.9287 (5.5957) grad_norm 2.0153 (inf) loss_scale 256.0000 (302.3035) mem 9655MB [2024-08-04 09:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][490/625] eta 0:00:35 lr 0.000145 wd 0.0500 time 0.2524 (0.2639) data time 0.0007 (0.0019) model time 0.2517 (0.2627) loss 5.1071 (5.5997) grad_norm 3.7186 (inf) loss_scale 256.0000 (301.3605) mem 9655MB [2024-08-04 09:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][500/625] eta 0:00:32 lr 0.000145 wd 0.0500 time 0.2553 (0.2637) data time 0.0009 (0.0019) model time 0.2544 (0.2625) loss 5.1056 (5.5957) grad_norm 2.3278 (inf) loss_scale 256.0000 (300.4551) mem 9655MB [2024-08-04 09:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][510/625] eta 0:00:30 lr 0.000145 wd 0.0500 time 0.2574 (0.2636) data time 0.0008 (0.0019) model time 0.2566 (0.2624) loss 6.5763 (5.5977) grad_norm 7.4437 (inf) loss_scale 256.0000 (299.5851) mem 9655MB [2024-08-04 09:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][520/625] eta 0:00:27 lr 0.000144 wd 0.0500 time 0.2536 (0.2637) data time 0.0007 (0.0019) model time 0.2530 (0.2625) loss 5.5089 (5.5988) grad_norm 4.2864 (inf) loss_scale 256.0000 (298.7486) mem 9655MB [2024-08-04 09:31:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][530/625] eta 0:00:25 lr 0.000144 wd 0.0500 time 0.2601 (0.2635) data time 0.0008 (0.0019) model time 0.2593 (0.2623) loss 4.7508 (5.5984) grad_norm 2.1953 (inf) loss_scale 256.0000 (297.9435) mem 9655MB [2024-08-04 09:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][540/625] eta 0:00:22 lr 0.000144 wd 0.0500 time 0.2572 (0.2638) data time 0.0007 (0.0018) model time 0.2566 (0.2626) loss 5.5322 (5.6013) grad_norm 2.4458 (inf) loss_scale 256.0000 (297.1682) mem 9655MB [2024-08-04 09:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][550/625] eta 0:00:19 lr 0.000144 wd 0.0500 time 0.2543 (0.2636) data time 0.0007 (0.0018) model time 0.2536 (0.2625) loss 4.5140 (5.5992) grad_norm 2.5829 (inf) loss_scale 256.0000 (296.4211) mem 9655MB [2024-08-04 09:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][560/625] eta 0:00:17 lr 0.000144 wd 0.0500 time 0.2558 (0.2639) data time 0.0009 (0.0018) model time 0.2548 (0.2628) loss 6.0859 (5.5957) grad_norm 2.0394 (inf) loss_scale 256.0000 (295.7005) mem 9655MB [2024-08-04 09:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][570/625] eta 0:00:14 lr 0.000144 wd 0.0500 time 0.2556 (0.2641) data time 0.0007 (0.0018) model time 0.2549 (0.2630) loss 5.8968 (5.5963) grad_norm 2.8481 (inf) loss_scale 256.0000 (295.0053) mem 9655MB [2024-08-04 09:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][580/625] eta 0:00:11 lr 0.000144 wd 0.0500 time 0.2539 (0.2640) data time 0.0009 (0.0018) model time 0.2530 (0.2628) loss 5.4420 (5.5962) grad_norm 3.0970 (inf) loss_scale 256.0000 (294.3339) mem 9655MB [2024-08-04 09:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][590/625] eta 0:00:09 lr 0.000144 wd 0.0500 time 0.2538 (0.2638) data time 0.0008 (0.0018) model time 0.2530 (0.2627) loss 5.8523 (5.5970) grad_norm 4.8091 (inf) loss_scale 256.0000 (293.6853) mem 9655MB [2024-08-04 09:32:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][600/625] eta 0:00:06 lr 0.000144 wd 0.0500 time 0.2575 (0.2637) data time 0.0006 (0.0017) model time 0.2569 (0.2626) loss 5.7266 (5.5938) grad_norm 2.3616 (inf) loss_scale 256.0000 (293.0582) mem 9655MB [2024-08-04 09:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][610/625] eta 0:00:03 lr 0.000144 wd 0.0500 time 0.2539 (0.2639) data time 0.0006 (0.0017) model time 0.2533 (0.2628) loss 5.8882 (5.5922) grad_norm 3.0148 (inf) loss_scale 256.0000 (292.4517) mem 9655MB [2024-08-04 09:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][620/625] eta 0:00:01 lr 0.000144 wd 0.0500 time 0.2526 (0.2637) data time 0.0006 (0.0017) model time 0.2520 (0.2626) loss 5.1252 (5.5962) grad_norm 1.8419 (inf) loss_scale 256.0000 (291.8647) mem 9655MB [2024-08-04 09:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 254 training takes 0:02:44 [2024-08-04 09:32:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:32:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.5835 (0.5835) Acc@1 90.479 (90.479) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9160 (0.7170) Acc@1 82.129 (87.078) Acc@5 96.289 (97.825) Mem 9655MB [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0205 (0.8348) Acc@1 78.369 (83.931) Acc@5 95.410 (96.694) Mem 9655MB [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.585 Acc@5 96.713 [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.59% [2024-08-04 09:32:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:32:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:32:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 0.5835 (0.5835) Acc@1 90.039 (90.039) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9038 (0.7086) Acc@1 81.299 (86.834) Acc@5 96.240 (97.776) Mem 9655MB [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0156 (0.8292) Acc@1 77.979 (83.647) Acc@5 95.361 (96.584) Mem 9655MB [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.345 Acc@5 96.591 [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.35% [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:32:23 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:32:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][0/625] eta 0:07:40 lr 0.000144 wd 0.0500 time 0.7362 (0.7362) data time 0.4934 (0.4934) model time 0.0000 (0.0000) loss 6.5554 (6.5554) grad_norm 7.2081 (7.2081) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][10/625] eta 0:03:03 lr 0.000143 wd 0.0500 time 0.2516 (0.2984) data time 0.0008 (0.0457) model time 0.0000 (0.0000) loss 4.7092 (5.6582) grad_norm 3.3808 (3.4409) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][20/625] eta 0:02:54 lr 0.000143 wd 0.0500 time 0.2569 (0.2879) data time 0.0007 (0.0244) model time 0.0000 (0.0000) loss 4.5980 (5.5576) grad_norm 2.5077 (3.1668) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][30/625] eta 0:02:49 lr 0.000143 wd 0.0500 time 0.2530 (0.2841) data time 0.0007 (0.0168) model time 0.0000 (0.0000) loss 5.8192 (5.6396) grad_norm 4.4372 (3.0257) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][40/625] eta 0:02:44 lr 0.000143 wd 0.0500 time 0.2551 (0.2804) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 5.8891 (5.6844) grad_norm 1.9741 (2.8435) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][50/625] eta 0:02:40 lr 0.000143 wd 0.0500 time 0.2560 (0.2791) data time 0.0007 (0.0106) model time 0.0000 (0.0000) loss 4.6845 (5.6585) grad_norm 1.4869 (2.9081) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][60/625] eta 0:02:35 lr 0.000143 wd 0.0500 time 0.2539 (0.2752) data time 0.0008 (0.0090) model time 0.2531 (0.2544) loss 5.4515 (5.6268) grad_norm 2.0564 (2.9178) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][70/625] eta 0:02:32 lr 0.000143 wd 0.0500 time 0.2514 (0.2748) data time 0.0007 (0.0079) model time 0.2507 (0.2627) loss 7.1830 (5.6676) grad_norm 3.1133 (2.9412) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][80/625] eta 0:02:28 lr 0.000143 wd 0.0500 time 0.2484 (0.2724) data time 0.0009 (0.0071) model time 0.2476 (0.2600) loss 5.7101 (5.6554) grad_norm 2.2144 (2.8762) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][90/625] eta 0:02:24 lr 0.000143 wd 0.0500 time 0.2583 (0.2706) data time 0.0008 (0.0064) model time 0.2575 (0.2587) loss 5.9656 (5.6705) grad_norm 2.1808 (2.7859) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][100/625] eta 0:02:21 lr 0.000143 wd 0.0500 time 0.2593 (0.2692) data time 0.0010 (0.0058) model time 0.2583 (0.2582) loss 5.8771 (5.6955) grad_norm 3.5280 (2.7526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][110/625] eta 0:02:18 lr 0.000143 wd 0.0500 time 0.2533 (0.2680) data time 0.0008 (0.0054) model time 0.2525 (0.2576) loss 6.0080 (5.6747) grad_norm 1.8762 (2.7266) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][120/625] eta 0:02:15 lr 0.000142 wd 0.0500 time 0.2568 (0.2687) data time 0.0008 (0.0050) model time 0.2560 (0.2602) loss 5.3001 (5.6653) grad_norm 1.9505 (2.6850) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][130/625] eta 0:02:12 lr 0.000142 wd 0.0500 time 0.2550 (0.2678) data time 0.0007 (0.0047) model time 0.2543 (0.2596) loss 5.2941 (5.6606) grad_norm 3.2966 (2.6951) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][140/625] eta 0:02:09 lr 0.000142 wd 0.0500 time 0.2748 (0.2680) data time 0.0006 (0.0044) model time 0.2742 (0.2608) loss 6.3686 (5.6535) grad_norm 2.0412 (2.7047) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][150/625] eta 0:02:07 lr 0.000142 wd 0.0500 time 0.2556 (0.2680) data time 0.0009 (0.0042) model time 0.2547 (0.2613) loss 5.4786 (5.6448) grad_norm 3.0563 (2.7070) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][160/625] eta 0:02:04 lr 0.000142 wd 0.0500 time 0.2586 (0.2673) data time 0.0008 (0.0040) model time 0.2578 (0.2609) loss 5.9705 (5.6351) grad_norm 4.0048 (2.6989) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][170/625] eta 0:02:01 lr 0.000142 wd 0.0500 time 0.2521 (0.2667) data time 0.0010 (0.0038) model time 0.2511 (0.2604) loss 6.5538 (5.6352) grad_norm 2.9100 (2.7252) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][180/625] eta 0:01:58 lr 0.000142 wd 0.0500 time 0.2609 (0.2661) data time 0.0006 (0.0037) model time 0.2603 (0.2601) loss 5.5068 (5.6270) grad_norm 2.3184 (2.7260) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][190/625] eta 0:01:55 lr 0.000142 wd 0.0500 time 0.2557 (0.2656) data time 0.0008 (0.0035) model time 0.2548 (0.2597) loss 5.8030 (5.6195) grad_norm 4.5764 (2.7151) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][200/625] eta 0:01:53 lr 0.000142 wd 0.0500 time 0.2562 (0.2662) data time 0.0009 (0.0034) model time 0.2553 (0.2609) loss 6.0967 (5.6166) grad_norm 4.3402 (2.7359) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][210/625] eta 0:01:50 lr 0.000142 wd 0.0500 time 0.2599 (0.2658) data time 0.0007 (0.0033) model time 0.2593 (0.2606) loss 4.7403 (5.6196) grad_norm 1.9405 (2.7171) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][220/625] eta 0:01:47 lr 0.000142 wd 0.0500 time 0.2567 (0.2653) data time 0.0008 (0.0032) model time 0.2559 (0.2602) loss 6.8839 (5.6262) grad_norm 1.7387 (2.7126) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][230/625] eta 0:01:45 lr 0.000142 wd 0.0500 time 0.4601 (0.2666) data time 0.0010 (0.0031) model time 0.4591 (0.2621) loss 4.3334 (5.6219) grad_norm 3.1957 (2.7209) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][240/625] eta 0:01:42 lr 0.000141 wd 0.0500 time 0.2593 (0.2662) data time 0.0012 (0.0030) model time 0.2581 (0.2618) loss 5.3148 (5.6276) grad_norm 2.2314 (2.7471) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][250/625] eta 0:01:39 lr 0.000141 wd 0.0500 time 0.2569 (0.2659) data time 0.0009 (0.0029) model time 0.2560 (0.2616) loss 5.3244 (5.6333) grad_norm 2.4833 (2.7524) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][260/625] eta 0:01:36 lr 0.000141 wd 0.0500 time 0.2592 (0.2655) data time 0.0010 (0.0028) model time 0.2582 (0.2613) loss 5.8789 (5.6313) grad_norm 2.3314 (2.7877) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][270/625] eta 0:01:34 lr 0.000141 wd 0.0500 time 0.2542 (0.2651) data time 0.0007 (0.0027) model time 0.2535 (0.2610) loss 5.9503 (5.6266) grad_norm 1.9537 (2.7677) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][280/625] eta 0:01:31 lr 0.000141 wd 0.0500 time 0.2542 (0.2649) data time 0.0011 (0.0027) model time 0.2531 (0.2608) loss 5.8772 (5.6287) grad_norm 3.1754 (2.7510) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][290/625] eta 0:01:28 lr 0.000141 wd 0.0500 time 0.2532 (0.2646) data time 0.0007 (0.0026) model time 0.2525 (0.2606) loss 4.4103 (5.6335) grad_norm 1.5522 (2.7437) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][300/625] eta 0:01:26 lr 0.000141 wd 0.0500 time 0.4315 (0.2666) data time 0.0007 (0.0026) model time 0.4309 (0.2632) loss 5.0958 (5.6400) grad_norm 2.6249 (2.7880) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][310/625] eta 0:01:23 lr 0.000141 wd 0.0500 time 0.2510 (0.2663) data time 0.0008 (0.0025) model time 0.2501 (0.2628) loss 5.9547 (5.6409) grad_norm 1.9264 (2.7788) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][320/625] eta 0:01:21 lr 0.000141 wd 0.0500 time 0.2526 (0.2659) data time 0.0009 (0.0025) model time 0.2517 (0.2625) loss 6.5134 (5.6468) grad_norm 2.8309 (2.8711) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][330/625] eta 0:01:18 lr 0.000141 wd 0.0500 time 0.2560 (0.2660) data time 0.0007 (0.0024) model time 0.2554 (0.2627) loss 5.9982 (5.6508) grad_norm 2.0473 (2.8625) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][340/625] eta 0:01:15 lr 0.000141 wd 0.0500 time 0.2517 (0.2657) data time 0.0007 (0.0024) model time 0.2509 (0.2624) loss 6.3963 (5.6500) grad_norm 2.2706 (2.8512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][350/625] eta 0:01:13 lr 0.000141 wd 0.0500 time 0.2596 (0.2660) data time 0.0006 (0.0023) model time 0.2590 (0.2629) loss 5.5476 (5.6499) grad_norm 2.8727 (2.8538) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][360/625] eta 0:01:10 lr 0.000140 wd 0.0500 time 0.2567 (0.2662) data time 0.0007 (0.0023) model time 0.2559 (0.2632) loss 6.3759 (5.6549) grad_norm 2.0359 (2.8665) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][370/625] eta 0:01:07 lr 0.000140 wd 0.0500 time 0.2532 (0.2660) data time 0.0007 (0.0023) model time 0.2525 (0.2630) loss 5.0403 (5.6551) grad_norm 4.2190 (2.8868) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][380/625] eta 0:01:05 lr 0.000140 wd 0.0500 time 0.2547 (0.2657) data time 0.0008 (0.0022) model time 0.2539 (0.2627) loss 5.1781 (5.6514) grad_norm 3.0593 (2.8921) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][390/625] eta 0:01:02 lr 0.000140 wd 0.0500 time 0.2531 (0.2655) data time 0.0007 (0.0022) model time 0.2525 (0.2625) loss 5.0521 (5.6410) grad_norm 3.3253 (2.8949) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][400/625] eta 0:00:59 lr 0.000140 wd 0.0500 time 0.2546 (0.2653) data time 0.0011 (0.0022) model time 0.2535 (0.2623) loss 5.0164 (5.6392) grad_norm 1.9360 (2.9083) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][410/625] eta 0:00:56 lr 0.000140 wd 0.0500 time 0.2574 (0.2651) data time 0.0007 (0.0021) model time 0.2567 (0.2621) loss 5.8549 (5.6340) grad_norm 2.2000 (2.9000) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][420/625] eta 0:00:54 lr 0.000140 wd 0.0500 time 0.2574 (0.2653) data time 0.0008 (0.0021) model time 0.2566 (0.2624) loss 4.6820 (5.6357) grad_norm 1.9754 (2.8852) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][430/625] eta 0:00:51 lr 0.000140 wd 0.0500 time 0.2557 (0.2651) data time 0.0007 (0.0021) model time 0.2550 (0.2622) loss 5.9133 (5.6365) grad_norm 2.9663 (2.8715) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][440/625] eta 0:00:48 lr 0.000140 wd 0.0500 time 0.2568 (0.2648) data time 0.0008 (0.0020) model time 0.2560 (0.2620) loss 6.0244 (5.6377) grad_norm 3.7584 (2.8613) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][450/625] eta 0:00:46 lr 0.000140 wd 0.0500 time 0.2570 (0.2649) data time 0.0007 (0.0020) model time 0.2563 (0.2621) loss 5.6399 (5.6319) grad_norm 2.1802 (2.8473) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][460/625] eta 0:00:43 lr 0.000140 wd 0.0500 time 0.2522 (0.2647) data time 0.0009 (0.0020) model time 0.2512 (0.2619) loss 5.3238 (5.6344) grad_norm 2.6736 (2.8495) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][470/625] eta 0:00:40 lr 0.000140 wd 0.0500 time 0.2518 (0.2645) data time 0.0009 (0.0020) model time 0.2509 (0.2617) loss 5.6008 (5.6307) grad_norm 1.9193 (2.8400) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][480/625] eta 0:00:38 lr 0.000139 wd 0.0500 time 0.2566 (0.2647) data time 0.0007 (0.0020) model time 0.2558 (0.2621) loss 5.7704 (5.6358) grad_norm 4.3253 (2.8406) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][490/625] eta 0:00:35 lr 0.000139 wd 0.0500 time 0.2554 (0.2645) data time 0.0006 (0.0019) model time 0.2548 (0.2619) loss 6.0086 (5.6368) grad_norm 2.3907 (2.8374) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][500/625] eta 0:00:33 lr 0.000139 wd 0.0500 time 0.2543 (0.2643) data time 0.0011 (0.0019) model time 0.2533 (0.2617) loss 6.3337 (5.6349) grad_norm 1.8210 (2.8250) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][510/625] eta 0:00:30 lr 0.000139 wd 0.0500 time 0.2530 (0.2642) data time 0.0009 (0.0019) model time 0.2520 (0.2616) loss 6.0804 (5.6376) grad_norm 2.6052 (2.8138) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][520/625] eta 0:00:27 lr 0.000139 wd 0.0500 time 0.2559 (0.2640) data time 0.0006 (0.0019) model time 0.2553 (0.2615) loss 5.9801 (5.6362) grad_norm 4.6603 (2.8104) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][530/625] eta 0:00:25 lr 0.000139 wd 0.0500 time 0.2522 (0.2643) data time 0.0008 (0.0019) model time 0.2514 (0.2618) loss 5.5137 (5.6337) grad_norm 2.8759 (2.8075) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][540/625] eta 0:00:22 lr 0.000139 wd 0.0500 time 0.2503 (0.2641) data time 0.0009 (0.0018) model time 0.2493 (0.2616) loss 5.6192 (5.6281) grad_norm 2.4796 (2.7965) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][550/625] eta 0:00:19 lr 0.000139 wd 0.0500 time 0.2528 (0.2640) data time 0.0007 (0.0018) model time 0.2521 (0.2615) loss 5.7535 (5.6276) grad_norm 1.7156 (2.7839) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][560/625] eta 0:00:17 lr 0.000139 wd 0.0500 time 0.2547 (0.2638) data time 0.0008 (0.0018) model time 0.2540 (0.2614) loss 4.5305 (5.6287) grad_norm 2.8944 (2.7871) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][570/625] eta 0:00:14 lr 0.000139 wd 0.0500 time 0.2525 (0.2637) data time 0.0009 (0.0018) model time 0.2516 (0.2612) loss 6.0335 (5.6319) grad_norm 3.1428 (2.7833) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][580/625] eta 0:00:11 lr 0.000139 wd 0.0500 time 0.2554 (0.2636) data time 0.0006 (0.0018) model time 0.2547 (0.2611) loss 5.2511 (5.6356) grad_norm 2.4644 (2.7794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][590/625] eta 0:00:09 lr 0.000139 wd 0.0500 time 0.2534 (0.2634) data time 0.0009 (0.0018) model time 0.2526 (0.2610) loss 6.4687 (5.6386) grad_norm 3.5039 (2.7892) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][600/625] eta 0:00:06 lr 0.000138 wd 0.0500 time 0.4376 (0.2640) data time 0.0009 (0.0017) model time 0.4367 (0.2617) loss 4.9560 (5.6369) grad_norm 3.4545 (2.8267) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][610/625] eta 0:00:03 lr 0.000138 wd 0.0500 time 0.2522 (0.2638) data time 0.0003 (0.0017) model time 0.2519 (0.2615) loss 6.2948 (5.6374) grad_norm 3.5199 (2.8487) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][620/625] eta 0:00:01 lr 0.000138 wd 0.0500 time 0.2529 (0.2637) data time 0.0003 (0.0017) model time 0.2526 (0.2614) loss 6.1693 (5.6433) grad_norm 3.0636 (2.8514) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 255 training takes 0:02:44 [2024-08-04 09:35:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:35:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5889 (0.5889) Acc@1 90.088 (90.088) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 09:35:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.096) Loss 0.9111 (0.7205) Acc@1 81.348 (86.799) Acc@5 96.338 (97.798) Mem 9655MB [2024-08-04 09:35:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0234 (0.8369) Acc@1 77.930 (83.708) Acc@5 95.508 (96.659) Mem 9655MB [2024-08-04 09:35:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.397 Acc@5 96.689 [2024-08-04 09:35:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:35:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.716 (0.716) Loss 0.5840 (0.5840) Acc@1 90.137 (90.137) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.124) Loss 0.9038 (0.7087) Acc@1 81.396 (86.901) Acc@5 96.240 (97.758) Mem 9655MB [2024-08-04 09:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0156 (0.8291) Acc@1 77.979 (83.684) Acc@5 95.410 (96.577) Mem 9655MB [2024-08-04 09:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.377 Acc@5 96.583 [2024-08-04 09:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.38% [2024-08-04 09:35:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:35:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][0/625] eta 0:07:33 lr 0.000138 wd 0.0500 time 0.7248 (0.7248) data time 0.4863 (0.4863) model time 0.0000 (0.0000) loss 6.1855 (6.1855) grad_norm 2.1771 (2.1771) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][10/625] eta 0:03:08 lr 0.000138 wd 0.0500 time 0.3588 (0.3073) data time 0.0007 (0.0450) model time 0.0000 (0.0000) loss 6.3214 (5.4898) grad_norm 2.1031 (2.3934) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][20/625] eta 0:02:50 lr 0.000138 wd 0.0500 time 0.2486 (0.2822) data time 0.0009 (0.0240) model time 0.0000 (0.0000) loss 4.9528 (5.5547) grad_norm 2.2192 (2.2915) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][30/625] eta 0:02:42 lr 0.000138 wd 0.0500 time 0.2541 (0.2738) data time 0.0007 (0.0166) model time 0.0000 (0.0000) loss 6.2870 (5.6427) grad_norm 2.1963 (2.2312) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][40/625] eta 0:02:37 lr 0.000138 wd 0.0500 time 0.2568 (0.2695) data time 0.0007 (0.0127) model time 0.0000 (0.0000) loss 5.8858 (5.5921) grad_norm 2.6592 (2.3132) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][50/625] eta 0:02:33 lr 0.000138 wd 0.0500 time 0.2568 (0.2667) data time 0.0006 (0.0104) model time 0.0000 (0.0000) loss 5.3976 (5.5896) grad_norm 4.0348 (2.3520) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][60/625] eta 0:02:29 lr 0.000138 wd 0.0500 time 0.2509 (0.2648) data time 0.0009 (0.0088) model time 0.2500 (0.2542) loss 6.2357 (5.5486) grad_norm 1.6395 (2.4895) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][70/625] eta 0:02:27 lr 0.000138 wd 0.0500 time 0.2547 (0.2652) data time 0.0012 (0.0077) model time 0.2535 (0.2607) loss 5.8800 (5.5495) grad_norm 5.3541 (2.9168) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][80/625] eta 0:02:23 lr 0.000138 wd 0.0500 time 0.2595 (0.2640) data time 0.0007 (0.0069) model time 0.2588 (0.2585) loss 4.4268 (5.5577) grad_norm 1.8023 (2.8914) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][90/625] eta 0:02:20 lr 0.000137 wd 0.0500 time 0.2542 (0.2630) data time 0.0008 (0.0063) model time 0.2534 (0.2574) loss 5.0445 (5.5930) grad_norm 2.1616 (2.8203) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][100/625] eta 0:02:18 lr 0.000137 wd 0.0500 time 0.2583 (0.2637) data time 0.0011 (0.0058) model time 0.2573 (0.2596) loss 6.7152 (5.6106) grad_norm 1.7925 (2.7506) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][110/625] eta 0:02:16 lr 0.000137 wd 0.0500 time 0.2522 (0.2647) data time 0.0010 (0.0054) model time 0.2511 (0.2619) loss 5.9987 (5.6259) grad_norm 1.8776 (2.7329) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][120/625] eta 0:02:14 lr 0.000137 wd 0.0500 time 0.2538 (0.2655) data time 0.0007 (0.0050) model time 0.2531 (0.2635) loss 6.1410 (5.6266) grad_norm 2.4066 (2.7120) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][130/625] eta 0:02:11 lr 0.000137 wd 0.0500 time 0.2588 (0.2662) data time 0.0012 (0.0047) model time 0.2576 (0.2648) loss 6.3120 (5.6378) grad_norm 4.9229 (2.7075) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][140/625] eta 0:02:08 lr 0.000137 wd 0.0500 time 0.2530 (0.2654) data time 0.0013 (0.0044) model time 0.2518 (0.2636) loss 5.7918 (5.6589) grad_norm 1.6452 (2.7058) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][150/625] eta 0:02:05 lr 0.000137 wd 0.0500 time 0.2539 (0.2647) data time 0.0008 (0.0042) model time 0.2532 (0.2627) loss 4.7576 (5.6453) grad_norm 4.4821 (2.7212) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][160/625] eta 0:02:02 lr 0.000137 wd 0.0500 time 0.2557 (0.2642) data time 0.0007 (0.0040) model time 0.2550 (0.2620) loss 4.6607 (5.6426) grad_norm 2.3758 (2.7257) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][170/625] eta 0:01:59 lr 0.000137 wd 0.0500 time 0.2551 (0.2637) data time 0.0008 (0.0038) model time 0.2543 (0.2614) loss 5.4838 (5.6342) grad_norm 1.6474 (2.7049) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][180/625] eta 0:01:57 lr 0.000137 wd 0.0500 time 0.2568 (0.2641) data time 0.0008 (0.0037) model time 0.2559 (0.2622) loss 4.8343 (5.6413) grad_norm 3.2310 (2.6958) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][190/625] eta 0:01:54 lr 0.000137 wd 0.0500 time 0.2572 (0.2637) data time 0.0008 (0.0035) model time 0.2563 (0.2617) loss 6.3678 (5.6581) grad_norm 2.7375 (2.6777) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][200/625] eta 0:01:51 lr 0.000137 wd 0.0500 time 0.2620 (0.2634) data time 0.0008 (0.0034) model time 0.2613 (0.2613) loss 5.3857 (5.6597) grad_norm 2.4897 (2.6529) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][210/625] eta 0:01:49 lr 0.000136 wd 0.0500 time 0.2561 (0.2632) data time 0.0009 (0.0033) model time 0.2552 (0.2611) loss 6.5547 (5.6523) grad_norm 2.6547 (2.6665) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][220/625] eta 0:01:46 lr 0.000136 wd 0.0500 time 0.2535 (0.2629) data time 0.0012 (0.0032) model time 0.2523 (0.2607) loss 6.3097 (5.6389) grad_norm 2.4347 (2.6697) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][230/625] eta 0:01:43 lr 0.000136 wd 0.0500 time 0.2521 (0.2626) data time 0.0008 (0.0031) model time 0.2514 (0.2605) loss 4.7213 (5.6293) grad_norm 3.0703 (2.8682) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][240/625] eta 0:01:41 lr 0.000136 wd 0.0500 time 0.2566 (0.2632) data time 0.0006 (0.0030) model time 0.2559 (0.2613) loss 5.6558 (5.6259) grad_norm 3.9165 (2.8769) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][250/625] eta 0:01:38 lr 0.000136 wd 0.0500 time 0.2537 (0.2629) data time 0.0007 (0.0029) model time 0.2529 (0.2610) loss 5.1590 (5.6210) grad_norm 2.6732 (2.8803) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][260/625] eta 0:01:36 lr 0.000136 wd 0.0500 time 0.2573 (0.2635) data time 0.0007 (0.0028) model time 0.2566 (0.2618) loss 6.2445 (5.6174) grad_norm 2.9773 (2.8649) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][270/625] eta 0:01:33 lr 0.000136 wd 0.0500 time 0.2551 (0.2647) data time 0.0008 (0.0027) model time 0.2543 (0.2632) loss 4.5767 (5.6105) grad_norm 2.8040 (2.8708) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][280/625] eta 0:01:31 lr 0.000136 wd 0.0500 time 0.2601 (0.2644) data time 0.0008 (0.0027) model time 0.2593 (0.2629) loss 6.5580 (5.6104) grad_norm 2.2138 (2.8769) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][290/625] eta 0:01:28 lr 0.000136 wd 0.0500 time 0.2561 (0.2646) data time 0.0008 (0.0026) model time 0.2553 (0.2632) loss 5.0741 (5.6090) grad_norm 2.6210 (2.8615) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][300/625] eta 0:01:26 lr 0.000136 wd 0.0500 time 0.2617 (0.2648) data time 0.0009 (0.0026) model time 0.2608 (0.2634) loss 5.9028 (5.6067) grad_norm 2.2690 (2.8467) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][310/625] eta 0:01:23 lr 0.000136 wd 0.0500 time 0.2553 (0.2645) data time 0.0011 (0.0025) model time 0.2542 (0.2631) loss 4.6044 (5.6047) grad_norm 2.5259 (2.8508) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][320/625] eta 0:01:20 lr 0.000136 wd 0.0500 time 0.2579 (0.2642) data time 0.0008 (0.0025) model time 0.2571 (0.2628) loss 5.2435 (5.5968) grad_norm 1.8483 (3.1135) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][330/625] eta 0:01:17 lr 0.000135 wd 0.0500 time 0.2570 (0.2639) data time 0.0006 (0.0024) model time 0.2564 (0.2625) loss 5.5639 (5.5937) grad_norm 2.6065 (3.0861) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][340/625] eta 0:01:15 lr 0.000135 wd 0.0500 time 0.2567 (0.2642) data time 0.0006 (0.0024) model time 0.2560 (0.2629) loss 5.2216 (5.5916) grad_norm 2.6730 (3.0668) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][350/625] eta 0:01:12 lr 0.000135 wd 0.0500 time 0.2537 (0.2650) data time 0.0007 (0.0023) model time 0.2530 (0.2638) loss 6.0511 (5.5921) grad_norm 2.1239 (3.0733) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][360/625] eta 0:01:10 lr 0.000135 wd 0.0500 time 0.2566 (0.2651) data time 0.0010 (0.0023) model time 0.2556 (0.2639) loss 6.1780 (5.5920) grad_norm 1.9956 (3.0675) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][370/625] eta 0:01:07 lr 0.000135 wd 0.0500 time 0.2527 (0.2649) data time 0.0008 (0.0022) model time 0.2519 (0.2636) loss 6.2788 (5.5958) grad_norm 2.2915 (3.0452) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][380/625] eta 0:01:04 lr 0.000135 wd 0.0500 time 0.2524 (0.2649) data time 0.0010 (0.0022) model time 0.2514 (0.2637) loss 5.4244 (5.5941) grad_norm 2.3603 (3.0243) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][390/625] eta 0:01:02 lr 0.000135 wd 0.0500 time 0.2536 (0.2647) data time 0.0008 (0.0022) model time 0.2528 (0.2634) loss 6.1742 (5.5939) grad_norm 2.2214 (3.0116) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:36:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][400/625] eta 0:00:59 lr 0.000135 wd 0.0500 time 0.2526 (0.2649) data time 0.0008 (0.0021) model time 0.2517 (0.2637) loss 6.3533 (5.5974) grad_norm 2.0488 (3.0018) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][410/625] eta 0:00:57 lr 0.000135 wd 0.0500 time 0.4417 (0.2652) data time 0.0007 (0.0021) model time 0.4411 (0.2640) loss 5.5718 (5.5934) grad_norm 1.8013 (2.9875) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][420/625] eta 0:00:54 lr 0.000135 wd 0.0500 time 0.2624 (0.2655) data time 0.0009 (0.0021) model time 0.2615 (0.2644) loss 5.3351 (5.5967) grad_norm 1.7807 (2.9787) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][430/625] eta 0:00:51 lr 0.000135 wd 0.0500 time 0.2586 (0.2652) data time 0.0008 (0.0021) model time 0.2578 (0.2641) loss 6.1016 (5.5919) grad_norm 1.9926 (2.9687) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][440/625] eta 0:00:49 lr 0.000135 wd 0.0500 time 0.2554 (0.2650) data time 0.0009 (0.0020) model time 0.2544 (0.2639) loss 6.2055 (5.5913) grad_norm 2.3385 (2.9670) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][450/625] eta 0:00:46 lr 0.000134 wd 0.0500 time 0.2560 (0.2652) data time 0.0011 (0.0020) model time 0.2550 (0.2641) loss 6.0904 (5.5928) grad_norm 3.4604 (2.9618) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][460/625] eta 0:00:43 lr 0.000134 wd 0.0500 time 0.2549 (0.2650) data time 0.0007 (0.0020) model time 0.2543 (0.2639) loss 5.2104 (5.5903) grad_norm 3.2569 (2.9629) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][470/625] eta 0:00:41 lr 0.000134 wd 0.0500 time 0.2559 (0.2648) data time 0.0010 (0.0020) model time 0.2549 (0.2637) loss 4.7683 (5.5870) grad_norm 13.1349 (3.0057) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][480/625] eta 0:00:38 lr 0.000134 wd 0.0500 time 0.2583 (0.2647) data time 0.0009 (0.0019) model time 0.2574 (0.2635) loss 5.3958 (5.5780) grad_norm 2.3758 (2.9968) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][490/625] eta 0:00:35 lr 0.000134 wd 0.0500 time 0.2531 (0.2645) data time 0.0009 (0.0019) model time 0.2522 (0.2633) loss 4.2644 (5.5730) grad_norm 1.5157 (2.9864) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][500/625] eta 0:00:33 lr 0.000134 wd 0.0500 time 0.2560 (0.2644) data time 0.0011 (0.0019) model time 0.2550 (0.2632) loss 4.5603 (5.5691) grad_norm 3.7654 (2.9871) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][510/625] eta 0:00:30 lr 0.000134 wd 0.0500 time 0.2557 (0.2646) data time 0.0008 (0.0019) model time 0.2548 (0.2634) loss 4.6187 (5.5673) grad_norm 3.1278 (2.9865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][520/625] eta 0:00:27 lr 0.000134 wd 0.0500 time 0.2537 (0.2644) data time 0.0008 (0.0019) model time 0.2529 (0.2632) loss 5.7508 (5.5641) grad_norm 2.2777 (2.9994) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][530/625] eta 0:00:25 lr 0.000134 wd 0.0500 time 0.2503 (0.2642) data time 0.0009 (0.0018) model time 0.2495 (0.2630) loss 4.6434 (5.5682) grad_norm 2.5080 (2.9921) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][540/625] eta 0:00:22 lr 0.000134 wd 0.0500 time 0.2549 (0.2643) data time 0.0011 (0.0018) model time 0.2539 (0.2631) loss 4.9462 (5.5630) grad_norm 5.6947 (3.0056) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][550/625] eta 0:00:19 lr 0.000134 wd 0.0500 time 0.2533 (0.2641) data time 0.0008 (0.0018) model time 0.2525 (0.2629) loss 6.1180 (5.5679) grad_norm 2.3818 (2.9974) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][560/625] eta 0:00:17 lr 0.000134 wd 0.0500 time 0.2540 (0.2640) data time 0.0010 (0.0018) model time 0.2530 (0.2628) loss 6.1530 (5.5734) grad_norm 2.0340 (2.9813) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][570/625] eta 0:00:14 lr 0.000133 wd 0.0500 time 0.2575 (0.2639) data time 0.0009 (0.0018) model time 0.2566 (0.2627) loss 4.7668 (5.5654) grad_norm 2.1697 (2.9596) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][580/625] eta 0:00:11 lr 0.000133 wd 0.0500 time 0.2507 (0.2637) data time 0.0010 (0.0018) model time 0.2497 (0.2625) loss 6.5354 (5.5694) grad_norm 1.5588 (2.9436) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][590/625] eta 0:00:09 lr 0.000133 wd 0.0500 time 0.2539 (0.2636) data time 0.0007 (0.0018) model time 0.2532 (0.2623) loss 5.8104 (5.5721) grad_norm 3.4223 (2.9396) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][600/625] eta 0:00:06 lr 0.000133 wd 0.0500 time 0.2555 (0.2637) data time 0.0008 (0.0017) model time 0.2547 (0.2625) loss 5.9607 (5.5742) grad_norm 2.9141 (2.9346) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][610/625] eta 0:00:03 lr 0.000133 wd 0.0500 time 0.2525 (0.2636) data time 0.0004 (0.0017) model time 0.2522 (0.2624) loss 5.5014 (5.5691) grad_norm 1.8656 (2.9268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][620/625] eta 0:00:01 lr 0.000133 wd 0.0500 time 0.2519 (0.2634) data time 0.0006 (0.0017) model time 0.2513 (0.2622) loss 6.1347 (5.5690) grad_norm 3.0913 (2.9170) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 256 training takes 0:02:44 [2024-08-04 09:37:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:37:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.505 (0.505) Loss 0.5918 (0.5918) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 09:37:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.8994 (0.7178) Acc@1 81.445 (86.830) Acc@5 96.289 (97.798) Mem 9655MB [2024-08-04 09:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0244 (0.8344) Acc@1 78.076 (83.840) Acc@5 95.752 (96.701) Mem 9655MB [2024-08-04 09:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.543 Acc@5 96.701 [2024-08-04 09:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.764 (0.764) Loss 0.5840 (0.5840) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.059 (0.127) Loss 0.9028 (0.7087) Acc@1 81.348 (86.896) Acc@5 96.191 (97.763) Mem 9655MB [2024-08-04 09:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.054 (0.093) Loss 1.0156 (0.8291) Acc@1 78.076 (83.680) Acc@5 95.410 (96.577) Mem 9655MB [2024-08-04 09:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.377 Acc@5 96.589 [2024-08-04 09:38:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][0/625] eta 0:11:07 lr 0.000133 wd 0.0500 time 1.0687 (1.0687) data time 0.4449 (0.4449) model time 0.0000 (0.0000) loss 5.8509 (5.8509) grad_norm 1.7127 (1.7127) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][10/625] eta 0:03:22 lr 0.000133 wd 0.0500 time 0.2536 (0.3293) data time 0.0009 (0.0412) model time 0.0000 (0.0000) loss 6.1480 (5.6807) grad_norm 1.9844 (2.8840) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][20/625] eta 0:02:57 lr 0.000133 wd 0.0500 time 0.2545 (0.2942) data time 0.0006 (0.0220) model time 0.0000 (0.0000) loss 6.3099 (5.7263) grad_norm 2.1664 (2.8983) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][30/625] eta 0:02:47 lr 0.000133 wd 0.0500 time 0.2517 (0.2817) data time 0.0015 (0.0152) model time 0.0000 (0.0000) loss 5.2760 (5.7023) grad_norm 2.4950 (2.6952) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][40/625] eta 0:02:43 lr 0.000133 wd 0.0500 time 0.2561 (0.2798) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 4.9497 (5.6675) grad_norm 3.3178 (2.6331) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][50/625] eta 0:02:38 lr 0.000133 wd 0.0500 time 0.2521 (0.2752) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 5.5777 (5.6179) grad_norm 2.9897 (2.8707) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][60/625] eta 0:02:33 lr 0.000133 wd 0.0500 time 0.2573 (0.2719) data time 0.0008 (0.0083) model time 0.2565 (0.2544) loss 5.7349 (5.5701) grad_norm 3.8560 (2.9235) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][70/625] eta 0:02:29 lr 0.000132 wd 0.0500 time 0.2605 (0.2696) data time 0.0008 (0.0072) model time 0.2598 (0.2544) loss 5.9566 (5.5848) grad_norm 1.8393 (2.8423) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][80/625] eta 0:02:25 lr 0.000132 wd 0.0500 time 0.2539 (0.2677) data time 0.0009 (0.0064) model time 0.2530 (0.2542) loss 5.0225 (5.6255) grad_norm 3.1183 (2.7578) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][90/625] eta 0:02:22 lr 0.000132 wd 0.0500 time 0.2595 (0.2664) data time 0.0006 (0.0058) model time 0.2590 (0.2543) loss 5.1538 (5.6735) grad_norm 2.1152 (2.8311) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][100/625] eta 0:02:19 lr 0.000132 wd 0.0500 time 0.2547 (0.2653) data time 0.0009 (0.0053) model time 0.2538 (0.2544) loss 6.6966 (5.6740) grad_norm 1.9925 (2.8173) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][110/625] eta 0:02:17 lr 0.000132 wd 0.0500 time 0.2570 (0.2661) data time 0.0010 (0.0049) model time 0.2560 (0.2575) loss 5.6757 (5.6735) grad_norm 2.4023 (2.8557) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][120/625] eta 0:02:13 lr 0.000132 wd 0.0500 time 0.2574 (0.2653) data time 0.0020 (0.0046) model time 0.2554 (0.2572) loss 4.6069 (5.6588) grad_norm 1.5941 (2.8768) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][130/625] eta 0:02:11 lr 0.000132 wd 0.0500 time 0.2567 (0.2647) data time 0.0009 (0.0043) model time 0.2559 (0.2571) loss 4.9865 (5.6559) grad_norm 1.9461 (2.8960) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][140/625] eta 0:02:08 lr 0.000132 wd 0.0500 time 0.2538 (0.2641) data time 0.0008 (0.0041) model time 0.2530 (0.2569) loss 5.4464 (5.6765) grad_norm 3.0573 (2.8669) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][150/625] eta 0:02:05 lr 0.000132 wd 0.0500 time 0.2524 (0.2636) data time 0.0010 (0.0039) model time 0.2514 (0.2567) loss 4.8469 (5.6645) grad_norm 2.2900 (2.8355) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][160/625] eta 0:02:02 lr 0.000132 wd 0.0500 time 0.2609 (0.2632) data time 0.0006 (0.0037) model time 0.2603 (0.2567) loss 5.1257 (5.6505) grad_norm 2.7423 (2.8270) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][170/625] eta 0:02:00 lr 0.000132 wd 0.0500 time 0.2569 (0.2638) data time 0.0008 (0.0035) model time 0.2561 (0.2581) loss 6.5118 (5.6429) grad_norm 2.2954 (2.8071) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][180/625] eta 0:01:57 lr 0.000132 wd 0.0500 time 0.2561 (0.2635) data time 0.0007 (0.0034) model time 0.2554 (0.2580) loss 4.7988 (5.6365) grad_norm 1.9516 (2.7608) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][190/625] eta 0:01:54 lr 0.000131 wd 0.0500 time 0.2526 (0.2631) data time 0.0008 (0.0033) model time 0.2518 (0.2578) loss 5.7965 (5.6296) grad_norm 1.6878 (2.7343) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][200/625] eta 0:01:51 lr 0.000131 wd 0.0500 time 0.2551 (0.2633) data time 0.0008 (0.0031) model time 0.2543 (0.2584) loss 4.5392 (5.6333) grad_norm 2.3377 (2.7326) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][210/625] eta 0:01:49 lr 0.000131 wd 0.0500 time 0.2536 (0.2631) data time 0.0007 (0.0030) model time 0.2529 (0.2583) loss 5.0805 (5.6380) grad_norm 2.0503 (2.7010) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:39:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][220/625] eta 0:01:46 lr 0.000131 wd 0.0500 time 0.2590 (0.2636) data time 0.0018 (0.0029) model time 0.2572 (0.2593) loss 5.3476 (5.6461) grad_norm 2.5159 (2.6976) loss_scale 512.0000 (266.4253) mem 9655MB [2024-08-04 09:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][230/625] eta 0:01:44 lr 0.000131 wd 0.0500 time 0.4518 (0.2641) data time 0.0008 (0.0029) model time 0.4510 (0.2601) loss 5.2769 (5.6398) grad_norm 2.2413 (2.6944) loss_scale 512.0000 (277.0563) mem 9655MB [2024-08-04 09:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][240/625] eta 0:01:41 lr 0.000131 wd 0.0500 time 0.2541 (0.2638) data time 0.0010 (0.0028) model time 0.2531 (0.2599) loss 5.0464 (5.6287) grad_norm 1.6788 (2.6713) loss_scale 512.0000 (286.8050) mem 9655MB [2024-08-04 09:39:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][250/625] eta 0:01:38 lr 0.000131 wd 0.0500 time 0.2585 (0.2635) data time 0.0007 (0.0027) model time 0.2578 (0.2596) loss 5.9333 (5.6285) grad_norm 4.2329 (2.6848) loss_scale 512.0000 (295.7769) mem 9655MB [2024-08-04 09:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][260/625] eta 0:01:36 lr 0.000131 wd 0.0500 time 0.2538 (0.2632) data time 0.0010 (0.0026) model time 0.2528 (0.2594) loss 5.7497 (5.6399) grad_norm 3.1443 (2.6873) loss_scale 512.0000 (304.0613) mem 9655MB [2024-08-04 09:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][270/625] eta 0:01:33 lr 0.000131 wd 0.0500 time 0.2610 (0.2637) data time 0.0008 (0.0026) model time 0.2602 (0.2601) loss 4.4995 (5.6375) grad_norm 1.6239 (2.6925) loss_scale 512.0000 (311.7343) mem 9655MB [2024-08-04 09:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][280/625] eta 0:01:30 lr 0.000131 wd 0.0500 time 0.2630 (0.2634) data time 0.0010 (0.0025) model time 0.2620 (0.2599) loss 6.2280 (5.6362) grad_norm 2.9942 (2.6936) loss_scale 512.0000 (318.8612) mem 9655MB [2024-08-04 09:39:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][290/625] eta 0:01:28 lr 0.000131 wd 0.0500 time 0.2607 (0.2632) data time 0.0005 (0.0025) model time 0.2602 (0.2597) loss 6.1172 (5.6380) grad_norm 2.0423 (2.6771) loss_scale 512.0000 (325.4983) mem 9655MB [2024-08-04 09:39:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][300/625] eta 0:01:25 lr 0.000131 wd 0.0500 time 0.2579 (0.2630) data time 0.0010 (0.0024) model time 0.2569 (0.2596) loss 6.3926 (5.6400) grad_norm 4.8275 (2.7146) loss_scale 512.0000 (331.6944) mem 9655MB [2024-08-04 09:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][310/625] eta 0:01:22 lr 0.000130 wd 0.0500 time 0.2607 (0.2634) data time 0.0008 (0.0024) model time 0.2599 (0.2601) loss 6.4721 (5.6400) grad_norm 2.4683 (2.7211) loss_scale 512.0000 (337.4920) mem 9655MB [2024-08-04 09:39:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][320/625] eta 0:01:20 lr 0.000130 wd 0.0500 time 0.2608 (0.2631) data time 0.0006 (0.0023) model time 0.2602 (0.2599) loss 5.7387 (5.6468) grad_norm 1.9502 (2.7208) loss_scale 512.0000 (342.9283) mem 9655MB [2024-08-04 09:39:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][330/625] eta 0:01:17 lr 0.000130 wd 0.0500 time 0.4356 (0.2638) data time 0.0007 (0.0023) model time 0.4350 (0.2609) loss 5.4708 (5.6491) grad_norm 2.2940 (2.7190) loss_scale 512.0000 (348.0363) mem 9655MB [2024-08-04 09:39:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][340/625] eta 0:01:15 lr 0.000130 wd 0.0500 time 0.2583 (0.2636) data time 0.0011 (0.0022) model time 0.2572 (0.2607) loss 4.9855 (5.6424) grad_norm 3.3081 (2.7198) loss_scale 512.0000 (352.8446) mem 9655MB [2024-08-04 09:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][350/625] eta 0:01:12 lr 0.000130 wd 0.0500 time 0.2518 (0.2634) data time 0.0009 (0.0022) model time 0.2508 (0.2605) loss 5.3449 (5.6415) grad_norm 2.7149 (2.7128) loss_scale 512.0000 (357.3789) mem 9655MB [2024-08-04 09:39:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][360/625] eta 0:01:09 lr 0.000130 wd 0.0500 time 0.2523 (0.2637) data time 0.0010 (0.0022) model time 0.2513 (0.2609) loss 5.2445 (5.6344) grad_norm 2.5163 (2.7049) loss_scale 512.0000 (361.6620) mem 9655MB [2024-08-04 09:39:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][370/625] eta 0:01:07 lr 0.000130 wd 0.0500 time 0.2534 (0.2641) data time 0.0010 (0.0021) model time 0.2524 (0.2614) loss 5.6644 (5.6247) grad_norm 4.3828 (2.7139) loss_scale 512.0000 (365.7143) mem 9655MB [2024-08-04 09:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][380/625] eta 0:01:04 lr 0.000130 wd 0.0500 time 0.2537 (0.2639) data time 0.0010 (0.0021) model time 0.2527 (0.2612) loss 6.4701 (5.6317) grad_norm 2.7379 (2.7169) loss_scale 512.0000 (369.5538) mem 9655MB [2024-08-04 09:39:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][390/625] eta 0:01:02 lr 0.000130 wd 0.0500 time 0.2516 (0.2641) data time 0.0006 (0.0021) model time 0.2509 (0.2616) loss 4.7772 (5.6350) grad_norm 2.1926 (2.7343) loss_scale 512.0000 (373.1969) mem 9655MB [2024-08-04 09:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][400/625] eta 0:00:59 lr 0.000130 wd 0.0500 time 0.2510 (0.2639) data time 0.0010 (0.0020) model time 0.2500 (0.2613) loss 5.4426 (5.6322) grad_norm 1.8076 (2.7282) loss_scale 512.0000 (376.6584) mem 9655MB [2024-08-04 09:39:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][410/625] eta 0:00:56 lr 0.000130 wd 0.0500 time 0.2604 (0.2637) data time 0.0008 (0.0020) model time 0.2597 (0.2612) loss 5.2033 (5.6234) grad_norm 2.6540 (2.7259) loss_scale 512.0000 (379.9513) mem 9655MB [2024-08-04 09:39:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][420/625] eta 0:00:54 lr 0.000130 wd 0.0500 time 0.2559 (0.2636) data time 0.0007 (0.0020) model time 0.2551 (0.2611) loss 4.9327 (5.6250) grad_norm 2.3193 (2.7138) loss_scale 512.0000 (383.0879) mem 9655MB [2024-08-04 09:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][430/625] eta 0:00:51 lr 0.000129 wd 0.0500 time 0.2529 (0.2634) data time 0.0008 (0.0019) model time 0.2522 (0.2609) loss 5.5855 (5.6206) grad_norm 2.4727 (2.7018) loss_scale 512.0000 (386.0789) mem 9655MB [2024-08-04 09:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][440/625] eta 0:00:48 lr 0.000129 wd 0.0500 time 0.2551 (0.2633) data time 0.0005 (0.0019) model time 0.2546 (0.2608) loss 5.5652 (5.6120) grad_norm 2.9266 (2.6982) loss_scale 512.0000 (388.9342) mem 9655MB [2024-08-04 09:40:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][450/625] eta 0:00:46 lr 0.000129 wd 0.0500 time 0.2643 (0.2631) data time 0.0005 (0.0019) model time 0.2638 (0.2607) loss 5.1313 (5.6118) grad_norm 2.5510 (2.6927) loss_scale 512.0000 (391.6630) mem 9655MB [2024-08-04 09:40:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][460/625] eta 0:00:43 lr 0.000129 wd 0.0500 time 0.2539 (0.2630) data time 0.0012 (0.0019) model time 0.2527 (0.2606) loss 6.7028 (5.6195) grad_norm 1.8156 (2.6830) loss_scale 512.0000 (394.2733) mem 9655MB [2024-08-04 09:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][470/625] eta 0:00:40 lr 0.000129 wd 0.0500 time 0.2559 (0.2629) data time 0.0010 (0.0019) model time 0.2549 (0.2604) loss 5.4140 (5.6184) grad_norm 2.2023 (2.6749) loss_scale 512.0000 (396.7728) mem 9655MB [2024-08-04 09:40:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][480/625] eta 0:00:38 lr 0.000129 wd 0.0500 time 0.2576 (0.2632) data time 0.0006 (0.0018) model time 0.2570 (0.2608) loss 5.0423 (5.6184) grad_norm 2.2906 (2.6621) loss_scale 512.0000 (399.1684) mem 9655MB [2024-08-04 09:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][490/625] eta 0:00:35 lr 0.000129 wd 0.0500 time 0.2602 (0.2630) data time 0.0008 (0.0018) model time 0.2594 (0.2607) loss 5.6870 (5.6197) grad_norm 1.8990 (2.6601) loss_scale 512.0000 (401.4664) mem 9655MB [2024-08-04 09:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][500/625] eta 0:00:32 lr 0.000129 wd 0.0500 time 0.2538 (0.2633) data time 0.0011 (0.0018) model time 0.2527 (0.2610) loss 5.5457 (5.6144) grad_norm 2.9736 (2.7289) loss_scale 512.0000 (403.6727) mem 9655MB [2024-08-04 09:40:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][510/625] eta 0:00:30 lr 0.000129 wd 0.0500 time 0.2542 (0.2639) data time 0.0011 (0.0018) model time 0.2532 (0.2617) loss 5.8608 (5.6131) grad_norm 1.9856 (2.7163) loss_scale 512.0000 (405.7926) mem 9655MB [2024-08-04 09:40:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][520/625] eta 0:00:27 lr 0.000129 wd 0.0500 time 0.2622 (0.2637) data time 0.0005 (0.0018) model time 0.2616 (0.2616) loss 6.2329 (5.6189) grad_norm 2.8292 (2.8471) loss_scale 512.0000 (407.8311) mem 9655MB [2024-08-04 09:40:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][530/625] eta 0:00:25 lr 0.000129 wd 0.0500 time 0.2597 (0.2643) data time 0.0007 (0.0018) model time 0.2590 (0.2623) loss 5.1923 (5.6237) grad_norm 2.1845 (2.8373) loss_scale 512.0000 (409.7928) mem 9655MB [2024-08-04 09:40:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][540/625] eta 0:00:22 lr 0.000129 wd 0.0500 time 0.2550 (0.2645) data time 0.0009 (0.0017) model time 0.2541 (0.2625) loss 5.8139 (5.6244) grad_norm 2.1078 (2.8249) loss_scale 512.0000 (411.6821) mem 9655MB [2024-08-04 09:40:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][550/625] eta 0:00:19 lr 0.000129 wd 0.0500 time 0.2535 (0.2651) data time 0.0012 (0.0017) model time 0.2523 (0.2631) loss 6.0484 (5.6200) grad_norm 4.0166 (2.8199) loss_scale 512.0000 (413.5027) mem 9655MB [2024-08-04 09:40:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][560/625] eta 0:00:17 lr 0.000128 wd 0.0500 time 0.2581 (0.2653) data time 0.0008 (0.0017) model time 0.2574 (0.2633) loss 5.6984 (5.6138) grad_norm 3.8895 (2.8287) loss_scale 512.0000 (415.2585) mem 9655MB [2024-08-04 09:40:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][570/625] eta 0:00:14 lr 0.000128 wd 0.0500 time 0.2599 (0.2654) data time 0.0005 (0.0017) model time 0.2594 (0.2636) loss 5.7368 (5.6166) grad_norm 1.8787 (2.8232) loss_scale 512.0000 (416.9527) mem 9655MB [2024-08-04 09:40:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][580/625] eta 0:00:11 lr 0.000128 wd 0.0500 time 0.2568 (0.2653) data time 0.0010 (0.0017) model time 0.2558 (0.2634) loss 6.1579 (5.6147) grad_norm 2.7263 (2.8293) loss_scale 512.0000 (418.5886) mem 9655MB [2024-08-04 09:40:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][590/625] eta 0:00:09 lr 0.000128 wd 0.0500 time 0.2554 (0.2651) data time 0.0008 (0.0017) model time 0.2545 (0.2632) loss 4.5720 (5.6085) grad_norm 1.8238 (2.8282) loss_scale 512.0000 (420.1692) mem 9655MB [2024-08-04 09:40:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][600/625] eta 0:00:06 lr 0.000128 wd 0.0500 time 0.2592 (0.2649) data time 0.0006 (0.0017) model time 0.2586 (0.2631) loss 5.5948 (5.6021) grad_norm 2.4955 (2.8195) loss_scale 512.0000 (421.6972) mem 9655MB [2024-08-04 09:40:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][610/625] eta 0:00:03 lr 0.000128 wd 0.0500 time 0.2514 (0.2648) data time 0.0004 (0.0017) model time 0.2511 (0.2629) loss 4.8884 (5.6033) grad_norm 2.4385 (2.8173) loss_scale 512.0000 (423.1751) mem 9655MB [2024-08-04 09:40:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][620/625] eta 0:00:01 lr 0.000128 wd 0.0500 time 0.2540 (0.2646) data time 0.0003 (0.0016) model time 0.2537 (0.2627) loss 6.0662 (5.6016) grad_norm 2.4835 (2.8129) loss_scale 512.0000 (424.6055) mem 9655MB [2024-08-04 09:40:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 257 training takes 0:02:45 [2024-08-04 09:40:47 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:40:48 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.587 (0.587) Loss 0.6089 (0.6089) Acc@1 90.039 (90.039) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 09:40:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.105) Loss 0.9087 (0.7291) Acc@1 81.201 (86.847) Acc@5 96.533 (97.781) Mem 9655MB [2024-08-04 09:40:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0137 (0.8419) Acc@1 78.955 (83.889) Acc@5 95.410 (96.659) Mem 9655MB [2024-08-04 09:40:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.557 Acc@5 96.679 [2024-08-04 09:40:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 09:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.717 (0.717) Loss 0.5835 (0.5835) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:40:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.9023 (0.7085) Acc@1 81.396 (86.901) Acc@5 96.289 (97.772) Mem 9655MB [2024-08-04 09:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0146 (0.8287) Acc@1 78.271 (83.698) Acc@5 95.410 (96.582) Mem 9655MB [2024-08-04 09:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.395 Acc@5 96.593 [2024-08-04 09:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:40:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.40% [2024-08-04 09:40:52 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:40:53 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:40:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][0/625] eta 0:07:42 lr 0.000128 wd 0.0500 time 0.7392 (0.7392) data time 0.4660 (0.4660) model time 0.0000 (0.0000) loss 5.9266 (5.9266) grad_norm 1.7677 (1.7677) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:40:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][10/625] eta 0:03:04 lr 0.000128 wd 0.0500 time 0.2529 (0.2994) data time 0.0007 (0.0432) model time 0.0000 (0.0000) loss 5.1437 (5.4414) grad_norm 3.4753 (2.4884) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:40:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][20/625] eta 0:02:48 lr 0.000128 wd 0.0500 time 0.2584 (0.2785) data time 0.0009 (0.0231) model time 0.0000 (0.0000) loss 4.5804 (5.5374) grad_norm 1.6689 (2.3700) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][30/625] eta 0:02:45 lr 0.000128 wd 0.0500 time 0.2568 (0.2776) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 4.7682 (5.5217) grad_norm 2.2988 (2.4759) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][40/625] eta 0:02:39 lr 0.000128 wd 0.0500 time 0.2589 (0.2721) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 5.3555 (5.5652) grad_norm 1.6995 (2.5985) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][50/625] eta 0:02:34 lr 0.000128 wd 0.0500 time 0.2562 (0.2688) data time 0.0011 (0.0100) model time 0.0000 (0.0000) loss 5.6724 (5.5840) grad_norm 2.0023 (2.6131) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][60/625] eta 0:02:31 lr 0.000127 wd 0.0500 time 0.3813 (0.2687) data time 0.0010 (0.0085) model time 0.3803 (0.2673) loss 5.6339 (5.5551) grad_norm 2.1927 (2.6225) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][70/625] eta 0:02:28 lr 0.000127 wd 0.0500 time 0.2550 (0.2669) data time 0.0006 (0.0075) model time 0.2544 (0.2611) loss 6.1362 (5.6042) grad_norm 2.1484 (2.6148) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][80/625] eta 0:02:24 lr 0.000127 wd 0.0500 time 0.2558 (0.2657) data time 0.0007 (0.0067) model time 0.2551 (0.2596) loss 5.6087 (5.6343) grad_norm 2.8183 (2.5488) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][90/625] eta 0:02:21 lr 0.000127 wd 0.0500 time 0.2583 (0.2648) data time 0.0007 (0.0060) model time 0.2576 (0.2588) loss 5.5174 (5.6320) grad_norm 2.2611 (2.5312) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][100/625] eta 0:02:20 lr 0.000127 wd 0.0500 time 0.2532 (0.2675) data time 0.0011 (0.0055) model time 0.2521 (0.2653) loss 6.1612 (5.6196) grad_norm 4.2917 (2.5512) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][110/625] eta 0:02:18 lr 0.000127 wd 0.0500 time 0.2535 (0.2681) data time 0.0008 (0.0051) model time 0.2527 (0.2665) loss 5.3951 (5.6163) grad_norm 1.9072 (2.5403) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][120/625] eta 0:02:14 lr 0.000127 wd 0.0500 time 0.2553 (0.2670) data time 0.0009 (0.0048) model time 0.2544 (0.2649) loss 6.4297 (5.6420) grad_norm 4.5550 (2.6721) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][130/625] eta 0:02:11 lr 0.000127 wd 0.0500 time 0.2555 (0.2662) data time 0.0008 (0.0045) model time 0.2547 (0.2637) loss 5.7320 (5.6519) grad_norm 2.5645 (2.7654) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][140/625] eta 0:02:09 lr 0.000127 wd 0.0500 time 0.2578 (0.2669) data time 0.0009 (0.0042) model time 0.2568 (0.2648) loss 5.5192 (5.6268) grad_norm 3.4837 (2.7529) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][150/625] eta 0:02:06 lr 0.000127 wd 0.0500 time 0.2560 (0.2662) data time 0.0008 (0.0040) model time 0.2553 (0.2640) loss 5.7411 (5.6104) grad_norm 2.1188 (2.7279) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][160/625] eta 0:02:03 lr 0.000127 wd 0.0500 time 0.2545 (0.2656) data time 0.0007 (0.0038) model time 0.2538 (0.2632) loss 6.0464 (5.6021) grad_norm 3.3631 (2.7047) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][170/625] eta 0:02:00 lr 0.000127 wd 0.0500 time 0.2562 (0.2650) data time 0.0009 (0.0036) model time 0.2553 (0.2625) loss 5.5240 (5.6048) grad_norm 3.2699 (2.6789) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][180/625] eta 0:01:57 lr 0.000126 wd 0.0500 time 0.2570 (0.2646) data time 0.0008 (0.0035) model time 0.2563 (0.2621) loss 4.9304 (5.5945) grad_norm 5.3429 (2.6901) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][190/625] eta 0:01:54 lr 0.000126 wd 0.0500 time 0.2564 (0.2642) data time 0.0007 (0.0033) model time 0.2557 (0.2616) loss 5.2102 (5.6094) grad_norm 1.6728 (2.6671) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][200/625] eta 0:01:52 lr 0.000126 wd 0.0500 time 0.2564 (0.2638) data time 0.0007 (0.0033) model time 0.2557 (0.2611) loss 7.0950 (5.6303) grad_norm 2.2374 (2.6648) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][210/625] eta 0:01:50 lr 0.000126 wd 0.0500 time 0.2553 (0.2653) data time 0.0008 (0.0031) model time 0.2545 (0.2632) loss 6.2558 (5.6195) grad_norm 2.7088 (2.6600) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][220/625] eta 0:01:47 lr 0.000126 wd 0.0500 time 0.2577 (0.2649) data time 0.0007 (0.0031) model time 0.2570 (0.2628) loss 5.8176 (5.6236) grad_norm 2.5460 (2.7710) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][230/625] eta 0:01:44 lr 0.000126 wd 0.0500 time 0.2565 (0.2646) data time 0.0007 (0.0030) model time 0.2559 (0.2624) loss 6.0190 (5.6151) grad_norm 1.9914 (2.7679) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][240/625] eta 0:01:42 lr 0.000126 wd 0.0500 time 0.2590 (0.2650) data time 0.0007 (0.0029) model time 0.2583 (0.2630) loss 5.2976 (5.6193) grad_norm 2.6831 (2.7875) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:41:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][250/625] eta 0:01:39 lr 0.000126 wd 0.0500 time 0.2575 (0.2647) data time 0.0006 (0.0028) model time 0.2569 (0.2627) loss 6.1107 (5.6147) grad_norm 2.2496 (2.7623) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][260/625] eta 0:01:36 lr 0.000126 wd 0.0500 time 0.2510 (0.2644) data time 0.0009 (0.0027) model time 0.2501 (0.2624) loss 6.3912 (5.6173) grad_norm 2.6059 (2.7562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][270/625] eta 0:01:33 lr 0.000126 wd 0.0500 time 0.2504 (0.2641) data time 0.0008 (0.0027) model time 0.2496 (0.2621) loss 6.6450 (5.6226) grad_norm 1.7343 (2.7886) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][280/625] eta 0:01:31 lr 0.000126 wd 0.0500 time 0.2573 (0.2638) data time 0.0009 (0.0026) model time 0.2564 (0.2618) loss 5.6871 (5.6194) grad_norm 3.2913 (2.7723) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][290/625] eta 0:01:28 lr 0.000126 wd 0.0500 time 0.2532 (0.2635) data time 0.0010 (0.0025) model time 0.2522 (0.2615) loss 5.8229 (5.6220) grad_norm 2.0560 (2.7642) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][300/625] eta 0:01:25 lr 0.000125 wd 0.0500 time 0.2552 (0.2638) data time 0.0007 (0.0025) model time 0.2545 (0.2618) loss 5.1646 (5.6225) grad_norm 4.1593 (2.7697) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][310/625] eta 0:01:23 lr 0.000125 wd 0.0500 time 0.2600 (0.2635) data time 0.0007 (0.0024) model time 0.2593 (0.2615) loss 5.3137 (5.6235) grad_norm 3.1829 (2.7707) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][320/625] eta 0:01:20 lr 0.000125 wd 0.0500 time 0.2523 (0.2632) data time 0.0008 (0.0024) model time 0.2515 (0.2613) loss 5.0377 (5.6197) grad_norm 2.1128 (2.7511) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][330/625] eta 0:01:17 lr 0.000125 wd 0.0500 time 0.2538 (0.2633) data time 0.0009 (0.0023) model time 0.2529 (0.2614) loss 4.8808 (5.6141) grad_norm 1.8116 (2.7318) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][340/625] eta 0:01:14 lr 0.000125 wd 0.0500 time 0.2536 (0.2631) data time 0.0009 (0.0023) model time 0.2527 (0.2612) loss 5.9186 (5.6072) grad_norm 2.7087 (2.7308) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][350/625] eta 0:01:12 lr 0.000125 wd 0.0500 time 0.2555 (0.2634) data time 0.0008 (0.0023) model time 0.2547 (0.2615) loss 5.8527 (5.6136) grad_norm 3.4570 (2.7384) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][360/625] eta 0:01:09 lr 0.000125 wd 0.0500 time 0.2557 (0.2632) data time 0.0008 (0.0022) model time 0.2550 (0.2613) loss 5.3711 (5.6083) grad_norm 2.1506 (2.7347) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][370/625] eta 0:01:07 lr 0.000125 wd 0.0500 time 0.2504 (0.2630) data time 0.0010 (0.0022) model time 0.2494 (0.2611) loss 5.5369 (5.6022) grad_norm 2.4696 (2.7585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][380/625] eta 0:01:04 lr 0.000125 wd 0.0500 time 0.2502 (0.2631) data time 0.0010 (0.0022) model time 0.2491 (0.2612) loss 5.3505 (5.6133) grad_norm 1.5260 (2.7560) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][390/625] eta 0:01:01 lr 0.000125 wd 0.0500 time 0.2587 (0.2629) data time 0.0006 (0.0021) model time 0.2581 (0.2611) loss 5.3505 (5.6094) grad_norm 2.4728 (2.7647) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][400/625] eta 0:00:59 lr 0.000125 wd 0.0500 time 0.2539 (0.2627) data time 0.0009 (0.0021) model time 0.2530 (0.2609) loss 5.5904 (5.6122) grad_norm 3.6016 (2.7552) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][410/625] eta 0:00:56 lr 0.000125 wd 0.0500 time 0.2595 (0.2629) data time 0.0009 (0.0021) model time 0.2586 (0.2611) loss 5.5662 (5.6103) grad_norm 1.6366 (2.7423) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][420/625] eta 0:00:53 lr 0.000125 wd 0.0500 time 0.2565 (0.2627) data time 0.0007 (0.0020) model time 0.2558 (0.2609) loss 4.3194 (5.6031) grad_norm 1.7182 (2.7287) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][430/625] eta 0:00:51 lr 0.000124 wd 0.0500 time 0.2592 (0.2625) data time 0.0007 (0.0020) model time 0.2585 (0.2608) loss 6.0913 (5.5996) grad_norm 3.6377 (2.7216) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][440/625] eta 0:00:48 lr 0.000124 wd 0.0500 time 0.2553 (0.2628) data time 0.0007 (0.0020) model time 0.2546 (0.2611) loss 5.4039 (5.5981) grad_norm 2.1964 (2.7317) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][450/625] eta 0:00:46 lr 0.000124 wd 0.0500 time 0.2548 (0.2630) data time 0.0006 (0.0020) model time 0.2542 (0.2613) loss 6.0022 (5.5983) grad_norm 1.7773 (2.7251) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][460/625] eta 0:00:43 lr 0.000124 wd 0.0500 time 0.2583 (0.2628) data time 0.0008 (0.0019) model time 0.2575 (0.2612) loss 5.5967 (5.6038) grad_norm 3.4068 (2.7280) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][470/625] eta 0:00:40 lr 0.000124 wd 0.0500 time 0.2509 (0.2627) data time 0.0008 (0.0019) model time 0.2501 (0.2610) loss 5.3482 (5.5991) grad_norm 4.7447 (2.7314) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:42:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][480/625] eta 0:00:38 lr 0.000124 wd 0.0500 time 0.2587 (0.2626) data time 0.0006 (0.0019) model time 0.2581 (0.2609) loss 4.8182 (5.5933) grad_norm 2.4097 (2.7395) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][490/625] eta 0:00:35 lr 0.000124 wd 0.0500 time 0.2656 (0.2625) data time 0.0009 (0.0019) model time 0.2647 (0.2608) loss 4.8877 (5.5940) grad_norm 1.5658 (2.7374) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][500/625] eta 0:00:32 lr 0.000124 wd 0.0500 time 0.2556 (0.2623) data time 0.0010 (0.0019) model time 0.2546 (0.2606) loss 6.0772 (5.5888) grad_norm 2.9167 (2.7339) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][510/625] eta 0:00:30 lr 0.000124 wd 0.0500 time 0.2534 (0.2622) data time 0.0007 (0.0018) model time 0.2527 (0.2605) loss 4.6389 (5.5810) grad_norm 2.5119 (2.7392) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][520/625] eta 0:00:27 lr 0.000124 wd 0.0500 time 0.2531 (0.2623) data time 0.0006 (0.0018) model time 0.2525 (0.2607) loss 4.8424 (5.5800) grad_norm 1.6159 (2.7254) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][530/625] eta 0:00:24 lr 0.000124 wd 0.0500 time 0.2573 (0.2622) data time 0.0008 (0.0018) model time 0.2565 (0.2606) loss 4.6207 (5.5810) grad_norm 2.7959 (2.7360) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][540/625] eta 0:00:22 lr 0.000124 wd 0.0500 time 0.2582 (0.2621) data time 0.0008 (0.0018) model time 0.2574 (0.2605) loss 5.4169 (5.5811) grad_norm 3.8333 (2.7253) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][550/625] eta 0:00:19 lr 0.000124 wd 0.0500 time 0.2544 (0.2622) data time 0.0006 (0.0018) model time 0.2538 (0.2606) loss 4.6572 (5.5742) grad_norm 1.6565 (2.7205) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][560/625] eta 0:00:17 lr 0.000123 wd 0.0500 time 0.2565 (0.2621) data time 0.0017 (0.0018) model time 0.2548 (0.2605) loss 6.0660 (5.5751) grad_norm 2.5678 (2.7213) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][570/625] eta 0:00:14 lr 0.000123 wd 0.0500 time 0.2562 (0.2622) data time 0.0009 (0.0017) model time 0.2552 (0.2606) loss 5.6279 (5.5741) grad_norm 1.9798 (2.7178) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][580/625] eta 0:00:11 lr 0.000123 wd 0.0500 time 0.2547 (0.2621) data time 0.0009 (0.0017) model time 0.2538 (0.2605) loss 6.5316 (5.5777) grad_norm 3.8774 (2.7306) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][590/625] eta 0:00:09 lr 0.000123 wd 0.0500 time 0.2543 (0.2623) data time 0.0009 (0.0017) model time 0.2535 (0.2608) loss 4.8269 (5.5736) grad_norm 2.1021 (2.7308) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][600/625] eta 0:00:06 lr 0.000123 wd 0.0500 time 0.2533 (0.2622) data time 0.0010 (0.0017) model time 0.2523 (0.2607) loss 6.7449 (5.5749) grad_norm 2.5299 (2.7303) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][610/625] eta 0:00:03 lr 0.000123 wd 0.0500 time 0.2534 (0.2621) data time 0.0006 (0.0017) model time 0.2528 (0.2606) loss 5.0839 (5.5719) grad_norm 5.8232 (2.7397) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][620/625] eta 0:00:01 lr 0.000123 wd 0.0500 time 0.2547 (0.2620) data time 0.0005 (0.0017) model time 0.2542 (0.2604) loss 6.1275 (5.5718) grad_norm 2.4843 (2.7640) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 258 training takes 0:02:43 [2024-08-04 09:43:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:43:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:43:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.6055 (0.6055) Acc@1 89.844 (89.844) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9092 (0.7172) Acc@1 81.396 (86.958) Acc@5 96.582 (97.829) Mem 9655MB [2024-08-04 09:43:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0264 (0.8352) Acc@1 78.857 (83.877) Acc@5 95.459 (96.654) Mem 9655MB [2024-08-04 09:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.575 Acc@5 96.669 [2024-08-04 09:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 09:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.744 (0.744) Loss 0.5845 (0.5845) Acc@1 90.137 (90.137) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:43:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.9023 (0.7087) Acc@1 81.494 (86.941) Acc@5 96.289 (97.772) Mem 9655MB [2024-08-04 09:43:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0146 (0.8286) Acc@1 78.223 (83.731) Acc@5 95.459 (96.582) Mem 9655MB [2024-08-04 09:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.421 Acc@5 96.595 [2024-08-04 09:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:43:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.42% [2024-08-04 09:43:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:43:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:43:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][0/625] eta 0:08:02 lr 0.000123 wd 0.0500 time 0.7715 (0.7715) data time 0.5322 (0.5322) model time 0.0000 (0.0000) loss 5.0330 (5.0330) grad_norm 3.1335 (3.1335) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][10/625] eta 0:03:15 lr 0.000123 wd 0.0500 time 0.2569 (0.3185) data time 0.0006 (0.0491) model time 0.0000 (0.0000) loss 5.7252 (5.3960) grad_norm 2.4400 (2.7579) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][20/625] eta 0:02:55 lr 0.000123 wd 0.0500 time 0.2615 (0.2898) data time 0.0008 (0.0262) model time 0.0000 (0.0000) loss 5.4549 (5.4400) grad_norm 3.4586 (2.7673) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][30/625] eta 0:02:45 lr 0.000123 wd 0.0500 time 0.2571 (0.2786) data time 0.0008 (0.0180) model time 0.0000 (0.0000) loss 5.9781 (5.4487) grad_norm 2.0713 (2.7302) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][40/625] eta 0:02:39 lr 0.000123 wd 0.0500 time 0.2534 (0.2728) data time 0.0011 (0.0139) model time 0.0000 (0.0000) loss 5.5288 (5.4775) grad_norm 3.0808 (2.9300) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][50/625] eta 0:02:36 lr 0.000123 wd 0.0500 time 0.2551 (0.2730) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 4.3770 (5.4276) grad_norm 2.6995 (2.8554) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][60/625] eta 0:02:32 lr 0.000122 wd 0.0500 time 0.2564 (0.2702) data time 0.0009 (0.0096) model time 0.2555 (0.2553) loss 6.2173 (5.4812) grad_norm 3.0246 (2.8168) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][70/625] eta 0:02:30 lr 0.000122 wd 0.0500 time 0.2525 (0.2706) data time 0.0008 (0.0084) model time 0.2517 (0.2636) loss 6.1682 (5.4906) grad_norm 2.7447 (2.8040) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][80/625] eta 0:02:26 lr 0.000122 wd 0.0500 time 0.2543 (0.2689) data time 0.0010 (0.0075) model time 0.2533 (0.2611) loss 5.4771 (5.5365) grad_norm 2.1060 (2.7494) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][90/625] eta 0:02:23 lr 0.000122 wd 0.0500 time 0.2537 (0.2675) data time 0.0007 (0.0068) model time 0.2530 (0.2596) loss 5.1181 (5.5216) grad_norm 2.5782 (2.6936) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][100/625] eta 0:02:20 lr 0.000122 wd 0.0500 time 0.2530 (0.2682) data time 0.0008 (0.0062) model time 0.2521 (0.2623) loss 6.2346 (5.5271) grad_norm 2.2695 (2.6567) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][110/625] eta 0:02:17 lr 0.000122 wd 0.0500 time 0.2589 (0.2671) data time 0.0008 (0.0057) model time 0.2581 (0.2611) loss 6.1773 (5.5398) grad_norm 2.4580 (2.6448) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][120/625] eta 0:02:14 lr 0.000122 wd 0.0500 time 0.2636 (0.2662) data time 0.0009 (0.0053) model time 0.2627 (0.2604) loss 6.0096 (5.5340) grad_norm 4.2524 (2.6683) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][130/625] eta 0:02:11 lr 0.000122 wd 0.0500 time 0.2565 (0.2655) data time 0.0007 (0.0050) model time 0.2558 (0.2597) loss 6.8499 (5.5602) grad_norm 1.9124 (2.6904) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][140/625] eta 0:02:08 lr 0.000122 wd 0.0500 time 0.2553 (0.2648) data time 0.0007 (0.0047) model time 0.2545 (0.2591) loss 6.1647 (5.5794) grad_norm 6.4644 (2.7187) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][150/625] eta 0:02:05 lr 0.000122 wd 0.0500 time 0.2577 (0.2642) data time 0.0009 (0.0044) model time 0.2569 (0.2587) loss 5.8231 (5.6061) grad_norm 3.4562 (2.7373) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][160/625] eta 0:02:03 lr 0.000122 wd 0.0500 time 0.2522 (0.2649) data time 0.0007 (0.0042) model time 0.2514 (0.2602) loss 5.6930 (5.6258) grad_norm 1.9012 (2.7845) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][170/625] eta 0:02:00 lr 0.000122 wd 0.0500 time 0.2566 (0.2644) data time 0.0006 (0.0040) model time 0.2560 (0.2598) loss 4.7642 (5.6177) grad_norm 4.6718 (2.8004) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][180/625] eta 0:01:57 lr 0.000122 wd 0.0500 time 0.2579 (0.2646) data time 0.0007 (0.0039) model time 0.2572 (0.2604) loss 6.4824 (5.6292) grad_norm 2.4131 (2.7888) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][190/625] eta 0:01:54 lr 0.000121 wd 0.0500 time 0.2592 (0.2641) data time 0.0008 (0.0037) model time 0.2584 (0.2600) loss 4.7687 (5.6116) grad_norm 2.3029 (2.7672) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][200/625] eta 0:01:52 lr 0.000121 wd 0.0500 time 0.2591 (0.2647) data time 0.0007 (0.0036) model time 0.2584 (0.2610) loss 5.2123 (5.6087) grad_norm 3.1961 (2.7812) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][210/625] eta 0:01:49 lr 0.000121 wd 0.0500 time 0.2537 (0.2644) data time 0.0008 (0.0035) model time 0.2529 (0.2607) loss 5.6307 (5.5937) grad_norm 2.0905 (2.7710) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][220/625] eta 0:01:46 lr 0.000121 wd 0.0500 time 0.2568 (0.2640) data time 0.0008 (0.0034) model time 0.2560 (0.2604) loss 5.2594 (5.5966) grad_norm 21.0830 (2.8272) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][230/625] eta 0:01:44 lr 0.000121 wd 0.0500 time 0.2595 (0.2655) data time 0.0006 (0.0033) model time 0.2589 (0.2624) loss 5.3019 (5.6059) grad_norm 2.1577 (2.8059) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][240/625] eta 0:01:42 lr 0.000121 wd 0.0500 time 0.4369 (0.2666) data time 0.0006 (0.0032) model time 0.4363 (0.2639) loss 5.5829 (5.6124) grad_norm 2.2729 (2.8038) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][250/625] eta 0:01:40 lr 0.000121 wd 0.0500 time 0.2591 (0.2670) data time 0.0011 (0.0031) model time 0.2580 (0.2645) loss 5.8638 (5.6101) grad_norm 3.0040 (2.8002) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][260/625] eta 0:01:37 lr 0.000121 wd 0.0500 time 0.2564 (0.2673) data time 0.0008 (0.0030) model time 0.2556 (0.2649) loss 6.4094 (5.6149) grad_norm 3.5935 (2.8050) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][270/625] eta 0:01:35 lr 0.000121 wd 0.0500 time 0.2596 (0.2677) data time 0.0006 (0.0029) model time 0.2590 (0.2655) loss 5.2927 (5.6251) grad_norm 1.9629 (2.7990) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][280/625] eta 0:01:32 lr 0.000121 wd 0.0500 time 0.2549 (0.2672) data time 0.0009 (0.0028) model time 0.2540 (0.2650) loss 6.5186 (5.6400) grad_norm 3.3648 (2.7996) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:44:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][290/625] eta 0:01:29 lr 0.000121 wd 0.0500 time 0.2569 (0.2668) data time 0.0008 (0.0028) model time 0.2561 (0.2646) loss 4.5946 (5.6356) grad_norm 1.8536 (2.8320) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][300/625] eta 0:01:26 lr 0.000121 wd 0.0500 time 0.2568 (0.2676) data time 0.0008 (0.0027) model time 0.2560 (0.2655) loss 6.1284 (5.6297) grad_norm 2.3910 (2.8476) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][310/625] eta 0:01:24 lr 0.000120 wd 0.0500 time 0.2567 (0.2672) data time 0.0010 (0.0027) model time 0.2557 (0.2651) loss 6.2580 (5.6265) grad_norm 2.0558 (2.8385) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][320/625] eta 0:01:21 lr 0.000120 wd 0.0500 time 0.2524 (0.2668) data time 0.0007 (0.0026) model time 0.2517 (0.2647) loss 5.0388 (5.6158) grad_norm 1.8275 (2.8331) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][330/625] eta 0:01:18 lr 0.000120 wd 0.0500 time 0.2590 (0.2665) data time 0.0007 (0.0026) model time 0.2583 (0.2644) loss 5.9364 (5.6147) grad_norm 2.2332 (2.8301) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][340/625] eta 0:01:15 lr 0.000120 wd 0.0500 time 0.2603 (0.2662) data time 0.0007 (0.0025) model time 0.2596 (0.2641) loss 5.5533 (5.6217) grad_norm 2.5409 (2.8135) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][350/625] eta 0:01:13 lr 0.000120 wd 0.0500 time 0.2559 (0.2659) data time 0.0006 (0.0025) model time 0.2553 (0.2638) loss 5.9825 (5.6201) grad_norm 2.2161 (2.8346) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][360/625] eta 0:01:10 lr 0.000120 wd 0.0500 time 0.2593 (0.2657) data time 0.0009 (0.0024) model time 0.2584 (0.2635) loss 6.5633 (5.6215) grad_norm 1.7818 (2.8208) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][370/625] eta 0:01:07 lr 0.000120 wd 0.0500 time 0.2546 (0.2659) data time 0.0007 (0.0024) model time 0.2539 (0.2638) loss 5.9405 (5.6225) grad_norm 1.9757 (2.8043) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][380/625] eta 0:01:05 lr 0.000120 wd 0.0500 time 0.2556 (0.2660) data time 0.0008 (0.0023) model time 0.2548 (0.2640) loss 5.2724 (5.6191) grad_norm 3.1613 (2.7965) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][390/625] eta 0:01:02 lr 0.000120 wd 0.0500 time 0.2575 (0.2662) data time 0.0007 (0.0023) model time 0.2568 (0.2642) loss 5.4059 (5.6127) grad_norm 3.8182 (2.8426) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][400/625] eta 0:00:59 lr 0.000120 wd 0.0500 time 0.2579 (0.2663) data time 0.0008 (0.0023) model time 0.2571 (0.2643) loss 4.6206 (5.6121) grad_norm 4.3379 (2.8387) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][410/625] eta 0:00:57 lr 0.000120 wd 0.0500 time 0.4526 (0.2665) data time 0.0007 (0.0022) model time 0.4519 (0.2646) loss 6.1375 (5.6081) grad_norm 2.5137 (2.8237) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][420/625] eta 0:00:54 lr 0.000120 wd 0.0500 time 0.2569 (0.2662) data time 0.0007 (0.0022) model time 0.2562 (0.2644) loss 6.3187 (5.6049) grad_norm 1.6774 (2.8270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][430/625] eta 0:00:51 lr 0.000120 wd 0.0500 time 0.2535 (0.2660) data time 0.0008 (0.0022) model time 0.2527 (0.2641) loss 5.9923 (5.6122) grad_norm 2.2274 (2.8153) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][440/625] eta 0:00:49 lr 0.000119 wd 0.0500 time 0.2525 (0.2658) data time 0.0008 (0.0021) model time 0.2517 (0.2639) loss 4.7579 (5.6064) grad_norm 3.1948 (2.8102) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][450/625] eta 0:00:46 lr 0.000119 wd 0.0500 time 0.2523 (0.2655) data time 0.0009 (0.0021) model time 0.2513 (0.2636) loss 5.4325 (5.5983) grad_norm 2.3416 (2.7920) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][460/625] eta 0:00:43 lr 0.000119 wd 0.0500 time 0.2576 (0.2653) data time 0.0011 (0.0021) model time 0.2565 (0.2634) loss 5.8039 (5.5981) grad_norm 2.9691 (2.7850) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][470/625] eta 0:00:41 lr 0.000119 wd 0.0500 time 0.2513 (0.2652) data time 0.0008 (0.0021) model time 0.2505 (0.2633) loss 5.5878 (5.5892) grad_norm 1.8890 (2.7723) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][480/625] eta 0:00:38 lr 0.000119 wd 0.0500 time 0.2575 (0.2650) data time 0.0006 (0.0020) model time 0.2569 (0.2631) loss 5.8732 (5.5910) grad_norm 2.5808 (2.7577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][490/625] eta 0:00:35 lr 0.000119 wd 0.0500 time 0.2532 (0.2650) data time 0.0010 (0.0020) model time 0.2522 (0.2631) loss 6.1257 (5.5872) grad_norm 2.8062 (2.7514) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][500/625] eta 0:00:33 lr 0.000119 wd 0.0500 time 0.2544 (0.2648) data time 0.0008 (0.0020) model time 0.2536 (0.2630) loss 4.9843 (5.5887) grad_norm 1.9154 (2.7585) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][510/625] eta 0:00:30 lr 0.000119 wd 0.0500 time 0.2552 (0.2651) data time 0.0008 (0.0020) model time 0.2543 (0.2633) loss 5.8405 (5.5862) grad_norm 1.8434 (2.7433) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:45:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][520/625] eta 0:00:27 lr 0.000119 wd 0.0500 time 0.2537 (0.2649) data time 0.0008 (0.0020) model time 0.2530 (0.2631) loss 5.1767 (5.5821) grad_norm 3.2668 (2.7404) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][530/625] eta 0:00:25 lr 0.000119 wd 0.0500 time 0.2551 (0.2648) data time 0.0009 (0.0019) model time 0.2542 (0.2629) loss 5.3558 (5.5788) grad_norm 2.2995 (2.7365) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][540/625] eta 0:00:22 lr 0.000119 wd 0.0500 time 0.2603 (0.2646) data time 0.0008 (0.0019) model time 0.2595 (0.2628) loss 5.3972 (5.5784) grad_norm 2.4465 (2.7321) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][550/625] eta 0:00:19 lr 0.000119 wd 0.0500 time 0.2525 (0.2648) data time 0.0007 (0.0019) model time 0.2518 (0.2630) loss 4.7358 (5.5727) grad_norm 2.3471 (2.7286) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][560/625] eta 0:00:17 lr 0.000119 wd 0.0500 time 0.2518 (0.2650) data time 0.0009 (0.0019) model time 0.2509 (0.2633) loss 5.7702 (5.5731) grad_norm 4.1246 (2.7413) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][570/625] eta 0:00:14 lr 0.000118 wd 0.0500 time 0.2520 (0.2648) data time 0.0010 (0.0019) model time 0.2510 (0.2631) loss 6.1578 (5.5713) grad_norm 2.9918 (2.7464) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][580/625] eta 0:00:11 lr 0.000118 wd 0.0500 time 0.2575 (0.2650) data time 0.0008 (0.0019) model time 0.2567 (0.2633) loss 5.0313 (5.5684) grad_norm 2.2168 (2.7660) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][590/625] eta 0:00:09 lr 0.000118 wd 0.0500 time 0.2616 (0.2649) data time 0.0009 (0.0018) model time 0.2607 (0.2632) loss 5.3657 (5.5654) grad_norm 2.3190 (2.7614) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][600/625] eta 0:00:06 lr 0.000118 wd 0.0500 time 0.2563 (0.2649) data time 0.0007 (0.0018) model time 0.2556 (0.2632) loss 6.1498 (5.5664) grad_norm 1.6572 (2.7484) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][610/625] eta 0:00:03 lr 0.000118 wd 0.0500 time 0.2536 (0.2648) data time 0.0006 (0.0018) model time 0.2530 (0.2631) loss 5.6311 (5.5603) grad_norm 3.7105 (2.7546) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][620/625] eta 0:00:01 lr 0.000118 wd 0.0500 time 0.2539 (0.2648) data time 0.0005 (0.0018) model time 0.2534 (0.2631) loss 6.1277 (5.5600) grad_norm 2.2370 (2.7554) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 259 training takes 0:02:45 [2024-08-04 09:46:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:46:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:46:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.6035 (0.6035) Acc@1 90.088 (90.088) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:46:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.057 (0.100) Loss 0.9238 (0.7216) Acc@1 81.787 (87.021) Acc@5 96.436 (97.798) Mem 9655MB [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0215 (0.8393) Acc@1 78.174 (83.980) Acc@5 95.459 (96.673) Mem 9655MB [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.629 Acc@5 96.689 [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.63% [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:46:29 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.5845 (0.5845) Acc@1 90.088 (90.088) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9004 (0.7084) Acc@1 81.494 (86.936) Acc@5 96.338 (97.776) Mem 9655MB [2024-08-04 09:46:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0146 (0.8283) Acc@1 78.320 (83.750) Acc@5 95.459 (96.584) Mem 9655MB [2024-08-04 09:46:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.439 Acc@5 96.595 [2024-08-04 09:46:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-04 09:46:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.44% [2024-08-04 09:46:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:46:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][0/625] eta 0:07:18 lr 0.000118 wd 0.0500 time 0.7015 (0.7015) data time 0.4619 (0.4619) model time 0.0000 (0.0000) loss 5.0586 (5.0586) grad_norm 2.2808 (2.2808) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][10/625] eta 0:03:03 lr 0.000118 wd 0.0500 time 0.2558 (0.2980) data time 0.0009 (0.0427) model time 0.0000 (0.0000) loss 5.4076 (5.3179) grad_norm 2.2171 (2.1599) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][20/625] eta 0:02:48 lr 0.000118 wd 0.0500 time 0.2529 (0.2783) data time 0.0009 (0.0228) model time 0.0000 (0.0000) loss 5.1001 (5.3679) grad_norm 11.9329 (2.7704) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][30/625] eta 0:02:41 lr 0.000118 wd 0.0500 time 0.2582 (0.2711) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 5.8277 (5.4809) grad_norm 3.5360 (2.8461) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][40/625] eta 0:02:36 lr 0.000118 wd 0.0500 time 0.2570 (0.2675) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 5.5337 (5.4689) grad_norm 5.2859 (3.2405) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][50/625] eta 0:02:32 lr 0.000118 wd 0.0500 time 0.2562 (0.2653) data time 0.0010 (0.0099) model time 0.0000 (0.0000) loss 4.9675 (5.4360) grad_norm 4.2191 (3.2642) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][60/625] eta 0:02:29 lr 0.000118 wd 0.0500 time 0.2576 (0.2638) data time 0.0009 (0.0085) model time 0.2567 (0.2556) loss 5.0441 (5.4274) grad_norm 2.6837 (3.1563) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][70/625] eta 0:02:27 lr 0.000118 wd 0.0500 time 0.2529 (0.2653) data time 0.0009 (0.0074) model time 0.2520 (0.2645) loss 5.1730 (5.4181) grad_norm 2.4572 (3.0689) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][80/625] eta 0:02:25 lr 0.000117 wd 0.0500 time 0.2559 (0.2665) data time 0.0007 (0.0066) model time 0.2553 (0.2677) loss 4.5701 (5.4029) grad_norm 1.9765 (2.9718) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][90/625] eta 0:02:21 lr 0.000117 wd 0.0500 time 0.2552 (0.2654) data time 0.0009 (0.0060) model time 0.2543 (0.2646) loss 6.1681 (5.4623) grad_norm 3.0722 (3.0626) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:46:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][100/625] eta 0:02:18 lr 0.000117 wd 0.0500 time 0.2573 (0.2645) data time 0.0010 (0.0055) model time 0.2564 (0.2628) loss 6.4806 (5.5106) grad_norm 2.2430 (3.0815) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][110/625] eta 0:02:15 lr 0.000117 wd 0.0500 time 0.2561 (0.2637) data time 0.0009 (0.0051) model time 0.2552 (0.2615) loss 5.8482 (5.4801) grad_norm 1.9656 (3.0655) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][120/625] eta 0:02:12 lr 0.000117 wd 0.0500 time 0.2552 (0.2631) data time 0.0007 (0.0047) model time 0.2544 (0.2606) loss 4.7472 (5.4757) grad_norm 2.2113 (3.0451) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][130/625] eta 0:02:10 lr 0.000117 wd 0.0500 time 0.2529 (0.2637) data time 0.0007 (0.0044) model time 0.2522 (0.2618) loss 5.7077 (5.4845) grad_norm 2.4444 (3.0138) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][140/625] eta 0:02:08 lr 0.000117 wd 0.0500 time 0.2512 (0.2646) data time 0.0010 (0.0042) model time 0.2501 (0.2632) loss 6.6025 (5.4957) grad_norm 3.0707 (2.9623) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][150/625] eta 0:02:05 lr 0.000117 wd 0.0500 time 0.2503 (0.2639) data time 0.0009 (0.0040) model time 0.2494 (0.2623) loss 5.1769 (5.4828) grad_norm 2.4909 (2.9372) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][160/625] eta 0:02:02 lr 0.000117 wd 0.0500 time 0.2616 (0.2634) data time 0.0007 (0.0038) model time 0.2609 (0.2616) loss 5.5432 (5.4672) grad_norm 3.7828 (2.9452) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][170/625] eta 0:01:59 lr 0.000117 wd 0.0500 time 0.2563 (0.2630) data time 0.0008 (0.0036) model time 0.2554 (0.2611) loss 5.3298 (5.4677) grad_norm 2.5183 (2.9733) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][180/625] eta 0:01:57 lr 0.000117 wd 0.0500 time 0.2551 (0.2643) data time 0.0008 (0.0035) model time 0.2542 (0.2629) loss 5.6306 (5.4666) grad_norm 3.3426 (2.9558) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][190/625] eta 0:01:54 lr 0.000117 wd 0.0500 time 0.2577 (0.2638) data time 0.0009 (0.0033) model time 0.2568 (0.2624) loss 4.3579 (5.4686) grad_norm 1.7083 (2.9503) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][200/625] eta 0:01:51 lr 0.000117 wd 0.0500 time 0.2558 (0.2634) data time 0.0008 (0.0032) model time 0.2550 (0.2618) loss 5.3311 (5.4701) grad_norm 2.6208 (2.9635) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][210/625] eta 0:01:49 lr 0.000116 wd 0.0500 time 0.2573 (0.2631) data time 0.0009 (0.0031) model time 0.2564 (0.2614) loss 5.6820 (5.4711) grad_norm 1.6934 (2.9288) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][220/625] eta 0:01:46 lr 0.000116 wd 0.0500 time 0.2560 (0.2627) data time 0.0007 (0.0030) model time 0.2553 (0.2610) loss 6.2673 (5.4778) grad_norm 2.2788 (2.9059) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][230/625] eta 0:01:43 lr 0.000116 wd 0.0500 time 0.2561 (0.2625) data time 0.0009 (0.0029) model time 0.2552 (0.2607) loss 5.9057 (5.4875) grad_norm 2.2820 (2.8747) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][240/625] eta 0:01:40 lr 0.000116 wd 0.0500 time 0.2551 (0.2622) data time 0.0007 (0.0028) model time 0.2544 (0.2604) loss 5.2526 (5.4997) grad_norm 2.3910 (2.8508) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][250/625] eta 0:01:38 lr 0.000116 wd 0.0500 time 0.2574 (0.2619) data time 0.0009 (0.0028) model time 0.2566 (0.2601) loss 4.8846 (5.5109) grad_norm 1.9397 (2.8341) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][260/625] eta 0:01:35 lr 0.000116 wd 0.0500 time 0.2579 (0.2617) data time 0.0006 (0.0027) model time 0.2572 (0.2599) loss 5.1932 (5.5191) grad_norm 4.6683 (2.8320) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][270/625] eta 0:01:32 lr 0.000116 wd 0.0500 time 0.2539 (0.2615) data time 0.0007 (0.0026) model time 0.2532 (0.2596) loss 5.4772 (5.5256) grad_norm 1.9174 (2.8383) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][280/625] eta 0:01:30 lr 0.000116 wd 0.0500 time 0.2572 (0.2613) data time 0.0009 (0.0026) model time 0.2563 (0.2594) loss 5.3659 (5.5309) grad_norm 2.4560 (2.8299) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][290/625] eta 0:01:27 lr 0.000116 wd 0.0500 time 0.2610 (0.2617) data time 0.0008 (0.0025) model time 0.2602 (0.2600) loss 6.1427 (5.5404) grad_norm 5.3225 (2.8576) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][300/625] eta 0:01:25 lr 0.000116 wd 0.0500 time 0.2555 (0.2616) data time 0.0008 (0.0025) model time 0.2548 (0.2599) loss 5.2774 (5.5421) grad_norm 2.5003 (2.8926) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][310/625] eta 0:01:22 lr 0.000116 wd 0.0500 time 0.2560 (0.2614) data time 0.0006 (0.0024) model time 0.2554 (0.2597) loss 5.7652 (5.5462) grad_norm 1.7967 (2.8864) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][320/625] eta 0:01:19 lr 0.000116 wd 0.0500 time 0.2614 (0.2613) data time 0.0010 (0.0024) model time 0.2604 (0.2596) loss 5.2134 (5.5355) grad_norm 2.8110 (2.8861) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:47:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][330/625] eta 0:01:17 lr 0.000116 wd 0.0500 time 0.4358 (0.2617) data time 0.0012 (0.0023) model time 0.4346 (0.2601) loss 4.9107 (5.5341) grad_norm 1.4776 (2.8955) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][340/625] eta 0:01:14 lr 0.000115 wd 0.0500 time 0.2567 (0.2615) data time 0.0009 (0.0023) model time 0.2558 (0.2599) loss 6.2118 (5.5417) grad_norm 1.8527 (2.8951) loss_scale 1024.0000 (518.0059) mem 9655MB [2024-08-04 09:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][350/625] eta 0:01:11 lr 0.000115 wd 0.0500 time 0.2530 (0.2617) data time 0.0009 (0.0022) model time 0.2521 (0.2602) loss 5.5759 (5.5290) grad_norm 2.3959 (2.8835) loss_scale 1024.0000 (532.4217) mem 9655MB [2024-08-04 09:48:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][360/625] eta 0:01:09 lr 0.000115 wd 0.0500 time 0.2564 (0.2616) data time 0.0012 (0.0022) model time 0.2552 (0.2600) loss 6.0698 (5.5377) grad_norm 1.8724 (2.8738) loss_scale 1024.0000 (546.0388) mem 9655MB [2024-08-04 09:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][370/625] eta 0:01:06 lr 0.000115 wd 0.0500 time 0.2729 (0.2615) data time 0.0009 (0.0022) model time 0.2720 (0.2599) loss 5.6937 (5.5329) grad_norm 2.0093 (2.8652) loss_scale 1024.0000 (558.9218) mem 9655MB [2024-08-04 09:48:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][380/625] eta 0:01:04 lr 0.000115 wd 0.0500 time 0.2549 (0.2613) data time 0.0008 (0.0021) model time 0.2541 (0.2598) loss 5.7773 (5.5324) grad_norm 2.1146 (2.8559) loss_scale 1024.0000 (571.1286) mem 9655MB [2024-08-04 09:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][390/625] eta 0:01:01 lr 0.000115 wd 0.0500 time 0.2527 (0.2612) data time 0.0007 (0.0021) model time 0.2520 (0.2597) loss 6.2160 (5.5264) grad_norm 3.5645 (2.8396) loss_scale 1024.0000 (582.7110) mem 9655MB [2024-08-04 09:48:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][400/625] eta 0:00:58 lr 0.000115 wd 0.0500 time 0.2603 (0.2611) data time 0.0008 (0.0021) model time 0.2595 (0.2595) loss 5.4033 (5.5345) grad_norm 3.1479 (inf) loss_scale 512.0000 (589.8853) mem 9655MB [2024-08-04 09:48:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][410/625] eta 0:00:56 lr 0.000115 wd 0.0500 time 0.2525 (0.2609) data time 0.0009 (0.0020) model time 0.2515 (0.2594) loss 4.8448 (5.5320) grad_norm 2.2141 (inf) loss_scale 512.0000 (587.9903) mem 9655MB [2024-08-04 09:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][420/625] eta 0:00:53 lr 0.000115 wd 0.0500 time 0.2570 (0.2608) data time 0.0010 (0.0020) model time 0.2561 (0.2593) loss 5.2644 (5.5329) grad_norm 4.6895 (inf) loss_scale 512.0000 (586.1853) mem 9655MB [2024-08-04 09:48:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][430/625] eta 0:00:50 lr 0.000115 wd 0.0500 time 0.2519 (0.2607) data time 0.0009 (0.0020) model time 0.2510 (0.2592) loss 5.2200 (5.5325) grad_norm 2.6181 (inf) loss_scale 512.0000 (584.4640) mem 9655MB [2024-08-04 09:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][440/625] eta 0:00:48 lr 0.000115 wd 0.0500 time 0.2583 (0.2610) data time 0.0009 (0.0020) model time 0.2574 (0.2596) loss 5.9695 (5.5275) grad_norm 2.5788 (inf) loss_scale 512.0000 (582.8209) mem 9655MB [2024-08-04 09:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][450/625] eta 0:00:45 lr 0.000115 wd 0.0500 time 0.2521 (0.2614) data time 0.0010 (0.0020) model time 0.2511 (0.2600) loss 5.2924 (5.5332) grad_norm 1.9805 (inf) loss_scale 512.0000 (581.2506) mem 9655MB [2024-08-04 09:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][460/625] eta 0:00:43 lr 0.000115 wd 0.0500 time 0.2597 (0.2620) data time 0.0008 (0.0019) model time 0.2589 (0.2607) loss 5.7691 (5.5321) grad_norm 2.3635 (inf) loss_scale 512.0000 (579.7484) mem 9655MB [2024-08-04 09:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][470/625] eta 0:00:40 lr 0.000114 wd 0.0500 time 0.2551 (0.2623) data time 0.0009 (0.0019) model time 0.2542 (0.2610) loss 5.3021 (5.5308) grad_norm 3.6901 (inf) loss_scale 512.0000 (578.3100) mem 9655MB [2024-08-04 09:48:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][480/625] eta 0:00:38 lr 0.000114 wd 0.0500 time 0.2525 (0.2631) data time 0.0010 (0.0019) model time 0.2516 (0.2618) loss 5.8063 (5.5352) grad_norm 3.8631 (inf) loss_scale 512.0000 (576.9314) mem 9655MB [2024-08-04 09:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][490/625] eta 0:00:35 lr 0.000114 wd 0.0500 time 0.2570 (0.2629) data time 0.0009 (0.0019) model time 0.2561 (0.2617) loss 5.9655 (5.5360) grad_norm 3.7243 (inf) loss_scale 512.0000 (575.6090) mem 9655MB [2024-08-04 09:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][500/625] eta 0:00:32 lr 0.000114 wd 0.0500 time 0.2549 (0.2628) data time 0.0010 (0.0019) model time 0.2539 (0.2615) loss 5.5924 (5.5361) grad_norm 7.3766 (inf) loss_scale 512.0000 (574.3393) mem 9655MB [2024-08-04 09:48:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][510/625] eta 0:00:30 lr 0.000114 wd 0.0500 time 0.2581 (0.2634) data time 0.0006 (0.0018) model time 0.2574 (0.2622) loss 4.4108 (5.5381) grad_norm 2.9007 (inf) loss_scale 512.0000 (573.1194) mem 9655MB [2024-08-04 09:48:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][520/625] eta 0:00:27 lr 0.000114 wd 0.0500 time 0.2567 (0.2632) data time 0.0008 (0.0018) model time 0.2559 (0.2621) loss 5.4234 (5.5374) grad_norm 3.1941 (inf) loss_scale 512.0000 (571.9463) mem 9655MB [2024-08-04 09:48:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][530/625] eta 0:00:25 lr 0.000114 wd 0.0500 time 0.2539 (0.2635) data time 0.0009 (0.0018) model time 0.2529 (0.2623) loss 5.6175 (5.5386) grad_norm 3.8598 (inf) loss_scale 512.0000 (570.8173) mem 9655MB [2024-08-04 09:48:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][540/625] eta 0:00:22 lr 0.000114 wd 0.0500 time 0.2547 (0.2633) data time 0.0007 (0.0018) model time 0.2541 (0.2622) loss 5.3493 (5.5393) grad_norm 2.2082 (inf) loss_scale 512.0000 (569.7301) mem 9655MB [2024-08-04 09:48:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][550/625] eta 0:00:19 lr 0.000114 wd 0.0500 time 0.2563 (0.2632) data time 0.0006 (0.0018) model time 0.2557 (0.2620) loss 5.6401 (5.5427) grad_norm 1.8835 (inf) loss_scale 512.0000 (568.6824) mem 9655MB [2024-08-04 09:49:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][560/625] eta 0:00:17 lr 0.000114 wd 0.0500 time 0.2557 (0.2634) data time 0.0008 (0.0018) model time 0.2549 (0.2623) loss 4.7125 (5.5374) grad_norm 2.4919 (inf) loss_scale 512.0000 (567.6720) mem 9655MB [2024-08-04 09:49:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][570/625] eta 0:00:14 lr 0.000114 wd 0.0500 time 0.2565 (0.2637) data time 0.0009 (0.0017) model time 0.2557 (0.2626) loss 6.0032 (5.5410) grad_norm 1.6868 (inf) loss_scale 512.0000 (566.6970) mem 9655MB [2024-08-04 09:49:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][580/625] eta 0:00:11 lr 0.000114 wd 0.0500 time 0.2548 (0.2635) data time 0.0011 (0.0017) model time 0.2537 (0.2624) loss 5.7422 (5.5400) grad_norm 2.2652 (inf) loss_scale 512.0000 (565.7556) mem 9655MB [2024-08-04 09:49:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][590/625] eta 0:00:09 lr 0.000114 wd 0.0500 time 0.2589 (0.2634) data time 0.0006 (0.0017) model time 0.2583 (0.2623) loss 5.7415 (5.5452) grad_norm 2.7912 (inf) loss_scale 512.0000 (564.8460) mem 9655MB [2024-08-04 09:49:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][600/625] eta 0:00:06 lr 0.000113 wd 0.0500 time 0.2561 (0.2633) data time 0.0009 (0.0017) model time 0.2552 (0.2622) loss 5.1921 (5.5425) grad_norm 1.8392 (inf) loss_scale 512.0000 (563.9667) mem 9655MB [2024-08-04 09:49:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][610/625] eta 0:00:03 lr 0.000113 wd 0.0500 time 0.2521 (0.2632) data time 0.0006 (0.0017) model time 0.2515 (0.2620) loss 5.8563 (5.5479) grad_norm 2.0454 (inf) loss_scale 512.0000 (563.1162) mem 9655MB [2024-08-04 09:49:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][620/625] eta 0:00:01 lr 0.000113 wd 0.0500 time 0.2526 (0.2630) data time 0.0006 (0.0017) model time 0.2520 (0.2619) loss 5.9242 (5.5543) grad_norm 2.9066 (inf) loss_scale 512.0000 (562.2931) mem 9655MB [2024-08-04 09:49:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 260 training takes 0:02:44 [2024-08-04 09:49:16 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:49:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:49:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5898 (0.5898) Acc@1 90.430 (90.430) Acc@5 98.926 (98.926) Mem 9655MB [2024-08-04 09:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.8940 (0.7145) Acc@1 82.568 (87.167) Acc@5 96.240 (97.807) Mem 9655MB [2024-08-04 09:49:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0020 (0.8306) Acc@1 79.248 (84.166) Acc@5 95.654 (96.675) Mem 9655MB [2024-08-04 09:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.803 Acc@5 96.705 [2024-08-04 09:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 09:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.80% [2024-08-04 09:49:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 09:49:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 09:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5850 (0.5850) Acc@1 90.137 (90.137) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:49:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.8989 (0.7082) Acc@1 81.494 (86.967) Acc@5 96.338 (97.776) Mem 9655MB [2024-08-04 09:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0137 (0.8280) Acc@1 78.467 (83.789) Acc@5 95.410 (96.584) Mem 9655MB [2024-08-04 09:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.477 Acc@5 96.599 [2024-08-04 09:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:49:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.48% [2024-08-04 09:49:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:49:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][0/625] eta 0:07:28 lr 0.000113 wd 0.0500 time 0.7177 (0.7177) data time 0.4793 (0.4793) model time 0.0000 (0.0000) loss 5.8718 (5.8718) grad_norm 1.9824 (1.9824) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][10/625] eta 0:03:10 lr 0.000113 wd 0.0500 time 0.3832 (0.3090) data time 0.0007 (0.0443) model time 0.0000 (0.0000) loss 4.3225 (5.3413) grad_norm 1.8041 (2.2406) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][20/625] eta 0:02:51 lr 0.000113 wd 0.0500 time 0.2575 (0.2838) data time 0.0007 (0.0236) model time 0.0000 (0.0000) loss 4.7933 (5.4745) grad_norm 4.9593 (2.5899) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][30/625] eta 0:02:43 lr 0.000113 wd 0.0500 time 0.2591 (0.2750) data time 0.0010 (0.0163) model time 0.0000 (0.0000) loss 4.7628 (5.4719) grad_norm 2.9193 (2.5526) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][40/625] eta 0:02:40 lr 0.000113 wd 0.0500 time 0.2554 (0.2739) data time 0.0007 (0.0126) model time 0.0000 (0.0000) loss 5.1468 (5.5092) grad_norm 3.3781 (2.5583) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][50/625] eta 0:02:39 lr 0.000113 wd 0.0500 time 0.2606 (0.2767) data time 0.0006 (0.0104) model time 0.0000 (0.0000) loss 5.6505 (5.5372) grad_norm 4.1705 (2.8956) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][60/625] eta 0:02:35 lr 0.000113 wd 0.0500 time 0.2533 (0.2750) data time 0.0007 (0.0089) model time 0.2526 (0.2653) loss 5.3953 (5.5331) grad_norm 4.1621 (3.2224) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][70/625] eta 0:02:31 lr 0.000113 wd 0.0500 time 0.2602 (0.2724) data time 0.0006 (0.0078) model time 0.2596 (0.2604) loss 5.5912 (5.5693) grad_norm 2.4096 (3.1645) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][80/625] eta 0:02:28 lr 0.000113 wd 0.0500 time 0.2542 (0.2728) data time 0.0009 (0.0070) model time 0.2533 (0.2650) loss 6.1498 (5.5865) grad_norm 4.7353 (3.2211) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][90/625] eta 0:02:24 lr 0.000113 wd 0.0500 time 0.2572 (0.2709) data time 0.0008 (0.0063) model time 0.2564 (0.2625) loss 5.6577 (5.5772) grad_norm 1.7617 (3.1135) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][100/625] eta 0:02:21 lr 0.000113 wd 0.0500 time 0.2526 (0.2694) data time 0.0007 (0.0058) model time 0.2519 (0.2610) loss 5.8608 (5.5625) grad_norm 1.6999 (3.0407) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][110/625] eta 0:02:18 lr 0.000112 wd 0.0500 time 0.2591 (0.2683) data time 0.0006 (0.0053) model time 0.2585 (0.2601) loss 5.0449 (5.5581) grad_norm 3.5819 (2.9917) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][120/625] eta 0:02:14 lr 0.000112 wd 0.0500 time 0.2586 (0.2673) data time 0.0008 (0.0050) model time 0.2578 (0.2594) loss 6.4785 (5.5615) grad_norm 1.8115 (2.9334) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][130/625] eta 0:02:11 lr 0.000112 wd 0.0500 time 0.2564 (0.2664) data time 0.0009 (0.0047) model time 0.2555 (0.2589) loss 5.7022 (5.5638) grad_norm 2.4147 (2.8762) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:49:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][140/625] eta 0:02:08 lr 0.000112 wd 0.0500 time 0.2572 (0.2657) data time 0.0007 (0.0044) model time 0.2565 (0.2585) loss 4.7400 (5.5362) grad_norm 3.5265 (2.8231) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][150/625] eta 0:02:06 lr 0.000112 wd 0.0500 time 0.2549 (0.2658) data time 0.0006 (0.0042) model time 0.2543 (0.2593) loss 5.6709 (5.5286) grad_norm 2.2207 (2.8418) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][160/625] eta 0:02:03 lr 0.000112 wd 0.0500 time 0.2566 (0.2653) data time 0.0008 (0.0040) model time 0.2559 (0.2591) loss 5.3175 (5.5293) grad_norm 1.9014 (2.8171) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][170/625] eta 0:02:00 lr 0.000112 wd 0.0500 time 0.2528 (0.2648) data time 0.0008 (0.0038) model time 0.2520 (0.2587) loss 6.0372 (5.5313) grad_norm 2.1395 (2.8100) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][180/625] eta 0:01:57 lr 0.000112 wd 0.0500 time 0.2549 (0.2643) data time 0.0007 (0.0036) model time 0.2542 (0.2585) loss 5.2047 (5.5356) grad_norm 2.9408 (2.7821) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][190/625] eta 0:01:54 lr 0.000112 wd 0.0500 time 0.2544 (0.2639) data time 0.0011 (0.0035) model time 0.2533 (0.2583) loss 5.3075 (5.5337) grad_norm 2.3917 (2.7536) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][200/625] eta 0:01:52 lr 0.000112 wd 0.0500 time 0.2598 (0.2636) data time 0.0008 (0.0034) model time 0.2590 (0.2581) loss 5.3032 (5.5201) grad_norm 2.4076 (2.7262) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][210/625] eta 0:01:49 lr 0.000112 wd 0.0500 time 0.2561 (0.2638) data time 0.0007 (0.0033) model time 0.2554 (0.2587) loss 4.6753 (5.5232) grad_norm 2.4791 (2.7043) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][220/625] eta 0:01:46 lr 0.000112 wd 0.0500 time 0.2530 (0.2634) data time 0.0007 (0.0032) model time 0.2523 (0.2585) loss 6.1332 (5.5223) grad_norm 3.9655 (2.6993) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][230/625] eta 0:01:43 lr 0.000112 wd 0.0500 time 0.2553 (0.2631) data time 0.0011 (0.0031) model time 0.2543 (0.2583) loss 6.0511 (5.5261) grad_norm 3.1372 (2.6909) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][240/625] eta 0:01:41 lr 0.000111 wd 0.0500 time 0.2591 (0.2629) data time 0.0007 (0.0030) model time 0.2584 (0.2582) loss 5.9332 (5.5322) grad_norm 1.8771 (2.6711) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][250/625] eta 0:01:38 lr 0.000111 wd 0.0500 time 0.2531 (0.2626) data time 0.0008 (0.0029) model time 0.2524 (0.2581) loss 4.8269 (5.5382) grad_norm 1.8426 (2.6613) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][260/625] eta 0:01:36 lr 0.000111 wd 0.0500 time 0.2592 (0.2631) data time 0.0009 (0.0028) model time 0.2583 (0.2588) loss 5.1725 (5.5427) grad_norm 2.9379 (2.6578) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][270/625] eta 0:01:33 lr 0.000111 wd 0.0500 time 0.2557 (0.2629) data time 0.0009 (0.0028) model time 0.2548 (0.2587) loss 6.2563 (5.5430) grad_norm 3.5790 (2.6855) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][280/625] eta 0:01:30 lr 0.000111 wd 0.0500 time 0.2529 (0.2633) data time 0.0010 (0.0027) model time 0.2520 (0.2594) loss 6.8014 (5.5591) grad_norm 2.2530 (2.7367) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][290/625] eta 0:01:28 lr 0.000111 wd 0.0500 time 0.2581 (0.2638) data time 0.0007 (0.0026) model time 0.2574 (0.2600) loss 5.6706 (5.5640) grad_norm 3.5931 (2.7530) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][300/625] eta 0:01:25 lr 0.000111 wd 0.0500 time 0.2568 (0.2635) data time 0.0008 (0.0026) model time 0.2560 (0.2598) loss 5.3820 (5.5636) grad_norm 2.1195 (2.7548) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][310/625] eta 0:01:22 lr 0.000111 wd 0.0500 time 0.2605 (0.2632) data time 0.0006 (0.0025) model time 0.2599 (0.2596) loss 6.2321 (5.5722) grad_norm 5.2230 (2.7523) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][320/625] eta 0:01:20 lr 0.000111 wd 0.0500 time 0.4614 (0.2636) data time 0.0008 (0.0025) model time 0.4607 (0.2602) loss 5.1434 (5.5748) grad_norm 1.9576 (2.7468) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][330/625] eta 0:01:17 lr 0.000111 wd 0.0500 time 0.2565 (0.2634) data time 0.0007 (0.0024) model time 0.2558 (0.2600) loss 6.4041 (5.5697) grad_norm 2.7621 (2.7402) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][340/625] eta 0:01:15 lr 0.000111 wd 0.0500 time 0.2562 (0.2632) data time 0.0008 (0.0024) model time 0.2553 (0.2599) loss 4.8984 (5.5657) grad_norm 1.7050 (2.7383) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][350/625] eta 0:01:12 lr 0.000111 wd 0.0500 time 0.2519 (0.2630) data time 0.0010 (0.0023) model time 0.2510 (0.2597) loss 6.2900 (5.5605) grad_norm 1.7930 (2.7326) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][360/625] eta 0:01:09 lr 0.000111 wd 0.0500 time 0.2587 (0.2633) data time 0.0006 (0.0023) model time 0.2580 (0.2601) loss 5.1274 (5.5555) grad_norm 1.8095 (2.7151) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:50:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][370/625] eta 0:01:07 lr 0.000111 wd 0.0500 time 0.2504 (0.2631) data time 0.0009 (0.0022) model time 0.2495 (0.2600) loss 5.6985 (5.5541) grad_norm 2.0016 (2.7053) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][380/625] eta 0:01:04 lr 0.000110 wd 0.0500 time 0.2708 (0.2630) data time 0.0009 (0.0022) model time 0.2699 (0.2600) loss 4.6931 (5.5479) grad_norm 2.2140 (2.6953) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][390/625] eta 0:01:01 lr 0.000110 wd 0.0500 time 0.2552 (0.2628) data time 0.0006 (0.0022) model time 0.2546 (0.2598) loss 5.7801 (5.5493) grad_norm 2.6412 (2.7150) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][400/625] eta 0:00:59 lr 0.000110 wd 0.0500 time 0.2566 (0.2627) data time 0.0010 (0.0022) model time 0.2557 (0.2597) loss 5.5907 (5.5514) grad_norm 2.2584 (2.7382) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][410/625] eta 0:00:56 lr 0.000110 wd 0.0500 time 0.2519 (0.2625) data time 0.0011 (0.0021) model time 0.2508 (0.2596) loss 6.3498 (5.5584) grad_norm 1.7133 (2.7292) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][420/625] eta 0:00:53 lr 0.000110 wd 0.0500 time 0.2621 (0.2624) data time 0.0009 (0.0021) model time 0.2612 (0.2595) loss 4.8232 (5.5537) grad_norm 1.6268 (2.7359) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][430/625] eta 0:00:51 lr 0.000110 wd 0.0500 time 0.2555 (0.2623) data time 0.0010 (0.0021) model time 0.2545 (0.2594) loss 5.0204 (5.5523) grad_norm 2.4572 (2.7328) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][440/625] eta 0:00:48 lr 0.000110 wd 0.0500 time 0.2579 (0.2626) data time 0.0008 (0.0020) model time 0.2571 (0.2598) loss 5.9308 (5.5470) grad_norm 2.0636 (2.7282) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][450/625] eta 0:00:45 lr 0.000110 wd 0.0500 time 0.2531 (0.2624) data time 0.0012 (0.0020) model time 0.2519 (0.2596) loss 4.4778 (5.5432) grad_norm 2.4101 (2.7255) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][460/625] eta 0:00:43 lr 0.000110 wd 0.0500 time 0.2583 (0.2623) data time 0.0007 (0.0020) model time 0.2576 (0.2596) loss 5.1292 (5.5446) grad_norm 1.9804 (2.7110) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][470/625] eta 0:00:40 lr 0.000110 wd 0.0500 time 0.2565 (0.2621) data time 0.0006 (0.0020) model time 0.2559 (0.2594) loss 5.9681 (5.5504) grad_norm 3.8539 (2.7122) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][480/625] eta 0:00:37 lr 0.000110 wd 0.0500 time 0.2519 (0.2620) data time 0.0007 (0.0019) model time 0.2511 (0.2593) loss 5.6899 (5.5487) grad_norm 2.0029 (2.7073) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][490/625] eta 0:00:35 lr 0.000110 wd 0.0500 time 0.2568 (0.2619) data time 0.0007 (0.0019) model time 0.2561 (0.2593) loss 5.3089 (5.5492) grad_norm 1.8262 (2.6952) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][500/625] eta 0:00:32 lr 0.000110 wd 0.0500 time 0.2609 (0.2622) data time 0.0009 (0.0019) model time 0.2600 (0.2597) loss 5.0841 (5.5495) grad_norm 1.7110 (2.6844) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][510/625] eta 0:00:30 lr 0.000109 wd 0.0500 time 0.2521 (0.2626) data time 0.0008 (0.0019) model time 0.2513 (0.2601) loss 5.2351 (5.5509) grad_norm 1.8321 (2.6843) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][520/625] eta 0:00:27 lr 0.000109 wd 0.0500 time 0.2546 (0.2628) data time 0.0006 (0.0019) model time 0.2539 (0.2603) loss 6.1242 (5.5570) grad_norm 5.7357 (2.6882) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][530/625] eta 0:00:24 lr 0.000109 wd 0.0500 time 0.2543 (0.2627) data time 0.0009 (0.0018) model time 0.2535 (0.2602) loss 4.4156 (5.5587) grad_norm 3.0082 (2.6901) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][540/625] eta 0:00:22 lr 0.000109 wd 0.0500 time 0.2557 (0.2629) data time 0.0006 (0.0018) model time 0.2551 (0.2605) loss 5.6886 (5.5578) grad_norm 5.0320 (2.6928) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][550/625] eta 0:00:19 lr 0.000109 wd 0.0500 time 0.2599 (0.2629) data time 0.0006 (0.0018) model time 0.2593 (0.2606) loss 5.8864 (5.5633) grad_norm 2.5272 (2.6911) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][560/625] eta 0:00:17 lr 0.000109 wd 0.0500 time 0.2552 (0.2628) data time 0.0007 (0.0018) model time 0.2545 (0.2605) loss 5.2410 (5.5597) grad_norm 1.9517 (2.6856) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][570/625] eta 0:00:14 lr 0.000109 wd 0.0500 time 0.2563 (0.2630) data time 0.0009 (0.0018) model time 0.2554 (0.2607) loss 4.9879 (5.5625) grad_norm 2.3506 (2.6888) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][580/625] eta 0:00:11 lr 0.000109 wd 0.0500 time 0.2560 (0.2629) data time 0.0011 (0.0018) model time 0.2549 (0.2606) loss 6.2829 (5.5679) grad_norm 1.6731 (2.6782) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:51:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][590/625] eta 0:00:09 lr 0.000109 wd 0.0500 time 0.2590 (0.2628) data time 0.0008 (0.0017) model time 0.2582 (0.2605) loss 5.7453 (5.5703) grad_norm 1.7292 (2.6823) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][600/625] eta 0:00:06 lr 0.000109 wd 0.0500 time 0.2623 (0.2630) data time 0.0011 (0.0017) model time 0.2613 (0.2608) loss 6.2849 (5.5705) grad_norm 1.7446 (2.6759) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][610/625] eta 0:00:03 lr 0.000109 wd 0.0500 time 0.2514 (0.2632) data time 0.0006 (0.0017) model time 0.2507 (0.2610) loss 6.0649 (5.5702) grad_norm 4.6954 (2.6854) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][620/625] eta 0:00:01 lr 0.000109 wd 0.0500 time 0.2542 (0.2630) data time 0.0004 (0.0017) model time 0.2538 (0.2609) loss 6.2769 (5.5724) grad_norm 3.9184 (2.7043) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 261 training takes 0:02:44 [2024-08-04 09:52:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:52:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.425 (0.425) Loss 0.5874 (0.5874) Acc@1 90.186 (90.186) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 09:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.090) Loss 0.8979 (0.7060) Acc@1 81.104 (86.998) Acc@5 96.484 (97.781) Mem 9655MB [2024-08-04 09:52:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.073) Loss 0.9990 (0.8213) Acc@1 79.053 (83.961) Acc@5 95.410 (96.654) Mem 9655MB [2024-08-04 09:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.663 Acc@5 96.657 [2024-08-04 09:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 09:52:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.792 (0.792) Loss 0.5845 (0.5845) Acc@1 90.137 (90.137) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8994 (0.7080) Acc@1 81.494 (86.954) Acc@5 96.338 (97.785) Mem 9655MB [2024-08-04 09:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0137 (0.8277) Acc@1 78.516 (83.784) Acc@5 95.557 (96.596) Mem 9655MB [2024-08-04 09:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.463 Acc@5 96.607 [2024-08-04 09:52:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][0/625] eta 0:10:54 lr 0.000109 wd 0.0500 time 1.0466 (1.0466) data time 0.7102 (0.7102) model time 0.0000 (0.0000) loss 5.9150 (5.9150) grad_norm 3.7216 (3.7216) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][10/625] eta 0:03:32 lr 0.000109 wd 0.0500 time 0.2542 (0.3448) data time 0.0009 (0.0655) model time 0.0000 (0.0000) loss 5.6325 (5.6571) grad_norm 2.6979 (2.7911) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 09:52:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][20/625] eta 0:03:02 lr 0.000108 wd 0.0500 time 0.2578 (0.3023) data time 0.0007 (0.0348) model time 0.0000 (0.0000) loss 5.5942 (5.7162) grad_norm 2.6499 (inf) loss_scale 256.0000 (475.4286) mem 9655MB [2024-08-04 09:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][30/625] eta 0:02:50 lr 0.000108 wd 0.0500 time 0.2544 (0.2873) data time 0.0007 (0.0238) model time 0.0000 (0.0000) loss 4.7440 (5.6268) grad_norm 2.0798 (inf) loss_scale 256.0000 (404.6452) mem 9655MB [2024-08-04 09:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][40/625] eta 0:02:43 lr 0.000108 wd 0.0500 time 0.2549 (0.2797) data time 0.0010 (0.0183) model time 0.0000 (0.0000) loss 5.0673 (5.6392) grad_norm 1.7868 (inf) loss_scale 256.0000 (368.3902) mem 9655MB [2024-08-04 09:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][50/625] eta 0:02:38 lr 0.000108 wd 0.0500 time 0.2544 (0.2750) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 4.8106 (5.6571) grad_norm 2.5640 (inf) loss_scale 256.0000 (346.3529) mem 9655MB [2024-08-04 09:52:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][60/625] eta 0:02:33 lr 0.000108 wd 0.0500 time 0.2523 (0.2720) data time 0.0010 (0.0126) model time 0.2512 (0.2558) loss 5.1172 (5.5790) grad_norm 2.9999 (inf) loss_scale 256.0000 (331.5410) mem 9655MB [2024-08-04 09:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][70/625] eta 0:02:29 lr 0.000108 wd 0.0500 time 0.2609 (0.2697) data time 0.0011 (0.0110) model time 0.2599 (0.2553) loss 6.0180 (5.5915) grad_norm 1.4880 (inf) loss_scale 256.0000 (320.9014) mem 9655MB [2024-08-04 09:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][80/625] eta 0:02:26 lr 0.000108 wd 0.0500 time 0.2548 (0.2694) data time 0.0010 (0.0097) model time 0.2539 (0.2590) loss 5.8023 (5.6107) grad_norm 3.4006 (inf) loss_scale 256.0000 (312.8889) mem 9655MB [2024-08-04 09:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][90/625] eta 0:02:23 lr 0.000108 wd 0.0500 time 0.2538 (0.2679) data time 0.0009 (0.0088) model time 0.2529 (0.2579) loss 4.9933 (5.5918) grad_norm 1.6525 (inf) loss_scale 256.0000 (306.6374) mem 9655MB [2024-08-04 09:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][100/625] eta 0:02:20 lr 0.000108 wd 0.0500 time 0.2528 (0.2667) data time 0.0008 (0.0080) model time 0.2520 (0.2572) loss 5.6372 (5.5915) grad_norm 4.0382 (inf) loss_scale 256.0000 (301.6238) mem 9655MB [2024-08-04 09:52:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][110/625] eta 0:02:16 lr 0.000108 wd 0.0500 time 0.2681 (0.2658) data time 0.0010 (0.0074) model time 0.2672 (0.2570) loss 6.0304 (5.6080) grad_norm 1.9519 (inf) loss_scale 256.0000 (297.5135) mem 9655MB [2024-08-04 09:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][120/625] eta 0:02:13 lr 0.000108 wd 0.0500 time 0.2526 (0.2650) data time 0.0008 (0.0068) model time 0.2518 (0.2567) loss 5.4016 (5.5979) grad_norm 2.5996 (inf) loss_scale 256.0000 (294.0826) mem 9655MB [2024-08-04 09:52:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][130/625] eta 0:02:10 lr 0.000108 wd 0.0500 time 0.2597 (0.2643) data time 0.0006 (0.0064) model time 0.2591 (0.2565) loss 4.7912 (5.5708) grad_norm 3.3460 (inf) loss_scale 256.0000 (291.1756) mem 9655MB [2024-08-04 09:52:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][140/625] eta 0:02:09 lr 0.000108 wd 0.0500 time 0.2541 (0.2663) data time 0.0008 (0.0060) model time 0.2533 (0.2605) loss 6.3240 (5.5747) grad_norm 1.5114 (inf) loss_scale 256.0000 (288.6809) mem 9655MB [2024-08-04 09:52:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][150/625] eta 0:02:06 lr 0.000108 wd 0.0500 time 0.2588 (0.2657) data time 0.0008 (0.0057) model time 0.2579 (0.2600) loss 6.2818 (5.5605) grad_norm 3.2181 (inf) loss_scale 256.0000 (286.5166) mem 9655MB [2024-08-04 09:52:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][160/625] eta 0:02:03 lr 0.000107 wd 0.0500 time 0.2562 (0.2651) data time 0.0011 (0.0054) model time 0.2551 (0.2595) loss 5.7009 (5.5739) grad_norm 2.6004 (inf) loss_scale 256.0000 (284.6211) mem 9655MB [2024-08-04 09:52:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][170/625] eta 0:02:00 lr 0.000107 wd 0.0500 time 0.2513 (0.2657) data time 0.0007 (0.0051) model time 0.2506 (0.2607) loss 6.3143 (5.5776) grad_norm 2.2021 (inf) loss_scale 256.0000 (282.9474) mem 9655MB [2024-08-04 09:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][180/625] eta 0:01:58 lr 0.000107 wd 0.0500 time 0.2599 (0.2663) data time 0.0011 (0.0049) model time 0.2588 (0.2619) loss 6.0904 (5.5724) grad_norm 2.9250 (inf) loss_scale 256.0000 (281.4586) mem 9655MB [2024-08-04 09:53:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][190/625] eta 0:01:55 lr 0.000107 wd 0.0500 time 0.2549 (0.2658) data time 0.0008 (0.0047) model time 0.2541 (0.2614) loss 5.3055 (5.5864) grad_norm 2.2390 (inf) loss_scale 256.0000 (280.1257) mem 9655MB [2024-08-04 09:53:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][200/625] eta 0:01:52 lr 0.000107 wd 0.0500 time 0.2508 (0.2652) data time 0.0010 (0.0045) model time 0.2498 (0.2610) loss 5.0210 (5.5723) grad_norm 3.5737 (inf) loss_scale 256.0000 (278.9254) mem 9655MB [2024-08-04 09:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][210/625] eta 0:01:49 lr 0.000107 wd 0.0500 time 0.2551 (0.2647) data time 0.0009 (0.0043) model time 0.2542 (0.2605) loss 5.9491 (5.5971) grad_norm 2.3083 (inf) loss_scale 256.0000 (277.8389) mem 9655MB [2024-08-04 09:53:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][220/625] eta 0:01:47 lr 0.000107 wd 0.0500 time 0.2586 (0.2643) data time 0.0007 (0.0042) model time 0.2580 (0.2602) loss 5.4562 (5.5978) grad_norm 5.2896 (inf) loss_scale 256.0000 (276.8507) mem 9655MB [2024-08-04 09:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][230/625] eta 0:01:44 lr 0.000107 wd 0.0500 time 0.2553 (0.2640) data time 0.0008 (0.0040) model time 0.2544 (0.2599) loss 5.6699 (5.5879) grad_norm 3.8123 (inf) loss_scale 256.0000 (275.9481) mem 9655MB [2024-08-04 09:53:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][240/625] eta 0:01:41 lr 0.000107 wd 0.0500 time 0.2559 (0.2636) data time 0.0008 (0.0039) model time 0.2551 (0.2596) loss 5.4467 (5.5873) grad_norm 2.8512 (inf) loss_scale 256.0000 (275.1203) mem 9655MB [2024-08-04 09:53:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][250/625] eta 0:01:38 lr 0.000107 wd 0.0500 time 0.2550 (0.2633) data time 0.0006 (0.0038) model time 0.2544 (0.2594) loss 4.4207 (5.5827) grad_norm 2.2091 (inf) loss_scale 256.0000 (274.3586) mem 9655MB [2024-08-04 09:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][260/625] eta 0:01:36 lr 0.000107 wd 0.0500 time 0.2600 (0.2638) data time 0.0009 (0.0037) model time 0.2592 (0.2602) loss 5.8965 (5.5716) grad_norm 1.9089 (inf) loss_scale 256.0000 (273.6552) mem 9655MB [2024-08-04 09:53:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][270/625] eta 0:01:33 lr 0.000107 wd 0.0500 time 0.2566 (0.2643) data time 0.0011 (0.0036) model time 0.2555 (0.2609) loss 5.6423 (5.5726) grad_norm 3.0997 (inf) loss_scale 256.0000 (273.0037) mem 9655MB [2024-08-04 09:53:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][280/625] eta 0:01:31 lr 0.000107 wd 0.0500 time 0.2570 (0.2640) data time 0.0008 (0.0035) model time 0.2563 (0.2606) loss 5.0273 (5.5582) grad_norm 2.8251 (inf) loss_scale 256.0000 (272.3986) mem 9655MB [2024-08-04 09:53:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][290/625] eta 0:01:28 lr 0.000107 wd 0.0500 time 0.2569 (0.2638) data time 0.0007 (0.0034) model time 0.2561 (0.2605) loss 6.4822 (5.5626) grad_norm 3.3700 (inf) loss_scale 256.0000 (271.8351) mem 9655MB [2024-08-04 09:53:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][300/625] eta 0:01:25 lr 0.000106 wd 0.0500 time 0.2516 (0.2635) data time 0.0011 (0.0033) model time 0.2506 (0.2602) loss 5.4933 (5.5663) grad_norm 2.9275 (inf) loss_scale 256.0000 (271.3090) mem 9655MB [2024-08-04 09:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][310/625] eta 0:01:22 lr 0.000106 wd 0.0500 time 0.2527 (0.2634) data time 0.0006 (0.0032) model time 0.2521 (0.2601) loss 4.7343 (5.5470) grad_norm 1.6223 (inf) loss_scale 256.0000 (270.8167) mem 9655MB [2024-08-04 09:53:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][320/625] eta 0:01:20 lr 0.000106 wd 0.0500 time 0.2587 (0.2632) data time 0.0009 (0.0032) model time 0.2579 (0.2600) loss 5.8715 (5.5545) grad_norm 2.0723 (inf) loss_scale 256.0000 (270.3551) mem 9655MB [2024-08-04 09:53:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][330/625] eta 0:01:17 lr 0.000106 wd 0.0500 time 0.4294 (0.2635) data time 0.0008 (0.0031) model time 0.4286 (0.2605) loss 5.3692 (5.5591) grad_norm 2.9490 (inf) loss_scale 256.0000 (269.9215) mem 9655MB [2024-08-04 09:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][340/625] eta 0:01:15 lr 0.000106 wd 0.0500 time 0.2562 (0.2636) data time 0.0006 (0.0030) model time 0.2556 (0.2606) loss 4.7723 (5.5506) grad_norm 4.9108 (inf) loss_scale 256.0000 (269.5132) mem 9655MB [2024-08-04 09:53:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][350/625] eta 0:01:12 lr 0.000106 wd 0.0500 time 0.2587 (0.2640) data time 0.0010 (0.0030) model time 0.2577 (0.2612) loss 4.8634 (5.5490) grad_norm 4.2510 (inf) loss_scale 256.0000 (269.1282) mem 9655MB [2024-08-04 09:53:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][360/625] eta 0:01:10 lr 0.000106 wd 0.0500 time 0.2538 (0.2643) data time 0.0008 (0.0029) model time 0.2530 (0.2616) loss 6.5080 (5.5540) grad_norm 3.0990 (inf) loss_scale 256.0000 (268.7645) mem 9655MB [2024-08-04 09:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][370/625] eta 0:01:07 lr 0.000106 wd 0.0500 time 0.2530 (0.2645) data time 0.0009 (0.0028) model time 0.2521 (0.2619) loss 6.5242 (5.5552) grad_norm 2.9968 (inf) loss_scale 256.0000 (268.4205) mem 9655MB [2024-08-04 09:53:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][380/625] eta 0:01:04 lr 0.000106 wd 0.0500 time 0.2587 (0.2643) data time 0.0014 (0.0028) model time 0.2573 (0.2617) loss 5.3162 (5.5580) grad_norm 3.3072 (inf) loss_scale 256.0000 (268.0945) mem 9655MB [2024-08-04 09:53:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][390/625] eta 0:01:02 lr 0.000106 wd 0.0500 time 0.2575 (0.2641) data time 0.0009 (0.0028) model time 0.2566 (0.2616) loss 6.9238 (5.5586) grad_norm 2.2483 (inf) loss_scale 256.0000 (267.7852) mem 9655MB [2024-08-04 09:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][400/625] eta 0:00:59 lr 0.000106 wd 0.0500 time 0.2543 (0.2640) data time 0.0008 (0.0027) model time 0.2535 (0.2614) loss 5.2891 (5.5628) grad_norm 2.2397 (inf) loss_scale 256.0000 (267.4913) mem 9655MB [2024-08-04 09:53:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][410/625] eta 0:00:56 lr 0.000106 wd 0.0500 time 0.2559 (0.2641) data time 0.0008 (0.0027) model time 0.2551 (0.2616) loss 6.1699 (5.5660) grad_norm 2.2380 (inf) loss_scale 256.0000 (267.2117) mem 9655MB [2024-08-04 09:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][420/625] eta 0:00:54 lr 0.000106 wd 0.0500 time 0.2516 (0.2644) data time 0.0009 (0.0026) model time 0.2507 (0.2620) loss 5.9212 (5.5666) grad_norm 1.9146 (inf) loss_scale 256.0000 (266.9454) mem 9655MB [2024-08-04 09:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][430/625] eta 0:00:51 lr 0.000105 wd 0.0500 time 0.2579 (0.2642) data time 0.0006 (0.0026) model time 0.2572 (0.2618) loss 4.8005 (5.5624) grad_norm 2.9991 (inf) loss_scale 256.0000 (266.6914) mem 9655MB [2024-08-04 09:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][440/625] eta 0:00:48 lr 0.000105 wd 0.0500 time 0.2579 (0.2640) data time 0.0009 (0.0025) model time 0.2570 (0.2616) loss 4.6732 (5.5563) grad_norm 2.1721 (inf) loss_scale 256.0000 (266.4490) mem 9655MB [2024-08-04 09:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][450/625] eta 0:00:46 lr 0.000105 wd 0.0500 time 0.2545 (0.2643) data time 0.0007 (0.0025) model time 0.2538 (0.2620) loss 6.4864 (5.5550) grad_norm 2.8569 (inf) loss_scale 256.0000 (266.2173) mem 9655MB [2024-08-04 09:54:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][460/625] eta 0:00:43 lr 0.000105 wd 0.0500 time 0.2563 (0.2641) data time 0.0007 (0.0025) model time 0.2556 (0.2618) loss 5.1425 (5.5569) grad_norm 3.8950 (inf) loss_scale 256.0000 (265.9957) mem 9655MB [2024-08-04 09:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][470/625] eta 0:00:40 lr 0.000105 wd 0.0500 time 0.2503 (0.2639) data time 0.0007 (0.0024) model time 0.2496 (0.2616) loss 4.9626 (5.5558) grad_norm 2.2784 (inf) loss_scale 256.0000 (265.7834) mem 9655MB [2024-08-04 09:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][480/625] eta 0:00:38 lr 0.000105 wd 0.0500 time 0.2544 (0.2637) data time 0.0010 (0.0024) model time 0.2534 (0.2614) loss 5.3877 (5.5557) grad_norm 4.2186 (inf) loss_scale 256.0000 (265.5800) mem 9655MB [2024-08-04 09:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][490/625] eta 0:00:35 lr 0.000105 wd 0.0500 time 0.2542 (0.2635) data time 0.0008 (0.0024) model time 0.2534 (0.2613) loss 5.0819 (5.5631) grad_norm 4.4125 (inf) loss_scale 256.0000 (265.3849) mem 9655MB [2024-08-04 09:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][500/625] eta 0:00:32 lr 0.000105 wd 0.0500 time 0.2600 (0.2638) data time 0.0008 (0.0024) model time 0.2591 (0.2616) loss 5.9851 (5.5653) grad_norm 1.7588 (inf) loss_scale 256.0000 (265.1976) mem 9655MB [2024-08-04 09:54:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][510/625] eta 0:00:30 lr 0.000105 wd 0.0500 time 0.2553 (0.2640) data time 0.0009 (0.0023) model time 0.2545 (0.2618) loss 5.7017 (5.5656) grad_norm 2.3815 (inf) loss_scale 256.0000 (265.0176) mem 9655MB [2024-08-04 09:54:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][520/625] eta 0:00:27 lr 0.000105 wd 0.0500 time 0.2596 (0.2638) data time 0.0009 (0.0023) model time 0.2587 (0.2617) loss 6.3700 (5.5657) grad_norm 3.9977 (inf) loss_scale 256.0000 (264.8445) mem 9655MB [2024-08-04 09:54:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][530/625] eta 0:00:25 lr 0.000105 wd 0.0500 time 0.2544 (0.2637) data time 0.0006 (0.0023) model time 0.2538 (0.2616) loss 6.4102 (5.5622) grad_norm 3.1309 (inf) loss_scale 256.0000 (264.6780) mem 9655MB [2024-08-04 09:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][540/625] eta 0:00:22 lr 0.000105 wd 0.0500 time 0.2533 (0.2636) data time 0.0010 (0.0022) model time 0.2523 (0.2615) loss 6.0155 (5.5584) grad_norm 2.0956 (inf) loss_scale 256.0000 (264.5176) mem 9655MB [2024-08-04 09:54:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][550/625] eta 0:00:19 lr 0.000105 wd 0.0500 time 0.2538 (0.2634) data time 0.0009 (0.0022) model time 0.2529 (0.2613) loss 5.4849 (5.5579) grad_norm 2.2965 (inf) loss_scale 256.0000 (264.3630) mem 9655MB [2024-08-04 09:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][560/625] eta 0:00:17 lr 0.000105 wd 0.0500 time 0.2721 (0.2633) data time 0.0006 (0.0022) model time 0.2715 (0.2612) loss 6.4556 (5.5551) grad_norm 2.6254 (inf) loss_scale 256.0000 (264.2139) mem 9655MB [2024-08-04 09:54:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][570/625] eta 0:00:14 lr 0.000104 wd 0.0500 time 0.2571 (0.2632) data time 0.0012 (0.0022) model time 0.2559 (0.2611) loss 4.9054 (5.5536) grad_norm 1.8801 (inf) loss_scale 256.0000 (264.0701) mem 9655MB [2024-08-04 09:54:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][580/625] eta 0:00:11 lr 0.000104 wd 0.0500 time 0.2526 (0.2633) data time 0.0007 (0.0022) model time 0.2519 (0.2612) loss 6.2895 (5.5495) grad_norm 1.8592 (inf) loss_scale 256.0000 (263.9312) mem 9655MB [2024-08-04 09:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][590/625] eta 0:00:09 lr 0.000104 wd 0.0500 time 0.2516 (0.2632) data time 0.0009 (0.0022) model time 0.2507 (0.2611) loss 5.8255 (5.5482) grad_norm 1.4016 (inf) loss_scale 256.0000 (263.7970) mem 9655MB [2024-08-04 09:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][600/625] eta 0:00:06 lr 0.000104 wd 0.0500 time 0.2545 (0.2633) data time 0.0006 (0.0021) model time 0.2540 (0.2613) loss 6.1332 (5.5500) grad_norm 3.1738 (inf) loss_scale 256.0000 (263.6672) mem 9655MB [2024-08-04 09:54:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][610/625] eta 0:00:03 lr 0.000104 wd 0.0500 time 0.2528 (0.2632) data time 0.0004 (0.0021) model time 0.2524 (0.2612) loss 4.6747 (5.5509) grad_norm 2.9922 (inf) loss_scale 256.0000 (263.5417) mem 9655MB [2024-08-04 09:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][620/625] eta 0:00:01 lr 0.000104 wd 0.0500 time 0.4357 (0.2633) data time 0.0006 (0.0021) model time 0.4351 (0.2613) loss 5.9875 (5.5505) grad_norm 4.8323 (inf) loss_scale 256.0000 (263.4203) mem 9655MB [2024-08-04 09:54:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 262 training takes 0:02:44 [2024-08-04 09:54:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:54:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:54:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.5972 (0.5972) Acc@1 90.430 (90.430) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 09:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.096) Loss 0.9131 (0.7177) Acc@1 81.885 (87.109) Acc@5 96.484 (97.838) Mem 9655MB [2024-08-04 09:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0088 (0.8361) Acc@1 78.418 (83.982) Acc@5 95.557 (96.708) Mem 9655MB [2024-08-04 09:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.689 Acc@5 96.715 [2024-08-04 09:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 09:54:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.748 (0.748) Loss 0.5845 (0.5845) Acc@1 90.186 (90.186) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.8994 (0.7080) Acc@1 81.787 (87.007) Acc@5 96.338 (97.785) Mem 9655MB [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0137 (0.8277) Acc@1 78.564 (83.817) Acc@5 95.557 (96.603) Mem 9655MB [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.489 Acc@5 96.613 [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.49% [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:55:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:55:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][0/625] eta 0:07:01 lr 0.000104 wd 0.0500 time 0.6749 (0.6749) data time 0.4351 (0.4351) model time 0.0000 (0.0000) loss 5.1866 (5.1866) grad_norm 2.0759 (2.0759) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][10/625] eta 0:03:07 lr 0.000104 wd 0.0500 time 0.2563 (0.3048) data time 0.0007 (0.0405) model time 0.0000 (0.0000) loss 5.1388 (5.6259) grad_norm 1.8608 (2.2225) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][20/625] eta 0:02:57 lr 0.000104 wd 0.0500 time 0.2553 (0.2935) data time 0.0008 (0.0217) model time 0.0000 (0.0000) loss 5.9097 (5.6522) grad_norm 4.2959 (2.2549) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][30/625] eta 0:02:54 lr 0.000104 wd 0.0500 time 0.2565 (0.2937) data time 0.0008 (0.0150) model time 0.0000 (0.0000) loss 5.0135 (5.5610) grad_norm 2.7416 (2.3710) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][40/625] eta 0:02:46 lr 0.000104 wd 0.0500 time 0.2513 (0.2844) data time 0.0010 (0.0116) model time 0.0000 (0.0000) loss 6.1182 (5.6061) grad_norm 5.4416 (2.5592) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][50/625] eta 0:02:40 lr 0.000104 wd 0.0500 time 0.2544 (0.2789) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 5.9760 (5.5934) grad_norm 3.3480 (2.6057) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][60/625] eta 0:02:36 lr 0.000104 wd 0.0500 time 0.3957 (0.2774) data time 0.0006 (0.0083) model time 0.3951 (0.2692) loss 5.8414 (5.5355) grad_norm 4.0611 (2.6182) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][70/625] eta 0:02:33 lr 0.000104 wd 0.0500 time 0.2551 (0.2769) data time 0.0009 (0.0073) model time 0.2542 (0.2709) loss 6.3311 (5.5691) grad_norm 4.3364 (2.7258) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][80/625] eta 0:02:29 lr 0.000104 wd 0.0500 time 0.2575 (0.2743) data time 0.0009 (0.0065) model time 0.2565 (0.2656) loss 4.9782 (5.5855) grad_norm 2.4064 (2.7849) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][90/625] eta 0:02:25 lr 0.000103 wd 0.0500 time 0.2553 (0.2723) data time 0.0009 (0.0059) model time 0.2544 (0.2630) loss 5.7824 (5.6079) grad_norm 3.4242 (3.0095) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][100/625] eta 0:02:22 lr 0.000103 wd 0.0500 time 0.2591 (0.2706) data time 0.0009 (0.0054) model time 0.2583 (0.2613) loss 5.8761 (5.5933) grad_norm 3.0299 (3.0784) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][110/625] eta 0:02:18 lr 0.000103 wd 0.0500 time 0.2552 (0.2692) data time 0.0012 (0.0050) model time 0.2540 (0.2600) loss 5.9849 (5.6006) grad_norm 5.0616 (3.0905) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][120/625] eta 0:02:16 lr 0.000103 wd 0.0500 time 0.2543 (0.2696) data time 0.0008 (0.0047) model time 0.2535 (0.2619) loss 6.0109 (5.5979) grad_norm 2.8669 (3.0418) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][130/625] eta 0:02:12 lr 0.000103 wd 0.0500 time 0.2553 (0.2685) data time 0.0010 (0.0044) model time 0.2543 (0.2610) loss 5.5535 (5.5994) grad_norm 2.6144 (3.0316) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][140/625] eta 0:02:09 lr 0.000103 wd 0.0500 time 0.2558 (0.2676) data time 0.0009 (0.0041) model time 0.2549 (0.2603) loss 4.9988 (5.5871) grad_norm 2.2803 (3.0233) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][150/625] eta 0:02:06 lr 0.000103 wd 0.0500 time 0.2583 (0.2670) data time 0.0008 (0.0039) model time 0.2576 (0.2599) loss 4.5354 (5.5950) grad_norm 2.0285 (2.9974) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][160/625] eta 0:02:03 lr 0.000103 wd 0.0500 time 0.2543 (0.2662) data time 0.0008 (0.0037) model time 0.2535 (0.2594) loss 4.2240 (5.5899) grad_norm 2.5226 (3.0203) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][170/625] eta 0:02:00 lr 0.000103 wd 0.0500 time 0.2568 (0.2656) data time 0.0007 (0.0036) model time 0.2562 (0.2590) loss 4.6408 (5.5797) grad_norm 3.6347 (3.0153) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][180/625] eta 0:01:57 lr 0.000103 wd 0.0500 time 0.2539 (0.2651) data time 0.0010 (0.0034) model time 0.2529 (0.2587) loss 5.4680 (5.5720) grad_norm 2.6009 (3.0103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][190/625] eta 0:01:55 lr 0.000103 wd 0.0500 time 0.2603 (0.2646) data time 0.0008 (0.0033) model time 0.2595 (0.2584) loss 5.2921 (5.5817) grad_norm 1.9098 (2.9932) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][200/625] eta 0:01:52 lr 0.000103 wd 0.0500 time 0.2563 (0.2651) data time 0.0008 (0.0032) model time 0.2555 (0.2595) loss 5.3666 (5.5763) grad_norm 2.4924 (2.9612) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][210/625] eta 0:01:50 lr 0.000103 wd 0.0500 time 0.2544 (0.2652) data time 0.0008 (0.0031) model time 0.2536 (0.2600) loss 6.0756 (5.5807) grad_norm 1.7484 (2.9127) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:55:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][220/625] eta 0:01:47 lr 0.000103 wd 0.0500 time 0.2560 (0.2648) data time 0.0007 (0.0030) model time 0.2553 (0.2597) loss 6.1643 (5.5785) grad_norm 1.5677 (2.8925) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][230/625] eta 0:01:44 lr 0.000102 wd 0.0500 time 0.2561 (0.2645) data time 0.0009 (0.0029) model time 0.2552 (0.2595) loss 4.8153 (5.5768) grad_norm 2.5080 (2.8727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][240/625] eta 0:01:42 lr 0.000102 wd 0.0500 time 0.2517 (0.2650) data time 0.0008 (0.0028) model time 0.2508 (0.2603) loss 4.5740 (5.5755) grad_norm 2.8351 (2.8735) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][250/625] eta 0:01:39 lr 0.000102 wd 0.0500 time 0.2570 (0.2651) data time 0.0009 (0.0027) model time 0.2561 (0.2606) loss 5.1153 (5.5752) grad_norm 5.5736 (2.8857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][260/625] eta 0:01:36 lr 0.000102 wd 0.0500 time 0.2537 (0.2647) data time 0.0007 (0.0027) model time 0.2530 (0.2603) loss 5.0074 (5.5669) grad_norm 2.3493 (2.9141) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][270/625] eta 0:01:33 lr 0.000102 wd 0.0500 time 0.2529 (0.2644) data time 0.0008 (0.0026) model time 0.2520 (0.2601) loss 6.3991 (5.5678) grad_norm 3.2536 (2.9229) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][280/625] eta 0:01:31 lr 0.000102 wd 0.0500 time 0.2555 (0.2642) data time 0.0009 (0.0026) model time 0.2546 (0.2599) loss 5.2320 (5.5704) grad_norm 2.2242 (2.9235) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][290/625] eta 0:01:28 lr 0.000102 wd 0.0500 time 0.2545 (0.2639) data time 0.0007 (0.0025) model time 0.2538 (0.2598) loss 6.0046 (5.5637) grad_norm 2.0222 (2.9120) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][300/625] eta 0:01:25 lr 0.000102 wd 0.0500 time 0.2561 (0.2643) data time 0.0015 (0.0025) model time 0.2546 (0.2604) loss 6.2190 (5.5527) grad_norm 2.1776 (2.8965) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][310/625] eta 0:01:23 lr 0.000102 wd 0.0500 time 0.2556 (0.2640) data time 0.0010 (0.0024) model time 0.2546 (0.2602) loss 5.6545 (5.5510) grad_norm 2.6517 (2.9112) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][320/625] eta 0:01:20 lr 0.000102 wd 0.0500 time 0.2582 (0.2638) data time 0.0008 (0.0024) model time 0.2574 (0.2600) loss 5.4071 (5.5534) grad_norm 2.3855 (2.9080) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][330/625] eta 0:01:17 lr 0.000102 wd 0.0500 time 0.4615 (0.2641) data time 0.0007 (0.0023) model time 0.4608 (0.2605) loss 4.5085 (5.5492) grad_norm 2.6496 (2.8930) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][340/625] eta 0:01:15 lr 0.000102 wd 0.0500 time 0.2540 (0.2639) data time 0.0011 (0.0023) model time 0.2529 (0.2603) loss 5.4581 (5.5479) grad_norm 2.0151 (2.8764) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][350/625] eta 0:01:12 lr 0.000102 wd 0.0500 time 0.2551 (0.2642) data time 0.0011 (0.0022) model time 0.2541 (0.2607) loss 6.2878 (5.5542) grad_norm 2.7474 (2.8592) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][360/625] eta 0:01:09 lr 0.000102 wd 0.0500 time 0.2591 (0.2640) data time 0.0008 (0.0022) model time 0.2583 (0.2606) loss 6.0325 (5.5526) grad_norm 2.0309 (2.8368) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][370/625] eta 0:01:07 lr 0.000101 wd 0.0500 time 0.2555 (0.2637) data time 0.0008 (0.0022) model time 0.2547 (0.2603) loss 5.4977 (5.5581) grad_norm 1.9617 (2.8306) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][380/625] eta 0:01:04 lr 0.000101 wd 0.0500 time 0.2559 (0.2640) data time 0.0010 (0.0021) model time 0.2549 (0.2608) loss 6.0258 (5.5601) grad_norm 1.8062 (2.9185) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][390/625] eta 0:01:02 lr 0.000101 wd 0.0500 time 0.2576 (0.2639) data time 0.0010 (0.0021) model time 0.2566 (0.2606) loss 5.9413 (5.5673) grad_norm 2.9331 (2.9155) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][400/625] eta 0:00:59 lr 0.000101 wd 0.0500 time 0.2555 (0.2636) data time 0.0006 (0.0021) model time 0.2549 (0.2605) loss 5.1459 (5.5646) grad_norm 2.1069 (2.9139) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][410/625] eta 0:00:56 lr 0.000101 wd 0.0500 time 0.2537 (0.2634) data time 0.0007 (0.0021) model time 0.2530 (0.2603) loss 5.8606 (5.5726) grad_norm 2.5919 (2.8991) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][420/625] eta 0:00:53 lr 0.000101 wd 0.0500 time 0.2504 (0.2632) data time 0.0010 (0.0020) model time 0.2494 (0.2601) loss 5.9083 (5.5759) grad_norm 2.8820 (2.8860) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][430/625] eta 0:00:51 lr 0.000101 wd 0.0500 time 0.2511 (0.2631) data time 0.0009 (0.0020) model time 0.2502 (0.2600) loss 5.8704 (5.5774) grad_norm 1.4925 (2.8773) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][440/625] eta 0:00:48 lr 0.000101 wd 0.0500 time 0.2537 (0.2629) data time 0.0011 (0.0020) model time 0.2526 (0.2598) loss 5.5242 (5.5828) grad_norm 3.4502 (2.8645) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:56:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][450/625] eta 0:00:45 lr 0.000101 wd 0.0500 time 0.2589 (0.2627) data time 0.0006 (0.0020) model time 0.2582 (0.2597) loss 4.8204 (5.5818) grad_norm 2.0184 (2.9371) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][460/625] eta 0:00:43 lr 0.000101 wd 0.0500 time 0.2598 (0.2626) data time 0.0006 (0.0019) model time 0.2592 (0.2596) loss 6.3685 (5.5855) grad_norm 2.1254 (2.9485) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][470/625] eta 0:00:40 lr 0.000101 wd 0.0500 time 0.2575 (0.2625) data time 0.0007 (0.0019) model time 0.2568 (0.2595) loss 5.5567 (5.5862) grad_norm 2.8098 (2.9475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][480/625] eta 0:00:38 lr 0.000101 wd 0.0500 time 0.2574 (0.2623) data time 0.0007 (0.0019) model time 0.2566 (0.2594) loss 6.0766 (5.5867) grad_norm 5.3895 (2.9618) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][490/625] eta 0:00:35 lr 0.000101 wd 0.0500 time 0.2550 (0.2622) data time 0.0010 (0.0019) model time 0.2541 (0.2593) loss 5.7977 (5.5896) grad_norm 2.7549 (2.9560) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][500/625] eta 0:00:32 lr 0.000101 wd 0.0500 time 0.2716 (0.2621) data time 0.0006 (0.0019) model time 0.2709 (0.2592) loss 5.7117 (5.5940) grad_norm 2.1405 (2.9397) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][510/625] eta 0:00:30 lr 0.000100 wd 0.0500 time 0.2528 (0.2624) data time 0.0007 (0.0019) model time 0.2520 (0.2596) loss 5.6407 (5.5893) grad_norm 3.9441 (2.9385) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][520/625] eta 0:00:27 lr 0.000100 wd 0.0500 time 0.2537 (0.2623) data time 0.0008 (0.0018) model time 0.2529 (0.2595) loss 6.0803 (5.5876) grad_norm 2.6213 (2.9284) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][530/625] eta 0:00:24 lr 0.000100 wd 0.0500 time 0.4363 (0.2629) data time 0.0008 (0.0018) model time 0.4356 (0.2602) loss 5.7624 (5.5898) grad_norm 3.6983 (2.9303) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][540/625] eta 0:00:22 lr 0.000100 wd 0.0500 time 0.2557 (0.2628) data time 0.0009 (0.0018) model time 0.2547 (0.2601) loss 5.2650 (5.5934) grad_norm 1.5999 (2.9173) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][550/625] eta 0:00:19 lr 0.000100 wd 0.0500 time 0.2556 (0.2629) data time 0.0008 (0.0018) model time 0.2548 (0.2603) loss 6.2214 (5.5955) grad_norm 2.9895 (2.9141) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][560/625] eta 0:00:17 lr 0.000100 wd 0.0500 time 0.2591 (0.2628) data time 0.0008 (0.0018) model time 0.2582 (0.2602) loss 6.3862 (5.5935) grad_norm 2.3810 (2.9147) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][570/625] eta 0:00:14 lr 0.000100 wd 0.0500 time 0.2560 (0.2627) data time 0.0008 (0.0018) model time 0.2552 (0.2601) loss 5.3880 (5.5872) grad_norm 1.9273 (2.9001) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][580/625] eta 0:00:11 lr 0.000100 wd 0.0500 time 0.2572 (0.2629) data time 0.0006 (0.0017) model time 0.2566 (0.2604) loss 6.5737 (5.5926) grad_norm 1.6185 (2.8860) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][590/625] eta 0:00:09 lr 0.000100 wd 0.0500 time 0.2511 (0.2628) data time 0.0010 (0.0017) model time 0.2501 (0.2603) loss 5.9325 (5.5902) grad_norm 2.8464 (2.8817) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][600/625] eta 0:00:06 lr 0.000100 wd 0.0500 time 0.2555 (0.2627) data time 0.0011 (0.0017) model time 0.2545 (0.2603) loss 5.3812 (5.5915) grad_norm 2.2237 (2.8759) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][610/625] eta 0:00:03 lr 0.000100 wd 0.0500 time 0.2532 (0.2626) data time 0.0004 (0.0017) model time 0.2527 (0.2602) loss 6.3930 (5.5934) grad_norm 1.9407 (2.8726) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][620/625] eta 0:00:01 lr 0.000100 wd 0.0500 time 0.2554 (0.2625) data time 0.0006 (0.0017) model time 0.2548 (0.2601) loss 5.3549 (5.5901) grad_norm 1.4130 (2.8684) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 263 training takes 0:02:44 [2024-08-04 09:57:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 09:57:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 09:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5962 (0.5962) Acc@1 90.430 (90.430) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 09:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.092) Loss 0.9009 (0.7157) Acc@1 82.373 (87.211) Acc@5 96.387 (97.829) Mem 9655MB [2024-08-04 09:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 1.0166 (0.8299) Acc@1 77.832 (84.124) Acc@5 95.557 (96.708) Mem 9655MB [2024-08-04 09:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.791 Acc@5 96.723 [2024-08-04 09:57:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 09:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.807 (0.807) Loss 0.5845 (0.5845) Acc@1 90.137 (90.137) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 09:57:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8994 (0.7081) Acc@1 81.787 (87.047) Acc@5 96.289 (97.776) Mem 9655MB [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0137 (0.8276) Acc@1 78.564 (83.868) Acc@5 95.557 (96.612) Mem 9655MB [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.537 Acc@5 96.619 [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.54% [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 09:57:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 09:57:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][0/625] eta 0:07:10 lr 0.000100 wd 0.0500 time 0.6885 (0.6885) data time 0.4414 (0.4414) model time 0.0000 (0.0000) loss 4.6932 (4.6932) grad_norm 2.0311 (2.0311) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][10/625] eta 0:03:18 lr 0.000100 wd 0.0500 time 0.2543 (0.3229) data time 0.0006 (0.0410) model time 0.0000 (0.0000) loss 5.9950 (5.4313) grad_norm 2.9365 (2.2837) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][20/625] eta 0:02:56 lr 0.000100 wd 0.0500 time 0.2555 (0.2917) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 4.7265 (5.4873) grad_norm 1.6038 (2.4526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:57:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][30/625] eta 0:02:50 lr 0.000099 wd 0.0500 time 0.2576 (0.2861) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 5.4811 (5.4272) grad_norm 2.1809 (2.4129) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][40/625] eta 0:02:45 lr 0.000099 wd 0.0500 time 0.2544 (0.2837) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 5.3318 (5.3966) grad_norm 1.6898 (2.5785) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][50/625] eta 0:02:41 lr 0.000099 wd 0.0500 time 0.2550 (0.2813) data time 0.0016 (0.0095) model time 0.0000 (0.0000) loss 6.5573 (5.4795) grad_norm 1.7761 (2.6790) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][60/625] eta 0:02:38 lr 0.000099 wd 0.0500 time 0.2562 (0.2799) data time 0.0006 (0.0081) model time 0.2556 (0.2716) loss 4.4713 (5.4946) grad_norm 1.6498 (2.6141) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][70/625] eta 0:02:33 lr 0.000099 wd 0.0500 time 0.2595 (0.2764) data time 0.0008 (0.0071) model time 0.2588 (0.2631) loss 5.3561 (5.4974) grad_norm 1.9980 (2.5960) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][80/625] eta 0:02:29 lr 0.000099 wd 0.0500 time 0.2557 (0.2739) data time 0.0008 (0.0063) model time 0.2549 (0.2604) loss 6.3609 (5.5206) grad_norm 1.9341 (2.5667) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][90/625] eta 0:02:25 lr 0.000099 wd 0.0500 time 0.2590 (0.2721) data time 0.0009 (0.0057) model time 0.2581 (0.2595) loss 5.5580 (5.5545) grad_norm 1.7576 (2.6551) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][100/625] eta 0:02:22 lr 0.000099 wd 0.0500 time 0.2553 (0.2720) data time 0.0011 (0.0052) model time 0.2542 (0.2617) loss 5.2072 (5.5480) grad_norm 4.0425 (2.6346) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][110/625] eta 0:02:20 lr 0.000099 wd 0.0500 time 0.2554 (0.2724) data time 0.0007 (0.0048) model time 0.2547 (0.2639) loss 4.5426 (5.5533) grad_norm 1.6511 (2.6452) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][120/625] eta 0:02:16 lr 0.000099 wd 0.0500 time 0.2559 (0.2710) data time 0.0010 (0.0045) model time 0.2549 (0.2626) loss 6.1988 (5.5613) grad_norm 2.1266 (2.6343) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][130/625] eta 0:02:13 lr 0.000099 wd 0.0500 time 0.2582 (0.2699) data time 0.0007 (0.0042) model time 0.2575 (0.2617) loss 6.2944 (5.5612) grad_norm 1.7967 (2.6540) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][140/625] eta 0:02:11 lr 0.000099 wd 0.0500 time 0.4519 (0.2703) data time 0.0009 (0.0040) model time 0.4510 (0.2631) loss 6.5460 (5.5490) grad_norm 2.1609 (2.6362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][150/625] eta 0:02:08 lr 0.000099 wd 0.0500 time 0.2543 (0.2702) data time 0.0008 (0.0038) model time 0.2535 (0.2637) loss 5.8504 (5.5557) grad_norm 1.8469 (2.6359) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][160/625] eta 0:02:05 lr 0.000099 wd 0.0500 time 0.2542 (0.2694) data time 0.0011 (0.0036) model time 0.2531 (0.2630) loss 4.6182 (5.5449) grad_norm 3.2647 (2.6923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][170/625] eta 0:02:02 lr 0.000098 wd 0.0500 time 0.2580 (0.2687) data time 0.0006 (0.0035) model time 0.2574 (0.2625) loss 6.2978 (5.5482) grad_norm 2.2811 (2.6995) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][180/625] eta 0:01:59 lr 0.000098 wd 0.0500 time 0.2531 (0.2680) data time 0.0007 (0.0033) model time 0.2524 (0.2618) loss 4.5480 (5.5289) grad_norm 2.7635 (2.7190) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][190/625] eta 0:01:56 lr 0.000098 wd 0.0500 time 0.2566 (0.2674) data time 0.0011 (0.0032) model time 0.2555 (0.2614) loss 5.4350 (5.5284) grad_norm 2.5452 (2.7108) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][200/625] eta 0:01:53 lr 0.000098 wd 0.0500 time 0.2588 (0.2669) data time 0.0010 (0.0031) model time 0.2579 (0.2611) loss 5.2627 (5.5464) grad_norm 2.8930 (2.7025) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][210/625] eta 0:01:50 lr 0.000098 wd 0.0500 time 0.2542 (0.2663) data time 0.0008 (0.0030) model time 0.2534 (0.2606) loss 4.1548 (5.5420) grad_norm 2.9342 (2.7033) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][220/625] eta 0:01:47 lr 0.000098 wd 0.0500 time 0.2566 (0.2658) data time 0.0008 (0.0029) model time 0.2558 (0.2603) loss 5.7410 (5.5483) grad_norm 2.0193 (2.6868) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][230/625] eta 0:01:44 lr 0.000098 wd 0.0500 time 0.2590 (0.2654) data time 0.0008 (0.0028) model time 0.2582 (0.2600) loss 6.4580 (5.5501) grad_norm 3.9239 (2.7764) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][240/625] eta 0:01:42 lr 0.000098 wd 0.0500 time 0.2547 (0.2650) data time 0.0007 (0.0027) model time 0.2540 (0.2597) loss 4.9765 (5.5600) grad_norm 2.3535 (2.7686) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][250/625] eta 0:01:39 lr 0.000098 wd 0.0500 time 0.2607 (0.2647) data time 0.0006 (0.0027) model time 0.2601 (0.2595) loss 4.7425 (5.5582) grad_norm 2.2027 (2.7499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:58:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][260/625] eta 0:01:36 lr 0.000098 wd 0.0500 time 0.2656 (0.2644) data time 0.0007 (0.0026) model time 0.2648 (0.2593) loss 5.5842 (5.5586) grad_norm 2.1296 (2.7427) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][270/625] eta 0:01:33 lr 0.000098 wd 0.0500 time 0.2550 (0.2647) data time 0.0009 (0.0025) model time 0.2540 (0.2600) loss 4.7890 (5.5503) grad_norm 1.6834 (2.7214) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][280/625] eta 0:01:31 lr 0.000098 wd 0.0500 time 0.2542 (0.2644) data time 0.0009 (0.0025) model time 0.2533 (0.2598) loss 5.3255 (5.5541) grad_norm 2.5098 (2.7217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][290/625] eta 0:01:28 lr 0.000098 wd 0.0500 time 0.2599 (0.2642) data time 0.0008 (0.0024) model time 0.2591 (0.2597) loss 6.0878 (5.5530) grad_norm 3.0519 (2.7248) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][300/625] eta 0:01:25 lr 0.000098 wd 0.0500 time 0.2605 (0.2639) data time 0.0005 (0.0024) model time 0.2600 (0.2595) loss 6.1678 (5.5579) grad_norm 1.9171 (2.7238) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][310/625] eta 0:01:23 lr 0.000098 wd 0.0500 time 0.2537 (0.2637) data time 0.0014 (0.0023) model time 0.2523 (0.2593) loss 6.3545 (5.5493) grad_norm 2.4081 (2.7504) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][320/625] eta 0:01:20 lr 0.000097 wd 0.0500 time 0.2559 (0.2634) data time 0.0008 (0.0023) model time 0.2551 (0.2592) loss 5.7595 (5.5523) grad_norm 1.8956 (2.7510) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][330/625] eta 0:01:18 lr 0.000097 wd 0.0500 time 0.2590 (0.2644) data time 0.0007 (0.0022) model time 0.2583 (0.2604) loss 5.6560 (5.5513) grad_norm 1.7805 (2.7389) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][340/625] eta 0:01:15 lr 0.000097 wd 0.0500 time 0.2536 (0.2653) data time 0.0006 (0.0022) model time 0.2530 (0.2615) loss 5.7329 (5.5574) grad_norm 3.5075 (2.7335) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][350/625] eta 0:01:12 lr 0.000097 wd 0.0500 time 0.2548 (0.2654) data time 0.0007 (0.0022) model time 0.2541 (0.2618) loss 5.5136 (5.5550) grad_norm 2.8594 (2.7264) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][360/625] eta 0:01:10 lr 0.000097 wd 0.0500 time 0.2549 (0.2651) data time 0.0006 (0.0021) model time 0.2542 (0.2616) loss 5.8611 (5.5581) grad_norm 3.4749 (2.7130) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][370/625] eta 0:01:07 lr 0.000097 wd 0.0500 time 0.2583 (0.2653) data time 0.0008 (0.0021) model time 0.2575 (0.2619) loss 6.5370 (5.5584) grad_norm 2.4088 (2.7027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][380/625] eta 0:01:04 lr 0.000097 wd 0.0500 time 0.2573 (0.2651) data time 0.0008 (0.0021) model time 0.2565 (0.2617) loss 6.5365 (5.5585) grad_norm 2.4914 (2.7020) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][390/625] eta 0:01:02 lr 0.000097 wd 0.0500 time 0.2555 (0.2649) data time 0.0008 (0.0020) model time 0.2547 (0.2615) loss 6.1508 (5.5605) grad_norm 1.7563 (2.7874) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][400/625] eta 0:00:59 lr 0.000097 wd 0.0500 time 0.2571 (0.2652) data time 0.0009 (0.0020) model time 0.2562 (0.2620) loss 5.8338 (5.5683) grad_norm 2.4186 (2.7889) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][410/625] eta 0:00:56 lr 0.000097 wd 0.0500 time 0.2552 (0.2650) data time 0.0009 (0.0020) model time 0.2543 (0.2618) loss 6.0868 (5.5690) grad_norm 2.3223 (2.7805) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][420/625] eta 0:00:54 lr 0.000097 wd 0.0500 time 0.2581 (0.2648) data time 0.0008 (0.0020) model time 0.2572 (0.2616) loss 5.6612 (5.5679) grad_norm 3.4894 (2.8456) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][430/625] eta 0:00:51 lr 0.000097 wd 0.0500 time 0.2589 (0.2646) data time 0.0009 (0.0019) model time 0.2580 (0.2614) loss 5.8718 (5.5712) grad_norm 2.0777 (2.8430) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][440/625] eta 0:00:48 lr 0.000097 wd 0.0500 time 0.2586 (0.2644) data time 0.0010 (0.0019) model time 0.2576 (0.2613) loss 5.7816 (5.5727) grad_norm 2.1774 (2.8369) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][450/625] eta 0:00:46 lr 0.000097 wd 0.0500 time 0.2564 (0.2642) data time 0.0010 (0.0019) model time 0.2553 (0.2611) loss 6.4013 (5.5692) grad_norm 2.3340 (2.8477) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][460/625] eta 0:00:43 lr 0.000096 wd 0.0500 time 0.2536 (0.2645) data time 0.0009 (0.0019) model time 0.2527 (0.2615) loss 5.6537 (5.5691) grad_norm 2.1244 (2.8662) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][470/625] eta 0:00:40 lr 0.000096 wd 0.0500 time 0.2551 (0.2643) data time 0.0009 (0.0019) model time 0.2542 (0.2613) loss 6.2754 (5.5738) grad_norm 2.2251 (2.8619) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][480/625] eta 0:00:38 lr 0.000096 wd 0.0500 time 0.2555 (0.2649) data time 0.0014 (0.0018) model time 0.2542 (0.2620) loss 5.8667 (5.5737) grad_norm 2.4302 (2.8663) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 09:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][490/625] eta 0:00:35 lr 0.000096 wd 0.0500 time 0.2524 (0.2647) data time 0.0007 (0.0018) model time 0.2517 (0.2619) loss 5.9304 (5.5751) grad_norm 1.7274 (2.9232) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][500/625] eta 0:00:33 lr 0.000096 wd 0.0500 time 0.2551 (0.2646) data time 0.0007 (0.0018) model time 0.2543 (0.2617) loss 5.5238 (5.5738) grad_norm 2.1770 (2.9111) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][510/625] eta 0:00:30 lr 0.000096 wd 0.0500 time 0.2564 (0.2644) data time 0.0009 (0.0018) model time 0.2555 (0.2616) loss 5.3562 (5.5729) grad_norm 2.5925 (2.9075) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][520/625] eta 0:00:27 lr 0.000096 wd 0.0500 time 0.2604 (0.2643) data time 0.0007 (0.0018) model time 0.2597 (0.2615) loss 6.4772 (5.5675) grad_norm 3.0360 (2.9089) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][530/625] eta 0:00:25 lr 0.000096 wd 0.0500 time 0.2564 (0.2641) data time 0.0009 (0.0017) model time 0.2556 (0.2614) loss 5.4309 (5.5639) grad_norm 1.9613 (2.9078) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][540/625] eta 0:00:22 lr 0.000096 wd 0.0500 time 0.2571 (0.2640) data time 0.0008 (0.0017) model time 0.2563 (0.2613) loss 5.3876 (5.5635) grad_norm 3.0395 (2.9089) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][550/625] eta 0:00:19 lr 0.000096 wd 0.0500 time 0.2565 (0.2642) data time 0.0006 (0.0017) model time 0.2558 (0.2615) loss 5.3514 (5.5668) grad_norm 3.6664 (2.9078) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][560/625] eta 0:00:17 lr 0.000096 wd 0.0500 time 0.2558 (0.2643) data time 0.0007 (0.0017) model time 0.2551 (0.2617) loss 6.7499 (5.5643) grad_norm 4.1866 (2.9058) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][570/625] eta 0:00:14 lr 0.000096 wd 0.0500 time 0.2522 (0.2644) data time 0.0010 (0.0017) model time 0.2512 (0.2619) loss 5.8147 (5.5674) grad_norm 3.6999 (2.9163) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][580/625] eta 0:00:11 lr 0.000096 wd 0.0500 time 0.2611 (0.2645) data time 0.0007 (0.0017) model time 0.2604 (0.2620) loss 6.1072 (5.5687) grad_norm 1.7421 (2.9171) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][590/625] eta 0:00:09 lr 0.000096 wd 0.0500 time 0.2556 (0.2644) data time 0.0010 (0.0017) model time 0.2546 (0.2618) loss 6.5666 (5.5641) grad_norm 2.6316 (2.9088) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][600/625] eta 0:00:06 lr 0.000096 wd 0.0500 time 0.2540 (0.2642) data time 0.0007 (0.0016) model time 0.2534 (0.2617) loss 4.8808 (5.5586) grad_norm 1.8970 (2.9023) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][610/625] eta 0:00:03 lr 0.000095 wd 0.0500 time 0.2527 (0.2641) data time 0.0004 (0.0016) model time 0.2523 (0.2616) loss 4.7764 (5.5584) grad_norm 1.8365 (2.8952) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][620/625] eta 0:00:01 lr 0.000095 wd 0.0500 time 0.2532 (0.2639) data time 0.0003 (0.0016) model time 0.2528 (0.2615) loss 5.2351 (5.5585) grad_norm 2.0875 (2.8928) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 264 training takes 0:02:44 [2024-08-04 10:00:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:00:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:00:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5967 (0.5967) Acc@1 90.039 (90.039) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 10:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.8916 (0.7085) Acc@1 82.080 (87.061) Acc@5 96.533 (97.825) Mem 9655MB [2024-08-04 10:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0039 (0.8243) Acc@1 77.588 (83.938) Acc@5 95.605 (96.710) Mem 9655MB [2024-08-04 10:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.643 Acc@5 96.733 [2024-08-04 10:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:00:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.703 (0.703) Loss 0.5850 (0.5850) Acc@1 90.234 (90.234) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.123) Loss 0.8984 (0.7082) Acc@1 81.787 (87.038) Acc@5 96.289 (97.767) Mem 9655MB [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0137 (0.8277) Acc@1 78.662 (83.877) Acc@5 95.557 (96.608) Mem 9655MB [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.547 Acc@5 96.617 [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.55% [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:00:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:00:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][0/625] eta 0:07:00 lr 0.000095 wd 0.0500 time 0.6728 (0.6728) data time 0.4277 (0.4277) model time 0.0000 (0.0000) loss 6.1009 (6.1009) grad_norm 4.2207 (4.2207) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][10/625] eta 0:03:11 lr 0.000095 wd 0.0500 time 0.2554 (0.3110) data time 0.0010 (0.0398) model time 0.0000 (0.0000) loss 5.9837 (5.8074) grad_norm 1.9342 (2.3363) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][20/625] eta 0:02:52 lr 0.000095 wd 0.0500 time 0.2599 (0.2852) data time 0.0008 (0.0214) model time 0.0000 (0.0000) loss 5.3628 (5.6874) grad_norm 3.8776 (2.4213) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][30/625] eta 0:02:43 lr 0.000095 wd 0.0500 time 0.2526 (0.2755) data time 0.0009 (0.0148) model time 0.0000 (0.0000) loss 4.9829 (5.5666) grad_norm 2.5155 (2.3980) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][40/625] eta 0:02:38 lr 0.000095 wd 0.0500 time 0.2537 (0.2709) data time 0.0007 (0.0114) model time 0.0000 (0.0000) loss 5.4333 (5.5802) grad_norm 3.2830 (2.5111) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][50/625] eta 0:02:34 lr 0.000095 wd 0.0500 time 0.2552 (0.2681) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 4.3786 (5.5167) grad_norm 2.9171 (2.6048) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][60/625] eta 0:02:30 lr 0.000095 wd 0.0500 time 0.2565 (0.2666) data time 0.0007 (0.0080) model time 0.2558 (0.2579) loss 5.0474 (5.5360) grad_norm 2.1820 (3.1722) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:00:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][70/625] eta 0:02:28 lr 0.000095 wd 0.0500 time 0.2540 (0.2679) data time 0.0008 (0.0070) model time 0.2531 (0.2663) loss 4.6883 (5.5524) grad_norm 1.8735 (3.0913) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][80/625] eta 0:02:25 lr 0.000095 wd 0.0500 time 0.2559 (0.2663) data time 0.0006 (0.0062) model time 0.2554 (0.2624) loss 5.5900 (5.5016) grad_norm 3.0886 (3.0085) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][90/625] eta 0:02:21 lr 0.000095 wd 0.0500 time 0.2569 (0.2653) data time 0.0009 (0.0057) model time 0.2560 (0.2607) loss 5.8162 (5.5514) grad_norm 2.0186 (2.9652) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][100/625] eta 0:02:19 lr 0.000095 wd 0.0500 time 0.2560 (0.2663) data time 0.0006 (0.0052) model time 0.2553 (0.2634) loss 5.8277 (5.5515) grad_norm 3.2229 (2.9926) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][110/625] eta 0:02:16 lr 0.000095 wd 0.0500 time 0.2548 (0.2655) data time 0.0008 (0.0048) model time 0.2540 (0.2623) loss 6.0306 (5.5601) grad_norm 2.6826 (2.9623) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][120/625] eta 0:02:13 lr 0.000095 wd 0.0500 time 0.2573 (0.2648) data time 0.0010 (0.0045) model time 0.2563 (0.2614) loss 6.3364 (5.5532) grad_norm 1.7100 (2.9952) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][130/625] eta 0:02:10 lr 0.000094 wd 0.0500 time 0.2564 (0.2643) data time 0.0007 (0.0042) model time 0.2556 (0.2609) loss 4.5069 (5.5592) grad_norm 2.0315 (3.0166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][140/625] eta 0:02:08 lr 0.000094 wd 0.0500 time 0.2537 (0.2650) data time 0.0013 (0.0040) model time 0.2524 (0.2623) loss 5.5541 (5.5442) grad_norm 2.1521 (2.9907) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:01:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][150/625] eta 0:02:05 lr 0.000094 wd 0.0500 time 0.2492 (0.2644) data time 0.0011 (0.0038) model time 0.2481 (0.2616) loss 5.9351 (5.5406) grad_norm 1.6953 (2.9517) loss_scale 512.0000 (269.5629) mem 9655MB [2024-08-04 10:01:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][160/625] eta 0:02:02 lr 0.000094 wd 0.0500 time 0.2565 (0.2640) data time 0.0009 (0.0036) model time 0.2555 (0.2611) loss 5.6320 (5.5359) grad_norm 2.2056 (2.9464) loss_scale 512.0000 (284.6211) mem 9655MB [2024-08-04 10:01:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][170/625] eta 0:01:59 lr 0.000094 wd 0.0500 time 0.2560 (0.2636) data time 0.0009 (0.0035) model time 0.2552 (0.2606) loss 5.3187 (5.5145) grad_norm 2.1126 (3.0528) loss_scale 512.0000 (297.9181) mem 9655MB [2024-08-04 10:01:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][180/625] eta 0:01:57 lr 0.000094 wd 0.0500 time 0.2489 (0.2631) data time 0.0009 (0.0033) model time 0.2479 (0.2602) loss 5.9740 (5.5292) grad_norm 2.4577 (3.0344) loss_scale 512.0000 (309.7459) mem 9655MB [2024-08-04 10:01:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][190/625] eta 0:01:54 lr 0.000094 wd 0.0500 time 0.2561 (0.2628) data time 0.0007 (0.0032) model time 0.2554 (0.2599) loss 5.3002 (5.5227) grad_norm 2.9900 (3.0495) loss_scale 512.0000 (320.3351) mem 9655MB [2024-08-04 10:01:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][200/625] eta 0:01:51 lr 0.000094 wd 0.0500 time 0.2563 (0.2625) data time 0.0007 (0.0031) model time 0.2555 (0.2596) loss 5.7533 (5.5175) grad_norm 1.8370 (3.0731) loss_scale 512.0000 (329.8706) mem 9655MB [2024-08-04 10:01:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][210/625] eta 0:01:48 lr 0.000094 wd 0.0500 time 0.2574 (0.2622) data time 0.0007 (0.0030) model time 0.2567 (0.2593) loss 5.4536 (5.5270) grad_norm 2.3982 (3.0698) loss_scale 512.0000 (338.5024) mem 9655MB [2024-08-04 10:01:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][220/625] eta 0:01:46 lr 0.000094 wd 0.0500 time 0.2606 (0.2620) data time 0.0009 (0.0029) model time 0.2597 (0.2592) loss 5.7905 (5.5082) grad_norm 1.9265 (3.0422) loss_scale 512.0000 (346.3529) mem 9655MB [2024-08-04 10:01:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][230/625] eta 0:01:43 lr 0.000094 wd 0.0500 time 0.2532 (0.2626) data time 0.0007 (0.0028) model time 0.2525 (0.2601) loss 5.4061 (5.5100) grad_norm 3.2637 (3.0328) loss_scale 512.0000 (353.5238) mem 9655MB [2024-08-04 10:01:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][240/625] eta 0:01:41 lr 0.000094 wd 0.0500 time 0.2539 (0.2632) data time 0.0018 (0.0027) model time 0.2521 (0.2609) loss 5.9969 (5.5060) grad_norm 2.6469 (3.0161) loss_scale 512.0000 (360.0996) mem 9655MB [2024-08-04 10:01:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][250/625] eta 0:01:38 lr 0.000094 wd 0.0500 time 0.2572 (0.2629) data time 0.0007 (0.0027) model time 0.2566 (0.2606) loss 5.3104 (5.5134) grad_norm 1.3782 (2.9897) loss_scale 512.0000 (366.1514) mem 9655MB [2024-08-04 10:01:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][260/625] eta 0:01:35 lr 0.000094 wd 0.0500 time 0.2579 (0.2627) data time 0.0010 (0.0026) model time 0.2568 (0.2604) loss 5.8397 (5.5130) grad_norm 3.2459 (2.9703) loss_scale 512.0000 (371.7395) mem 9655MB [2024-08-04 10:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][270/625] eta 0:01:33 lr 0.000094 wd 0.0500 time 0.2581 (0.2625) data time 0.0007 (0.0025) model time 0.2574 (0.2602) loss 5.7331 (5.5127) grad_norm 3.2186 (2.9787) loss_scale 512.0000 (376.9151) mem 9655MB [2024-08-04 10:01:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][280/625] eta 0:01:30 lr 0.000093 wd 0.0500 time 0.2556 (0.2630) data time 0.0011 (0.0025) model time 0.2546 (0.2609) loss 6.2153 (5.5187) grad_norm 2.1547 (2.9585) loss_scale 512.0000 (381.7224) mem 9655MB [2024-08-04 10:01:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][290/625] eta 0:01:28 lr 0.000093 wd 0.0500 time 0.2557 (0.2628) data time 0.0007 (0.0024) model time 0.2550 (0.2607) loss 7.0357 (5.5263) grad_norm 3.7856 (2.9537) loss_scale 512.0000 (386.1993) mem 9655MB [2024-08-04 10:01:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][300/625] eta 0:01:25 lr 0.000093 wd 0.0500 time 0.4320 (0.2632) data time 0.0010 (0.0024) model time 0.4310 (0.2612) loss 6.4521 (5.5316) grad_norm 3.0629 (2.9344) loss_scale 512.0000 (390.3787) mem 9655MB [2024-08-04 10:02:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][310/625] eta 0:01:22 lr 0.000093 wd 0.0500 time 0.2553 (0.2629) data time 0.0006 (0.0023) model time 0.2546 (0.2610) loss 5.4918 (5.5383) grad_norm 1.5467 (2.9090) loss_scale 512.0000 (394.2894) mem 9655MB [2024-08-04 10:02:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][320/625] eta 0:01:20 lr 0.000093 wd 0.0500 time 0.2514 (0.2627) data time 0.0008 (0.0023) model time 0.2506 (0.2608) loss 6.3640 (5.5365) grad_norm 1.8567 (2.9160) loss_scale 512.0000 (397.9564) mem 9655MB [2024-08-04 10:02:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][330/625] eta 0:01:17 lr 0.000093 wd 0.0500 time 0.4621 (0.2642) data time 0.0010 (0.0022) model time 0.4612 (0.2625) loss 5.8216 (5.5404) grad_norm 2.7517 (2.9183) loss_scale 512.0000 (401.4018) mem 9655MB [2024-08-04 10:02:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][340/625] eta 0:01:15 lr 0.000093 wd 0.0500 time 0.2545 (0.2639) data time 0.0011 (0.0022) model time 0.2534 (0.2622) loss 5.9478 (5.5350) grad_norm 5.9589 (2.9519) loss_scale 512.0000 (404.6452) mem 9655MB [2024-08-04 10:02:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][350/625] eta 0:01:12 lr 0.000093 wd 0.0500 time 0.2545 (0.2642) data time 0.0009 (0.0022) model time 0.2536 (0.2626) loss 4.9311 (5.5368) grad_norm 19.0493 (2.9870) loss_scale 512.0000 (407.7037) mem 9655MB [2024-08-04 10:02:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][360/625] eta 0:01:10 lr 0.000093 wd 0.0500 time 0.2497 (0.2645) data time 0.0009 (0.0021) model time 0.2488 (0.2629) loss 6.4312 (5.5398) grad_norm 1.9483 (3.0054) loss_scale 512.0000 (410.5928) mem 9655MB [2024-08-04 10:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][370/625] eta 0:01:07 lr 0.000093 wd 0.0500 time 0.2578 (0.2648) data time 0.0008 (0.0021) model time 0.2570 (0.2633) loss 6.1579 (5.5455) grad_norm 4.7976 (3.0160) loss_scale 512.0000 (413.3261) mem 9655MB [2024-08-04 10:02:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][380/625] eta 0:01:04 lr 0.000093 wd 0.0500 time 0.2604 (0.2646) data time 0.0010 (0.0021) model time 0.2594 (0.2631) loss 6.0982 (5.5475) grad_norm 7.6060 (3.0382) loss_scale 512.0000 (415.9160) mem 9655MB [2024-08-04 10:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][390/625] eta 0:01:02 lr 0.000093 wd 0.0500 time 0.2602 (0.2649) data time 0.0008 (0.0020) model time 0.2594 (0.2635) loss 5.7405 (5.5481) grad_norm 3.8976 (3.0564) loss_scale 512.0000 (418.3734) mem 9655MB [2024-08-04 10:02:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][400/625] eta 0:00:59 lr 0.000093 wd 0.0500 time 0.2604 (0.2647) data time 0.0008 (0.0020) model time 0.2596 (0.2632) loss 5.5485 (5.5532) grad_norm 1.5782 (3.0470) loss_scale 512.0000 (420.7082) mem 9655MB [2024-08-04 10:02:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][410/625] eta 0:00:56 lr 0.000093 wd 0.0500 time 0.2583 (0.2644) data time 0.0009 (0.0020) model time 0.2575 (0.2630) loss 4.8392 (5.5536) grad_norm 2.1468 (3.0212) loss_scale 512.0000 (422.9294) mem 9655MB [2024-08-04 10:02:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][420/625] eta 0:00:54 lr 0.000093 wd 0.0500 time 0.2561 (0.2647) data time 0.0006 (0.0020) model time 0.2555 (0.2633) loss 6.2181 (5.5540) grad_norm 3.7436 (3.0101) loss_scale 512.0000 (425.0451) mem 9655MB [2024-08-04 10:02:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][430/625] eta 0:00:51 lr 0.000092 wd 0.0500 time 0.2524 (0.2645) data time 0.0009 (0.0019) model time 0.2515 (0.2631) loss 5.6373 (5.5616) grad_norm 1.7250 (3.0013) loss_scale 512.0000 (427.0626) mem 9655MB [2024-08-04 10:02:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][440/625] eta 0:00:48 lr 0.000092 wd 0.0500 time 0.2521 (0.2643) data time 0.0010 (0.0019) model time 0.2511 (0.2628) loss 5.9285 (5.5633) grad_norm 1.9619 (3.0326) loss_scale 512.0000 (428.9887) mem 9655MB [2024-08-04 10:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][450/625] eta 0:00:46 lr 0.000092 wd 0.0500 time 0.2544 (0.2644) data time 0.0011 (0.0019) model time 0.2533 (0.2630) loss 5.8260 (5.5662) grad_norm 3.0185 (3.0209) loss_scale 512.0000 (430.8293) mem 9655MB [2024-08-04 10:02:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][460/625] eta 0:00:43 lr 0.000092 wd 0.0500 time 0.2650 (0.2642) data time 0.0007 (0.0019) model time 0.2643 (0.2628) loss 5.9948 (5.5645) grad_norm 2.0497 (3.0124) loss_scale 512.0000 (432.5900) mem 9655MB [2024-08-04 10:02:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][470/625] eta 0:00:40 lr 0.000092 wd 0.0500 time 0.2554 (0.2641) data time 0.0010 (0.0019) model time 0.2544 (0.2626) loss 5.7471 (5.5623) grad_norm 3.3226 (3.0101) loss_scale 512.0000 (434.2760) mem 9655MB [2024-08-04 10:02:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][480/625] eta 0:00:38 lr 0.000092 wd 0.0500 time 0.2569 (0.2639) data time 0.0007 (0.0018) model time 0.2562 (0.2625) loss 4.8313 (5.5580) grad_norm 3.2284 (3.0649) loss_scale 512.0000 (435.8919) mem 9655MB [2024-08-04 10:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][490/625] eta 0:00:35 lr 0.000092 wd 0.0500 time 0.2569 (0.2638) data time 0.0007 (0.0018) model time 0.2562 (0.2623) loss 4.4980 (5.5522) grad_norm 1.8986 (3.0574) loss_scale 512.0000 (437.4420) mem 9655MB [2024-08-04 10:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][500/625] eta 0:00:32 lr 0.000092 wd 0.0500 time 0.2591 (0.2636) data time 0.0006 (0.0018) model time 0.2585 (0.2622) loss 4.4995 (5.5519) grad_norm 2.4923 (3.0429) loss_scale 512.0000 (438.9301) mem 9655MB [2024-08-04 10:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][510/625] eta 0:00:30 lr 0.000092 wd 0.0500 time 0.2587 (0.2635) data time 0.0010 (0.0018) model time 0.2577 (0.2620) loss 5.0245 (5.5543) grad_norm 2.6338 (3.0360) loss_scale 512.0000 (440.3601) mem 9655MB [2024-08-04 10:02:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][520/625] eta 0:00:27 lr 0.000092 wd 0.0500 time 0.2474 (0.2636) data time 0.0011 (0.0018) model time 0.2463 (0.2621) loss 5.3876 (5.5492) grad_norm 5.7023 (3.0357) loss_scale 512.0000 (441.7351) mem 9655MB [2024-08-04 10:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][530/625] eta 0:00:25 lr 0.000092 wd 0.0500 time 0.2549 (0.2638) data time 0.0010 (0.0018) model time 0.2539 (0.2624) loss 6.4844 (5.5483) grad_norm 3.6951 (3.0385) loss_scale 512.0000 (443.0584) mem 9655MB [2024-08-04 10:03:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][540/625] eta 0:00:22 lr 0.000092 wd 0.0500 time 0.2583 (0.2640) data time 0.0008 (0.0017) model time 0.2575 (0.2627) loss 5.4495 (5.5474) grad_norm 3.3905 (3.0495) loss_scale 512.0000 (444.3327) mem 9655MB [2024-08-04 10:03:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][550/625] eta 0:00:19 lr 0.000092 wd 0.0500 time 0.2547 (0.2639) data time 0.0008 (0.0017) model time 0.2539 (0.2625) loss 5.5738 (5.5450) grad_norm 2.7548 (3.0735) loss_scale 512.0000 (445.5608) mem 9655MB [2024-08-04 10:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][560/625] eta 0:00:17 lr 0.000092 wd 0.0500 time 0.2586 (0.2638) data time 0.0008 (0.0017) model time 0.2579 (0.2624) loss 5.5988 (5.5452) grad_norm 2.2787 (3.0644) loss_scale 512.0000 (446.7451) mem 9655MB [2024-08-04 10:03:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][570/625] eta 0:00:14 lr 0.000092 wd 0.0500 time 0.2548 (0.2640) data time 0.0007 (0.0017) model time 0.2541 (0.2626) loss 5.0303 (5.5480) grad_norm 3.4212 (3.0607) loss_scale 512.0000 (447.8879) mem 9655MB [2024-08-04 10:03:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][580/625] eta 0:00:11 lr 0.000091 wd 0.0500 time 0.2559 (0.2641) data time 0.0010 (0.0017) model time 0.2549 (0.2627) loss 6.1332 (5.5491) grad_norm 3.9911 (3.0604) loss_scale 512.0000 (448.9914) mem 9655MB [2024-08-04 10:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][590/625] eta 0:00:09 lr 0.000091 wd 0.0500 time 0.2538 (0.2639) data time 0.0011 (0.0017) model time 0.2527 (0.2626) loss 4.4785 (5.5485) grad_norm 2.0954 (3.0529) loss_scale 512.0000 (450.0575) mem 9655MB [2024-08-04 10:03:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][600/625] eta 0:00:06 lr 0.000091 wd 0.0500 time 0.2564 (0.2640) data time 0.0011 (0.0017) model time 0.2554 (0.2627) loss 6.3216 (5.5494) grad_norm 1.8659 (3.0381) loss_scale 512.0000 (451.0882) mem 9655MB [2024-08-04 10:03:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][610/625] eta 0:00:03 lr 0.000091 wd 0.0500 time 0.2532 (0.2639) data time 0.0004 (0.0017) model time 0.2528 (0.2626) loss 6.2881 (5.5480) grad_norm 4.0607 (3.0316) loss_scale 512.0000 (452.0851) mem 9655MB [2024-08-04 10:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][620/625] eta 0:00:01 lr 0.000091 wd 0.0500 time 0.2520 (0.2640) data time 0.0006 (0.0016) model time 0.2514 (0.2627) loss 5.9121 (5.5532) grad_norm 1.7756 (3.0203) loss_scale 512.0000 (453.0499) mem 9655MB [2024-08-04 10:03:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 265 training takes 0:02:44 [2024-08-04 10:03:24 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:03:25 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.603 (0.603) Loss 0.5952 (0.5952) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:03:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.107) Loss 0.8931 (0.7061) Acc@1 82.227 (87.176) Acc@5 96.338 (97.843) Mem 9655MB [2024-08-04 10:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.082) Loss 1.0127 (0.8258) Acc@1 78.076 (84.059) Acc@5 95.605 (96.729) Mem 9655MB [2024-08-04 10:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.743 Acc@5 96.745 [2024-08-04 10:03:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.848 (0.848) Loss 0.5850 (0.5850) Acc@1 90.186 (90.186) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.134) Loss 0.8975 (0.7077) Acc@1 81.787 (87.038) Acc@5 96.338 (97.763) Mem 9655MB [2024-08-04 10:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.0117 (0.8272) Acc@1 78.711 (83.898) Acc@5 95.557 (96.612) Mem 9655MB [2024-08-04 10:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.557 Acc@5 96.623 [2024-08-04 10:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:03:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.56% [2024-08-04 10:03:29 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:03:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:03:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][0/625] eta 0:07:48 lr 0.000091 wd 0.0500 time 0.7500 (0.7500) data time 0.5124 (0.5124) model time 0.0000 (0.0000) loss 5.4319 (5.4319) grad_norm 2.3170 (2.3170) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][10/625] eta 0:03:15 lr 0.000091 wd 0.0500 time 0.2546 (0.3176) data time 0.0006 (0.0473) model time 0.0000 (0.0000) loss 5.0485 (5.4832) grad_norm 3.0087 (2.7905) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][20/625] eta 0:03:06 lr 0.000091 wd 0.0500 time 0.2569 (0.3080) data time 0.0007 (0.0252) model time 0.0000 (0.0000) loss 5.9440 (5.5889) grad_norm 2.7915 (2.8225) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][30/625] eta 0:02:53 lr 0.000091 wd 0.0500 time 0.2556 (0.2913) data time 0.0010 (0.0174) model time 0.0000 (0.0000) loss 6.7020 (5.5228) grad_norm 1.8335 (2.9665) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][40/625] eta 0:02:47 lr 0.000091 wd 0.0500 time 0.2562 (0.2856) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 5.3565 (5.5633) grad_norm 2.2030 (2.8664) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][50/625] eta 0:02:44 lr 0.000091 wd 0.0500 time 0.2541 (0.2862) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 6.2843 (5.5282) grad_norm 2.1773 (2.7963) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][60/625] eta 0:02:38 lr 0.000091 wd 0.0500 time 0.2572 (0.2811) data time 0.0006 (0.0093) model time 0.2566 (0.2542) loss 5.9501 (5.5070) grad_norm 2.7062 (2.8482) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][70/625] eta 0:02:35 lr 0.000091 wd 0.0500 time 0.2561 (0.2802) data time 0.0009 (0.0081) model time 0.2552 (0.2641) loss 5.7819 (5.5440) grad_norm 2.5225 (2.7563) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][80/625] eta 0:02:31 lr 0.000091 wd 0.0500 time 0.2576 (0.2773) data time 0.0007 (0.0072) model time 0.2569 (0.2612) loss 6.0874 (5.5301) grad_norm 2.5892 (2.7462) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][90/625] eta 0:02:27 lr 0.000091 wd 0.0500 time 0.2582 (0.2749) data time 0.0017 (0.0065) model time 0.2566 (0.2595) loss 5.3660 (5.5293) grad_norm 1.8101 (2.6694) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][100/625] eta 0:02:23 lr 0.000091 wd 0.0500 time 0.2536 (0.2730) data time 0.0010 (0.0060) model time 0.2526 (0.2586) loss 6.2326 (5.5546) grad_norm 3.0807 (2.6313) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][110/625] eta 0:02:19 lr 0.000090 wd 0.0500 time 0.2542 (0.2714) data time 0.0008 (0.0055) model time 0.2534 (0.2579) loss 5.7758 (5.5634) grad_norm 2.3217 (2.8382) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][120/625] eta 0:02:16 lr 0.000090 wd 0.0500 time 0.2656 (0.2702) data time 0.0009 (0.0052) model time 0.2648 (0.2576) loss 5.6348 (5.5698) grad_norm 2.2978 (2.8063) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][130/625] eta 0:02:13 lr 0.000090 wd 0.0500 time 0.2603 (0.2692) data time 0.0010 (0.0048) model time 0.2592 (0.2574) loss 5.7604 (5.5949) grad_norm 2.5629 (2.8003) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][140/625] eta 0:02:10 lr 0.000090 wd 0.0500 time 0.2558 (0.2682) data time 0.0008 (0.0046) model time 0.2549 (0.2571) loss 6.9744 (5.5935) grad_norm 1.8274 (2.7620) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][150/625] eta 0:02:08 lr 0.000090 wd 0.0500 time 0.2570 (0.2695) data time 0.0009 (0.0043) model time 0.2561 (0.2601) loss 6.0346 (5.6075) grad_norm 3.4177 (2.7423) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][160/625] eta 0:02:04 lr 0.000090 wd 0.0500 time 0.2522 (0.2686) data time 0.0011 (0.0041) model time 0.2511 (0.2595) loss 5.2461 (5.6100) grad_norm 5.9242 (2.8072) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][170/625] eta 0:02:01 lr 0.000090 wd 0.0500 time 0.2568 (0.2678) data time 0.0006 (0.0039) model time 0.2562 (0.2591) loss 5.7915 (5.6057) grad_norm 2.3488 (2.8036) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][180/625] eta 0:01:59 lr 0.000090 wd 0.0500 time 0.2533 (0.2680) data time 0.0011 (0.0038) model time 0.2522 (0.2600) loss 5.4522 (5.5905) grad_norm 1.8991 (2.8170) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][190/625] eta 0:01:56 lr 0.000090 wd 0.0500 time 0.2587 (0.2674) data time 0.0008 (0.0036) model time 0.2579 (0.2596) loss 4.4778 (5.5845) grad_norm 2.1630 (2.7886) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][200/625] eta 0:01:53 lr 0.000090 wd 0.0500 time 0.2561 (0.2668) data time 0.0008 (0.0035) model time 0.2552 (0.2592) loss 5.8741 (5.5873) grad_norm 2.1533 (2.8552) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][210/625] eta 0:01:50 lr 0.000090 wd 0.0500 time 0.2511 (0.2663) data time 0.0011 (0.0034) model time 0.2500 (0.2590) loss 6.2201 (5.5973) grad_norm 2.1787 (2.8192) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][220/625] eta 0:01:47 lr 0.000090 wd 0.0500 time 0.2511 (0.2659) data time 0.0010 (0.0033) model time 0.2501 (0.2588) loss 4.9533 (5.5951) grad_norm 2.7256 (2.8097) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][230/625] eta 0:01:44 lr 0.000090 wd 0.0500 time 0.2561 (0.2655) data time 0.0009 (0.0032) model time 0.2552 (0.2586) loss 6.2697 (5.5941) grad_norm 2.8210 (2.7967) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][240/625] eta 0:01:42 lr 0.000090 wd 0.0500 time 0.2579 (0.2650) data time 0.0008 (0.0031) model time 0.2571 (0.2584) loss 5.6982 (5.5931) grad_norm 3.8463 (2.7802) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][250/625] eta 0:01:39 lr 0.000090 wd 0.0500 time 0.2563 (0.2647) data time 0.0007 (0.0030) model time 0.2556 (0.2582) loss 4.9123 (5.5836) grad_norm 1.9436 (2.7772) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][260/625] eta 0:01:36 lr 0.000089 wd 0.0500 time 0.2571 (0.2643) data time 0.0008 (0.0029) model time 0.2562 (0.2581) loss 4.5923 (5.5764) grad_norm 3.3017 (2.7633) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][270/625] eta 0:01:33 lr 0.000089 wd 0.0500 time 0.2604 (0.2641) data time 0.0009 (0.0028) model time 0.2596 (0.2581) loss 5.0721 (5.5664) grad_norm 2.7245 (2.7577) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][280/625] eta 0:01:31 lr 0.000089 wd 0.0500 time 0.2563 (0.2639) data time 0.0007 (0.0028) model time 0.2556 (0.2580) loss 5.5023 (5.5652) grad_norm 2.2812 (2.7746) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][290/625] eta 0:01:28 lr 0.000089 wd 0.0500 time 0.2516 (0.2642) data time 0.0010 (0.0027) model time 0.2505 (0.2586) loss 5.7554 (5.5720) grad_norm 2.4662 (2.7652) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][300/625] eta 0:01:25 lr 0.000089 wd 0.0500 time 0.2538 (0.2639) data time 0.0010 (0.0026) model time 0.2528 (0.2584) loss 6.0343 (5.5671) grad_norm 2.5853 (2.8488) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][310/625] eta 0:01:23 lr 0.000089 wd 0.0500 time 0.2602 (0.2638) data time 0.0008 (0.0026) model time 0.2594 (0.2584) loss 6.4530 (5.5689) grad_norm 3.0378 (2.8429) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][320/625] eta 0:01:20 lr 0.000089 wd 0.0500 time 0.2555 (0.2635) data time 0.0006 (0.0025) model time 0.2549 (0.2583) loss 6.1244 (5.5614) grad_norm 1.9462 (2.8310) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][330/625] eta 0:01:17 lr 0.000089 wd 0.0500 time 0.2533 (0.2633) data time 0.0009 (0.0025) model time 0.2523 (0.2581) loss 5.4931 (5.5591) grad_norm 2.4763 (2.8268) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:04:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][340/625] eta 0:01:14 lr 0.000089 wd 0.0500 time 0.2491 (0.2631) data time 0.0011 (0.0024) model time 0.2480 (0.2580) loss 5.9636 (5.5652) grad_norm 2.0083 (2.8199) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:05:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][350/625] eta 0:01:12 lr 0.000089 wd 0.0500 time 0.2571 (0.2629) data time 0.0010 (0.0024) model time 0.2561 (0.2580) loss 5.3003 (5.5692) grad_norm 3.3018 (2.8086) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:05:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][360/625] eta 0:01:09 lr 0.000089 wd 0.0500 time 0.2585 (0.2630) data time 0.0007 (0.0024) model time 0.2578 (0.2583) loss 6.3207 (5.5796) grad_norm 4.0781 (2.8399) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:05:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][370/625] eta 0:01:07 lr 0.000089 wd 0.0500 time 0.2546 (0.2629) data time 0.0008 (0.0023) model time 0.2538 (0.2582) loss 5.9656 (5.5714) grad_norm 4.6524 (inf) loss_scale 256.0000 (509.2399) mem 9655MB [2024-08-04 10:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][380/625] eta 0:01:04 lr 0.000089 wd 0.0500 time 0.2576 (0.2631) data time 0.0008 (0.0023) model time 0.2568 (0.2585) loss 5.4829 (5.5752) grad_norm 1.9670 (inf) loss_scale 256.0000 (502.5932) mem 9655MB [2024-08-04 10:05:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][390/625] eta 0:01:01 lr 0.000089 wd 0.0500 time 0.2567 (0.2629) data time 0.0009 (0.0023) model time 0.2559 (0.2585) loss 5.4212 (5.5736) grad_norm 2.0621 (inf) loss_scale 256.0000 (496.2864) mem 9655MB [2024-08-04 10:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][400/625] eta 0:00:59 lr 0.000089 wd 0.0500 time 0.2522 (0.2627) data time 0.0011 (0.0022) model time 0.2511 (0.2584) loss 6.3004 (5.5713) grad_norm 3.1135 (inf) loss_scale 256.0000 (490.2943) mem 9655MB [2024-08-04 10:05:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][410/625] eta 0:00:56 lr 0.000088 wd 0.0500 time 0.2539 (0.2626) data time 0.0009 (0.0022) model time 0.2530 (0.2583) loss 6.2487 (5.5739) grad_norm 1.7068 (inf) loss_scale 256.0000 (484.5937) mem 9655MB [2024-08-04 10:05:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][420/625] eta 0:00:53 lr 0.000088 wd 0.0500 time 0.2611 (0.2625) data time 0.0006 (0.0022) model time 0.2605 (0.2582) loss 4.9514 (5.5807) grad_norm 3.7665 (inf) loss_scale 256.0000 (479.1639) mem 9655MB [2024-08-04 10:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][430/625] eta 0:00:51 lr 0.000088 wd 0.0500 time 0.2545 (0.2623) data time 0.0008 (0.0021) model time 0.2537 (0.2582) loss 5.2961 (5.5819) grad_norm 2.3376 (inf) loss_scale 256.0000 (473.9861) mem 9655MB [2024-08-04 10:05:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][440/625] eta 0:00:48 lr 0.000088 wd 0.0500 time 0.2556 (0.2622) data time 0.0007 (0.0021) model time 0.2549 (0.2581) loss 4.5848 (5.5783) grad_norm 2.7786 (inf) loss_scale 256.0000 (469.0431) mem 9655MB [2024-08-04 10:05:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][450/625] eta 0:00:45 lr 0.000088 wd 0.0500 time 0.2548 (0.2620) data time 0.0008 (0.0021) model time 0.2539 (0.2580) loss 4.6067 (5.5757) grad_norm 3.4277 (inf) loss_scale 256.0000 (464.3193) mem 9655MB [2024-08-04 10:05:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][460/625] eta 0:00:43 lr 0.000088 wd 0.0500 time 0.2525 (0.2623) data time 0.0009 (0.0021) model time 0.2516 (0.2584) loss 6.2802 (5.5730) grad_norm 2.0727 (inf) loss_scale 256.0000 (459.8004) mem 9655MB [2024-08-04 10:05:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][470/625] eta 0:00:40 lr 0.000088 wd 0.0500 time 0.2557 (0.2626) data time 0.0008 (0.0020) model time 0.2549 (0.2587) loss 5.3438 (5.5735) grad_norm 3.2498 (inf) loss_scale 256.0000 (455.4735) mem 9655MB [2024-08-04 10:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][480/625] eta 0:00:38 lr 0.000088 wd 0.0500 time 0.2589 (0.2624) data time 0.0008 (0.0020) model time 0.2582 (0.2587) loss 4.8466 (5.5718) grad_norm 1.5370 (inf) loss_scale 256.0000 (451.3264) mem 9655MB [2024-08-04 10:05:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][490/625] eta 0:00:35 lr 0.000088 wd 0.0500 time 0.2538 (0.2623) data time 0.0008 (0.0020) model time 0.2531 (0.2586) loss 4.9930 (5.5658) grad_norm 2.5293 (inf) loss_scale 256.0000 (447.3483) mem 9655MB [2024-08-04 10:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][500/625] eta 0:00:32 lr 0.000088 wd 0.0500 time 0.2531 (0.2626) data time 0.0009 (0.0020) model time 0.2522 (0.2590) loss 5.9827 (5.5706) grad_norm 2.8848 (inf) loss_scale 256.0000 (443.5289) mem 9655MB [2024-08-04 10:05:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][510/625] eta 0:00:30 lr 0.000088 wd 0.0500 time 0.2543 (0.2625) data time 0.0007 (0.0019) model time 0.2536 (0.2589) loss 5.5260 (5.5644) grad_norm 1.7575 (inf) loss_scale 256.0000 (439.8591) mem 9655MB [2024-08-04 10:05:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][520/625] eta 0:00:27 lr 0.000088 wd 0.0500 time 0.2544 (0.2627) data time 0.0009 (0.0019) model time 0.2535 (0.2592) loss 5.6447 (5.5612) grad_norm 1.7867 (inf) loss_scale 256.0000 (436.3301) mem 9655MB [2024-08-04 10:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][530/625] eta 0:00:24 lr 0.000088 wd 0.0500 time 0.2533 (0.2626) data time 0.0008 (0.0019) model time 0.2525 (0.2591) loss 4.4556 (5.5643) grad_norm 1.5863 (inf) loss_scale 256.0000 (432.9341) mem 9655MB [2024-08-04 10:05:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][540/625] eta 0:00:22 lr 0.000088 wd 0.0500 time 0.2476 (0.2625) data time 0.0008 (0.0019) model time 0.2468 (0.2590) loss 5.7877 (5.5612) grad_norm 3.6165 (inf) loss_scale 256.0000 (429.6636) mem 9655MB [2024-08-04 10:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][550/625] eta 0:00:19 lr 0.000088 wd 0.0500 time 0.2589 (0.2626) data time 0.0008 (0.0019) model time 0.2581 (0.2592) loss 6.3271 (5.5699) grad_norm 3.1933 (inf) loss_scale 256.0000 (426.5118) mem 9655MB [2024-08-04 10:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][560/625] eta 0:00:17 lr 0.000088 wd 0.0500 time 0.2557 (0.2624) data time 0.0008 (0.0019) model time 0.2549 (0.2591) loss 5.0175 (5.5679) grad_norm 2.5008 (inf) loss_scale 256.0000 (423.4724) mem 9655MB [2024-08-04 10:05:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][570/625] eta 0:00:14 lr 0.000087 wd 0.0500 time 0.2577 (0.2624) data time 0.0010 (0.0018) model time 0.2567 (0.2591) loss 5.5662 (5.5651) grad_norm 4.7840 (inf) loss_scale 256.0000 (420.5394) mem 9655MB [2024-08-04 10:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][580/625] eta 0:00:11 lr 0.000087 wd 0.0500 time 0.2492 (0.2623) data time 0.0007 (0.0018) model time 0.2485 (0.2590) loss 6.2465 (5.5644) grad_norm 2.5729 (inf) loss_scale 256.0000 (417.7074) mem 9655MB [2024-08-04 10:06:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][590/625] eta 0:00:09 lr 0.000087 wd 0.0500 time 0.2580 (0.2622) data time 0.0011 (0.0018) model time 0.2569 (0.2589) loss 5.7576 (5.5633) grad_norm 2.7248 (inf) loss_scale 256.0000 (414.9712) mem 9655MB [2024-08-04 10:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][600/625] eta 0:00:06 lr 0.000087 wd 0.0500 time 0.2547 (0.2624) data time 0.0009 (0.0018) model time 0.2538 (0.2592) loss 5.5953 (5.5575) grad_norm 2.9091 (inf) loss_scale 256.0000 (412.3261) mem 9655MB [2024-08-04 10:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][610/625] eta 0:00:03 lr 0.000087 wd 0.0500 time 0.2532 (0.2623) data time 0.0006 (0.0018) model time 0.2526 (0.2592) loss 6.3901 (5.5611) grad_norm 3.2198 (inf) loss_scale 256.0000 (409.7676) mem 9655MB [2024-08-04 10:06:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][620/625] eta 0:00:01 lr 0.000087 wd 0.0500 time 0.2520 (0.2622) data time 0.0006 (0.0018) model time 0.2515 (0.2591) loss 4.6881 (5.5610) grad_norm 2.3300 (inf) loss_scale 256.0000 (407.2915) mem 9655MB [2024-08-04 10:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 266 training takes 0:02:43 [2024-08-04 10:06:13 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:06:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:06:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.501 (0.501) Loss 0.6006 (0.6006) Acc@1 90.381 (90.381) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.8931 (0.7178) Acc@1 81.885 (87.105) Acc@5 96.533 (97.865) Mem 9655MB [2024-08-04 10:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0137 (0.8379) Acc@1 77.783 (83.917) Acc@5 95.752 (96.738) Mem 9655MB [2024-08-04 10:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.613 Acc@5 96.753 [2024-08-04 10:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:06:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.769 (0.769) Loss 0.5859 (0.5859) Acc@1 90.186 (90.186) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:06:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.8965 (0.7077) Acc@1 81.787 (87.047) Acc@5 96.338 (97.772) Mem 9655MB [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0117 (0.8269) Acc@1 78.613 (83.917) Acc@5 95.605 (96.617) Mem 9655MB [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.571 Acc@5 96.625 [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.57% [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:06:18 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:06:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][0/625] eta 0:06:51 lr 0.000087 wd 0.0500 time 0.6584 (0.6584) data time 0.4170 (0.4170) model time 0.0000 (0.0000) loss 5.4489 (5.4489) grad_norm 8.3501 (8.3501) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][10/625] eta 0:03:11 lr 0.000087 wd 0.0500 time 0.2556 (0.3119) data time 0.0008 (0.0387) model time 0.0000 (0.0000) loss 5.4993 (5.7786) grad_norm 4.5937 (3.4513) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][20/625] eta 0:02:52 lr 0.000087 wd 0.0500 time 0.2531 (0.2855) data time 0.0010 (0.0208) model time 0.0000 (0.0000) loss 4.9972 (5.5240) grad_norm 3.6508 (3.0576) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][30/625] eta 0:02:46 lr 0.000087 wd 0.0500 time 0.2566 (0.2803) data time 0.0007 (0.0144) model time 0.0000 (0.0000) loss 4.9572 (5.5397) grad_norm 1.9611 (2.9416) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][40/625] eta 0:02:48 lr 0.000087 wd 0.0500 time 0.2553 (0.2877) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 5.8308 (5.5547) grad_norm 2.4662 (2.7459) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][50/625] eta 0:02:44 lr 0.000087 wd 0.0500 time 0.2582 (0.2861) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 5.7187 (5.5176) grad_norm 6.7070 (2.8605) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][60/625] eta 0:02:40 lr 0.000087 wd 0.0500 time 0.2509 (0.2835) data time 0.0009 (0.0078) model time 0.2500 (0.2692) loss 5.3344 (5.5243) grad_norm 2.6235 (2.9362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][70/625] eta 0:02:38 lr 0.000087 wd 0.0500 time 0.2550 (0.2854) data time 0.0007 (0.0068) model time 0.2543 (0.2826) loss 6.1740 (5.5443) grad_norm 5.0869 (3.0414) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][80/625] eta 0:02:36 lr 0.000087 wd 0.0500 time 0.2561 (0.2865) data time 0.0011 (0.0061) model time 0.2550 (0.2862) loss 6.3676 (5.5341) grad_norm 2.5181 (3.0423) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][90/625] eta 0:02:31 lr 0.000087 wd 0.0500 time 0.2551 (0.2832) data time 0.0007 (0.0055) model time 0.2544 (0.2785) loss 5.2436 (5.5537) grad_norm 2.5474 (3.0772) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][100/625] eta 0:02:27 lr 0.000086 wd 0.0500 time 0.2577 (0.2805) data time 0.0007 (0.0050) model time 0.2570 (0.2739) loss 5.7588 (5.5498) grad_norm 2.8906 (3.1197) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][110/625] eta 0:02:24 lr 0.000086 wd 0.0500 time 0.2566 (0.2801) data time 0.0007 (0.0047) model time 0.2558 (0.2741) loss 6.4703 (5.5409) grad_norm 4.0910 (3.2569) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][120/625] eta 0:02:20 lr 0.000086 wd 0.0500 time 0.2579 (0.2782) data time 0.0007 (0.0044) model time 0.2572 (0.2715) loss 5.8725 (5.5413) grad_norm 2.6331 (3.2561) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][130/625] eta 0:02:16 lr 0.000086 wd 0.0500 time 0.2544 (0.2764) data time 0.0013 (0.0041) model time 0.2531 (0.2693) loss 5.4082 (5.5265) grad_norm 2.6678 (3.2882) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:06:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][140/625] eta 0:02:13 lr 0.000086 wd 0.0500 time 0.2509 (0.2760) data time 0.0010 (0.0039) model time 0.2499 (0.2694) loss 6.2125 (5.5480) grad_norm 5.8982 (3.3503) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][150/625] eta 0:02:10 lr 0.000086 wd 0.0500 time 0.2571 (0.2757) data time 0.0010 (0.0037) model time 0.2560 (0.2694) loss 5.2074 (5.5438) grad_norm 3.4097 (3.3703) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][160/625] eta 0:02:07 lr 0.000086 wd 0.0500 time 0.2537 (0.2744) data time 0.0006 (0.0035) model time 0.2531 (0.2680) loss 6.0590 (5.5394) grad_norm 3.4618 (3.4025) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][170/625] eta 0:02:04 lr 0.000086 wd 0.0500 time 0.2583 (0.2734) data time 0.0009 (0.0034) model time 0.2574 (0.2671) loss 6.0870 (5.5299) grad_norm 1.9052 (3.3772) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][180/625] eta 0:02:01 lr 0.000086 wd 0.0500 time 0.2559 (0.2724) data time 0.0007 (0.0032) model time 0.2552 (0.2661) loss 5.0869 (5.5327) grad_norm 2.3881 (3.3178) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][190/625] eta 0:01:58 lr 0.000086 wd 0.0500 time 0.2543 (0.2716) data time 0.0009 (0.0031) model time 0.2534 (0.2653) loss 5.6698 (5.5207) grad_norm 2.1900 (3.2769) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][200/625] eta 0:01:55 lr 0.000086 wd 0.0500 time 0.2656 (0.2710) data time 0.0007 (0.0030) model time 0.2649 (0.2649) loss 5.7385 (5.5195) grad_norm 4.1727 (3.3467) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][210/625] eta 0:01:52 lr 0.000086 wd 0.0500 time 0.2594 (0.2703) data time 0.0011 (0.0029) model time 0.2583 (0.2642) loss 5.0777 (5.5194) grad_norm 1.6213 (3.3098) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][220/625] eta 0:01:49 lr 0.000086 wd 0.0500 time 0.2591 (0.2696) data time 0.0007 (0.0028) model time 0.2584 (0.2637) loss 5.7894 (5.5281) grad_norm 2.9882 (3.2898) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][230/625] eta 0:01:46 lr 0.000086 wd 0.0500 time 0.2564 (0.2690) data time 0.0011 (0.0027) model time 0.2553 (0.2632) loss 5.2993 (5.5292) grad_norm 2.6505 (3.2698) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][240/625] eta 0:01:43 lr 0.000086 wd 0.0500 time 0.2581 (0.2692) data time 0.0007 (0.0027) model time 0.2575 (0.2638) loss 5.1569 (5.5469) grad_norm 3.3345 (3.2626) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][250/625] eta 0:01:40 lr 0.000085 wd 0.0500 time 0.2545 (0.2687) data time 0.0009 (0.0026) model time 0.2536 (0.2633) loss 5.8785 (5.5462) grad_norm 2.2919 (3.2639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][260/625] eta 0:01:37 lr 0.000085 wd 0.0500 time 0.2610 (0.2682) data time 0.0010 (0.0025) model time 0.2601 (0.2629) loss 4.7671 (5.5409) grad_norm 3.9313 (3.2636) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][270/625] eta 0:01:35 lr 0.000085 wd 0.0500 time 0.2561 (0.2685) data time 0.0010 (0.0025) model time 0.2551 (0.2634) loss 6.1804 (5.5480) grad_norm 3.7603 (3.2686) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][280/625] eta 0:01:32 lr 0.000085 wd 0.0500 time 0.2554 (0.2680) data time 0.0007 (0.0024) model time 0.2547 (0.2630) loss 5.0023 (5.5522) grad_norm 2.3566 (3.2409) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][290/625] eta 0:01:29 lr 0.000085 wd 0.0500 time 0.2575 (0.2676) data time 0.0007 (0.0024) model time 0.2568 (0.2627) loss 5.1237 (5.5461) grad_norm 2.2451 (3.2064) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][300/625] eta 0:01:26 lr 0.000085 wd 0.0500 time 0.2565 (0.2672) data time 0.0006 (0.0023) model time 0.2559 (0.2624) loss 6.3673 (5.5452) grad_norm 2.1422 (3.1760) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][310/625] eta 0:01:24 lr 0.000085 wd 0.0500 time 0.2543 (0.2669) data time 0.0009 (0.0023) model time 0.2534 (0.2622) loss 4.7864 (5.5435) grad_norm 2.5129 (3.1559) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][320/625] eta 0:01:21 lr 0.000085 wd 0.0500 time 0.2553 (0.2666) data time 0.0007 (0.0023) model time 0.2546 (0.2619) loss 5.5805 (5.5431) grad_norm 1.6117 (3.1242) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][330/625] eta 0:01:18 lr 0.000085 wd 0.0500 time 0.2564 (0.2663) data time 0.0009 (0.0022) model time 0.2555 (0.2617) loss 5.7141 (5.5462) grad_norm 2.8294 (3.1143) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][340/625] eta 0:01:15 lr 0.000085 wd 0.0500 time 0.2552 (0.2663) data time 0.0006 (0.0022) model time 0.2545 (0.2619) loss 5.2234 (5.5361) grad_norm 2.3268 (3.1031) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][350/625] eta 0:01:13 lr 0.000085 wd 0.0500 time 0.2536 (0.2665) data time 0.0010 (0.0021) model time 0.2526 (0.2622) loss 5.8527 (5.5321) grad_norm 2.9503 (3.1079) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][360/625] eta 0:01:10 lr 0.000085 wd 0.0500 time 0.2544 (0.2668) data time 0.0012 (0.0021) model time 0.2531 (0.2627) loss 6.0930 (5.5332) grad_norm 3.0439 (3.1082) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:07:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][370/625] eta 0:01:07 lr 0.000085 wd 0.0500 time 0.2587 (0.2665) data time 0.0008 (0.0021) model time 0.2579 (0.2624) loss 5.0487 (5.5344) grad_norm 2.2576 (3.0967) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][380/625] eta 0:01:05 lr 0.000085 wd 0.0500 time 0.2533 (0.2662) data time 0.0008 (0.0020) model time 0.2525 (0.2622) loss 6.0919 (5.5372) grad_norm 2.5928 (3.0826) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][390/625] eta 0:01:02 lr 0.000085 wd 0.0500 time 0.2522 (0.2660) data time 0.0007 (0.0020) model time 0.2515 (0.2620) loss 5.9169 (5.5430) grad_norm 2.3362 (3.0669) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][400/625] eta 0:00:59 lr 0.000085 wd 0.0500 time 0.2543 (0.2658) data time 0.0008 (0.0020) model time 0.2536 (0.2618) loss 4.2361 (5.5468) grad_norm 2.9644 (3.0696) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][410/625] eta 0:00:57 lr 0.000084 wd 0.0500 time 0.2550 (0.2655) data time 0.0009 (0.0020) model time 0.2541 (0.2616) loss 6.3309 (5.5563) grad_norm 3.1902 (3.0683) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][420/625] eta 0:00:54 lr 0.000084 wd 0.0500 time 0.2574 (0.2663) data time 0.0008 (0.0019) model time 0.2566 (0.2626) loss 6.0775 (5.5593) grad_norm 2.6982 (3.0504) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][430/625] eta 0:00:51 lr 0.000084 wd 0.0500 time 0.2574 (0.2660) data time 0.0007 (0.0019) model time 0.2567 (0.2624) loss 5.1197 (5.5483) grad_norm 2.5565 (3.0444) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][440/625] eta 0:00:49 lr 0.000084 wd 0.0500 time 0.2538 (0.2663) data time 0.0010 (0.0019) model time 0.2528 (0.2627) loss 5.5642 (5.5506) grad_norm 2.2333 (3.0436) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][450/625] eta 0:00:46 lr 0.000084 wd 0.0500 time 0.2568 (0.2660) data time 0.0008 (0.0019) model time 0.2560 (0.2625) loss 6.3468 (5.5476) grad_norm 2.9616 (3.0312) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][460/625] eta 0:00:43 lr 0.000084 wd 0.0500 time 0.2581 (0.2659) data time 0.0008 (0.0019) model time 0.2573 (0.2624) loss 4.4355 (5.5509) grad_norm 1.7729 (3.0109) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][470/625] eta 0:00:41 lr 0.000084 wd 0.0500 time 0.2542 (0.2656) data time 0.0010 (0.0018) model time 0.2532 (0.2622) loss 4.8333 (5.5473) grad_norm 3.2445 (3.0114) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][480/625] eta 0:00:38 lr 0.000084 wd 0.0500 time 0.2560 (0.2654) data time 0.0008 (0.0018) model time 0.2552 (0.2620) loss 5.8040 (5.5500) grad_norm 44.9268 (3.0931) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][490/625] eta 0:00:35 lr 0.000084 wd 0.0500 time 0.2583 (0.2653) data time 0.0010 (0.0018) model time 0.2573 (0.2619) loss 4.6216 (5.5494) grad_norm 1.8823 (3.0753) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][500/625] eta 0:00:33 lr 0.000084 wd 0.0500 time 0.2544 (0.2651) data time 0.0011 (0.0018) model time 0.2534 (0.2617) loss 5.6079 (5.5527) grad_norm 1.8764 (3.0674) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][510/625] eta 0:00:30 lr 0.000084 wd 0.0500 time 0.2587 (0.2652) data time 0.0009 (0.0018) model time 0.2578 (0.2619) loss 5.4821 (5.5510) grad_norm 1.9105 (3.0741) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][520/625] eta 0:00:27 lr 0.000084 wd 0.0500 time 0.2534 (0.2650) data time 0.0008 (0.0017) model time 0.2526 (0.2617) loss 5.6366 (5.5519) grad_norm 2.7214 (3.1048) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][530/625] eta 0:00:25 lr 0.000084 wd 0.0500 time 0.2539 (0.2649) data time 0.0010 (0.0017) model time 0.2529 (0.2616) loss 6.4314 (5.5505) grad_norm 1.9632 (3.0906) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][540/625] eta 0:00:22 lr 0.000084 wd 0.0500 time 0.2538 (0.2647) data time 0.0009 (0.0017) model time 0.2529 (0.2615) loss 6.2494 (5.5515) grad_norm 1.9493 (3.0734) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][550/625] eta 0:00:19 lr 0.000084 wd 0.0500 time 0.2564 (0.2649) data time 0.0008 (0.0017) model time 0.2556 (0.2618) loss 4.1588 (5.5546) grad_norm 2.7323 (3.0711) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][560/625] eta 0:00:17 lr 0.000084 wd 0.0500 time 0.2531 (0.2651) data time 0.0007 (0.0017) model time 0.2524 (0.2620) loss 5.4197 (5.5523) grad_norm 2.2095 (3.0764) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][570/625] eta 0:00:14 lr 0.000083 wd 0.0500 time 0.2537 (0.2649) data time 0.0006 (0.0017) model time 0.2531 (0.2619) loss 4.8869 (5.5481) grad_norm 2.3941 (3.0657) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][580/625] eta 0:00:11 lr 0.000083 wd 0.0500 time 0.2575 (0.2648) data time 0.0009 (0.0017) model time 0.2566 (0.2618) loss 4.9827 (5.5476) grad_norm 2.0802 (3.0639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][590/625] eta 0:00:09 lr 0.000083 wd 0.0500 time 0.2517 (0.2646) data time 0.0007 (0.0016) model time 0.2509 (0.2617) loss 5.6781 (5.5529) grad_norm 4.1207 (3.0648) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][600/625] eta 0:00:06 lr 0.000083 wd 0.0500 time 0.2590 (0.2645) data time 0.0006 (0.0016) model time 0.2584 (0.2616) loss 5.2337 (5.5569) grad_norm 3.8799 (3.0933) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:09:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][610/625] eta 0:00:03 lr 0.000083 wd 0.0500 time 0.2531 (0.2644) data time 0.0004 (0.0016) model time 0.2527 (0.2614) loss 4.9111 (5.5535) grad_norm 1.9436 (3.0935) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][620/625] eta 0:00:01 lr 0.000083 wd 0.0500 time 0.2525 (0.2642) data time 0.0005 (0.0016) model time 0.2519 (0.2613) loss 6.2050 (5.5557) grad_norm 2.3329 (3.0965) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:09:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 267 training takes 0:02:45 [2024-08-04 10:09:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:09:04 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.6196 (0.6196) Acc@1 90.039 (90.039) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.9282 (0.7339) Acc@1 82.178 (87.158) Acc@5 96.387 (97.812) Mem 9655MB [2024-08-04 10:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 1.0186 (0.8514) Acc@1 78.564 (84.047) Acc@5 95.703 (96.687) Mem 9655MB [2024-08-04 10:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.701 Acc@5 96.695 [2024-08-04 10:09:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.753 (0.753) Loss 0.5854 (0.5854) Acc@1 90.234 (90.234) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.8955 (0.7073) Acc@1 81.885 (87.061) Acc@5 96.387 (97.772) Mem 9655MB [2024-08-04 10:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0117 (0.8266) Acc@1 78.613 (83.952) Acc@5 95.654 (96.626) Mem 9655MB [2024-08-04 10:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.595 Acc@5 96.627 [2024-08-04 10:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:09:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.60% [2024-08-04 10:09:08 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:09:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:09:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][0/625] eta 0:07:14 lr 0.000083 wd 0.0500 time 0.6945 (0.6945) data time 0.4511 (0.4511) model time 0.0000 (0.0000) loss 5.6911 (5.6911) grad_norm 2.5198 (2.5198) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:09:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][10/625] eta 0:03:08 lr 0.000083 wd 0.0500 time 0.2544 (0.3068) data time 0.0008 (0.0418) model time 0.0000 (0.0000) loss 6.1315 (5.5819) grad_norm 2.1138 (2.5266) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][20/625] eta 0:02:51 lr 0.000083 wd 0.0500 time 0.2504 (0.2830) data time 0.0007 (0.0223) model time 0.0000 (0.0000) loss 6.6044 (5.4574) grad_norm inf (inf) loss_scale 128.0000 (249.9048) mem 9655MB [2024-08-04 10:09:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][30/625] eta 0:02:43 lr 0.000083 wd 0.0500 time 0.2597 (0.2742) data time 0.0006 (0.0154) model time 0.0000 (0.0000) loss 5.2117 (5.4715) grad_norm 2.9089 (inf) loss_scale 128.0000 (210.5806) mem 9655MB [2024-08-04 10:09:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][40/625] eta 0:02:39 lr 0.000083 wd 0.0500 time 0.2620 (0.2726) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 5.5134 (5.4469) grad_norm 2.3009 (inf) loss_scale 128.0000 (190.4390) mem 9655MB [2024-08-04 10:09:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][50/625] eta 0:02:34 lr 0.000083 wd 0.0500 time 0.2553 (0.2692) data time 0.0006 (0.0097) model time 0.0000 (0.0000) loss 4.4282 (5.4179) grad_norm 2.0297 (inf) loss_scale 128.0000 (178.1961) mem 9655MB [2024-08-04 10:09:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][60/625] eta 0:02:32 lr 0.000083 wd 0.0500 time 0.2558 (0.2702) data time 0.0007 (0.0082) model time 0.2551 (0.2744) loss 4.5921 (5.4447) grad_norm 3.5500 (inf) loss_scale 128.0000 (169.9672) mem 9655MB [2024-08-04 10:09:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][70/625] eta 0:02:30 lr 0.000083 wd 0.0500 time 0.2512 (0.2706) data time 0.0009 (0.0072) model time 0.2503 (0.2732) loss 5.7294 (5.4195) grad_norm 2.9282 (inf) loss_scale 128.0000 (164.0563) mem 9655MB [2024-08-04 10:09:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][80/625] eta 0:02:26 lr 0.000083 wd 0.0500 time 0.2583 (0.2689) data time 0.0006 (0.0064) model time 0.2577 (0.2676) loss 4.5185 (5.3995) grad_norm 3.8191 (inf) loss_scale 128.0000 (159.6049) mem 9655MB [2024-08-04 10:09:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][90/625] eta 0:02:23 lr 0.000083 wd 0.0500 time 0.2536 (0.2676) data time 0.0010 (0.0058) model time 0.2526 (0.2646) loss 6.0403 (5.4154) grad_norm 2.2920 (inf) loss_scale 128.0000 (156.1319) mem 9655MB [2024-08-04 10:09:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][100/625] eta 0:02:19 lr 0.000083 wd 0.0500 time 0.2605 (0.2665) data time 0.0008 (0.0053) model time 0.2597 (0.2629) loss 6.2614 (5.4476) grad_norm 3.3217 (inf) loss_scale 128.0000 (153.3465) mem 9655MB [2024-08-04 10:09:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][110/625] eta 0:02:17 lr 0.000082 wd 0.0500 time 0.2585 (0.2673) data time 0.0008 (0.0049) model time 0.2578 (0.2649) loss 6.0742 (5.4486) grad_norm 2.2335 (inf) loss_scale 128.0000 (151.0631) mem 9655MB [2024-08-04 10:09:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][120/625] eta 0:02:15 lr 0.000082 wd 0.0500 time 0.2541 (0.2674) data time 0.0007 (0.0046) model time 0.2534 (0.2652) loss 6.2592 (5.4648) grad_norm 2.7484 (inf) loss_scale 128.0000 (149.1570) mem 9655MB [2024-08-04 10:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][130/625] eta 0:02:12 lr 0.000082 wd 0.0500 time 0.2561 (0.2684) data time 0.0007 (0.0043) model time 0.2554 (0.2669) loss 6.3972 (5.4916) grad_norm 3.6975 (inf) loss_scale 128.0000 (147.5420) mem 9655MB [2024-08-04 10:09:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][140/625] eta 0:02:10 lr 0.000082 wd 0.0500 time 0.4523 (0.2688) data time 0.0007 (0.0041) model time 0.4517 (0.2677) loss 6.7165 (5.4965) grad_norm 2.3726 (inf) loss_scale 128.0000 (146.1560) mem 9655MB [2024-08-04 10:09:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][150/625] eta 0:02:07 lr 0.000082 wd 0.0500 time 0.2523 (0.2688) data time 0.0010 (0.0039) model time 0.2512 (0.2677) loss 5.0021 (5.5112) grad_norm 2.3562 (inf) loss_scale 128.0000 (144.9536) mem 9655MB [2024-08-04 10:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][160/625] eta 0:02:05 lr 0.000082 wd 0.0500 time 0.2561 (0.2693) data time 0.0007 (0.0037) model time 0.2554 (0.2684) loss 5.4489 (5.5038) grad_norm 1.6533 (inf) loss_scale 128.0000 (143.9006) mem 9655MB [2024-08-04 10:09:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][170/625] eta 0:02:02 lr 0.000082 wd 0.0500 time 0.2532 (0.2686) data time 0.0009 (0.0035) model time 0.2522 (0.2675) loss 5.5555 (5.5097) grad_norm 1.7508 (inf) loss_scale 128.0000 (142.9708) mem 9655MB [2024-08-04 10:09:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][180/625] eta 0:01:59 lr 0.000082 wd 0.0500 time 0.2565 (0.2691) data time 0.0007 (0.0034) model time 0.2559 (0.2682) loss 5.2066 (5.5122) grad_norm 2.5138 (inf) loss_scale 128.0000 (142.1436) mem 9655MB [2024-08-04 10:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][190/625] eta 0:01:56 lr 0.000082 wd 0.0500 time 0.2534 (0.2684) data time 0.0008 (0.0032) model time 0.2526 (0.2672) loss 5.3262 (5.5063) grad_norm 2.2869 (inf) loss_scale 128.0000 (141.4031) mem 9655MB [2024-08-04 10:10:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][200/625] eta 0:01:53 lr 0.000082 wd 0.0500 time 0.2606 (0.2678) data time 0.0008 (0.0031) model time 0.2598 (0.2665) loss 6.7898 (5.5389) grad_norm 4.3073 (inf) loss_scale 128.0000 (140.7363) mem 9655MB [2024-08-04 10:10:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][210/625] eta 0:01:51 lr 0.000082 wd 0.0500 time 0.2557 (0.2678) data time 0.0012 (0.0030) model time 0.2545 (0.2665) loss 5.2546 (5.5518) grad_norm 2.7445 (inf) loss_scale 128.0000 (140.1327) mem 9655MB [2024-08-04 10:10:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][220/625] eta 0:01:48 lr 0.000082 wd 0.0500 time 0.2503 (0.2673) data time 0.0008 (0.0029) model time 0.2494 (0.2659) loss 4.7516 (5.5471) grad_norm 2.8560 (inf) loss_scale 128.0000 (139.5837) mem 9655MB [2024-08-04 10:10:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][230/625] eta 0:01:45 lr 0.000082 wd 0.0500 time 0.2523 (0.2668) data time 0.0008 (0.0028) model time 0.2515 (0.2652) loss 6.6991 (5.5501) grad_norm 2.9514 (inf) loss_scale 128.0000 (139.0823) mem 9655MB [2024-08-04 10:10:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][240/625] eta 0:01:42 lr 0.000082 wd 0.0500 time 0.2569 (0.2671) data time 0.0012 (0.0028) model time 0.2557 (0.2657) loss 5.5770 (5.5518) grad_norm 1.9389 (inf) loss_scale 128.0000 (138.6224) mem 9655MB [2024-08-04 10:10:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][250/625] eta 0:01:39 lr 0.000082 wd 0.0500 time 0.2578 (0.2667) data time 0.0006 (0.0027) model time 0.2571 (0.2651) loss 4.6857 (5.5391) grad_norm 5.1700 (inf) loss_scale 128.0000 (138.1992) mem 9655MB [2024-08-04 10:10:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][260/625] eta 0:01:37 lr 0.000082 wd 0.0500 time 0.2575 (0.2663) data time 0.0010 (0.0026) model time 0.2565 (0.2647) loss 6.1408 (5.5442) grad_norm 2.7662 (inf) loss_scale 128.0000 (137.8084) mem 9655MB [2024-08-04 10:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][270/625] eta 0:01:34 lr 0.000081 wd 0.0500 time 0.2564 (0.2659) data time 0.0010 (0.0026) model time 0.2554 (0.2643) loss 5.2121 (5.5449) grad_norm 2.2128 (inf) loss_scale 128.0000 (137.4465) mem 9655MB [2024-08-04 10:10:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][280/625] eta 0:01:31 lr 0.000081 wd 0.0500 time 0.2572 (0.2656) data time 0.0006 (0.0025) model time 0.2566 (0.2639) loss 6.1282 (5.5395) grad_norm 2.9674 (inf) loss_scale 128.0000 (137.1103) mem 9655MB [2024-08-04 10:10:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][290/625] eta 0:01:28 lr 0.000081 wd 0.0500 time 0.2541 (0.2653) data time 0.0007 (0.0024) model time 0.2534 (0.2636) loss 5.6366 (5.5397) grad_norm 3.3094 (inf) loss_scale 128.0000 (136.7973) mem 9655MB [2024-08-04 10:10:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][300/625] eta 0:01:26 lr 0.000081 wd 0.0500 time 0.2562 (0.2650) data time 0.0009 (0.0024) model time 0.2552 (0.2632) loss 5.8812 (5.5275) grad_norm 1.5756 (inf) loss_scale 128.0000 (136.5050) mem 9655MB [2024-08-04 10:10:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][310/625] eta 0:01:23 lr 0.000081 wd 0.0500 time 0.2536 (0.2653) data time 0.0008 (0.0023) model time 0.2528 (0.2637) loss 5.2724 (5.5306) grad_norm 3.5510 (inf) loss_scale 128.0000 (136.2315) mem 9655MB [2024-08-04 10:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][320/625] eta 0:01:20 lr 0.000081 wd 0.0500 time 0.2564 (0.2650) data time 0.0009 (0.0023) model time 0.2555 (0.2634) loss 5.3500 (5.5310) grad_norm 1.8924 (inf) loss_scale 128.0000 (135.9751) mem 9655MB [2024-08-04 10:10:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][330/625] eta 0:01:18 lr 0.000081 wd 0.0500 time 0.2550 (0.2648) data time 0.0008 (0.0023) model time 0.2542 (0.2631) loss 5.5323 (5.5311) grad_norm 1.7828 (inf) loss_scale 128.0000 (135.7341) mem 9655MB [2024-08-04 10:10:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][340/625] eta 0:01:15 lr 0.000081 wd 0.0500 time 0.2551 (0.2651) data time 0.0006 (0.0022) model time 0.2544 (0.2635) loss 5.6281 (5.5272) grad_norm 2.6145 (inf) loss_scale 128.0000 (135.5073) mem 9655MB [2024-08-04 10:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][350/625] eta 0:01:12 lr 0.000081 wd 0.0500 time 0.2526 (0.2654) data time 0.0007 (0.0022) model time 0.2520 (0.2639) loss 4.7683 (5.5194) grad_norm 2.3700 (inf) loss_scale 128.0000 (135.2934) mem 9655MB [2024-08-04 10:10:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][360/625] eta 0:01:10 lr 0.000081 wd 0.0500 time 0.2582 (0.2658) data time 0.0009 (0.0021) model time 0.2573 (0.2643) loss 5.6595 (5.5288) grad_norm 2.3602 (inf) loss_scale 128.0000 (135.0914) mem 9655MB [2024-08-04 10:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][370/625] eta 0:01:07 lr 0.000081 wd 0.0500 time 0.2581 (0.2660) data time 0.0006 (0.0021) model time 0.2575 (0.2646) loss 5.1097 (5.5238) grad_norm 3.1634 (inf) loss_scale 128.0000 (134.9003) mem 9655MB [2024-08-04 10:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][380/625] eta 0:01:05 lr 0.000081 wd 0.0500 time 0.2597 (0.2658) data time 0.0009 (0.0021) model time 0.2588 (0.2644) loss 5.8140 (5.5209) grad_norm 1.9515 (inf) loss_scale 128.0000 (134.7192) mem 9655MB [2024-08-04 10:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][390/625] eta 0:01:02 lr 0.000081 wd 0.0500 time 0.2547 (0.2656) data time 0.0008 (0.0021) model time 0.2539 (0.2641) loss 6.4951 (5.5292) grad_norm 2.8102 (inf) loss_scale 128.0000 (134.5473) mem 9655MB [2024-08-04 10:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][400/625] eta 0:00:59 lr 0.000081 wd 0.0500 time 0.2573 (0.2653) data time 0.0008 (0.0020) model time 0.2565 (0.2638) loss 5.1296 (5.5305) grad_norm 2.1296 (inf) loss_scale 128.0000 (134.3840) mem 9655MB [2024-08-04 10:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][410/625] eta 0:00:57 lr 0.000081 wd 0.0500 time 0.4344 (0.2655) data time 0.0009 (0.0020) model time 0.4336 (0.2641) loss 5.2693 (5.5263) grad_norm 2.4720 (inf) loss_scale 128.0000 (134.2287) mem 9655MB [2024-08-04 10:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][420/625] eta 0:00:54 lr 0.000081 wd 0.0500 time 0.2566 (0.2653) data time 0.0010 (0.0020) model time 0.2556 (0.2638) loss 6.0628 (5.5341) grad_norm 2.7332 (inf) loss_scale 128.0000 (134.0808) mem 9655MB [2024-08-04 10:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][430/625] eta 0:00:51 lr 0.000080 wd 0.0500 time 0.2588 (0.2651) data time 0.0006 (0.0020) model time 0.2583 (0.2636) loss 5.0230 (5.5353) grad_norm 1.7680 (inf) loss_scale 128.0000 (133.9397) mem 9655MB [2024-08-04 10:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][440/625] eta 0:00:49 lr 0.000080 wd 0.0500 time 0.2620 (0.2649) data time 0.0008 (0.0019) model time 0.2611 (0.2634) loss 5.8162 (5.5325) grad_norm 2.2982 (inf) loss_scale 128.0000 (133.8050) mem 9655MB [2024-08-04 10:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][450/625] eta 0:00:46 lr 0.000080 wd 0.0500 time 0.2529 (0.2647) data time 0.0007 (0.0019) model time 0.2522 (0.2632) loss 6.2075 (5.5420) grad_norm 2.6823 (inf) loss_scale 128.0000 (133.6763) mem 9655MB [2024-08-04 10:11:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][460/625] eta 0:00:43 lr 0.000080 wd 0.0500 time 0.2551 (0.2645) data time 0.0008 (0.0019) model time 0.2543 (0.2630) loss 6.5486 (5.5382) grad_norm 1.7393 (inf) loss_scale 128.0000 (133.5531) mem 9655MB [2024-08-04 10:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][470/625] eta 0:00:40 lr 0.000080 wd 0.0500 time 0.2565 (0.2643) data time 0.0006 (0.0019) model time 0.2559 (0.2628) loss 5.3676 (5.5359) grad_norm 5.0730 (inf) loss_scale 128.0000 (133.4352) mem 9655MB [2024-08-04 10:11:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][480/625] eta 0:00:38 lr 0.000080 wd 0.0500 time 0.2542 (0.2645) data time 0.0008 (0.0019) model time 0.2534 (0.2631) loss 5.1058 (5.5324) grad_norm 1.9704 (inf) loss_scale 128.0000 (133.3222) mem 9655MB [2024-08-04 10:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][490/625] eta 0:00:35 lr 0.000080 wd 0.0500 time 0.2559 (0.2644) data time 0.0011 (0.0018) model time 0.2548 (0.2629) loss 5.0780 (5.5306) grad_norm 2.8625 (inf) loss_scale 128.0000 (133.2138) mem 9655MB [2024-08-04 10:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][500/625] eta 0:00:33 lr 0.000080 wd 0.0500 time 0.2576 (0.2642) data time 0.0008 (0.0018) model time 0.2568 (0.2627) loss 6.0041 (5.5320) grad_norm 3.2205 (inf) loss_scale 128.0000 (133.1098) mem 9655MB [2024-08-04 10:11:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][510/625] eta 0:00:30 lr 0.000080 wd 0.0500 time 0.2556 (0.2641) data time 0.0007 (0.0018) model time 0.2549 (0.2626) loss 5.6522 (5.5313) grad_norm 4.7901 (inf) loss_scale 128.0000 (133.0098) mem 9655MB [2024-08-04 10:11:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][520/625] eta 0:00:27 lr 0.000080 wd 0.0500 time 0.2571 (0.2639) data time 0.0017 (0.0018) model time 0.2554 (0.2624) loss 4.9537 (5.5299) grad_norm 2.8571 (inf) loss_scale 128.0000 (132.9136) mem 9655MB [2024-08-04 10:11:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][530/625] eta 0:00:25 lr 0.000080 wd 0.0500 time 0.4207 (0.2645) data time 0.0009 (0.0018) model time 0.4198 (0.2631) loss 4.7896 (5.5299) grad_norm 3.6977 (inf) loss_scale 128.0000 (132.8211) mem 9655MB [2024-08-04 10:11:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][540/625] eta 0:00:22 lr 0.000080 wd 0.0500 time 0.2595 (0.2644) data time 0.0009 (0.0018) model time 0.2586 (0.2629) loss 5.7957 (5.5235) grad_norm 1.9512 (inf) loss_scale 128.0000 (132.7320) mem 9655MB [2024-08-04 10:11:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][550/625] eta 0:00:19 lr 0.000080 wd 0.0500 time 0.2573 (0.2642) data time 0.0008 (0.0017) model time 0.2565 (0.2628) loss 5.0529 (5.5183) grad_norm 3.5106 (inf) loss_scale 128.0000 (132.6461) mem 9655MB [2024-08-04 10:11:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][560/625] eta 0:00:17 lr 0.000080 wd 0.0500 time 0.2606 (0.2641) data time 0.0008 (0.0017) model time 0.2598 (0.2626) loss 5.4580 (5.5136) grad_norm 1.8033 (inf) loss_scale 128.0000 (132.5633) mem 9655MB [2024-08-04 10:11:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][570/625] eta 0:00:14 lr 0.000080 wd 0.0500 time 0.2571 (0.2639) data time 0.0009 (0.0017) model time 0.2562 (0.2625) loss 4.1968 (5.5144) grad_norm 3.1648 (inf) loss_scale 128.0000 (132.4834) mem 9655MB [2024-08-04 10:11:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][580/625] eta 0:00:11 lr 0.000080 wd 0.0500 time 0.2542 (0.2638) data time 0.0009 (0.0017) model time 0.2533 (0.2623) loss 5.8178 (5.5151) grad_norm 1.7750 (inf) loss_scale 128.0000 (132.4062) mem 9655MB [2024-08-04 10:11:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][590/625] eta 0:00:09 lr 0.000079 wd 0.0500 time 0.2546 (0.2637) data time 0.0008 (0.0017) model time 0.2539 (0.2622) loss 4.7776 (5.5065) grad_norm 1.6661 (inf) loss_scale 128.0000 (132.3316) mem 9655MB [2024-08-04 10:11:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][600/625] eta 0:00:06 lr 0.000079 wd 0.0500 time 0.2588 (0.2636) data time 0.0008 (0.0017) model time 0.2580 (0.2621) loss 5.6815 (5.5101) grad_norm 1.6461 (inf) loss_scale 128.0000 (132.2596) mem 9655MB [2024-08-04 10:11:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][610/625] eta 0:00:03 lr 0.000079 wd 0.0500 time 0.2523 (0.2634) data time 0.0004 (0.0017) model time 0.2518 (0.2620) loss 4.7562 (5.5134) grad_norm 3.4393 (inf) loss_scale 128.0000 (132.1899) mem 9655MB [2024-08-04 10:11:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][620/625] eta 0:00:01 lr 0.000079 wd 0.0500 time 0.2537 (0.2633) data time 0.0003 (0.0017) model time 0.2533 (0.2618) loss 4.9842 (5.5130) grad_norm 2.4804 (inf) loss_scale 128.0000 (132.1224) mem 9655MB [2024-08-04 10:11:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 268 training takes 0:02:44 [2024-08-04 10:11:53 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:11:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5991 (0.5991) Acc@1 90.576 (90.576) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.8994 (0.7132) Acc@1 81.738 (87.189) Acc@5 96.777 (97.896) Mem 9655MB [2024-08-04 10:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 1.0039 (0.8325) Acc@1 78.564 (84.124) Acc@5 95.605 (96.749) Mem 9655MB [2024-08-04 10:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.757 Acc@5 96.769 [2024-08-04 10:11:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:11:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.716 (0.716) Loss 0.5859 (0.5859) Acc@1 90.283 (90.283) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.122) Loss 0.8960 (0.7073) Acc@1 81.934 (87.074) Acc@5 96.436 (97.785) Mem 9655MB [2024-08-04 10:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.090) Loss 1.0107 (0.8264) Acc@1 78.760 (83.991) Acc@5 95.605 (96.624) Mem 9655MB [2024-08-04 10:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.625 Acc@5 96.623 [2024-08-04 10:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.63% [2024-08-04 10:11:58 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:11:58 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][0/625] eta 0:07:49 lr 0.000079 wd 0.0500 time 0.7508 (0.7508) data time 0.5129 (0.5129) model time 0.0000 (0.0000) loss 5.3753 (5.3753) grad_norm 2.8512 (2.8512) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][10/625] eta 0:03:11 lr 0.000079 wd 0.0500 time 0.2521 (0.3109) data time 0.0009 (0.0474) model time 0.0000 (0.0000) loss 4.7482 (5.3721) grad_norm 1.8166 (2.3342) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][20/625] eta 0:02:52 lr 0.000079 wd 0.0500 time 0.2566 (0.2849) data time 0.0007 (0.0253) model time 0.0000 (0.0000) loss 6.2639 (5.6805) grad_norm 1.8850 (2.2823) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][30/625] eta 0:02:47 lr 0.000079 wd 0.0500 time 0.2574 (0.2809) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 4.6997 (5.5877) grad_norm 1.7336 (2.3830) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][40/625] eta 0:02:43 lr 0.000079 wd 0.0500 time 0.2539 (0.2792) data time 0.0010 (0.0134) model time 0.0000 (0.0000) loss 5.9466 (5.5388) grad_norm 2.3304 (2.7738) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][50/625] eta 0:02:37 lr 0.000079 wd 0.0500 time 0.2560 (0.2747) data time 0.0010 (0.0110) model time 0.0000 (0.0000) loss 6.3964 (5.5976) grad_norm 2.7123 (2.7505) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][60/625] eta 0:02:33 lr 0.000079 wd 0.0500 time 0.2559 (0.2718) data time 0.0006 (0.0093) model time 0.2553 (0.2561) loss 5.8419 (5.5918) grad_norm 2.5559 (2.6845) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][70/625] eta 0:02:29 lr 0.000079 wd 0.0500 time 0.2583 (0.2695) data time 0.0008 (0.0081) model time 0.2575 (0.2555) loss 5.6459 (5.5878) grad_norm 4.1246 (2.6613) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][80/625] eta 0:02:26 lr 0.000079 wd 0.0500 time 0.2549 (0.2680) data time 0.0012 (0.0073) model time 0.2537 (0.2557) loss 5.1839 (5.5404) grad_norm 1.9726 (2.6741) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][90/625] eta 0:02:23 lr 0.000079 wd 0.0500 time 0.2579 (0.2689) data time 0.0007 (0.0066) model time 0.2572 (0.2606) loss 6.8331 (5.5628) grad_norm 3.3203 (2.6713) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][100/625] eta 0:02:20 lr 0.000079 wd 0.0500 time 0.2616 (0.2677) data time 0.0006 (0.0060) model time 0.2611 (0.2597) loss 6.1570 (5.5888) grad_norm 2.7161 (2.6411) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][110/625] eta 0:02:19 lr 0.000079 wd 0.0500 time 0.2569 (0.2699) data time 0.0009 (0.0055) model time 0.2559 (0.2650) loss 5.1775 (5.5875) grad_norm 2.6991 (2.6216) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][120/625] eta 0:02:16 lr 0.000079 wd 0.0500 time 0.2583 (0.2704) data time 0.0008 (0.0051) model time 0.2575 (0.2664) loss 4.7401 (5.5731) grad_norm 3.1019 (2.6748) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][130/625] eta 0:02:13 lr 0.000078 wd 0.0500 time 0.2502 (0.2693) data time 0.0008 (0.0048) model time 0.2494 (0.2650) loss 5.7662 (5.5698) grad_norm 1.8252 (2.6910) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][140/625] eta 0:02:10 lr 0.000078 wd 0.0500 time 0.2540 (0.2684) data time 0.0013 (0.0046) model time 0.2527 (0.2639) loss 4.8527 (5.5508) grad_norm 2.0037 (2.7501) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][150/625] eta 0:02:07 lr 0.000078 wd 0.0500 time 0.2577 (0.2677) data time 0.0009 (0.0043) model time 0.2568 (0.2631) loss 4.6068 (5.5472) grad_norm 2.3382 (2.7213) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][160/625] eta 0:02:04 lr 0.000078 wd 0.0500 time 0.2597 (0.2680) data time 0.0008 (0.0042) model time 0.2588 (0.2638) loss 5.4861 (5.5409) grad_norm 2.2916 (2.7060) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][170/625] eta 0:02:01 lr 0.000078 wd 0.0500 time 0.2541 (0.2673) data time 0.0008 (0.0040) model time 0.2533 (0.2631) loss 6.0068 (5.5299) grad_norm 2.4446 (2.6974) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][180/625] eta 0:01:58 lr 0.000078 wd 0.0500 time 0.2559 (0.2667) data time 0.0009 (0.0038) model time 0.2550 (0.2625) loss 5.8294 (5.5351) grad_norm 2.2061 (2.7039) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][190/625] eta 0:01:56 lr 0.000078 wd 0.0500 time 0.2562 (0.2667) data time 0.0010 (0.0037) model time 0.2552 (0.2627) loss 5.2645 (5.5337) grad_norm 2.5181 (2.7483) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][200/625] eta 0:01:53 lr 0.000078 wd 0.0500 time 0.2578 (0.2670) data time 0.0006 (0.0036) model time 0.2572 (0.2633) loss 6.2224 (5.5352) grad_norm 2.6404 (2.7451) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][210/625] eta 0:01:50 lr 0.000078 wd 0.0500 time 0.2552 (0.2665) data time 0.0010 (0.0034) model time 0.2542 (0.2628) loss 5.5860 (5.5217) grad_norm 2.5287 (2.7409) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][220/625] eta 0:01:47 lr 0.000078 wd 0.0500 time 0.2550 (0.2660) data time 0.0009 (0.0033) model time 0.2541 (0.2624) loss 5.4361 (5.5361) grad_norm 4.6664 (2.8163) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:12:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][230/625] eta 0:01:45 lr 0.000078 wd 0.0500 time 0.2583 (0.2661) data time 0.0006 (0.0032) model time 0.2577 (0.2626) loss 4.7229 (5.5450) grad_norm 3.1478 (2.8138) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][240/625] eta 0:01:42 lr 0.000078 wd 0.0500 time 0.4578 (0.2665) data time 0.0011 (0.0031) model time 0.4568 (0.2633) loss 6.3156 (5.5509) grad_norm 2.3356 (2.8039) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][250/625] eta 0:01:39 lr 0.000078 wd 0.0500 time 0.2537 (0.2661) data time 0.0008 (0.0030) model time 0.2529 (0.2629) loss 6.4591 (5.5516) grad_norm 2.3097 (2.7870) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][260/625] eta 0:01:37 lr 0.000078 wd 0.0500 time 0.2545 (0.2664) data time 0.0007 (0.0029) model time 0.2537 (0.2634) loss 6.1604 (5.5528) grad_norm 2.7189 (2.8648) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][270/625] eta 0:01:34 lr 0.000078 wd 0.0500 time 0.2563 (0.2660) data time 0.0008 (0.0029) model time 0.2556 (0.2630) loss 4.6123 (5.5397) grad_norm 1.8395 (2.8511) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][280/625] eta 0:01:31 lr 0.000078 wd 0.0500 time 0.2655 (0.2657) data time 0.0007 (0.0028) model time 0.2648 (0.2627) loss 5.3486 (5.5335) grad_norm 2.1649 (2.8579) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][290/625] eta 0:01:28 lr 0.000078 wd 0.0500 time 0.2528 (0.2654) data time 0.0009 (0.0027) model time 0.2519 (0.2624) loss 6.0503 (5.5360) grad_norm 3.0898 (2.8484) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][300/625] eta 0:01:26 lr 0.000077 wd 0.0500 time 0.4289 (0.2656) data time 0.0011 (0.0027) model time 0.4278 (0.2628) loss 5.4367 (5.5283) grad_norm 3.0410 (2.8795) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][310/625] eta 0:01:23 lr 0.000077 wd 0.0500 time 0.2543 (0.2653) data time 0.0009 (0.0026) model time 0.2534 (0.2625) loss 5.9692 (5.5380) grad_norm 2.7882 (2.8664) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][320/625] eta 0:01:20 lr 0.000077 wd 0.0500 time 0.2560 (0.2651) data time 0.0007 (0.0026) model time 0.2553 (0.2623) loss 5.3938 (5.5431) grad_norm 3.3456 (2.8612) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][330/625] eta 0:01:18 lr 0.000077 wd 0.0500 time 0.2543 (0.2654) data time 0.0012 (0.0025) model time 0.2531 (0.2627) loss 5.2400 (5.5368) grad_norm 1.9616 (2.8350) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][340/625] eta 0:01:15 lr 0.000077 wd 0.0500 time 0.2549 (0.2651) data time 0.0008 (0.0025) model time 0.2541 (0.2625) loss 5.8931 (5.5319) grad_norm 4.1663 (2.8162) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][350/625] eta 0:01:12 lr 0.000077 wd 0.0500 time 0.2555 (0.2654) data time 0.0008 (0.0024) model time 0.2547 (0.2629) loss 6.1572 (5.5326) grad_norm 3.1102 (2.8228) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][360/625] eta 0:01:10 lr 0.000077 wd 0.0500 time 0.2553 (0.2657) data time 0.0006 (0.0024) model time 0.2547 (0.2633) loss 6.5277 (5.5389) grad_norm 3.9282 (2.8191) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][370/625] eta 0:01:07 lr 0.000077 wd 0.0500 time 0.2529 (0.2654) data time 0.0008 (0.0023) model time 0.2522 (0.2630) loss 4.7978 (5.5359) grad_norm 2.4645 (2.8111) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][380/625] eta 0:01:04 lr 0.000077 wd 0.0500 time 0.2572 (0.2652) data time 0.0006 (0.0023) model time 0.2565 (0.2627) loss 6.1594 (5.5396) grad_norm 2.6685 (2.8193) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][390/625] eta 0:01:02 lr 0.000077 wd 0.0500 time 0.2542 (0.2649) data time 0.0009 (0.0023) model time 0.2533 (0.2625) loss 5.3595 (5.5387) grad_norm 2.1987 (2.8132) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][400/625] eta 0:00:59 lr 0.000077 wd 0.0500 time 0.2568 (0.2652) data time 0.0008 (0.0022) model time 0.2560 (0.2628) loss 6.2485 (5.5388) grad_norm 2.4425 (2.8108) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][410/625] eta 0:00:56 lr 0.000077 wd 0.0500 time 0.2523 (0.2649) data time 0.0010 (0.0022) model time 0.2513 (0.2626) loss 5.1752 (5.5432) grad_norm 2.0863 (2.8071) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][420/625] eta 0:00:54 lr 0.000077 wd 0.0500 time 0.2571 (0.2652) data time 0.0006 (0.0022) model time 0.2564 (0.2630) loss 6.0430 (5.5456) grad_norm 3.0857 (2.8112) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][430/625] eta 0:00:51 lr 0.000077 wd 0.0500 time 0.2563 (0.2650) data time 0.0006 (0.0021) model time 0.2556 (0.2628) loss 5.8656 (5.5409) grad_norm 3.1770 (2.8077) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][440/625] eta 0:00:49 lr 0.000077 wd 0.0500 time 0.2538 (0.2649) data time 0.0009 (0.0021) model time 0.2529 (0.2627) loss 6.0762 (5.5372) grad_norm 2.4757 (2.8281) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:13:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][450/625] eta 0:00:46 lr 0.000077 wd 0.0500 time 0.2583 (0.2647) data time 0.0008 (0.0021) model time 0.2575 (0.2625) loss 5.3153 (5.5368) grad_norm 2.5785 (2.8396) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][460/625] eta 0:00:43 lr 0.000077 wd 0.0500 time 0.2573 (0.2649) data time 0.0007 (0.0021) model time 0.2566 (0.2627) loss 5.1667 (5.5320) grad_norm 3.8892 (2.8642) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][470/625] eta 0:00:41 lr 0.000076 wd 0.0500 time 0.2553 (0.2651) data time 0.0007 (0.0020) model time 0.2546 (0.2630) loss 5.5670 (5.5269) grad_norm 5.0948 (2.8748) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][480/625] eta 0:00:38 lr 0.000076 wd 0.0500 time 0.2555 (0.2649) data time 0.0007 (0.0020) model time 0.2547 (0.2628) loss 4.8969 (5.5251) grad_norm 2.6019 (2.8633) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][490/625] eta 0:00:35 lr 0.000076 wd 0.0500 time 0.2541 (0.2647) data time 0.0006 (0.0020) model time 0.2534 (0.2626) loss 5.1639 (5.5252) grad_norm 3.2340 (2.8595) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][500/625] eta 0:00:33 lr 0.000076 wd 0.0500 time 0.2570 (0.2646) data time 0.0007 (0.0020) model time 0.2563 (0.2625) loss 5.7055 (5.5203) grad_norm 2.6402 (2.8746) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][510/625] eta 0:00:30 lr 0.000076 wd 0.0500 time 0.2607 (0.2645) data time 0.0006 (0.0019) model time 0.2601 (0.2624) loss 6.3802 (5.5261) grad_norm 2.8186 (2.8831) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][520/625] eta 0:00:27 lr 0.000076 wd 0.0500 time 0.2576 (0.2646) data time 0.0009 (0.0019) model time 0.2567 (0.2625) loss 5.1593 (5.5292) grad_norm 2.1617 (2.8896) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][530/625] eta 0:00:25 lr 0.000076 wd 0.0500 time 0.2527 (0.2644) data time 0.0008 (0.0019) model time 0.2519 (0.2624) loss 5.7477 (5.5289) grad_norm 4.5648 (2.8802) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][540/625] eta 0:00:22 lr 0.000076 wd 0.0500 time 0.2539 (0.2643) data time 0.0008 (0.0019) model time 0.2531 (0.2623) loss 5.9137 (5.5267) grad_norm 3.2383 (2.8804) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][550/625] eta 0:00:19 lr 0.000076 wd 0.0500 time 0.2523 (0.2641) data time 0.0006 (0.0019) model time 0.2516 (0.2621) loss 5.6029 (5.5234) grad_norm 3.5184 (2.8861) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][560/625] eta 0:00:17 lr 0.000076 wd 0.0500 time 0.2534 (0.2640) data time 0.0008 (0.0018) model time 0.2526 (0.2620) loss 4.3169 (5.5214) grad_norm 1.8602 (2.8838) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][570/625] eta 0:00:14 lr 0.000076 wd 0.0500 time 0.2638 (0.2641) data time 0.0006 (0.0018) model time 0.2632 (0.2622) loss 5.6005 (5.5218) grad_norm 2.9442 (2.9041) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][580/625] eta 0:00:11 lr 0.000076 wd 0.0500 time 0.2542 (0.2640) data time 0.0008 (0.0018) model time 0.2534 (0.2620) loss 5.8151 (5.5187) grad_norm 2.1070 (2.8927) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][590/625] eta 0:00:09 lr 0.000076 wd 0.0500 time 0.4723 (0.2642) data time 0.0007 (0.0018) model time 0.4716 (0.2623) loss 5.1903 (5.5146) grad_norm 2.6628 (2.8957) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][600/625] eta 0:00:06 lr 0.000076 wd 0.0500 time 0.2543 (0.2641) data time 0.0009 (0.0018) model time 0.2534 (0.2622) loss 5.4048 (5.5143) grad_norm 2.5397 (2.8947) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][610/625] eta 0:00:03 lr 0.000076 wd 0.0500 time 0.2527 (0.2640) data time 0.0003 (0.0018) model time 0.2524 (0.2621) loss 5.8773 (5.5143) grad_norm 2.9002 (2.8891) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][620/625] eta 0:00:01 lr 0.000076 wd 0.0500 time 0.2515 (0.2638) data time 0.0005 (0.0018) model time 0.2510 (0.2619) loss 5.0993 (5.5172) grad_norm 3.6435 (2.8841) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 269 training takes 0:02:44 [2024-08-04 10:14:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:14:43 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5957 (0.5957) Acc@1 90.723 (90.723) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 10:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9019 (0.7117) Acc@1 81.592 (87.336) Acc@5 96.777 (97.843) Mem 9655MB [2024-08-04 10:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0020 (0.8309) Acc@1 79.248 (84.224) Acc@5 95.898 (96.722) Mem 9655MB [2024-08-04 10:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.879 Acc@5 96.743 [2024-08-04 10:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:14:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.88% [2024-08-04 10:14:45 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:14:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5859 (0.5859) Acc@1 90.332 (90.332) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8955 (0.7066) Acc@1 81.787 (87.052) Acc@5 96.436 (97.789) Mem 9655MB [2024-08-04 10:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0088 (0.8258) Acc@1 78.711 (83.977) Acc@5 95.605 (96.629) Mem 9655MB [2024-08-04 10:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.617 Acc@5 96.635 [2024-08-04 10:14:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-04 10:14:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][0/625] eta 0:10:45 lr 0.000076 wd 0.0500 time 1.0327 (1.0327) data time 0.5498 (0.5498) model time 0.0000 (0.0000) loss 4.5218 (4.5218) grad_norm 2.8879 (2.8879) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][10/625] eta 0:03:20 lr 0.000075 wd 0.0500 time 0.2533 (0.3258) data time 0.0006 (0.0508) model time 0.0000 (0.0000) loss 5.1587 (5.2466) grad_norm 2.4191 (3.3162) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][20/625] eta 0:03:02 lr 0.000075 wd 0.0500 time 0.2552 (0.3019) data time 0.0008 (0.0270) model time 0.0000 (0.0000) loss 5.9733 (5.4689) grad_norm 1.9401 (3.6460) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][30/625] eta 0:02:54 lr 0.000075 wd 0.0500 time 0.2581 (0.2929) data time 0.0005 (0.0186) model time 0.0000 (0.0000) loss 4.9329 (5.4880) grad_norm 2.3805 (3.1903) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:14:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][40/625] eta 0:02:48 lr 0.000075 wd 0.0500 time 0.2545 (0.2887) data time 0.0010 (0.0143) model time 0.0000 (0.0000) loss 4.7874 (5.5332) grad_norm 2.8530 (3.0529) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][50/625] eta 0:02:44 lr 0.000075 wd 0.0500 time 0.2572 (0.2866) data time 0.0006 (0.0117) model time 0.0000 (0.0000) loss 5.4612 (5.4770) grad_norm 3.5669 (3.1548) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][60/625] eta 0:02:39 lr 0.000075 wd 0.0500 time 0.2528 (0.2819) data time 0.0007 (0.0099) model time 0.2521 (0.2574) loss 4.6638 (5.4490) grad_norm 5.6728 (3.1058) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][70/625] eta 0:02:34 lr 0.000075 wd 0.0500 time 0.2551 (0.2782) data time 0.0008 (0.0086) model time 0.2543 (0.2560) loss 5.2894 (5.4422) grad_norm 2.1345 (2.9650) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][80/625] eta 0:02:30 lr 0.000075 wd 0.0500 time 0.2538 (0.2753) data time 0.0007 (0.0077) model time 0.2531 (0.2552) loss 5.1450 (5.4274) grad_norm 2.7358 (2.9682) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][90/625] eta 0:02:26 lr 0.000075 wd 0.0500 time 0.2557 (0.2730) data time 0.0007 (0.0070) model time 0.2550 (0.2548) loss 5.9815 (5.4501) grad_norm 2.5843 (2.9585) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][100/625] eta 0:02:24 lr 0.000075 wd 0.0500 time 0.2556 (0.2751) data time 0.0008 (0.0064) model time 0.2549 (0.2625) loss 5.3204 (5.4539) grad_norm 2.5022 (3.0594) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][110/625] eta 0:02:20 lr 0.000075 wd 0.0500 time 0.2521 (0.2734) data time 0.0007 (0.0059) model time 0.2514 (0.2612) loss 5.3141 (5.4590) grad_norm 2.4857 (3.0241) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][120/625] eta 0:02:18 lr 0.000075 wd 0.0500 time 0.2592 (0.2735) data time 0.0008 (0.0054) model time 0.2584 (0.2631) loss 6.2996 (5.4518) grad_norm 3.0583 (2.9679) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][130/625] eta 0:02:14 lr 0.000075 wd 0.0500 time 0.2585 (0.2722) data time 0.0006 (0.0051) model time 0.2579 (0.2620) loss 5.3251 (5.4602) grad_norm 2.1629 (2.9146) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][140/625] eta 0:02:11 lr 0.000075 wd 0.0500 time 0.2562 (0.2712) data time 0.0011 (0.0048) model time 0.2551 (0.2616) loss 5.6109 (5.4484) grad_norm 2.3914 (2.8836) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][150/625] eta 0:02:08 lr 0.000075 wd 0.0500 time 0.2499 (0.2702) data time 0.0007 (0.0046) model time 0.2492 (0.2609) loss 5.5281 (5.4416) grad_norm 1.6700 (2.8714) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][160/625] eta 0:02:05 lr 0.000075 wd 0.0500 time 0.2609 (0.2693) data time 0.0007 (0.0043) model time 0.2601 (0.2604) loss 5.5863 (5.4511) grad_norm 2.6330 (2.8751) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][170/625] eta 0:02:02 lr 0.000075 wd 0.0500 time 0.2514 (0.2685) data time 0.0011 (0.0041) model time 0.2503 (0.2599) loss 4.6546 (5.4553) grad_norm 6.3275 (2.8649) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][180/625] eta 0:01:59 lr 0.000075 wd 0.0500 time 0.2538 (0.2678) data time 0.0008 (0.0040) model time 0.2530 (0.2595) loss 6.5895 (5.4533) grad_norm 4.3082 (2.8922) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][190/625] eta 0:01:56 lr 0.000074 wd 0.0500 time 0.2527 (0.2671) data time 0.0006 (0.0038) model time 0.2521 (0.2591) loss 4.5080 (5.4561) grad_norm 3.0739 (2.8785) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][200/625] eta 0:01:53 lr 0.000074 wd 0.0500 time 0.2560 (0.2665) data time 0.0008 (0.0036) model time 0.2552 (0.2587) loss 5.3093 (5.4735) grad_norm 1.8806 (3.1433) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][210/625] eta 0:01:50 lr 0.000074 wd 0.0500 time 0.2552 (0.2660) data time 0.0006 (0.0035) model time 0.2546 (0.2585) loss 5.3043 (5.4599) grad_norm 2.4320 (3.1292) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][220/625] eta 0:01:47 lr 0.000074 wd 0.0500 time 0.2534 (0.2655) data time 0.0008 (0.0034) model time 0.2526 (0.2583) loss 4.9405 (5.4646) grad_norm 2.2236 (3.1099) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][230/625] eta 0:01:45 lr 0.000074 wd 0.0500 time 0.2578 (0.2663) data time 0.0008 (0.0033) model time 0.2569 (0.2597) loss 5.3297 (5.4675) grad_norm 3.2169 (3.0993) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][240/625] eta 0:01:42 lr 0.000074 wd 0.0500 time 0.2556 (0.2667) data time 0.0010 (0.0032) model time 0.2547 (0.2605) loss 5.4685 (5.4752) grad_norm 1.9258 (3.0787) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][250/625] eta 0:01:39 lr 0.000074 wd 0.0500 time 0.2557 (0.2663) data time 0.0009 (0.0031) model time 0.2549 (0.2602) loss 6.1535 (5.4846) grad_norm 2.8267 (3.0695) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][260/625] eta 0:01:37 lr 0.000074 wd 0.0500 time 0.2534 (0.2659) data time 0.0010 (0.0030) model time 0.2524 (0.2599) loss 6.0259 (5.4734) grad_norm 3.8267 (3.1127) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][270/625] eta 0:01:34 lr 0.000074 wd 0.0500 time 0.2578 (0.2662) data time 0.0007 (0.0029) model time 0.2570 (0.2605) loss 4.9176 (5.4784) grad_norm 2.9136 (3.1202) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][280/625] eta 0:01:31 lr 0.000074 wd 0.0500 time 0.2562 (0.2658) data time 0.0008 (0.0029) model time 0.2554 (0.2603) loss 4.5478 (5.4780) grad_norm 3.4653 (3.1039) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][290/625] eta 0:01:28 lr 0.000074 wd 0.0500 time 0.2532 (0.2654) data time 0.0009 (0.0028) model time 0.2524 (0.2600) loss 5.7158 (5.4776) grad_norm 2.1918 (3.0816) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][300/625] eta 0:01:26 lr 0.000074 wd 0.0500 time 0.2536 (0.2651) data time 0.0008 (0.0027) model time 0.2528 (0.2598) loss 6.4083 (5.4842) grad_norm 2.5060 (3.1548) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][310/625] eta 0:01:23 lr 0.000074 wd 0.0500 time 0.2590 (0.2648) data time 0.0008 (0.0027) model time 0.2582 (0.2596) loss 5.5169 (5.4741) grad_norm 1.6295 (3.1868) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][320/625] eta 0:01:20 lr 0.000074 wd 0.0500 time 0.2598 (0.2646) data time 0.0006 (0.0026) model time 0.2592 (0.2595) loss 4.6276 (5.4683) grad_norm 2.0141 (3.1832) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][330/625] eta 0:01:18 lr 0.000074 wd 0.0500 time 0.2550 (0.2648) data time 0.0009 (0.0026) model time 0.2541 (0.2600) loss 5.1680 (5.4721) grad_norm 2.7556 (3.1762) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][340/625] eta 0:01:15 lr 0.000074 wd 0.0500 time 0.2574 (0.2645) data time 0.0010 (0.0025) model time 0.2563 (0.2597) loss 5.5972 (5.4720) grad_norm 2.4198 (3.1725) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][350/625] eta 0:01:12 lr 0.000074 wd 0.0500 time 0.2533 (0.2649) data time 0.0009 (0.0025) model time 0.2524 (0.2603) loss 5.2497 (5.4838) grad_norm 1.4777 (3.1649) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][360/625] eta 0:01:10 lr 0.000073 wd 0.0500 time 0.2578 (0.2650) data time 0.0008 (0.0024) model time 0.2570 (0.2605) loss 5.9698 (5.4908) grad_norm 1.6351 (3.1628) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][370/625] eta 0:01:07 lr 0.000073 wd 0.0500 time 0.2536 (0.2647) data time 0.0007 (0.0024) model time 0.2528 (0.2603) loss 4.3051 (5.4901) grad_norm 2.3390 (3.1429) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][380/625] eta 0:01:04 lr 0.000073 wd 0.0500 time 0.2558 (0.2645) data time 0.0006 (0.0024) model time 0.2551 (0.2602) loss 6.6845 (5.5004) grad_norm 6.1412 (3.1506) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][390/625] eta 0:01:02 lr 0.000073 wd 0.0500 time 0.2543 (0.2643) data time 0.0007 (0.0023) model time 0.2536 (0.2600) loss 5.1863 (5.4979) grad_norm 1.9239 (3.1408) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][400/625] eta 0:00:59 lr 0.000073 wd 0.0500 time 0.2551 (0.2641) data time 0.0018 (0.0023) model time 0.2533 (0.2599) loss 5.8997 (5.4976) grad_norm 3.4835 (3.1829) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][410/625] eta 0:00:56 lr 0.000073 wd 0.0500 time 0.2542 (0.2639) data time 0.0008 (0.0023) model time 0.2534 (0.2597) loss 5.3640 (5.5017) grad_norm 3.6981 (3.1710) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][420/625] eta 0:00:54 lr 0.000073 wd 0.0500 time 0.2579 (0.2637) data time 0.0006 (0.0022) model time 0.2573 (0.2596) loss 6.2843 (5.5042) grad_norm 1.9935 (3.1627) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][430/625] eta 0:00:51 lr 0.000073 wd 0.0500 time 0.2522 (0.2635) data time 0.0009 (0.0022) model time 0.2513 (0.2595) loss 5.2515 (5.5020) grad_norm 3.6912 (3.1492) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][440/625] eta 0:00:48 lr 0.000073 wd 0.0500 time 0.2564 (0.2637) data time 0.0007 (0.0022) model time 0.2557 (0.2598) loss 5.4355 (5.4977) grad_norm 3.1275 (3.1320) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][450/625] eta 0:00:46 lr 0.000073 wd 0.0500 time 0.2562 (0.2639) data time 0.0009 (0.0021) model time 0.2553 (0.2601) loss 6.0597 (5.4961) grad_norm 2.8025 (3.1310) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][460/625] eta 0:00:43 lr 0.000073 wd 0.0500 time 0.2550 (0.2638) data time 0.0007 (0.0021) model time 0.2542 (0.2600) loss 4.4488 (5.4987) grad_norm 7.3961 (3.1377) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][470/625] eta 0:00:40 lr 0.000073 wd 0.0500 time 0.2506 (0.2640) data time 0.0010 (0.0021) model time 0.2495 (0.2604) loss 5.5054 (5.4979) grad_norm 1.8070 (3.1309) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][480/625] eta 0:00:38 lr 0.000073 wd 0.0500 time 0.2540 (0.2643) data time 0.0013 (0.0021) model time 0.2527 (0.2608) loss 5.7813 (5.5062) grad_norm 3.3703 (3.1205) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][490/625] eta 0:00:35 lr 0.000073 wd 0.0500 time 0.2530 (0.2642) data time 0.0012 (0.0020) model time 0.2518 (0.2606) loss 4.9106 (5.4988) grad_norm 1.8569 (3.1025) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][500/625] eta 0:00:33 lr 0.000073 wd 0.0500 time 0.2566 (0.2640) data time 0.0008 (0.0020) model time 0.2558 (0.2605) loss 5.6328 (5.5013) grad_norm 1.8851 (3.0907) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][510/625] eta 0:00:30 lr 0.000073 wd 0.0500 time 0.2576 (0.2645) data time 0.0008 (0.0020) model time 0.2567 (0.2611) loss 4.8301 (5.5004) grad_norm 2.0398 (3.0779) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][520/625] eta 0:00:27 lr 0.000073 wd 0.0500 time 0.2528 (0.2643) data time 0.0007 (0.0020) model time 0.2521 (0.2610) loss 4.4124 (5.4973) grad_norm 1.8186 (3.0696) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][530/625] eta 0:00:25 lr 0.000072 wd 0.0500 time 0.2558 (0.2642) data time 0.0009 (0.0020) model time 0.2549 (0.2609) loss 5.4883 (5.4956) grad_norm 3.0535 (3.0741) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][540/625] eta 0:00:22 lr 0.000072 wd 0.0500 time 0.2520 (0.2640) data time 0.0007 (0.0019) model time 0.2513 (0.2607) loss 5.4467 (5.4904) grad_norm 1.8518 (3.0772) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][550/625] eta 0:00:19 lr 0.000072 wd 0.0500 time 0.2682 (0.2641) data time 0.0008 (0.0019) model time 0.2674 (0.2609) loss 5.8449 (5.4903) grad_norm 1.7924 (3.0685) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][560/625] eta 0:00:17 lr 0.000072 wd 0.0500 time 0.2522 (0.2640) data time 0.0010 (0.0019) model time 0.2512 (0.2608) loss 4.7740 (5.4899) grad_norm 3.5417 (3.0569) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][570/625] eta 0:00:14 lr 0.000072 wd 0.0500 time 0.2525 (0.2638) data time 0.0010 (0.0019) model time 0.2515 (0.2607) loss 4.1122 (5.4831) grad_norm 1.4499 (3.0440) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][580/625] eta 0:00:11 lr 0.000072 wd 0.0500 time 0.2513 (0.2637) data time 0.0011 (0.0019) model time 0.2502 (0.2605) loss 6.2336 (5.4909) grad_norm 4.7463 (3.0516) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][590/625] eta 0:00:09 lr 0.000072 wd 0.0500 time 0.2561 (0.2635) data time 0.0009 (0.0019) model time 0.2552 (0.2604) loss 6.3997 (5.4903) grad_norm 6.8066 (3.1140) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][600/625] eta 0:00:06 lr 0.000072 wd 0.0500 time 0.2574 (0.2636) data time 0.0011 (0.0018) model time 0.2563 (0.2605) loss 5.4381 (5.4860) grad_norm 2.1990 (3.1433) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][610/625] eta 0:00:03 lr 0.000072 wd 0.0500 time 0.2530 (0.2635) data time 0.0004 (0.0018) model time 0.2526 (0.2604) loss 5.4876 (5.4871) grad_norm 2.1743 (3.1334) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][620/625] eta 0:00:01 lr 0.000072 wd 0.0500 time 0.2518 (0.2635) data time 0.0004 (0.0018) model time 0.2514 (0.2605) loss 5.0640 (5.4873) grad_norm 3.2231 (3.1260) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 270 training takes 0:02:44 [2024-08-04 10:17:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:17:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:17:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.662 (0.662) Loss 0.6025 (0.6025) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.113) Loss 0.9082 (0.7180) Acc@1 82.031 (87.220) Acc@5 96.680 (97.829) Mem 9655MB [2024-08-04 10:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.085) Loss 1.0020 (0.8342) Acc@1 78.906 (84.131) Acc@5 95.752 (96.722) Mem 9655MB [2024-08-04 10:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.757 Acc@5 96.743 [2024-08-04 10:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:17:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.760 (0.760) Loss 0.5859 (0.5859) Acc@1 90.381 (90.381) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:17:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.8965 (0.7070) Acc@1 81.836 (87.092) Acc@5 96.387 (97.785) Mem 9655MB [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0098 (0.8260) Acc@1 78.613 (84.015) Acc@5 95.605 (96.629) Mem 9655MB [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.651 Acc@5 96.641 [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.65% [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:17:37 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:17:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][0/625] eta 0:07:07 lr 0.000072 wd 0.0500 time 0.6841 (0.6841) data time 0.4388 (0.4388) model time 0.0000 (0.0000) loss 4.4483 (4.4483) grad_norm 2.3032 (2.3032) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][10/625] eta 0:03:11 lr 0.000072 wd 0.0500 time 0.2598 (0.3111) data time 0.0007 (0.0408) model time 0.0000 (0.0000) loss 5.0247 (5.4358) grad_norm 2.2980 (2.5455) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][20/625] eta 0:02:52 lr 0.000072 wd 0.0500 time 0.2527 (0.2847) data time 0.0010 (0.0218) model time 0.0000 (0.0000) loss 5.3063 (5.5688) grad_norm 2.1349 (2.5888) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][30/625] eta 0:02:47 lr 0.000072 wd 0.0500 time 0.2535 (0.2815) data time 0.0008 (0.0151) model time 0.0000 (0.0000) loss 4.6049 (5.5122) grad_norm 3.3284 (2.7565) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][40/625] eta 0:02:43 lr 0.000072 wd 0.0500 time 0.2553 (0.2795) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 4.9565 (5.5301) grad_norm 2.5405 (2.7744) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][50/625] eta 0:02:38 lr 0.000072 wd 0.0500 time 0.2553 (0.2749) data time 0.0010 (0.0095) model time 0.0000 (0.0000) loss 5.4353 (5.5585) grad_norm 3.5402 (2.7313) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][60/625] eta 0:02:33 lr 0.000072 wd 0.0500 time 0.2534 (0.2716) data time 0.0008 (0.0081) model time 0.2527 (0.2536) loss 6.3659 (5.5292) grad_norm 2.9936 (2.6700) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][70/625] eta 0:02:29 lr 0.000072 wd 0.0500 time 0.2568 (0.2695) data time 0.0006 (0.0071) model time 0.2562 (0.2549) loss 5.5075 (5.5508) grad_norm 1.8572 (2.6078) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:17:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][80/625] eta 0:02:27 lr 0.000071 wd 0.0500 time 0.2554 (0.2704) data time 0.0010 (0.0064) model time 0.2544 (0.2617) loss 4.9441 (5.5273) grad_norm 2.5725 (2.6033) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][90/625] eta 0:02:25 lr 0.000071 wd 0.0500 time 0.2578 (0.2714) data time 0.0008 (0.0058) model time 0.2571 (0.2660) loss 5.9237 (5.5376) grad_norm 2.8565 (2.6028) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][100/625] eta 0:02:22 lr 0.000071 wd 0.0500 time 0.4710 (0.2720) data time 0.0008 (0.0053) model time 0.4702 (0.2681) loss 6.1422 (5.5091) grad_norm 10.7013 (2.7117) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][110/625] eta 0:02:20 lr 0.000071 wd 0.0500 time 0.2576 (0.2726) data time 0.0008 (0.0049) model time 0.2568 (0.2696) loss 6.4104 (5.5050) grad_norm 2.8317 (2.6956) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][120/625] eta 0:02:16 lr 0.000071 wd 0.0500 time 0.2592 (0.2712) data time 0.0007 (0.0046) model time 0.2586 (0.2674) loss 6.3885 (5.5182) grad_norm 2.8746 (2.6833) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][130/625] eta 0:02:14 lr 0.000071 wd 0.0500 time 0.2543 (0.2713) data time 0.0009 (0.0043) model time 0.2534 (0.2681) loss 6.0904 (5.5017) grad_norm 3.9894 (2.6853) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][140/625] eta 0:02:11 lr 0.000071 wd 0.0500 time 0.2614 (0.2716) data time 0.0008 (0.0041) model time 0.2606 (0.2688) loss 5.0970 (5.4953) grad_norm 3.1427 (2.6730) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:18:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][150/625] eta 0:02:09 lr 0.000071 wd 0.0500 time 0.2539 (0.2718) data time 0.0011 (0.0039) model time 0.2528 (0.2693) loss 5.4898 (5.4717) grad_norm 1.8502 (2.6716) loss_scale 256.0000 (133.0861) mem 9655MB [2024-08-04 10:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][160/625] eta 0:02:05 lr 0.000071 wd 0.0500 time 0.2565 (0.2709) data time 0.0009 (0.0037) model time 0.2556 (0.2680) loss 5.8660 (5.4914) grad_norm 2.6338 (2.7071) loss_scale 256.0000 (140.7205) mem 9655MB [2024-08-04 10:18:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][170/625] eta 0:02:02 lr 0.000071 wd 0.0500 time 0.2576 (0.2700) data time 0.0008 (0.0035) model time 0.2568 (0.2670) loss 4.5487 (5.4857) grad_norm 1.8006 (2.6928) loss_scale 256.0000 (147.4620) mem 9655MB [2024-08-04 10:18:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][180/625] eta 0:02:00 lr 0.000071 wd 0.0500 time 0.2570 (0.2700) data time 0.0008 (0.0034) model time 0.2562 (0.2671) loss 4.6357 (5.4727) grad_norm 1.6783 (2.7154) loss_scale 256.0000 (153.4586) mem 9655MB [2024-08-04 10:18:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][190/625] eta 0:01:57 lr 0.000071 wd 0.0500 time 0.2547 (0.2692) data time 0.0011 (0.0032) model time 0.2536 (0.2661) loss 5.4031 (5.4693) grad_norm 1.9637 (2.6923) loss_scale 256.0000 (158.8272) mem 9655MB [2024-08-04 10:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][200/625] eta 0:01:54 lr 0.000071 wd 0.0500 time 0.2613 (0.2694) data time 0.0007 (0.0031) model time 0.2605 (0.2665) loss 6.0770 (5.4802) grad_norm 1.7360 (2.7350) loss_scale 256.0000 (163.6617) mem 9655MB [2024-08-04 10:18:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][210/625] eta 0:01:51 lr 0.000071 wd 0.0500 time 0.2550 (0.2687) data time 0.0008 (0.0030) model time 0.2542 (0.2658) loss 5.1781 (5.4767) grad_norm 1.8939 (2.7771) loss_scale 256.0000 (168.0379) mem 9655MB [2024-08-04 10:18:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][220/625] eta 0:01:48 lr 0.000071 wd 0.0500 time 0.2601 (0.2682) data time 0.0007 (0.0029) model time 0.2594 (0.2653) loss 5.7698 (5.4704) grad_norm 2.0950 (2.7633) loss_scale 256.0000 (172.0181) mem 9655MB [2024-08-04 10:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][230/625] eta 0:01:46 lr 0.000071 wd 0.0500 time 0.2553 (0.2685) data time 0.0008 (0.0028) model time 0.2545 (0.2657) loss 6.1368 (5.4738) grad_norm 3.6712 (2.7454) loss_scale 256.0000 (175.6537) mem 9655MB [2024-08-04 10:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][240/625] eta 0:01:43 lr 0.000071 wd 0.0500 time 0.2544 (0.2680) data time 0.0008 (0.0028) model time 0.2536 (0.2652) loss 5.9583 (5.4865) grad_norm 3.4478 (2.7465) loss_scale 256.0000 (178.9876) mem 9655MB [2024-08-04 10:18:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][250/625] eta 0:01:40 lr 0.000071 wd 0.0500 time 0.2515 (0.2675) data time 0.0010 (0.0027) model time 0.2505 (0.2647) loss 6.6714 (5.4945) grad_norm 1.8883 (2.7308) loss_scale 256.0000 (182.0558) mem 9655MB [2024-08-04 10:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][260/625] eta 0:01:37 lr 0.000070 wd 0.0500 time 0.2539 (0.2671) data time 0.0006 (0.0026) model time 0.2532 (0.2643) loss 4.8075 (5.4929) grad_norm 3.8373 (2.7426) loss_scale 256.0000 (184.8889) mem 9655MB [2024-08-04 10:18:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][270/625] eta 0:01:34 lr 0.000070 wd 0.0500 time 0.2571 (0.2667) data time 0.0006 (0.0026) model time 0.2565 (0.2638) loss 4.4463 (5.4889) grad_norm 2.4766 (2.7501) loss_scale 256.0000 (187.5129) mem 9655MB [2024-08-04 10:18:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][280/625] eta 0:01:31 lr 0.000070 wd 0.0500 time 0.2560 (0.2663) data time 0.0007 (0.0025) model time 0.2553 (0.2635) loss 6.0254 (5.4955) grad_norm 3.1660 (2.7539) loss_scale 256.0000 (189.9502) mem 9655MB [2024-08-04 10:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][290/625] eta 0:01:29 lr 0.000070 wd 0.0500 time 0.3821 (0.2664) data time 0.0009 (0.0024) model time 0.3811 (0.2637) loss 5.9068 (5.4908) grad_norm 7.1342 (2.7704) loss_scale 256.0000 (192.2199) mem 9655MB [2024-08-04 10:18:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][300/625] eta 0:01:26 lr 0.000070 wd 0.0500 time 0.2605 (0.2665) data time 0.0009 (0.0024) model time 0.2596 (0.2639) loss 6.0952 (5.4971) grad_norm 2.0731 (2.7644) loss_scale 256.0000 (194.3389) mem 9655MB [2024-08-04 10:19:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][310/625] eta 0:01:23 lr 0.000070 wd 0.0500 time 0.2580 (0.2662) data time 0.0008 (0.0023) model time 0.2571 (0.2636) loss 6.2346 (5.4985) grad_norm 2.1585 (2.7560) loss_scale 256.0000 (196.3215) mem 9655MB [2024-08-04 10:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][320/625] eta 0:01:21 lr 0.000070 wd 0.0500 time 0.2592 (0.2660) data time 0.0008 (0.0023) model time 0.2584 (0.2634) loss 5.9432 (5.5164) grad_norm 5.1273 (2.7540) loss_scale 256.0000 (198.1807) mem 9655MB [2024-08-04 10:19:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][330/625] eta 0:01:18 lr 0.000070 wd 0.0500 time 0.2568 (0.2657) data time 0.0009 (0.0023) model time 0.2559 (0.2631) loss 5.9879 (5.5206) grad_norm 2.7347 (2.7683) loss_scale 256.0000 (199.9275) mem 9655MB [2024-08-04 10:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][340/625] eta 0:01:15 lr 0.000070 wd 0.0500 time 0.2634 (0.2654) data time 0.0006 (0.0022) model time 0.2628 (0.2628) loss 5.0260 (5.5217) grad_norm 2.1608 (2.7702) loss_scale 256.0000 (201.5718) mem 9655MB [2024-08-04 10:19:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][350/625] eta 0:01:13 lr 0.000070 wd 0.0500 time 0.2510 (0.2657) data time 0.0008 (0.0022) model time 0.2502 (0.2632) loss 4.6144 (5.5258) grad_norm 2.3550 (2.7726) loss_scale 256.0000 (203.1225) mem 9655MB [2024-08-04 10:19:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][360/625] eta 0:01:10 lr 0.000070 wd 0.0500 time 0.2540 (0.2654) data time 0.0010 (0.0022) model time 0.2530 (0.2629) loss 4.9799 (5.5250) grad_norm 3.3218 (2.7727) loss_scale 256.0000 (204.5873) mem 9655MB [2024-08-04 10:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][370/625] eta 0:01:07 lr 0.000070 wd 0.0500 time 0.2561 (0.2651) data time 0.0008 (0.0021) model time 0.2553 (0.2626) loss 5.2836 (5.5201) grad_norm 2.4856 (2.7800) loss_scale 256.0000 (205.9730) mem 9655MB [2024-08-04 10:19:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][380/625] eta 0:01:04 lr 0.000070 wd 0.0500 time 0.2584 (0.2649) data time 0.0010 (0.0021) model time 0.2574 (0.2624) loss 6.0138 (5.5107) grad_norm 1.8288 (2.7804) loss_scale 256.0000 (207.2861) mem 9655MB [2024-08-04 10:19:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][390/625] eta 0:01:02 lr 0.000070 wd 0.0500 time 0.2585 (0.2647) data time 0.0009 (0.0021) model time 0.2576 (0.2622) loss 5.5736 (5.5198) grad_norm 4.2240 (2.7926) loss_scale 256.0000 (208.5320) mem 9655MB [2024-08-04 10:19:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][400/625] eta 0:00:59 lr 0.000070 wd 0.0500 time 0.2516 (0.2644) data time 0.0008 (0.0020) model time 0.2508 (0.2620) loss 4.7318 (5.5209) grad_norm 2.8648 (2.7850) loss_scale 256.0000 (209.7157) mem 9655MB [2024-08-04 10:19:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][410/625] eta 0:00:56 lr 0.000070 wd 0.0500 time 0.2526 (0.2642) data time 0.0008 (0.0020) model time 0.2517 (0.2618) loss 5.6935 (5.5213) grad_norm 1.9363 (2.8579) loss_scale 256.0000 (210.8418) mem 9655MB [2024-08-04 10:19:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][420/625] eta 0:00:54 lr 0.000070 wd 0.0500 time 0.2547 (0.2640) data time 0.0009 (0.0020) model time 0.2538 (0.2616) loss 5.3081 (5.5120) grad_norm 1.5519 (2.9071) loss_scale 256.0000 (211.9145) mem 9655MB [2024-08-04 10:19:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][430/625] eta 0:00:51 lr 0.000070 wd 0.0500 time 0.2521 (0.2639) data time 0.0008 (0.0020) model time 0.2513 (0.2615) loss 5.6393 (5.5059) grad_norm 1.6530 (2.9129) loss_scale 256.0000 (212.9374) mem 9655MB [2024-08-04 10:19:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][440/625] eta 0:00:48 lr 0.000069 wd 0.0500 time 0.2537 (0.2637) data time 0.0010 (0.0019) model time 0.2526 (0.2613) loss 5.7807 (5.5132) grad_norm 3.0189 (2.9045) loss_scale 256.0000 (213.9138) mem 9655MB [2024-08-04 10:19:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][450/625] eta 0:00:46 lr 0.000069 wd 0.0500 time 0.2538 (0.2635) data time 0.0008 (0.0019) model time 0.2530 (0.2611) loss 6.4724 (5.5154) grad_norm 2.7831 (2.9066) loss_scale 256.0000 (214.8470) mem 9655MB [2024-08-04 10:19:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][460/625] eta 0:00:43 lr 0.000069 wd 0.0500 time 0.2563 (0.2634) data time 0.0006 (0.0019) model time 0.2556 (0.2610) loss 4.7139 (5.5154) grad_norm 2.2045 (2.8986) loss_scale 256.0000 (215.7397) mem 9655MB [2024-08-04 10:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][470/625] eta 0:00:40 lr 0.000069 wd 0.0500 time 0.2580 (0.2640) data time 0.0009 (0.0019) model time 0.2571 (0.2617) loss 5.8984 (5.5106) grad_norm 1.4543 (2.8960) loss_scale 256.0000 (216.5945) mem 9655MB [2024-08-04 10:19:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][480/625] eta 0:00:38 lr 0.000069 wd 0.0500 time 0.2561 (0.2639) data time 0.0007 (0.0019) model time 0.2554 (0.2616) loss 6.6917 (5.5133) grad_norm 2.8791 (2.8827) loss_scale 256.0000 (217.4137) mem 9655MB [2024-08-04 10:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][490/625] eta 0:00:35 lr 0.000069 wd 0.0500 time 0.2537 (0.2637) data time 0.0008 (0.0018) model time 0.2529 (0.2615) loss 4.5070 (5.5071) grad_norm 3.2891 (2.8723) loss_scale 256.0000 (218.1996) mem 9655MB [2024-08-04 10:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][500/625] eta 0:00:32 lr 0.000069 wd 0.0500 time 0.2552 (0.2635) data time 0.0007 (0.0018) model time 0.2545 (0.2613) loss 5.3136 (5.5008) grad_norm 1.7525 (2.8617) loss_scale 256.0000 (218.9541) mem 9655MB [2024-08-04 10:19:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][510/625] eta 0:00:30 lr 0.000069 wd 0.0500 time 0.2556 (0.2635) data time 0.0010 (0.0018) model time 0.2547 (0.2613) loss 5.7009 (5.5006) grad_norm 2.7609 (2.8490) loss_scale 256.0000 (219.6791) mem 9655MB [2024-08-04 10:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][520/625] eta 0:00:27 lr 0.000069 wd 0.0500 time 0.2561 (0.2640) data time 0.0007 (0.0018) model time 0.2553 (0.2619) loss 6.0749 (5.4993) grad_norm 6.5202 (2.8610) loss_scale 256.0000 (220.3762) mem 9655MB [2024-08-04 10:19:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][530/625] eta 0:00:25 lr 0.000069 wd 0.0500 time 0.2554 (0.2639) data time 0.0007 (0.0018) model time 0.2547 (0.2617) loss 4.1419 (5.5017) grad_norm 2.6718 (2.8703) loss_scale 256.0000 (221.0471) mem 9655MB [2024-08-04 10:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][540/625] eta 0:00:22 lr 0.000069 wd 0.0500 time 0.2552 (0.2637) data time 0.0008 (0.0018) model time 0.2543 (0.2616) loss 5.6380 (5.5034) grad_norm 5.4905 (2.8827) loss_scale 256.0000 (221.6932) mem 9655MB [2024-08-04 10:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][550/625] eta 0:00:19 lr 0.000069 wd 0.0500 time 0.2567 (0.2641) data time 0.0008 (0.0017) model time 0.2559 (0.2621) loss 4.9773 (5.5005) grad_norm 2.6526 (2.8773) loss_scale 256.0000 (222.3158) mem 9655MB [2024-08-04 10:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][560/625] eta 0:00:17 lr 0.000069 wd 0.0500 time 0.2578 (0.2640) data time 0.0005 (0.0017) model time 0.2573 (0.2620) loss 5.2845 (5.4982) grad_norm 2.8808 (2.8764) loss_scale 256.0000 (222.9162) mem 9655MB [2024-08-04 10:20:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][570/625] eta 0:00:14 lr 0.000069 wd 0.0500 time 0.2586 (0.2638) data time 0.0008 (0.0017) model time 0.2578 (0.2618) loss 6.1505 (5.5011) grad_norm 2.4031 (2.8679) loss_scale 256.0000 (223.4956) mem 9655MB [2024-08-04 10:20:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][580/625] eta 0:00:11 lr 0.000069 wd 0.0500 time 0.2587 (0.2637) data time 0.0009 (0.0017) model time 0.2578 (0.2617) loss 5.7658 (5.4957) grad_norm 3.2607 (2.8728) loss_scale 256.0000 (224.0551) mem 9655MB [2024-08-04 10:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][590/625] eta 0:00:09 lr 0.000069 wd 0.0500 time 0.2532 (0.2636) data time 0.0009 (0.0017) model time 0.2523 (0.2616) loss 4.9891 (5.5001) grad_norm 1.9034 (2.8624) loss_scale 256.0000 (224.5956) mem 9655MB [2024-08-04 10:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][600/625] eta 0:00:06 lr 0.000069 wd 0.0500 time 0.2556 (0.2638) data time 0.0010 (0.0017) model time 0.2546 (0.2618) loss 5.0873 (5.5042) grad_norm 3.0683 (2.8659) loss_scale 256.0000 (225.1181) mem 9655MB [2024-08-04 10:20:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][610/625] eta 0:00:03 lr 0.000069 wd 0.0500 time 0.2523 (0.2637) data time 0.0006 (0.0017) model time 0.2517 (0.2617) loss 5.8681 (5.5040) grad_norm 1.7980 (2.8542) loss_scale 256.0000 (225.6236) mem 9655MB [2024-08-04 10:20:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][620/625] eta 0:00:01 lr 0.000068 wd 0.0500 time 0.2537 (0.2635) data time 0.0005 (0.0016) model time 0.2531 (0.2615) loss 5.2927 (5.4986) grad_norm 1.9513 (2.8477) loss_scale 256.0000 (226.1127) mem 9655MB [2024-08-04 10:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 271 training takes 0:02:44 [2024-08-04 10:20:22 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:20:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:20:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.553 (0.553) Loss 0.5947 (0.5947) Acc@1 90.625 (90.625) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.9058 (0.7105) Acc@1 82.080 (87.238) Acc@5 96.729 (97.856) Mem 9655MB [2024-08-04 10:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0117 (0.8282) Acc@1 79.150 (84.226) Acc@5 95.654 (96.761) Mem 9655MB [2024-08-04 10:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.871 Acc@5 96.795 [2024-08-04 10:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:20:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.885 (0.885) Loss 0.5859 (0.5859) Acc@1 90.283 (90.283) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.132) Loss 0.8960 (0.7070) Acc@1 81.934 (87.105) Acc@5 96.387 (97.789) Mem 9655MB [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.0098 (0.8260) Acc@1 78.662 (84.049) Acc@5 95.654 (96.640) Mem 9655MB [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.681 Acc@5 96.651 [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.68% [2024-08-04 10:20:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:20:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:20:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][0/625] eta 0:06:58 lr 0.000068 wd 0.0500 time 0.6703 (0.6703) data time 0.4223 (0.4223) model time 0.0000 (0.0000) loss 5.1230 (5.1230) grad_norm 9.3474 (9.3474) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][10/625] eta 0:03:08 lr 0.000068 wd 0.0500 time 0.2530 (0.3069) data time 0.0010 (0.0393) model time 0.0000 (0.0000) loss 5.1605 (5.3336) grad_norm 3.1740 (3.5375) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][20/625] eta 0:02:51 lr 0.000068 wd 0.0500 time 0.2554 (0.2829) data time 0.0006 (0.0210) model time 0.0000 (0.0000) loss 5.7566 (5.4813) grad_norm 2.7071 (3.3457) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][30/625] eta 0:02:43 lr 0.000068 wd 0.0500 time 0.2554 (0.2741) data time 0.0011 (0.0145) model time 0.0000 (0.0000) loss 5.0384 (5.5714) grad_norm 2.4744 (3.1317) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][40/625] eta 0:02:40 lr 0.000068 wd 0.0500 time 0.2544 (0.2739) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 6.3693 (5.5461) grad_norm 3.9420 (2.9724) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][50/625] eta 0:02:35 lr 0.000068 wd 0.0500 time 0.2542 (0.2705) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 5.8947 (5.6105) grad_norm 3.3508 (2.9843) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][60/625] eta 0:02:32 lr 0.000068 wd 0.0500 time 0.2553 (0.2703) data time 0.0011 (0.0078) model time 0.2542 (0.2679) loss 4.6144 (5.5709) grad_norm 3.4269 (3.1018) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][70/625] eta 0:02:30 lr 0.000068 wd 0.0500 time 0.2492 (0.2709) data time 0.0008 (0.0069) model time 0.2484 (0.2710) loss 6.0254 (5.5655) grad_norm 4.4493 (3.1721) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][80/625] eta 0:02:26 lr 0.000068 wd 0.0500 time 0.2568 (0.2691) data time 0.0009 (0.0061) model time 0.2558 (0.2657) loss 5.4240 (5.5622) grad_norm 1.8786 (3.1654) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][90/625] eta 0:02:23 lr 0.000068 wd 0.0500 time 0.2668 (0.2678) data time 0.0009 (0.0056) model time 0.2659 (0.2633) loss 6.2189 (5.5759) grad_norm 3.1598 (3.1117) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][100/625] eta 0:02:20 lr 0.000068 wd 0.0500 time 0.2563 (0.2667) data time 0.0010 (0.0051) model time 0.2553 (0.2619) loss 6.1245 (5.5387) grad_norm 2.6114 (3.0660) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][110/625] eta 0:02:17 lr 0.000068 wd 0.0500 time 0.2558 (0.2676) data time 0.0006 (0.0047) model time 0.2552 (0.2641) loss 4.4118 (5.5461) grad_norm 1.8471 (3.0134) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][120/625] eta 0:02:14 lr 0.000068 wd 0.0500 time 0.2579 (0.2667) data time 0.0009 (0.0044) model time 0.2570 (0.2629) loss 5.8264 (5.5552) grad_norm 1.6507 (2.9682) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][130/625] eta 0:02:11 lr 0.000068 wd 0.0500 time 0.2655 (0.2659) data time 0.0006 (0.0041) model time 0.2649 (0.2620) loss 5.1954 (5.5441) grad_norm 2.0818 (2.9412) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][140/625] eta 0:02:09 lr 0.000068 wd 0.0500 time 0.2575 (0.2667) data time 0.0008 (0.0039) model time 0.2568 (0.2637) loss 6.1928 (5.5553) grad_norm 3.4153 (3.0347) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][150/625] eta 0:02:06 lr 0.000068 wd 0.0500 time 0.2567 (0.2660) data time 0.0008 (0.0037) model time 0.2559 (0.2628) loss 5.5877 (5.5536) grad_norm 1.8727 (3.0222) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][160/625] eta 0:02:04 lr 0.000068 wd 0.0500 time 0.2559 (0.2676) data time 0.0007 (0.0035) model time 0.2552 (0.2653) loss 5.4137 (5.5380) grad_norm 3.1492 (3.1826) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][170/625] eta 0:02:01 lr 0.000068 wd 0.0500 time 0.2598 (0.2669) data time 0.0006 (0.0034) model time 0.2592 (0.2644) loss 5.6192 (5.5367) grad_norm 3.1635 (3.2003) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][180/625] eta 0:01:58 lr 0.000067 wd 0.0500 time 0.2564 (0.2669) data time 0.0009 (0.0033) model time 0.2555 (0.2646) loss 4.9137 (5.5231) grad_norm 1.9200 (3.1719) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][190/625] eta 0:01:55 lr 0.000067 wd 0.0500 time 0.2549 (0.2664) data time 0.0008 (0.0031) model time 0.2541 (0.2639) loss 4.6290 (5.5143) grad_norm 1.8238 (3.1366) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][200/625] eta 0:01:53 lr 0.000067 wd 0.0500 time 0.2561 (0.2675) data time 0.0014 (0.0030) model time 0.2547 (0.2655) loss 5.3128 (5.5139) grad_norm 1.9917 (3.0887) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][210/625] eta 0:01:50 lr 0.000067 wd 0.0500 time 0.2597 (0.2669) data time 0.0006 (0.0029) model time 0.2591 (0.2648) loss 5.9756 (5.5215) grad_norm 2.0411 (3.0881) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][220/625] eta 0:01:47 lr 0.000067 wd 0.0500 time 0.2542 (0.2665) data time 0.0009 (0.0028) model time 0.2533 (0.2643) loss 5.0847 (5.5128) grad_norm 2.4916 (3.0629) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][230/625] eta 0:01:45 lr 0.000067 wd 0.0500 time 0.2524 (0.2660) data time 0.0010 (0.0027) model time 0.2514 (0.2638) loss 6.0502 (5.5248) grad_norm 2.9467 (3.0428) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][240/625] eta 0:01:42 lr 0.000067 wd 0.0500 time 0.2567 (0.2656) data time 0.0008 (0.0027) model time 0.2560 (0.2633) loss 5.1139 (5.5197) grad_norm 3.9540 (3.0078) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][250/625] eta 0:01:39 lr 0.000067 wd 0.0500 time 0.2586 (0.2652) data time 0.0006 (0.0026) model time 0.2580 (0.2629) loss 5.9084 (5.5269) grad_norm 1.7615 (2.9831) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][260/625] eta 0:01:36 lr 0.000067 wd 0.0500 time 0.2585 (0.2648) data time 0.0007 (0.0025) model time 0.2578 (0.2625) loss 4.6692 (5.5079) grad_norm 2.0495 (2.9879) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][270/625] eta 0:01:33 lr 0.000067 wd 0.0500 time 0.2557 (0.2645) data time 0.0007 (0.0025) model time 0.2550 (0.2622) loss 6.1077 (5.5017) grad_norm 2.2399 (2.9659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][280/625] eta 0:01:31 lr 0.000067 wd 0.0500 time 0.2560 (0.2642) data time 0.0009 (0.0024) model time 0.2551 (0.2619) loss 4.9254 (5.5163) grad_norm 4.0818 (2.9850) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][290/625] eta 0:01:28 lr 0.000067 wd 0.0500 time 0.2499 (0.2650) data time 0.0008 (0.0024) model time 0.2491 (0.2629) loss 4.4722 (5.5089) grad_norm 3.6308 (2.9779) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][300/625] eta 0:01:26 lr 0.000067 wd 0.0500 time 0.2539 (0.2653) data time 0.0009 (0.0023) model time 0.2530 (0.2633) loss 5.0162 (5.5123) grad_norm 2.0048 (2.9537) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][310/625] eta 0:01:23 lr 0.000067 wd 0.0500 time 0.2531 (0.2650) data time 0.0009 (0.0023) model time 0.2522 (0.2630) loss 5.4391 (5.5033) grad_norm 3.7817 (2.9593) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][320/625] eta 0:01:20 lr 0.000067 wd 0.0500 time 0.2583 (0.2647) data time 0.0008 (0.0022) model time 0.2575 (0.2627) loss 5.5301 (5.5116) grad_norm 2.6477 (2.9834) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][330/625] eta 0:01:18 lr 0.000067 wd 0.0500 time 0.2570 (0.2650) data time 0.0008 (0.0022) model time 0.2562 (0.2631) loss 5.3710 (5.5150) grad_norm 2.2211 (2.9818) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:21:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][340/625] eta 0:01:15 lr 0.000067 wd 0.0500 time 0.2543 (0.2648) data time 0.0009 (0.0022) model time 0.2534 (0.2628) loss 5.7112 (5.5185) grad_norm 1.9150 (2.9662) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][350/625] eta 0:01:12 lr 0.000067 wd 0.0500 time 0.2590 (0.2650) data time 0.0008 (0.0021) model time 0.2582 (0.2631) loss 5.7644 (5.5265) grad_norm 2.4705 (2.9737) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][360/625] eta 0:01:10 lr 0.000066 wd 0.0500 time 0.2554 (0.2647) data time 0.0006 (0.0021) model time 0.2548 (0.2628) loss 6.6431 (5.5339) grad_norm 3.3702 (2.9789) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][370/625] eta 0:01:07 lr 0.000066 wd 0.0500 time 0.2572 (0.2645) data time 0.0008 (0.0021) model time 0.2564 (0.2626) loss 5.6747 (5.5315) grad_norm 3.3411 (2.9739) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][380/625] eta 0:01:04 lr 0.000066 wd 0.0500 time 0.2546 (0.2642) data time 0.0009 (0.0020) model time 0.2537 (0.2623) loss 5.2417 (5.5271) grad_norm 3.1965 (2.9589) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][390/625] eta 0:01:02 lr 0.000066 wd 0.0500 time 0.2610 (0.2641) data time 0.0007 (0.0020) model time 0.2602 (0.2622) loss 5.8587 (5.5296) grad_norm 3.2700 (2.9703) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][400/625] eta 0:00:59 lr 0.000066 wd 0.0500 time 0.2589 (0.2643) data time 0.0005 (0.0020) model time 0.2583 (0.2625) loss 5.8617 (5.5329) grad_norm 2.0323 (2.9671) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][410/625] eta 0:00:56 lr 0.000066 wd 0.0500 time 0.2555 (0.2641) data time 0.0006 (0.0019) model time 0.2549 (0.2623) loss 6.3099 (5.5288) grad_norm 2.4979 (2.9588) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][420/625] eta 0:00:54 lr 0.000066 wd 0.0500 time 0.2533 (0.2645) data time 0.0009 (0.0019) model time 0.2524 (0.2627) loss 5.0985 (5.5293) grad_norm 3.3729 (2.9483) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][430/625] eta 0:00:51 lr 0.000066 wd 0.0500 time 0.2545 (0.2643) data time 0.0012 (0.0019) model time 0.2534 (0.2626) loss 4.9149 (5.5225) grad_norm 2.6711 (2.9340) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][440/625] eta 0:00:48 lr 0.000066 wd 0.0500 time 0.2572 (0.2645) data time 0.0010 (0.0019) model time 0.2562 (0.2628) loss 5.5402 (5.5249) grad_norm 3.3851 (2.9238) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][450/625] eta 0:00:46 lr 0.000066 wd 0.0500 time 0.2569 (0.2644) data time 0.0008 (0.0019) model time 0.2561 (0.2627) loss 5.0654 (5.5236) grad_norm 2.5929 (2.9537) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][460/625] eta 0:00:43 lr 0.000066 wd 0.0500 time 0.2557 (0.2642) data time 0.0007 (0.0018) model time 0.2550 (0.2625) loss 5.6421 (5.5172) grad_norm 1.6495 (2.9497) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][470/625] eta 0:00:40 lr 0.000066 wd 0.0500 time 0.2570 (0.2640) data time 0.0008 (0.0018) model time 0.2562 (0.2623) loss 4.9855 (5.5139) grad_norm 2.4659 (2.9512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][480/625] eta 0:00:38 lr 0.000066 wd 0.0500 time 0.2546 (0.2638) data time 0.0008 (0.0018) model time 0.2538 (0.2621) loss 5.5896 (5.5155) grad_norm 2.0442 (2.9484) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][490/625] eta 0:00:35 lr 0.000066 wd 0.0500 time 0.2580 (0.2637) data time 0.0008 (0.0018) model time 0.2572 (0.2620) loss 5.3906 (5.5161) grad_norm 5.3733 (2.9521) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][500/625] eta 0:00:32 lr 0.000066 wd 0.0500 time 0.2580 (0.2636) data time 0.0006 (0.0018) model time 0.2574 (0.2619) loss 6.3462 (5.5173) grad_norm 3.4120 (2.9399) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][510/625] eta 0:00:30 lr 0.000066 wd 0.0500 time 0.2543 (0.2642) data time 0.0008 (0.0018) model time 0.2535 (0.2625) loss 5.3371 (5.5151) grad_norm 1.6205 (2.9271) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][520/625] eta 0:00:27 lr 0.000066 wd 0.0500 time 0.2547 (0.2640) data time 0.0008 (0.0017) model time 0.2538 (0.2624) loss 5.1933 (5.5138) grad_norm 2.7960 (2.9166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][530/625] eta 0:00:25 lr 0.000066 wd 0.0500 time 0.2560 (0.2638) data time 0.0010 (0.0017) model time 0.2550 (0.2622) loss 5.0963 (5.5106) grad_norm 2.5621 (2.9053) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][540/625] eta 0:00:22 lr 0.000066 wd 0.0500 time 0.2567 (0.2637) data time 0.0012 (0.0017) model time 0.2555 (0.2621) loss 5.5913 (5.5117) grad_norm 2.6402 (2.9053) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][550/625] eta 0:00:19 lr 0.000065 wd 0.0500 time 0.2587 (0.2636) data time 0.0007 (0.0017) model time 0.2580 (0.2619) loss 5.8633 (5.5138) grad_norm 3.6658 (2.8967) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][560/625] eta 0:00:17 lr 0.000065 wd 0.0500 time 0.2570 (0.2637) data time 0.0009 (0.0017) model time 0.2562 (0.2621) loss 4.7163 (5.5152) grad_norm 1.8057 (2.9382) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:22:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][570/625] eta 0:00:14 lr 0.000065 wd 0.0500 time 0.2571 (0.2636) data time 0.0007 (0.0017) model time 0.2565 (0.2620) loss 4.6672 (5.5122) grad_norm 3.1483 (2.9282) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][580/625] eta 0:00:11 lr 0.000065 wd 0.0500 time 0.2639 (0.2637) data time 0.0009 (0.0017) model time 0.2630 (0.2621) loss 5.9981 (5.5141) grad_norm 4.3005 (2.9252) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][590/625] eta 0:00:09 lr 0.000065 wd 0.0500 time 0.2537 (0.2636) data time 0.0008 (0.0016) model time 0.2530 (0.2620) loss 4.6116 (5.5075) grad_norm 3.2305 (2.9327) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][600/625] eta 0:00:06 lr 0.000065 wd 0.0500 time 0.2583 (0.2635) data time 0.0009 (0.0016) model time 0.2573 (0.2619) loss 5.4954 (5.5112) grad_norm 1.9964 (2.9410) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][610/625] eta 0:00:03 lr 0.000065 wd 0.0500 time 0.2519 (0.2633) data time 0.0006 (0.0016) model time 0.2513 (0.2617) loss 5.6458 (5.5082) grad_norm 2.0915 (2.9430) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][620/625] eta 0:00:01 lr 0.000065 wd 0.0500 time 0.2522 (0.2632) data time 0.0006 (0.0016) model time 0.2516 (0.2616) loss 5.3186 (5.5103) grad_norm 2.4279 (2.9546) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 272 training takes 0:02:44 [2024-08-04 10:23:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:23:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:23:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.571 (0.571) Loss 0.5938 (0.5938) Acc@1 90.576 (90.576) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 0.8921 (0.7062) Acc@1 82.129 (87.314) Acc@5 96.533 (97.887) Mem 9655MB [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 0.9961 (0.8243) Acc@1 78.760 (84.203) Acc@5 95.459 (96.794) Mem 9655MB [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.881 Acc@5 96.823 [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.88% [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:23:14 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.5864 (0.5864) Acc@1 90.332 (90.332) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:23:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.8965 (0.7071) Acc@1 81.885 (87.087) Acc@5 96.436 (97.794) Mem 9655MB [2024-08-04 10:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0088 (0.8260) Acc@1 78.809 (84.047) Acc@5 95.605 (96.647) Mem 9655MB [2024-08-04 10:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.677 Acc@5 96.659 [2024-08-04 10:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:23:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][0/625] eta 0:11:12 lr 0.000065 wd 0.0500 time 1.0758 (1.0758) data time 0.5818 (0.5818) model time 0.0000 (0.0000) loss 4.2236 (4.2236) grad_norm 3.0514 (3.0514) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][10/625] eta 0:03:23 lr 0.000065 wd 0.0500 time 0.2563 (0.3316) data time 0.0010 (0.0537) model time 0.0000 (0.0000) loss 5.5237 (5.2741) grad_norm 2.6511 (5.0783) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][20/625] eta 0:03:02 lr 0.000065 wd 0.0500 time 0.2658 (0.3021) data time 0.0009 (0.0286) model time 0.0000 (0.0000) loss 5.2725 (5.3481) grad_norm 2.5323 (3.8197) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][30/625] eta 0:02:55 lr 0.000065 wd 0.0500 time 0.2576 (0.2943) data time 0.0008 (0.0197) model time 0.0000 (0.0000) loss 6.2098 (5.3977) grad_norm 1.9194 (3.3792) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][40/625] eta 0:02:46 lr 0.000065 wd 0.0500 time 0.2579 (0.2851) data time 0.0009 (0.0151) model time 0.0000 (0.0000) loss 5.9010 (5.4769) grad_norm 2.4931 (3.1478) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][50/625] eta 0:02:40 lr 0.000065 wd 0.0500 time 0.2590 (0.2797) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 5.3911 (5.4810) grad_norm 2.0758 (3.0142) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][60/625] eta 0:02:37 lr 0.000065 wd 0.0500 time 0.3975 (0.2782) data time 0.0007 (0.0104) model time 0.3968 (0.2696) loss 5.7831 (5.4512) grad_norm 1.7619 (3.1004) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][70/625] eta 0:02:34 lr 0.000065 wd 0.0500 time 0.2522 (0.2776) data time 0.0008 (0.0091) model time 0.2513 (0.2712) loss 5.0173 (5.4069) grad_norm 3.5785 (3.0519) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][80/625] eta 0:02:29 lr 0.000065 wd 0.0500 time 0.2545 (0.2749) data time 0.0009 (0.0081) model time 0.2536 (0.2657) loss 6.3741 (5.4203) grad_norm 2.3383 (3.0027) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][90/625] eta 0:02:25 lr 0.000065 wd 0.0500 time 0.2537 (0.2729) data time 0.0008 (0.0073) model time 0.2529 (0.2632) loss 5.7598 (5.4544) grad_norm 3.1329 (3.0198) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][100/625] eta 0:02:22 lr 0.000065 wd 0.0500 time 0.2534 (0.2713) data time 0.0008 (0.0067) model time 0.2526 (0.2617) loss 4.7833 (5.4511) grad_norm 1.9256 (2.9653) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][110/625] eta 0:02:19 lr 0.000064 wd 0.0500 time 0.2537 (0.2718) data time 0.0010 (0.0062) model time 0.2526 (0.2642) loss 5.7577 (5.4659) grad_norm 3.4119 (3.0222) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][120/625] eta 0:02:16 lr 0.000064 wd 0.0500 time 0.2574 (0.2706) data time 0.0008 (0.0057) model time 0.2566 (0.2629) loss 6.0520 (5.4893) grad_norm 2.2044 (3.0595) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][130/625] eta 0:02:13 lr 0.000064 wd 0.0500 time 0.2528 (0.2695) data time 0.0009 (0.0054) model time 0.2519 (0.2619) loss 5.8828 (5.4821) grad_norm 2.2081 (3.0244) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][140/625] eta 0:02:10 lr 0.000064 wd 0.0500 time 0.2589 (0.2685) data time 0.0009 (0.0051) model time 0.2580 (0.2612) loss 6.3014 (5.4955) grad_norm 1.7006 (3.0960) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][150/625] eta 0:02:07 lr 0.000064 wd 0.0500 time 0.2558 (0.2677) data time 0.0013 (0.0048) model time 0.2545 (0.2606) loss 5.8036 (5.5050) grad_norm 2.6453 (3.0448) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:23:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][160/625] eta 0:02:04 lr 0.000064 wd 0.0500 time 0.2547 (0.2682) data time 0.0011 (0.0046) model time 0.2536 (0.2619) loss 5.5752 (5.5018) grad_norm 2.4654 (3.0541) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][170/625] eta 0:02:01 lr 0.000064 wd 0.0500 time 0.2540 (0.2675) data time 0.0006 (0.0044) model time 0.2534 (0.2613) loss 5.5422 (5.4910) grad_norm 2.0828 (3.0796) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][180/625] eta 0:01:59 lr 0.000064 wd 0.0500 time 0.2569 (0.2676) data time 0.0006 (0.0042) model time 0.2563 (0.2618) loss 4.5257 (5.4824) grad_norm 1.8245 (3.0370) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][190/625] eta 0:01:56 lr 0.000064 wd 0.0500 time 0.2544 (0.2669) data time 0.0007 (0.0040) model time 0.2537 (0.2613) loss 4.9886 (5.4863) grad_norm 3.3270 (3.1266) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][200/625] eta 0:01:53 lr 0.000064 wd 0.0500 time 0.2591 (0.2665) data time 0.0007 (0.0038) model time 0.2584 (0.2610) loss 6.0127 (5.4906) grad_norm 1.8832 (3.1201) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][210/625] eta 0:01:50 lr 0.000064 wd 0.0500 time 0.2535 (0.2659) data time 0.0009 (0.0037) model time 0.2526 (0.2606) loss 5.6865 (5.4841) grad_norm 4.1451 (3.1146) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][220/625] eta 0:01:47 lr 0.000064 wd 0.0500 time 0.2564 (0.2655) data time 0.0007 (0.0036) model time 0.2557 (0.2603) loss 5.0260 (5.4764) grad_norm 2.6466 (3.0816) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][230/625] eta 0:01:44 lr 0.000064 wd 0.0500 time 0.2515 (0.2651) data time 0.0009 (0.0035) model time 0.2506 (0.2599) loss 5.4563 (5.4822) grad_norm 2.0939 (3.0588) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][240/625] eta 0:01:42 lr 0.000064 wd 0.0500 time 0.4393 (0.2655) data time 0.0007 (0.0033) model time 0.4386 (0.2607) loss 4.9650 (5.4842) grad_norm 4.6586 (3.0664) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][250/625] eta 0:01:39 lr 0.000064 wd 0.0500 time 0.2580 (0.2651) data time 0.0007 (0.0032) model time 0.2573 (0.2604) loss 6.2741 (5.4893) grad_norm 2.1189 (3.0476) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][260/625] eta 0:01:36 lr 0.000064 wd 0.0500 time 0.2535 (0.2652) data time 0.0008 (0.0032) model time 0.2527 (0.2607) loss 4.8301 (5.4820) grad_norm 1.8245 (3.0441) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][270/625] eta 0:01:34 lr 0.000064 wd 0.0500 time 0.2613 (0.2648) data time 0.0006 (0.0031) model time 0.2607 (0.2604) loss 5.0191 (5.4912) grad_norm 3.2643 (3.0279) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][280/625] eta 0:01:31 lr 0.000064 wd 0.0500 time 0.2552 (0.2646) data time 0.0009 (0.0030) model time 0.2543 (0.2603) loss 4.7487 (5.4831) grad_norm 3.2062 (2.9954) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][290/625] eta 0:01:28 lr 0.000064 wd 0.0500 time 0.2589 (0.2643) data time 0.0009 (0.0029) model time 0.2580 (0.2600) loss 6.3756 (5.4783) grad_norm 2.3373 (2.9822) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][300/625] eta 0:01:25 lr 0.000064 wd 0.0500 time 0.2544 (0.2640) data time 0.0006 (0.0029) model time 0.2538 (0.2598) loss 5.7000 (5.4660) grad_norm 24.1563 (3.1021) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][310/625] eta 0:01:23 lr 0.000063 wd 0.0500 time 0.2522 (0.2637) data time 0.0007 (0.0028) model time 0.2515 (0.2597) loss 5.8911 (5.4794) grad_norm 2.2771 (3.0931) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][320/625] eta 0:01:20 lr 0.000063 wd 0.0500 time 0.2568 (0.2635) data time 0.0017 (0.0028) model time 0.2551 (0.2595) loss 5.7924 (5.4823) grad_norm 2.8625 (3.0817) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][330/625] eta 0:01:18 lr 0.000063 wd 0.0500 time 0.2574 (0.2645) data time 0.0009 (0.0027) model time 0.2564 (0.2608) loss 5.7030 (5.4831) grad_norm 2.8646 (3.0568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][340/625] eta 0:01:15 lr 0.000063 wd 0.0500 time 0.2563 (0.2647) data time 0.0008 (0.0027) model time 0.2554 (0.2611) loss 5.8224 (5.4893) grad_norm 3.1428 (3.0421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][350/625] eta 0:01:12 lr 0.000063 wd 0.0500 time 0.2544 (0.2650) data time 0.0011 (0.0026) model time 0.2533 (0.2616) loss 4.9150 (5.4959) grad_norm 2.1601 (3.0302) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][360/625] eta 0:01:10 lr 0.000063 wd 0.0500 time 0.2557 (0.2648) data time 0.0008 (0.0026) model time 0.2549 (0.2614) loss 5.3833 (5.4976) grad_norm 1.7315 (3.0562) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][370/625] eta 0:01:07 lr 0.000063 wd 0.0500 time 0.2628 (0.2651) data time 0.0006 (0.0025) model time 0.2622 (0.2619) loss 4.8402 (5.4928) grad_norm 1.8447 (3.0384) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][380/625] eta 0:01:04 lr 0.000063 wd 0.0500 time 0.2582 (0.2649) data time 0.0009 (0.0025) model time 0.2572 (0.2617) loss 5.8102 (5.5003) grad_norm 2.9632 (3.0336) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][390/625] eta 0:01:02 lr 0.000063 wd 0.0500 time 0.2562 (0.2647) data time 0.0007 (0.0024) model time 0.2555 (0.2615) loss 5.1967 (5.4874) grad_norm 3.1805 (3.0283) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][400/625] eta 0:00:59 lr 0.000063 wd 0.0500 time 0.2608 (0.2645) data time 0.0012 (0.0024) model time 0.2596 (0.2614) loss 6.1534 (5.4894) grad_norm 3.1580 (3.0290) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][410/625] eta 0:00:56 lr 0.000063 wd 0.0500 time 0.2552 (0.2648) data time 0.0007 (0.0024) model time 0.2545 (0.2617) loss 5.5106 (5.4870) grad_norm 3.8973 (3.0153) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][420/625] eta 0:00:54 lr 0.000063 wd 0.0500 time 0.2698 (0.2651) data time 0.0008 (0.0023) model time 0.2690 (0.2621) loss 5.1633 (5.4823) grad_norm 1.8939 (3.0405) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][430/625] eta 0:00:51 lr 0.000063 wd 0.0500 time 0.2575 (0.2649) data time 0.0006 (0.0023) model time 0.2569 (0.2619) loss 5.6900 (5.4912) grad_norm 2.7504 (3.0292) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][440/625] eta 0:00:48 lr 0.000063 wd 0.0500 time 0.2538 (0.2647) data time 0.0006 (0.0023) model time 0.2532 (0.2618) loss 5.3678 (5.4917) grad_norm 4.0922 (3.0213) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][450/625] eta 0:00:46 lr 0.000063 wd 0.0500 time 0.2573 (0.2649) data time 0.0008 (0.0022) model time 0.2565 (0.2620) loss 4.9108 (5.4933) grad_norm 2.6908 (3.0185) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][460/625] eta 0:00:43 lr 0.000063 wd 0.0500 time 0.2581 (0.2647) data time 0.0010 (0.0022) model time 0.2571 (0.2619) loss 4.5203 (5.4890) grad_norm 2.4682 (3.0107) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][470/625] eta 0:00:40 lr 0.000063 wd 0.0500 time 0.2539 (0.2645) data time 0.0007 (0.0022) model time 0.2531 (0.2617) loss 5.0652 (5.4872) grad_norm 2.1146 (3.0029) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][480/625] eta 0:00:38 lr 0.000063 wd 0.0500 time 0.2563 (0.2647) data time 0.0005 (0.0022) model time 0.2558 (0.2620) loss 5.9363 (5.4888) grad_norm 2.2948 (2.9920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][490/625] eta 0:00:35 lr 0.000063 wd 0.0500 time 0.2567 (0.2645) data time 0.0006 (0.0021) model time 0.2561 (0.2618) loss 5.1874 (5.4856) grad_norm 1.8020 (2.9773) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][500/625] eta 0:00:33 lr 0.000062 wd 0.0500 time 0.2577 (0.2648) data time 0.0008 (0.0021) model time 0.2569 (0.2621) loss 6.2147 (5.4853) grad_norm 1.5096 (2.9794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][510/625] eta 0:00:30 lr 0.000062 wd 0.0500 time 0.2580 (0.2646) data time 0.0005 (0.0021) model time 0.2575 (0.2620) loss 5.6005 (5.4851) grad_norm 4.2175 (2.9730) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][520/625] eta 0:00:27 lr 0.000062 wd 0.0500 time 0.2570 (0.2645) data time 0.0011 (0.0021) model time 0.2559 (0.2619) loss 6.0763 (5.4923) grad_norm 1.8390 (2.9613) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][530/625] eta 0:00:25 lr 0.000062 wd 0.0500 time 0.2575 (0.2643) data time 0.0006 (0.0020) model time 0.2569 (0.2617) loss 5.9357 (5.4926) grad_norm 11.2193 (2.9851) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][540/625] eta 0:00:22 lr 0.000062 wd 0.0500 time 0.2554 (0.2645) data time 0.0009 (0.0020) model time 0.2545 (0.2619) loss 5.8578 (5.4963) grad_norm 1.6586 (2.9751) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][550/625] eta 0:00:19 lr 0.000062 wd 0.0500 time 0.2550 (0.2643) data time 0.0012 (0.0020) model time 0.2538 (0.2618) loss 5.7300 (5.4979) grad_norm 3.7872 (2.9852) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][560/625] eta 0:00:17 lr 0.000062 wd 0.0500 time 0.2556 (0.2642) data time 0.0010 (0.0020) model time 0.2546 (0.2617) loss 6.3398 (5.4982) grad_norm 3.1834 (2.9789) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][570/625] eta 0:00:14 lr 0.000062 wd 0.0500 time 0.2549 (0.2640) data time 0.0013 (0.0020) model time 0.2536 (0.2615) loss 5.9019 (5.5008) grad_norm 3.8303 (2.9745) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][580/625] eta 0:00:11 lr 0.000062 wd 0.0500 time 0.2558 (0.2639) data time 0.0009 (0.0020) model time 0.2549 (0.2614) loss 4.8643 (5.4991) grad_norm 2.3571 (2.9854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][590/625] eta 0:00:09 lr 0.000062 wd 0.0500 time 0.2543 (0.2638) data time 0.0011 (0.0019) model time 0.2532 (0.2613) loss 6.0256 (5.5040) grad_norm 1.7680 (2.9781) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][600/625] eta 0:00:06 lr 0.000062 wd 0.0500 time 0.2531 (0.2636) data time 0.0009 (0.0019) model time 0.2522 (0.2612) loss 4.5701 (5.5020) grad_norm 5.4225 (2.9786) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:25:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][610/625] eta 0:00:03 lr 0.000062 wd 0.0500 time 0.2521 (0.2635) data time 0.0004 (0.0019) model time 0.2517 (0.2611) loss 6.4673 (5.5059) grad_norm 3.3623 (2.9757) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][620/625] eta 0:00:01 lr 0.000062 wd 0.0500 time 0.2525 (0.2634) data time 0.0006 (0.0019) model time 0.2519 (0.2609) loss 5.4906 (5.5071) grad_norm 1.9242 (2.9765) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 273 training takes 0:02:44 [2024-08-04 10:26:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:26:01 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.6001 (0.6001) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:26:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9048 (0.7112) Acc@1 81.592 (87.185) Acc@5 96.484 (97.874) Mem 9655MB [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9946 (0.8280) Acc@1 79.346 (84.198) Acc@5 95.850 (96.777) Mem 9655MB [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.895 Acc@5 96.797 [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.90% [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:26:03 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:26:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5869 (0.5869) Acc@1 90.332 (90.332) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.099) Loss 0.8945 (0.7063) Acc@1 81.982 (87.118) Acc@5 96.436 (97.789) Mem 9655MB [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0088 (0.8252) Acc@1 78.760 (84.063) Acc@5 95.654 (96.656) Mem 9655MB [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.695 Acc@5 96.673 [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.70% [2024-08-04 10:26:05 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:26:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][0/625] eta 0:06:51 lr 0.000062 wd 0.0500 time 0.6580 (0.6580) data time 0.4150 (0.4150) model time 0.0000 (0.0000) loss 5.3398 (5.3398) grad_norm 2.7037 (2.7037) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][10/625] eta 0:03:06 lr 0.000062 wd 0.0500 time 0.2574 (0.3030) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 5.7522 (5.5386) grad_norm 2.0317 (2.8716) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][20/625] eta 0:02:49 lr 0.000062 wd 0.0500 time 0.2555 (0.2799) data time 0.0007 (0.0206) model time 0.0000 (0.0000) loss 5.5927 (5.5532) grad_norm 2.2892 (2.6806) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][30/625] eta 0:02:41 lr 0.000062 wd 0.0500 time 0.2582 (0.2721) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 5.3388 (5.4516) grad_norm 2.3909 (2.5797) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][40/625] eta 0:02:38 lr 0.000062 wd 0.0500 time 0.2573 (0.2715) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 5.8258 (5.4648) grad_norm 2.3136 (2.6245) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][50/625] eta 0:02:35 lr 0.000062 wd 0.0500 time 0.3920 (0.2711) data time 0.0007 (0.0090) model time 0.0000 (0.0000) loss 5.6687 (5.4897) grad_norm 1.9415 (2.6217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][60/625] eta 0:02:31 lr 0.000062 wd 0.0500 time 0.2565 (0.2685) data time 0.0008 (0.0077) model time 0.2557 (0.2542) loss 5.8141 (5.4806) grad_norm 2.1572 (2.5701) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][70/625] eta 0:02:27 lr 0.000061 wd 0.0500 time 0.2555 (0.2666) data time 0.0014 (0.0067) model time 0.2540 (0.2542) loss 5.6029 (5.4954) grad_norm 2.0224 (2.5297) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][80/625] eta 0:02:25 lr 0.000061 wd 0.0500 time 0.2569 (0.2677) data time 0.0006 (0.0060) model time 0.2563 (0.2611) loss 6.4487 (5.5190) grad_norm 2.2498 (2.6233) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][90/625] eta 0:02:22 lr 0.000061 wd 0.0500 time 0.2526 (0.2663) data time 0.0009 (0.0055) model time 0.2517 (0.2593) loss 5.7999 (5.5155) grad_norm 2.4998 (2.7736) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][100/625] eta 0:02:20 lr 0.000061 wd 0.0500 time 0.4651 (0.2674) data time 0.0012 (0.0050) model time 0.4639 (0.2627) loss 6.0426 (5.4985) grad_norm 2.0537 (2.7809) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][110/625] eta 0:02:18 lr 0.000061 wd 0.0500 time 0.2581 (0.2683) data time 0.0011 (0.0047) model time 0.2570 (0.2649) loss 6.2981 (5.5116) grad_norm 1.6540 (2.7412) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][120/625] eta 0:02:15 lr 0.000061 wd 0.0500 time 0.2569 (0.2689) data time 0.0006 (0.0044) model time 0.2564 (0.2663) loss 5.7147 (5.5032) grad_norm 2.2020 (2.7310) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][130/625] eta 0:02:12 lr 0.000061 wd 0.0500 time 0.2571 (0.2678) data time 0.0009 (0.0041) model time 0.2562 (0.2648) loss 6.0850 (5.5311) grad_norm 2.7833 (2.7194) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][140/625] eta 0:02:09 lr 0.000061 wd 0.0500 time 0.2557 (0.2669) data time 0.0008 (0.0039) model time 0.2549 (0.2637) loss 4.5140 (5.5142) grad_norm 2.4903 (2.7096) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][150/625] eta 0:02:06 lr 0.000061 wd 0.0500 time 0.2539 (0.2669) data time 0.0008 (0.0037) model time 0.2530 (0.2639) loss 5.4425 (5.5210) grad_norm 2.1409 (2.7064) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][160/625] eta 0:02:03 lr 0.000061 wd 0.0500 time 0.2554 (0.2663) data time 0.0008 (0.0035) model time 0.2546 (0.2631) loss 5.9888 (5.5386) grad_norm 2.3729 (2.8303) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][170/625] eta 0:02:00 lr 0.000061 wd 0.0500 time 0.2551 (0.2657) data time 0.0009 (0.0033) model time 0.2541 (0.2625) loss 5.4564 (5.5423) grad_norm 1.8191 (2.8229) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][180/625] eta 0:01:57 lr 0.000061 wd 0.0500 time 0.2550 (0.2651) data time 0.0006 (0.0032) model time 0.2544 (0.2619) loss 6.5264 (5.5499) grad_norm 2.5329 (2.8321) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][190/625] eta 0:01:55 lr 0.000061 wd 0.0500 time 0.2550 (0.2655) data time 0.0010 (0.0031) model time 0.2540 (0.2625) loss 5.3579 (5.5563) grad_norm 1.7628 (2.8050) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][200/625] eta 0:01:53 lr 0.000061 wd 0.0500 time 0.2510 (0.2660) data time 0.0006 (0.0030) model time 0.2504 (0.2634) loss 5.7742 (5.5577) grad_norm 3.0485 (2.7830) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][210/625] eta 0:01:50 lr 0.000061 wd 0.0500 time 0.2613 (0.2664) data time 0.0007 (0.0029) model time 0.2606 (0.2640) loss 6.3455 (5.5637) grad_norm 1.9571 (2.7985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][220/625] eta 0:01:47 lr 0.000061 wd 0.0500 time 0.2535 (0.2662) data time 0.0006 (0.0028) model time 0.2529 (0.2638) loss 5.1738 (5.5508) grad_norm 2.9044 (2.7868) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][230/625] eta 0:01:45 lr 0.000061 wd 0.0500 time 0.2571 (0.2663) data time 0.0009 (0.0027) model time 0.2562 (0.2640) loss 5.0624 (5.5572) grad_norm 4.6875 (2.7864) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][240/625] eta 0:01:42 lr 0.000061 wd 0.0500 time 0.2768 (0.2660) data time 0.0008 (0.0026) model time 0.2760 (0.2637) loss 6.1208 (5.5577) grad_norm 1.8807 (2.7683) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][250/625] eta 0:01:39 lr 0.000061 wd 0.0500 time 0.2532 (0.2656) data time 0.0008 (0.0026) model time 0.2524 (0.2633) loss 4.8964 (5.5501) grad_norm 2.5493 (2.7658) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][260/625] eta 0:01:36 lr 0.000061 wd 0.0500 time 0.2532 (0.2653) data time 0.0010 (0.0025) model time 0.2521 (0.2630) loss 6.3885 (5.5536) grad_norm 2.2471 (2.7521) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:27:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][270/625] eta 0:01:34 lr 0.000060 wd 0.0500 time 0.2537 (0.2649) data time 0.0007 (0.0024) model time 0.2530 (0.2626) loss 4.7961 (5.5426) grad_norm 2.1874 (2.7436) loss_scale 512.0000 (256.9446) mem 9655MB [2024-08-04 10:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][280/625] eta 0:01:31 lr 0.000060 wd 0.0500 time 0.2564 (0.2646) data time 0.0006 (0.0024) model time 0.2558 (0.2623) loss 5.0016 (5.5375) grad_norm 2.6739 (2.7369) loss_scale 512.0000 (266.0214) mem 9655MB [2024-08-04 10:27:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][290/625] eta 0:01:28 lr 0.000060 wd 0.0500 time 0.2489 (0.2644) data time 0.0010 (0.0023) model time 0.2479 (0.2620) loss 5.5813 (5.5343) grad_norm 3.8323 (2.7429) loss_scale 512.0000 (274.4742) mem 9655MB [2024-08-04 10:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][300/625] eta 0:01:25 lr 0.000060 wd 0.0500 time 0.2574 (0.2641) data time 0.0010 (0.0023) model time 0.2564 (0.2617) loss 4.8461 (5.5338) grad_norm 2.4101 (2.7595) loss_scale 512.0000 (282.3654) mem 9655MB [2024-08-04 10:27:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][310/625] eta 0:01:23 lr 0.000060 wd 0.0500 time 0.2583 (0.2638) data time 0.0008 (0.0022) model time 0.2575 (0.2615) loss 5.2836 (5.5235) grad_norm 2.0969 (2.7644) loss_scale 512.0000 (289.7492) mem 9655MB [2024-08-04 10:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][320/625] eta 0:01:20 lr 0.000060 wd 0.0500 time 0.2574 (0.2636) data time 0.0008 (0.0022) model time 0.2566 (0.2613) loss 5.8247 (5.5192) grad_norm 2.6507 (2.7544) loss_scale 512.0000 (296.6729) mem 9655MB [2024-08-04 10:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][330/625] eta 0:01:17 lr 0.000060 wd 0.0500 time 0.2565 (0.2639) data time 0.0009 (0.0022) model time 0.2556 (0.2617) loss 5.5726 (5.5129) grad_norm 1.9879 (2.7431) loss_scale 512.0000 (303.1782) mem 9655MB [2024-08-04 10:27:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][340/625] eta 0:01:15 lr 0.000060 wd 0.0500 time 0.2582 (0.2642) data time 0.0008 (0.0021) model time 0.2574 (0.2621) loss 5.0352 (5.5137) grad_norm 1.8522 (2.7471) loss_scale 512.0000 (309.3021) mem 9655MB [2024-08-04 10:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][350/625] eta 0:01:12 lr 0.000060 wd 0.0500 time 0.2574 (0.2639) data time 0.0008 (0.0021) model time 0.2566 (0.2618) loss 5.6659 (5.5057) grad_norm 2.3792 (2.7398) loss_scale 512.0000 (315.0769) mem 9655MB [2024-08-04 10:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][360/625] eta 0:01:09 lr 0.000060 wd 0.0500 time 0.2523 (0.2637) data time 0.0008 (0.0021) model time 0.2515 (0.2616) loss 4.7538 (5.4925) grad_norm 3.0482 (2.7229) loss_scale 512.0000 (320.5319) mem 9655MB [2024-08-04 10:27:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][370/625] eta 0:01:07 lr 0.000060 wd 0.0500 time 0.2493 (0.2641) data time 0.0007 (0.0020) model time 0.2486 (0.2620) loss 4.6904 (5.5000) grad_norm 2.3863 (2.7453) loss_scale 512.0000 (325.6927) mem 9655MB [2024-08-04 10:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][380/625] eta 0:01:04 lr 0.000060 wd 0.0500 time 0.3961 (0.2643) data time 0.0006 (0.0020) model time 0.3955 (0.2623) loss 5.1657 (5.4968) grad_norm 5.2616 (2.7716) loss_scale 512.0000 (330.5827) mem 9655MB [2024-08-04 10:27:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][390/625] eta 0:01:02 lr 0.000060 wd 0.0500 time 0.2581 (0.2641) data time 0.0006 (0.0020) model time 0.2575 (0.2622) loss 5.8297 (5.4921) grad_norm 2.3447 (2.7803) loss_scale 512.0000 (335.2225) mem 9655MB [2024-08-04 10:27:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][400/625] eta 0:00:59 lr 0.000060 wd 0.0500 time 0.2581 (0.2640) data time 0.0008 (0.0019) model time 0.2573 (0.2620) loss 5.2043 (5.4916) grad_norm 3.1649 (2.7928) loss_scale 512.0000 (339.6309) mem 9655MB [2024-08-04 10:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][410/625] eta 0:00:56 lr 0.000060 wd 0.0500 time 0.2512 (0.2638) data time 0.0011 (0.0019) model time 0.2501 (0.2618) loss 4.9991 (5.4976) grad_norm 2.2672 (2.7871) loss_scale 512.0000 (343.8248) mem 9655MB [2024-08-04 10:27:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][420/625] eta 0:00:54 lr 0.000060 wd 0.0500 time 0.2554 (0.2636) data time 0.0009 (0.0019) model time 0.2546 (0.2617) loss 5.3020 (5.5019) grad_norm 2.6813 (2.7876) loss_scale 512.0000 (347.8195) mem 9655MB [2024-08-04 10:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][430/625] eta 0:00:51 lr 0.000060 wd 0.0500 time 0.2565 (0.2634) data time 0.0005 (0.0019) model time 0.2560 (0.2615) loss 4.6821 (5.5021) grad_norm 4.5138 (2.7927) loss_scale 512.0000 (351.6288) mem 9655MB [2024-08-04 10:28:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][440/625] eta 0:00:48 lr 0.000060 wd 0.0500 time 0.2601 (0.2638) data time 0.0007 (0.0018) model time 0.2595 (0.2619) loss 5.0363 (5.4985) grad_norm 3.7156 (2.7955) loss_scale 512.0000 (355.2653) mem 9655MB [2024-08-04 10:28:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][450/625] eta 0:00:46 lr 0.000060 wd 0.0500 time 0.2536 (0.2636) data time 0.0013 (0.0018) model time 0.2524 (0.2617) loss 5.9325 (5.5028) grad_norm 19.5255 (2.8200) loss_scale 512.0000 (358.7406) mem 9655MB [2024-08-04 10:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][460/625] eta 0:00:43 lr 0.000060 wd 0.0500 time 0.2599 (0.2635) data time 0.0008 (0.0018) model time 0.2591 (0.2616) loss 4.4874 (5.5013) grad_norm 2.2867 (2.8093) loss_scale 512.0000 (362.0651) mem 9655MB [2024-08-04 10:28:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][470/625] eta 0:00:40 lr 0.000059 wd 0.0500 time 0.2637 (0.2633) data time 0.0007 (0.0018) model time 0.2629 (0.2615) loss 5.6363 (5.5014) grad_norm 2.5106 (2.8375) loss_scale 512.0000 (365.2484) mem 9655MB [2024-08-04 10:28:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][480/625] eta 0:00:38 lr 0.000059 wd 0.0500 time 0.2568 (0.2639) data time 0.0011 (0.0018) model time 0.2556 (0.2622) loss 6.6924 (5.5037) grad_norm 2.3827 (2.8305) loss_scale 512.0000 (368.2994) mem 9655MB [2024-08-04 10:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][490/625] eta 0:00:35 lr 0.000059 wd 0.0500 time 0.2633 (0.2642) data time 0.0009 (0.0018) model time 0.2624 (0.2624) loss 5.3425 (5.4982) grad_norm 3.2030 (2.8404) loss_scale 512.0000 (371.2261) mem 9655MB [2024-08-04 10:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][500/625] eta 0:00:32 lr 0.000059 wd 0.0500 time 0.2526 (0.2640) data time 0.0007 (0.0017) model time 0.2518 (0.2623) loss 5.7491 (5.4991) grad_norm 3.1702 (2.8312) loss_scale 512.0000 (374.0359) mem 9655MB [2024-08-04 10:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][510/625] eta 0:00:30 lr 0.000059 wd 0.0500 time 0.2586 (0.2639) data time 0.0009 (0.0017) model time 0.2577 (0.2621) loss 4.4784 (5.4987) grad_norm 1.8836 (2.8279) loss_scale 512.0000 (376.7358) mem 9655MB [2024-08-04 10:28:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][520/625] eta 0:00:27 lr 0.000059 wd 0.0500 time 0.2529 (0.2637) data time 0.0010 (0.0017) model time 0.2519 (0.2620) loss 6.1770 (5.5052) grad_norm 2.8311 (2.8667) loss_scale 512.0000 (379.3321) mem 9655MB [2024-08-04 10:28:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][530/625] eta 0:00:25 lr 0.000059 wd 0.0500 time 0.4651 (0.2640) data time 0.0008 (0.0017) model time 0.4643 (0.2623) loss 6.3441 (5.5072) grad_norm 2.0454 (2.8608) loss_scale 512.0000 (381.8305) mem 9655MB [2024-08-04 10:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][540/625] eta 0:00:22 lr 0.000059 wd 0.0500 time 0.2586 (0.2638) data time 0.0006 (0.0017) model time 0.2580 (0.2622) loss 5.6977 (5.5145) grad_norm 2.1101 (2.8586) loss_scale 512.0000 (384.2366) mem 9655MB [2024-08-04 10:28:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][550/625] eta 0:00:19 lr 0.000059 wd 0.0500 time 0.2636 (0.2637) data time 0.0008 (0.0017) model time 0.2628 (0.2620) loss 4.6232 (5.5041) grad_norm 2.9304 (2.8645) loss_scale 512.0000 (386.5554) mem 9655MB [2024-08-04 10:28:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][560/625] eta 0:00:17 lr 0.000059 wd 0.0500 time 0.2568 (0.2636) data time 0.0008 (0.0017) model time 0.2560 (0.2619) loss 5.3513 (5.4987) grad_norm 2.4396 (2.8735) loss_scale 512.0000 (388.7914) mem 9655MB [2024-08-04 10:28:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][570/625] eta 0:00:14 lr 0.000059 wd 0.0500 time 0.2575 (0.2635) data time 0.0009 (0.0016) model time 0.2567 (0.2618) loss 6.1301 (5.5054) grad_norm 3.2029 (2.8853) loss_scale 512.0000 (390.9492) mem 9655MB [2024-08-04 10:28:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][580/625] eta 0:00:11 lr 0.000059 wd 0.0500 time 0.2556 (0.2633) data time 0.0010 (0.0016) model time 0.2545 (0.2617) loss 5.5502 (5.5020) grad_norm 1.8040 (2.8824) loss_scale 512.0000 (393.0327) mem 9655MB [2024-08-04 10:28:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][590/625] eta 0:00:09 lr 0.000059 wd 0.0500 time 0.2514 (0.2632) data time 0.0009 (0.0016) model time 0.2504 (0.2615) loss 6.0911 (5.5029) grad_norm 2.3355 (2.8934) loss_scale 512.0000 (395.0457) mem 9655MB [2024-08-04 10:28:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][600/625] eta 0:00:06 lr 0.000059 wd 0.0500 time 0.2498 (0.2631) data time 0.0008 (0.0016) model time 0.2489 (0.2614) loss 6.2808 (5.5014) grad_norm 2.8446 (2.8962) loss_scale 512.0000 (396.9917) mem 9655MB [2024-08-04 10:28:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][610/625] eta 0:00:03 lr 0.000059 wd 0.0500 time 0.2523 (0.2629) data time 0.0004 (0.0016) model time 0.2519 (0.2613) loss 6.5801 (5.4978) grad_norm 4.6640 (2.9002) loss_scale 512.0000 (398.8740) mem 9655MB [2024-08-04 10:28:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][620/625] eta 0:00:01 lr 0.000059 wd 0.0500 time 0.2501 (0.2628) data time 0.0003 (0.0016) model time 0.2498 (0.2611) loss 6.5599 (5.4947) grad_norm 3.1315 (2.8988) loss_scale 512.0000 (400.6957) mem 9655MB [2024-08-04 10:28:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 274 training takes 0:02:44 [2024-08-04 10:28:50 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:28:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.6050 (0.6050) Acc@1 90.137 (90.137) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.090) Loss 0.9131 (0.7212) Acc@1 81.934 (87.180) Acc@5 96.289 (97.838) Mem 9655MB [2024-08-04 10:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.074) Loss 0.9995 (0.8367) Acc@1 79.053 (84.173) Acc@5 95.654 (96.710) Mem 9655MB [2024-08-04 10:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.857 Acc@5 96.735 [2024-08-04 10:28:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.707 (0.707) Loss 0.5874 (0.5874) Acc@1 90.332 (90.332) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8940 (0.7064) Acc@1 82.031 (87.123) Acc@5 96.387 (97.785) Mem 9655MB [2024-08-04 10:28:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0078 (0.8252) Acc@1 78.760 (84.073) Acc@5 95.654 (96.654) Mem 9655MB [2024-08-04 10:28:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.703 Acc@5 96.671 [2024-08-04 10:28:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:28:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.70% [2024-08-04 10:28:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:28:55 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][0/625] eta 0:07:07 lr 0.000059 wd 0.0500 time 0.6836 (0.6836) data time 0.4396 (0.4396) model time 0.0000 (0.0000) loss 6.4052 (6.4052) grad_norm 2.5473 (2.5473) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][10/625] eta 0:03:13 lr 0.000059 wd 0.0500 time 0.2591 (0.3150) data time 0.0008 (0.0408) model time 0.0000 (0.0000) loss 4.8768 (5.3578) grad_norm 3.7679 (2.4085) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][20/625] eta 0:02:53 lr 0.000059 wd 0.0500 time 0.2524 (0.2872) data time 0.0007 (0.0218) model time 0.0000 (0.0000) loss 5.6424 (5.4159) grad_norm 1.6992 (2.3493) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][30/625] eta 0:02:48 lr 0.000059 wd 0.0500 time 0.2564 (0.2829) data time 0.0006 (0.0151) model time 0.0000 (0.0000) loss 5.0987 (5.3632) grad_norm 3.1358 (2.5236) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][40/625] eta 0:02:44 lr 0.000058 wd 0.0500 time 0.4648 (0.2814) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 4.7209 (5.4560) grad_norm 6.0079 (2.6385) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][50/625] eta 0:02:41 lr 0.000058 wd 0.0500 time 0.2578 (0.2800) data time 0.0010 (0.0095) model time 0.0000 (0.0000) loss 6.4062 (5.4786) grad_norm 3.7054 (2.6562) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][60/625] eta 0:02:36 lr 0.000058 wd 0.0500 time 0.2586 (0.2761) data time 0.0009 (0.0081) model time 0.2578 (0.2552) loss 5.0051 (5.4818) grad_norm 1.5737 (2.6045) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][70/625] eta 0:02:31 lr 0.000058 wd 0.0500 time 0.2538 (0.2733) data time 0.0009 (0.0071) model time 0.2529 (0.2553) loss 5.8136 (5.4699) grad_norm 6.3873 (2.6447) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][80/625] eta 0:02:27 lr 0.000058 wd 0.0500 time 0.2563 (0.2712) data time 0.0010 (0.0063) model time 0.2552 (0.2552) loss 4.9784 (5.4037) grad_norm 2.3682 (2.5800) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][90/625] eta 0:02:24 lr 0.000058 wd 0.0500 time 0.2539 (0.2696) data time 0.0019 (0.0058) model time 0.2519 (0.2553) loss 5.5116 (5.3949) grad_norm 2.3164 (2.5399) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][100/625] eta 0:02:21 lr 0.000058 wd 0.0500 time 0.2520 (0.2703) data time 0.0007 (0.0053) model time 0.2514 (0.2595) loss 5.9936 (5.4287) grad_norm 1.9688 (2.5571) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][110/625] eta 0:02:19 lr 0.000058 wd 0.0500 time 0.2548 (0.2711) data time 0.0010 (0.0049) model time 0.2538 (0.2626) loss 6.0003 (5.4242) grad_norm 1.8194 (2.5742) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][120/625] eta 0:02:16 lr 0.000058 wd 0.0500 time 0.2492 (0.2699) data time 0.0008 (0.0046) model time 0.2484 (0.2615) loss 6.0344 (5.4427) grad_norm 2.4067 (2.5430) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][130/625] eta 0:02:13 lr 0.000058 wd 0.0500 time 0.2558 (0.2689) data time 0.0006 (0.0043) model time 0.2552 (0.2608) loss 5.4200 (5.4295) grad_norm 1.7343 (2.5103) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][140/625] eta 0:02:10 lr 0.000058 wd 0.0500 time 0.2508 (0.2693) data time 0.0007 (0.0041) model time 0.2500 (0.2623) loss 5.4137 (5.4279) grad_norm 2.2191 (2.6597) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][150/625] eta 0:02:07 lr 0.000058 wd 0.0500 time 0.2553 (0.2684) data time 0.0007 (0.0038) model time 0.2546 (0.2616) loss 6.0135 (5.4200) grad_norm 4.9223 (2.8270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][160/625] eta 0:02:04 lr 0.000058 wd 0.0500 time 0.2507 (0.2677) data time 0.0010 (0.0037) model time 0.2497 (0.2610) loss 6.3089 (5.4103) grad_norm 1.7402 (2.8270) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][170/625] eta 0:02:01 lr 0.000058 wd 0.0500 time 0.2524 (0.2670) data time 0.0006 (0.0035) model time 0.2518 (0.2605) loss 4.8413 (5.4087) grad_norm 2.4850 (2.9884) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][180/625] eta 0:01:58 lr 0.000058 wd 0.0500 time 0.2536 (0.2671) data time 0.0007 (0.0034) model time 0.2529 (0.2610) loss 4.7701 (5.4184) grad_norm 2.1345 (2.9512) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][190/625] eta 0:01:56 lr 0.000058 wd 0.0500 time 0.2548 (0.2677) data time 0.0009 (0.0032) model time 0.2539 (0.2623) loss 5.9718 (5.4308) grad_norm 1.8947 (2.9355) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][200/625] eta 0:01:53 lr 0.000058 wd 0.0500 time 0.2551 (0.2681) data time 0.0006 (0.0031) model time 0.2545 (0.2632) loss 5.3861 (5.4431) grad_norm 2.1602 (2.9052) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][210/625] eta 0:01:51 lr 0.000058 wd 0.0500 time 0.2566 (0.2676) data time 0.0008 (0.0030) model time 0.2559 (0.2627) loss 5.3571 (5.4529) grad_norm 4.1021 (2.8983) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][220/625] eta 0:01:48 lr 0.000058 wd 0.0500 time 0.2539 (0.2671) data time 0.0009 (0.0029) model time 0.2530 (0.2622) loss 6.6349 (5.4570) grad_norm 2.8083 (2.9438) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:29:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][230/625] eta 0:01:45 lr 0.000058 wd 0.0500 time 0.2510 (0.2666) data time 0.0009 (0.0028) model time 0.2501 (0.2619) loss 5.1776 (5.4509) grad_norm 3.1280 (2.9417) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][240/625] eta 0:01:43 lr 0.000058 wd 0.0500 time 0.2583 (0.2677) data time 0.0007 (0.0027) model time 0.2576 (0.2634) loss 6.1206 (5.4513) grad_norm 1.9887 (2.9268) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][250/625] eta 0:01:40 lr 0.000057 wd 0.0500 time 0.2524 (0.2673) data time 0.0010 (0.0027) model time 0.2514 (0.2631) loss 6.3781 (5.4531) grad_norm 2.3927 (2.9196) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][260/625] eta 0:01:37 lr 0.000057 wd 0.0500 time 0.2580 (0.2670) data time 0.0007 (0.0026) model time 0.2573 (0.2629) loss 4.8298 (5.4480) grad_norm 1.8617 (3.0211) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][270/625] eta 0:01:34 lr 0.000057 wd 0.0500 time 0.2526 (0.2672) data time 0.0006 (0.0026) model time 0.2520 (0.2633) loss 5.1519 (5.4476) grad_norm 2.4261 (3.0033) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][280/625] eta 0:01:32 lr 0.000057 wd 0.0500 time 0.2565 (0.2668) data time 0.0010 (0.0025) model time 0.2555 (0.2629) loss 4.6095 (5.4415) grad_norm 2.2919 (2.9914) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][290/625] eta 0:01:29 lr 0.000057 wd 0.0500 time 0.2537 (0.2665) data time 0.0009 (0.0024) model time 0.2528 (0.2626) loss 5.5133 (5.4380) grad_norm 1.8199 (2.9753) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][300/625] eta 0:01:26 lr 0.000057 wd 0.0500 time 0.2576 (0.2662) data time 0.0012 (0.0024) model time 0.2564 (0.2624) loss 4.9770 (5.4413) grad_norm 3.6746 (2.9970) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][310/625] eta 0:01:23 lr 0.000057 wd 0.0500 time 0.2572 (0.2660) data time 0.0007 (0.0023) model time 0.2565 (0.2623) loss 4.2957 (5.4328) grad_norm 2.2934 (2.9944) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][320/625] eta 0:01:21 lr 0.000057 wd 0.0500 time 0.2557 (0.2657) data time 0.0011 (0.0023) model time 0.2547 (0.2621) loss 5.9577 (5.4392) grad_norm 2.3354 (3.0178) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][330/625] eta 0:01:18 lr 0.000057 wd 0.0500 time 0.2573 (0.2660) data time 0.0008 (0.0023) model time 0.2565 (0.2625) loss 5.6632 (5.4443) grad_norm 2.0335 (3.0012) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:30:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][340/625] eta 0:01:15 lr 0.000057 wd 0.0500 time 0.2555 (0.2656) data time 0.0007 (0.0022) model time 0.2548 (0.2622) loss 5.4893 (5.4539) grad_norm 2.4613 (inf) loss_scale 256.0000 (505.9941) mem 9655MB [2024-08-04 10:30:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][350/625] eta 0:01:13 lr 0.000057 wd 0.0500 time 0.2572 (0.2661) data time 0.0011 (0.0022) model time 0.2562 (0.2627) loss 4.4321 (5.4536) grad_norm 1.7879 (inf) loss_scale 256.0000 (498.8718) mem 9655MB [2024-08-04 10:30:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][360/625] eta 0:01:10 lr 0.000057 wd 0.0500 time 0.2563 (0.2663) data time 0.0007 (0.0021) model time 0.2556 (0.2631) loss 5.2619 (5.4623) grad_norm 2.8742 (inf) loss_scale 256.0000 (492.1440) mem 9655MB [2024-08-04 10:30:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][370/625] eta 0:01:07 lr 0.000057 wd 0.0500 time 0.2548 (0.2660) data time 0.0009 (0.0021) model time 0.2539 (0.2629) loss 5.1431 (5.4614) grad_norm 11.1834 (inf) loss_scale 256.0000 (485.7790) mem 9655MB [2024-08-04 10:30:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][380/625] eta 0:01:05 lr 0.000057 wd 0.0500 time 0.3826 (0.2661) data time 0.0007 (0.0021) model time 0.3819 (0.2630) loss 6.1691 (5.4656) grad_norm 5.8657 (inf) loss_scale 256.0000 (479.7480) mem 9655MB [2024-08-04 10:30:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][390/625] eta 0:01:02 lr 0.000057 wd 0.0500 time 0.2543 (0.2659) data time 0.0006 (0.0021) model time 0.2537 (0.2629) loss 4.9244 (5.4669) grad_norm 1.5262 (inf) loss_scale 256.0000 (474.0256) mem 9655MB [2024-08-04 10:30:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][400/625] eta 0:00:59 lr 0.000057 wd 0.0500 time 0.2560 (0.2657) data time 0.0010 (0.0020) model time 0.2550 (0.2626) loss 5.1993 (5.4664) grad_norm 2.5557 (inf) loss_scale 256.0000 (468.5885) mem 9655MB [2024-08-04 10:30:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][410/625] eta 0:00:57 lr 0.000057 wd 0.0500 time 0.4202 (0.2668) data time 0.0009 (0.0020) model time 0.4193 (0.2640) loss 5.3664 (5.4712) grad_norm 2.3546 (inf) loss_scale 256.0000 (463.4161) mem 9655MB [2024-08-04 10:30:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][420/625] eta 0:00:54 lr 0.000057 wd 0.0500 time 0.2578 (0.2665) data time 0.0008 (0.0020) model time 0.2569 (0.2637) loss 5.3704 (5.4747) grad_norm 2.2556 (inf) loss_scale 256.0000 (458.4893) mem 9655MB [2024-08-04 10:30:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][430/625] eta 0:00:51 lr 0.000057 wd 0.0500 time 0.2573 (0.2663) data time 0.0008 (0.0019) model time 0.2566 (0.2635) loss 5.6767 (5.4773) grad_norm 2.3967 (inf) loss_scale 256.0000 (453.7912) mem 9655MB [2024-08-04 10:30:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][440/625] eta 0:00:49 lr 0.000057 wd 0.0500 time 0.2582 (0.2665) data time 0.0008 (0.0019) model time 0.2573 (0.2638) loss 4.7476 (5.4748) grad_norm 3.3865 (inf) loss_scale 256.0000 (449.3061) mem 9655MB [2024-08-04 10:30:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][450/625] eta 0:00:46 lr 0.000057 wd 0.0500 time 0.2568 (0.2662) data time 0.0008 (0.0019) model time 0.2560 (0.2635) loss 5.2612 (5.4760) grad_norm 2.2135 (inf) loss_scale 256.0000 (445.0200) mem 9655MB [2024-08-04 10:30:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][460/625] eta 0:00:43 lr 0.000056 wd 0.0500 time 0.2568 (0.2660) data time 0.0011 (0.0019) model time 0.2557 (0.2633) loss 5.6742 (5.4757) grad_norm 1.9270 (inf) loss_scale 256.0000 (440.9197) mem 9655MB [2024-08-04 10:31:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][470/625] eta 0:00:41 lr 0.000056 wd 0.0500 time 0.2574 (0.2662) data time 0.0009 (0.0019) model time 0.2564 (0.2635) loss 4.7654 (5.4779) grad_norm 2.4855 (inf) loss_scale 256.0000 (436.9936) mem 9655MB [2024-08-04 10:31:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][480/625] eta 0:00:38 lr 0.000056 wd 0.0500 time 0.2570 (0.2659) data time 0.0010 (0.0018) model time 0.2560 (0.2633) loss 4.5760 (5.4685) grad_norm 1.6833 (inf) loss_scale 256.0000 (433.2308) mem 9655MB [2024-08-04 10:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][490/625] eta 0:00:35 lr 0.000056 wd 0.0500 time 0.2581 (0.2658) data time 0.0017 (0.0018) model time 0.2564 (0.2632) loss 5.3939 (5.4737) grad_norm 2.5087 (inf) loss_scale 256.0000 (429.6212) mem 9655MB [2024-08-04 10:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][500/625] eta 0:00:33 lr 0.000056 wd 0.0500 time 0.2547 (0.2656) data time 0.0009 (0.0018) model time 0.2538 (0.2630) loss 6.0691 (5.4801) grad_norm 3.2695 (inf) loss_scale 256.0000 (426.1557) mem 9655MB [2024-08-04 10:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][510/625] eta 0:00:30 lr 0.000056 wd 0.0500 time 0.2572 (0.2654) data time 0.0008 (0.0018) model time 0.2564 (0.2629) loss 4.6070 (5.4791) grad_norm 2.2473 (inf) loss_scale 256.0000 (422.8258) mem 9655MB [2024-08-04 10:31:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][520/625] eta 0:00:27 lr 0.000056 wd 0.0500 time 0.2520 (0.2654) data time 0.0009 (0.0018) model time 0.2510 (0.2629) loss 6.2207 (5.4803) grad_norm 2.6186 (inf) loss_scale 256.0000 (419.6238) mem 9655MB [2024-08-04 10:31:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][530/625] eta 0:00:25 lr 0.000056 wd 0.0500 time 0.2572 (0.2652) data time 0.0010 (0.0018) model time 0.2562 (0.2627) loss 6.8182 (5.4878) grad_norm 1.8105 (inf) loss_scale 256.0000 (416.5424) mem 9655MB [2024-08-04 10:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][540/625] eta 0:00:22 lr 0.000056 wd 0.0500 time 0.2539 (0.2651) data time 0.0008 (0.0017) model time 0.2531 (0.2626) loss 5.3203 (5.4860) grad_norm 4.0528 (inf) loss_scale 256.0000 (413.5749) mem 9655MB [2024-08-04 10:31:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][550/625] eta 0:00:19 lr 0.000056 wd 0.0500 time 0.2558 (0.2649) data time 0.0009 (0.0017) model time 0.2549 (0.2625) loss 4.5959 (5.4818) grad_norm 30.6413 (inf) loss_scale 256.0000 (410.7151) mem 9655MB [2024-08-04 10:31:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][560/625] eta 0:00:17 lr 0.000056 wd 0.0500 time 0.4695 (0.2652) data time 0.0010 (0.0017) model time 0.4685 (0.2627) loss 6.5907 (5.4893) grad_norm 2.8773 (inf) loss_scale 256.0000 (407.9572) mem 9655MB [2024-08-04 10:31:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][570/625] eta 0:00:14 lr 0.000056 wd 0.0500 time 0.2589 (0.2650) data time 0.0009 (0.0017) model time 0.2579 (0.2626) loss 5.8468 (5.4942) grad_norm 1.8376 (inf) loss_scale 256.0000 (405.2960) mem 9655MB [2024-08-04 10:31:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][580/625] eta 0:00:11 lr 0.000056 wd 0.0500 time 0.2525 (0.2648) data time 0.0009 (0.0017) model time 0.2516 (0.2624) loss 4.9467 (5.4965) grad_norm 2.1788 (inf) loss_scale 256.0000 (402.7263) mem 9655MB [2024-08-04 10:31:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][590/625] eta 0:00:09 lr 0.000056 wd 0.0500 time 0.2528 (0.2647) data time 0.0011 (0.0017) model time 0.2517 (0.2623) loss 5.7971 (5.4953) grad_norm 1.7668 (inf) loss_scale 256.0000 (400.2437) mem 9655MB [2024-08-04 10:31:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][600/625] eta 0:00:06 lr 0.000056 wd 0.0500 time 0.2594 (0.2649) data time 0.0006 (0.0017) model time 0.2589 (0.2626) loss 4.8365 (5.4921) grad_norm 4.3570 (inf) loss_scale 256.0000 (397.8436) mem 9655MB [2024-08-04 10:31:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][610/625] eta 0:00:03 lr 0.000056 wd 0.0500 time 0.2534 (0.2648) data time 0.0005 (0.0017) model time 0.2530 (0.2624) loss 4.3791 (5.4946) grad_norm 3.7308 (inf) loss_scale 256.0000 (395.5221) mem 9655MB [2024-08-04 10:31:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][620/625] eta 0:00:01 lr 0.000056 wd 0.0500 time 0.2527 (0.2649) data time 0.0005 (0.0016) model time 0.2522 (0.2626) loss 6.1079 (5.4956) grad_norm 2.9199 (inf) loss_scale 256.0000 (393.2754) mem 9655MB [2024-08-04 10:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 275 training takes 0:02:45 [2024-08-04 10:31:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:31:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.6001 (0.6001) Acc@1 90.430 (90.430) Acc@5 98.926 (98.926) Mem 9655MB [2024-08-04 10:31:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9062 (0.7156) Acc@1 81.689 (87.291) Acc@5 96.582 (97.869) Mem 9655MB [2024-08-04 10:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0039 (0.8331) Acc@1 78.662 (84.245) Acc@5 95.557 (96.717) Mem 9655MB [2024-08-04 10:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.861 Acc@5 96.745 [2024-08-04 10:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:31:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.789 (0.789) Loss 0.5869 (0.5869) Acc@1 90.283 (90.283) Acc@5 98.633 (98.633) Mem 9655MB [2024-08-04 10:31:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.8955 (0.7064) Acc@1 81.934 (87.096) Acc@5 96.484 (97.794) Mem 9655MB [2024-08-04 10:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 1.0049 (0.8250) Acc@1 78.857 (84.061) Acc@5 95.654 (96.661) Mem 9655MB [2024-08-04 10:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.697 Acc@5 96.677 [2024-08-04 10:31:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][0/625] eta 0:11:12 lr 0.000056 wd 0.0500 time 1.0764 (1.0764) data time 0.4405 (0.4405) model time 0.0000 (0.0000) loss 6.1983 (6.1983) grad_norm 3.0391 (3.0391) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:31:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][10/625] eta 0:03:24 lr 0.000056 wd 0.0500 time 0.2595 (0.3323) data time 0.0009 (0.0409) model time 0.0000 (0.0000) loss 5.7337 (5.7789) grad_norm 3.2093 (3.0617) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:31:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][20/625] eta 0:03:05 lr 0.000056 wd 0.0500 time 0.2652 (0.3063) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 4.8895 (5.4063) grad_norm 3.0689 (3.6931) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][30/625] eta 0:02:52 lr 0.000056 wd 0.0500 time 0.2593 (0.2905) data time 0.0006 (0.0151) model time 0.0000 (0.0000) loss 4.8403 (5.4819) grad_norm 2.1715 (3.4110) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:31:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][40/625] eta 0:02:45 lr 0.000055 wd 0.0500 time 0.2612 (0.2824) data time 0.0007 (0.0116) model time 0.0000 (0.0000) loss 5.8706 (5.4716) grad_norm 2.5299 (3.1088) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][50/625] eta 0:02:41 lr 0.000055 wd 0.0500 time 0.2580 (0.2810) data time 0.0007 (0.0096) model time 0.0000 (0.0000) loss 6.3345 (5.5032) grad_norm 2.6379 (3.0103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][60/625] eta 0:02:36 lr 0.000055 wd 0.0500 time 0.2611 (0.2772) data time 0.0009 (0.0081) model time 0.2602 (0.2566) loss 5.8539 (5.4781) grad_norm 1.6151 (2.9228) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][70/625] eta 0:02:33 lr 0.000055 wd 0.0500 time 0.2538 (0.2767) data time 0.0008 (0.0071) model time 0.2530 (0.2647) loss 4.5506 (5.4956) grad_norm 2.2535 (2.8937) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][80/625] eta 0:02:29 lr 0.000055 wd 0.0500 time 0.2493 (0.2740) data time 0.0007 (0.0064) model time 0.2486 (0.2612) loss 4.8190 (5.5045) grad_norm 2.6171 (2.9499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][90/625] eta 0:02:25 lr 0.000055 wd 0.0500 time 0.2548 (0.2722) data time 0.0007 (0.0058) model time 0.2542 (0.2601) loss 6.3624 (5.5324) grad_norm 2.0266 (2.9572) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][100/625] eta 0:02:22 lr 0.000055 wd 0.0500 time 0.2503 (0.2706) data time 0.0007 (0.0053) model time 0.2496 (0.2590) loss 5.8024 (5.5531) grad_norm 2.8389 (2.9894) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][110/625] eta 0:02:18 lr 0.000055 wd 0.0500 time 0.2590 (0.2693) data time 0.0007 (0.0049) model time 0.2583 (0.2585) loss 5.5881 (5.5288) grad_norm 1.6912 (2.9722) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][120/625] eta 0:02:15 lr 0.000055 wd 0.0500 time 0.2577 (0.2682) data time 0.0011 (0.0046) model time 0.2566 (0.2580) loss 5.5510 (5.5324) grad_norm 1.6037 (2.9251) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][130/625] eta 0:02:12 lr 0.000055 wd 0.0500 time 0.2495 (0.2684) data time 0.0010 (0.0043) model time 0.2485 (0.2595) loss 5.2799 (5.5249) grad_norm 2.2388 (2.9147) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][140/625] eta 0:02:10 lr 0.000055 wd 0.0500 time 0.2546 (0.2689) data time 0.0009 (0.0040) model time 0.2537 (0.2611) loss 5.1685 (5.5153) grad_norm 3.0441 (2.8839) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][150/625] eta 0:02:07 lr 0.000055 wd 0.0500 time 0.2597 (0.2681) data time 0.0007 (0.0038) model time 0.2589 (0.2606) loss 4.3945 (5.4963) grad_norm 2.1508 (2.8494) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][160/625] eta 0:02:04 lr 0.000055 wd 0.0500 time 0.2540 (0.2673) data time 0.0008 (0.0037) model time 0.2532 (0.2601) loss 4.9508 (5.4898) grad_norm 1.7459 (2.8093) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][170/625] eta 0:02:01 lr 0.000055 wd 0.0500 time 0.2477 (0.2667) data time 0.0008 (0.0035) model time 0.2469 (0.2596) loss 4.4117 (5.4785) grad_norm 7.2471 (2.8976) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][180/625] eta 0:01:58 lr 0.000055 wd 0.0500 time 0.2640 (0.2671) data time 0.0007 (0.0034) model time 0.2633 (0.2608) loss 4.3592 (5.4825) grad_norm 1.9245 (2.8700) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][190/625] eta 0:01:55 lr 0.000055 wd 0.0500 time 0.2538 (0.2665) data time 0.0009 (0.0032) model time 0.2529 (0.2603) loss 6.3574 (5.4887) grad_norm 1.8441 (2.8399) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][200/625] eta 0:01:53 lr 0.000055 wd 0.0500 time 0.2557 (0.2660) data time 0.0008 (0.0031) model time 0.2549 (0.2600) loss 5.4797 (5.4956) grad_norm 1.8355 (2.8762) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][210/625] eta 0:01:50 lr 0.000055 wd 0.0500 time 0.2533 (0.2655) data time 0.0013 (0.0030) model time 0.2520 (0.2597) loss 6.1473 (5.4955) grad_norm 2.6804 (2.8731) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][220/625] eta 0:01:48 lr 0.000055 wd 0.0500 time 0.4600 (0.2669) data time 0.0007 (0.0029) model time 0.4592 (0.2618) loss 6.2107 (5.4829) grad_norm 2.4063 (2.8624) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][230/625] eta 0:01:45 lr 0.000055 wd 0.0500 time 0.2584 (0.2673) data time 0.0012 (0.0028) model time 0.2572 (0.2624) loss 6.1179 (5.4844) grad_norm 2.0886 (3.1311) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][240/625] eta 0:01:43 lr 0.000055 wd 0.0500 time 0.2556 (0.2676) data time 0.0009 (0.0028) model time 0.2546 (0.2631) loss 5.5371 (5.4884) grad_norm 1.8586 (3.1284) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][250/625] eta 0:01:40 lr 0.000055 wd 0.0500 time 0.2556 (0.2682) data time 0.0007 (0.0027) model time 0.2549 (0.2640) loss 5.7828 (5.4955) grad_norm 2.3521 (3.1253) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][260/625] eta 0:01:37 lr 0.000054 wd 0.0500 time 0.2610 (0.2678) data time 0.0009 (0.0026) model time 0.2601 (0.2636) loss 6.1129 (5.5072) grad_norm 2.6103 (3.1091) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:32:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][270/625] eta 0:01:34 lr 0.000054 wd 0.0500 time 0.2562 (0.2674) data time 0.0008 (0.0026) model time 0.2554 (0.2633) loss 4.7495 (5.4946) grad_norm 2.7769 (3.1012) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][280/625] eta 0:01:32 lr 0.000054 wd 0.0500 time 0.2528 (0.2670) data time 0.0009 (0.0025) model time 0.2519 (0.2629) loss 4.6379 (5.4883) grad_norm 2.1286 (3.0839) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][290/625] eta 0:01:29 lr 0.000054 wd 0.0500 time 0.2519 (0.2665) data time 0.0009 (0.0025) model time 0.2511 (0.2625) loss 4.9867 (5.4807) grad_norm 2.1201 (3.0943) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][300/625] eta 0:01:26 lr 0.000054 wd 0.0500 time 0.2545 (0.2671) data time 0.0007 (0.0024) model time 0.2538 (0.2633) loss 5.2148 (5.4751) grad_norm 5.8809 (3.0923) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][310/625] eta 0:01:24 lr 0.000054 wd 0.0500 time 0.2568 (0.2667) data time 0.0007 (0.0024) model time 0.2562 (0.2630) loss 4.4632 (5.4837) grad_norm 3.3596 (3.0779) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][320/625] eta 0:01:21 lr 0.000054 wd 0.0500 time 0.2536 (0.2664) data time 0.0011 (0.0023) model time 0.2525 (0.2627) loss 5.6633 (5.4769) grad_norm 2.8593 (3.0615) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][330/625] eta 0:01:18 lr 0.000054 wd 0.0500 time 0.2533 (0.2661) data time 0.0008 (0.0023) model time 0.2525 (0.2624) loss 5.1084 (5.4753) grad_norm 1.6875 (3.0497) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][340/625] eta 0:01:15 lr 0.000054 wd 0.0500 time 0.2547 (0.2657) data time 0.0009 (0.0022) model time 0.2537 (0.2621) loss 5.4700 (5.4759) grad_norm 2.8519 (3.0279) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][350/625] eta 0:01:13 lr 0.000054 wd 0.0500 time 0.2528 (0.2660) data time 0.0011 (0.0022) model time 0.2518 (0.2625) loss 6.4924 (5.4818) grad_norm 2.2177 (3.0139) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][360/625] eta 0:01:10 lr 0.000054 wd 0.0500 time 0.2564 (0.2658) data time 0.0010 (0.0022) model time 0.2554 (0.2623) loss 5.8807 (5.4801) grad_norm 3.6576 (3.0115) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][370/625] eta 0:01:07 lr 0.000054 wd 0.0500 time 0.2553 (0.2655) data time 0.0012 (0.0021) model time 0.2542 (0.2621) loss 4.6926 (5.4817) grad_norm 3.2888 (3.0008) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][380/625] eta 0:01:04 lr 0.000054 wd 0.0500 time 0.2551 (0.2652) data time 0.0008 (0.0021) model time 0.2543 (0.2618) loss 4.1150 (5.4761) grad_norm 2.0751 (2.9930) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][390/625] eta 0:01:02 lr 0.000054 wd 0.0500 time 0.2531 (0.2650) data time 0.0011 (0.0021) model time 0.2521 (0.2616) loss 5.8780 (5.4784) grad_norm 9.0097 (2.9897) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][400/625] eta 0:00:59 lr 0.000054 wd 0.0500 time 0.2583 (0.2648) data time 0.0008 (0.0020) model time 0.2575 (0.2614) loss 5.8479 (5.4756) grad_norm 3.0256 (2.9740) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][410/625] eta 0:00:56 lr 0.000054 wd 0.0500 time 0.2535 (0.2646) data time 0.0009 (0.0020) model time 0.2526 (0.2613) loss 6.1938 (5.4804) grad_norm 2.8622 (2.9651) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][420/625] eta 0:00:54 lr 0.000054 wd 0.0500 time 0.2540 (0.2644) data time 0.0009 (0.0020) model time 0.2531 (0.2611) loss 4.4866 (5.4822) grad_norm 2.3012 (2.9549) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][430/625] eta 0:00:51 lr 0.000054 wd 0.0500 time 0.2520 (0.2642) data time 0.0010 (0.0020) model time 0.2510 (0.2609) loss 5.7871 (5.4850) grad_norm 3.2076 (2.9668) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][440/625] eta 0:00:48 lr 0.000054 wd 0.0500 time 0.2539 (0.2640) data time 0.0007 (0.0020) model time 0.2532 (0.2608) loss 5.4566 (5.4826) grad_norm 2.1336 (2.9829) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][450/625] eta 0:00:46 lr 0.000054 wd 0.0500 time 0.3940 (0.2646) data time 0.0006 (0.0019) model time 0.3933 (0.2616) loss 5.8686 (5.4736) grad_norm 2.0910 (2.9706) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][460/625] eta 0:00:43 lr 0.000054 wd 0.0500 time 0.2577 (0.2647) data time 0.0006 (0.0019) model time 0.2571 (0.2617) loss 6.2132 (5.4794) grad_norm 2.9286 (2.9701) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][470/625] eta 0:00:40 lr 0.000053 wd 0.0500 time 0.2573 (0.2645) data time 0.0008 (0.0019) model time 0.2565 (0.2615) loss 5.4179 (5.4804) grad_norm 2.3185 (2.9720) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][480/625] eta 0:00:38 lr 0.000053 wd 0.0500 time 0.2541 (0.2647) data time 0.0014 (0.0019) model time 0.2528 (0.2618) loss 5.6298 (5.4807) grad_norm 2.8968 (2.9668) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][490/625] eta 0:00:35 lr 0.000053 wd 0.0500 time 0.2567 (0.2646) data time 0.0008 (0.0018) model time 0.2559 (0.2617) loss 5.1806 (5.4741) grad_norm 1.9461 (2.9537) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][500/625] eta 0:00:33 lr 0.000053 wd 0.0500 time 0.2613 (0.2644) data time 0.0007 (0.0018) model time 0.2606 (0.2616) loss 5.0048 (5.4742) grad_norm 5.2656 (2.9543) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][510/625] eta 0:00:30 lr 0.000053 wd 0.0500 time 0.2588 (0.2642) data time 0.0009 (0.0018) model time 0.2579 (0.2614) loss 4.9003 (5.4713) grad_norm 3.5237 (2.9499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][520/625] eta 0:00:27 lr 0.000053 wd 0.0500 time 0.2652 (0.2645) data time 0.0006 (0.0018) model time 0.2647 (0.2617) loss 4.9752 (5.4663) grad_norm 1.7686 (2.9368) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][530/625] eta 0:00:25 lr 0.000053 wd 0.0500 time 0.2625 (0.2643) data time 0.0006 (0.0018) model time 0.2619 (0.2616) loss 5.7781 (5.4626) grad_norm 2.4349 (2.9365) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][540/625] eta 0:00:22 lr 0.000053 wd 0.0500 time 0.2560 (0.2641) data time 0.0007 (0.0018) model time 0.2554 (0.2614) loss 6.7041 (5.4636) grad_norm 4.4285 (2.9424) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][550/625] eta 0:00:19 lr 0.000053 wd 0.0500 time 0.2553 (0.2640) data time 0.0009 (0.0017) model time 0.2544 (0.2613) loss 4.6574 (5.4642) grad_norm 3.3548 (2.9536) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][560/625] eta 0:00:17 lr 0.000053 wd 0.0500 time 0.4279 (0.2641) data time 0.0007 (0.0017) model time 0.4272 (0.2615) loss 6.1439 (5.4693) grad_norm 3.2485 (2.9624) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][570/625] eta 0:00:14 lr 0.000053 wd 0.0500 time 0.2507 (0.2640) data time 0.0011 (0.0017) model time 0.2496 (0.2613) loss 6.0570 (5.4705) grad_norm 3.2001 (2.9621) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][580/625] eta 0:00:11 lr 0.000053 wd 0.0500 time 0.2556 (0.2638) data time 0.0008 (0.0017) model time 0.2549 (0.2612) loss 5.1539 (5.4672) grad_norm 2.3999 (2.9608) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][590/625] eta 0:00:09 lr 0.000053 wd 0.0500 time 0.2572 (0.2637) data time 0.0007 (0.0017) model time 0.2564 (0.2611) loss 5.8575 (5.4645) grad_norm 2.3362 (2.9544) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][600/625] eta 0:00:06 lr 0.000053 wd 0.0500 time 0.2565 (0.2639) data time 0.0008 (0.0017) model time 0.2557 (0.2614) loss 6.3284 (5.4631) grad_norm 2.8880 (2.9453) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][610/625] eta 0:00:03 lr 0.000053 wd 0.0500 time 0.2519 (0.2638) data time 0.0005 (0.0017) model time 0.2514 (0.2613) loss 5.5022 (5.4598) grad_norm 2.1627 (2.9441) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][620/625] eta 0:00:01 lr 0.000053 wd 0.0500 time 0.2535 (0.2636) data time 0.0005 (0.0017) model time 0.2530 (0.2611) loss 4.6773 (5.4553) grad_norm 9.1458 (2.9615) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 276 training takes 0:02:44 [2024-08-04 10:34:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:34:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.5977 (0.5977) Acc@1 90.479 (90.479) Acc@5 98.975 (98.975) Mem 9655MB [2024-08-04 10:34:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9004 (0.7145) Acc@1 82.080 (87.229) Acc@5 96.631 (97.905) Mem 9655MB [2024-08-04 10:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0039 (0.8305) Acc@1 79.150 (84.215) Acc@5 95.459 (96.770) Mem 9655MB [2024-08-04 10:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.865 Acc@5 96.789 [2024-08-04 10:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:34:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.909 (0.909) Loss 0.5874 (0.5874) Acc@1 90.283 (90.283) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.134) Loss 0.8950 (0.7065) Acc@1 82.080 (87.123) Acc@5 96.436 (97.789) Mem 9655MB [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.096) Loss 1.0049 (0.8248) Acc@1 78.760 (84.080) Acc@5 95.654 (96.675) Mem 9655MB [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.721 Acc@5 96.691 [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.72% [2024-08-04 10:34:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:34:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:34:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][0/625] eta 0:07:38 lr 0.000053 wd 0.0500 time 0.7330 (0.7330) data time 0.4953 (0.4953) model time 0.0000 (0.0000) loss 5.7688 (5.7688) grad_norm 2.0544 (2.0544) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][10/625] eta 0:03:14 lr 0.000053 wd 0.0500 time 0.2559 (0.3156) data time 0.0007 (0.0458) model time 0.0000 (0.0000) loss 5.1603 (5.5626) grad_norm 1.9667 (2.8084) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][20/625] eta 0:02:57 lr 0.000053 wd 0.0500 time 0.2536 (0.2931) data time 0.0007 (0.0244) model time 0.0000 (0.0000) loss 4.7190 (5.4160) grad_norm 2.1763 (2.7854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][30/625] eta 0:02:50 lr 0.000053 wd 0.0500 time 0.2577 (0.2872) data time 0.0006 (0.0168) model time 0.0000 (0.0000) loss 5.0989 (5.4565) grad_norm 4.6798 (2.8577) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][40/625] eta 0:02:49 lr 0.000053 wd 0.0500 time 0.2544 (0.2893) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 4.4350 (5.4391) grad_norm 18.2420 (3.2276) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][50/625] eta 0:02:42 lr 0.000053 wd 0.0500 time 0.2574 (0.2830) data time 0.0007 (0.0106) model time 0.0000 (0.0000) loss 5.0416 (5.4272) grad_norm 1.7409 (3.0963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][60/625] eta 0:02:38 lr 0.000053 wd 0.0500 time 0.3771 (0.2805) data time 0.0009 (0.0091) model time 0.3762 (0.2669) loss 5.8642 (5.3826) grad_norm 3.7116 (3.0107) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][70/625] eta 0:02:33 lr 0.000052 wd 0.0500 time 0.2514 (0.2772) data time 0.0007 (0.0079) model time 0.2507 (0.2613) loss 5.7572 (5.4410) grad_norm 2.2283 (2.9430) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:34:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][80/625] eta 0:02:30 lr 0.000052 wd 0.0500 time 0.2536 (0.2770) data time 0.0012 (0.0070) model time 0.2524 (0.2658) loss 4.6397 (5.4736) grad_norm 31.6755 (3.2805) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][90/625] eta 0:02:26 lr 0.000052 wd 0.0500 time 0.2579 (0.2747) data time 0.0008 (0.0064) model time 0.2571 (0.2630) loss 4.9415 (5.4993) grad_norm 1.6486 (3.1857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][100/625] eta 0:02:24 lr 0.000052 wd 0.0500 time 0.2693 (0.2748) data time 0.0008 (0.0058) model time 0.2685 (0.2655) loss 5.1416 (5.5118) grad_norm 1.7538 (3.0937) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][110/625] eta 0:02:20 lr 0.000052 wd 0.0500 time 0.2517 (0.2733) data time 0.0008 (0.0054) model time 0.2509 (0.2640) loss 5.1200 (5.5289) grad_norm 2.1776 (3.0565) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][120/625] eta 0:02:17 lr 0.000052 wd 0.0500 time 0.2561 (0.2719) data time 0.0009 (0.0050) model time 0.2552 (0.2629) loss 4.5741 (5.4949) grad_norm 2.5578 (3.1097) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][130/625] eta 0:02:13 lr 0.000052 wd 0.0500 time 0.2575 (0.2707) data time 0.0006 (0.0047) model time 0.2569 (0.2619) loss 5.2108 (5.4944) grad_norm 1.9023 (3.0543) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][140/625] eta 0:02:11 lr 0.000052 wd 0.0500 time 0.2549 (0.2711) data time 0.0009 (0.0044) model time 0.2541 (0.2634) loss 6.5177 (5.5116) grad_norm 2.2307 (3.0200) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][150/625] eta 0:02:08 lr 0.000052 wd 0.0500 time 0.2567 (0.2710) data time 0.0006 (0.0042) model time 0.2561 (0.2640) loss 6.3287 (5.5190) grad_norm 2.2177 (2.9947) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][160/625] eta 0:02:06 lr 0.000052 wd 0.0500 time 0.2605 (0.2712) data time 0.0008 (0.0040) model time 0.2597 (0.2648) loss 6.5506 (5.5171) grad_norm 2.5511 (2.9561) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][170/625] eta 0:02:02 lr 0.000052 wd 0.0500 time 0.2515 (0.2703) data time 0.0009 (0.0038) model time 0.2506 (0.2639) loss 4.9388 (5.5068) grad_norm 3.3050 (2.9968) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][180/625] eta 0:01:59 lr 0.000052 wd 0.0500 time 0.2543 (0.2695) data time 0.0007 (0.0036) model time 0.2536 (0.2633) loss 5.8951 (5.4969) grad_norm 2.6174 (3.0159) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][190/625] eta 0:01:56 lr 0.000052 wd 0.0500 time 0.2578 (0.2688) data time 0.0010 (0.0035) model time 0.2568 (0.2627) loss 5.0994 (5.4990) grad_norm 2.0363 (2.9856) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][200/625] eta 0:01:53 lr 0.000052 wd 0.0500 time 0.2592 (0.2682) data time 0.0008 (0.0034) model time 0.2584 (0.2623) loss 5.1385 (5.4975) grad_norm 5.2468 (2.9801) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][210/625] eta 0:01:51 lr 0.000052 wd 0.0500 time 0.2548 (0.2676) data time 0.0010 (0.0033) model time 0.2538 (0.2618) loss 5.3255 (5.5005) grad_norm 3.9269 (2.9988) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][220/625] eta 0:01:48 lr 0.000052 wd 0.0500 time 0.2541 (0.2671) data time 0.0009 (0.0032) model time 0.2532 (0.2615) loss 6.2017 (5.5007) grad_norm 3.3611 (2.9948) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][230/625] eta 0:01:45 lr 0.000052 wd 0.0500 time 0.2519 (0.2667) data time 0.0007 (0.0031) model time 0.2511 (0.2611) loss 6.1315 (5.4907) grad_norm 4.5827 (2.9948) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][240/625] eta 0:01:42 lr 0.000052 wd 0.0500 time 0.2558 (0.2663) data time 0.0006 (0.0030) model time 0.2552 (0.2609) loss 5.5676 (5.4719) grad_norm 3.0398 (3.0135) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][250/625] eta 0:01:39 lr 0.000052 wd 0.0500 time 0.2518 (0.2659) data time 0.0008 (0.0029) model time 0.2510 (0.2606) loss 5.3836 (5.4649) grad_norm 2.1839 (3.0066) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][260/625] eta 0:01:37 lr 0.000052 wd 0.0500 time 0.2534 (0.2663) data time 0.0007 (0.0028) model time 0.2527 (0.2613) loss 6.3841 (5.4613) grad_norm 2.4214 (2.9949) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][270/625] eta 0:01:34 lr 0.000052 wd 0.0500 time 0.2551 (0.2659) data time 0.0018 (0.0028) model time 0.2533 (0.2610) loss 5.7971 (5.4603) grad_norm 2.2446 (2.9751) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][280/625] eta 0:01:31 lr 0.000052 wd 0.0500 time 0.2563 (0.2655) data time 0.0010 (0.0027) model time 0.2553 (0.2607) loss 6.0434 (5.4570) grad_norm 4.4540 (3.0082) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][290/625] eta 0:01:29 lr 0.000051 wd 0.0500 time 0.2590 (0.2659) data time 0.0005 (0.0026) model time 0.2584 (0.2614) loss 5.4144 (5.4514) grad_norm 1.8160 (2.9945) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][300/625] eta 0:01:26 lr 0.000051 wd 0.0500 time 0.2592 (0.2660) data time 0.0008 (0.0026) model time 0.2583 (0.2616) loss 4.3780 (5.4417) grad_norm 2.0200 (3.0140) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:35:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][310/625] eta 0:01:23 lr 0.000051 wd 0.0500 time 0.2563 (0.2657) data time 0.0009 (0.0025) model time 0.2554 (0.2613) loss 5.0755 (5.4409) grad_norm 2.0493 (3.0171) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][320/625] eta 0:01:20 lr 0.000051 wd 0.0500 time 0.2525 (0.2654) data time 0.0009 (0.0025) model time 0.2516 (0.2611) loss 4.1214 (5.4372) grad_norm 2.0227 (3.0023) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][330/625] eta 0:01:18 lr 0.000051 wd 0.0500 time 0.2500 (0.2651) data time 0.0007 (0.0024) model time 0.2492 (0.2609) loss 4.6334 (5.4220) grad_norm 2.0081 (2.9883) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][340/625] eta 0:01:15 lr 0.000051 wd 0.0500 time 0.2527 (0.2652) data time 0.0009 (0.0024) model time 0.2518 (0.2612) loss 5.4001 (5.4175) grad_norm 2.2634 (3.0024) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][350/625] eta 0:01:12 lr 0.000051 wd 0.0500 time 0.2556 (0.2650) data time 0.0008 (0.0023) model time 0.2548 (0.2610) loss 5.1639 (5.4214) grad_norm 2.4977 (2.9907) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][360/625] eta 0:01:10 lr 0.000051 wd 0.0500 time 0.2555 (0.2648) data time 0.0009 (0.0023) model time 0.2546 (0.2608) loss 5.8818 (5.4315) grad_norm 2.0837 (2.9797) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][370/625] eta 0:01:07 lr 0.000051 wd 0.0500 time 0.2557 (0.2650) data time 0.0009 (0.0023) model time 0.2548 (0.2612) loss 5.8119 (5.4393) grad_norm 3.0589 (2.9672) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][380/625] eta 0:01:04 lr 0.000051 wd 0.0500 time 0.2542 (0.2649) data time 0.0010 (0.0022) model time 0.2533 (0.2611) loss 5.5347 (5.4404) grad_norm 7.2550 (2.9704) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][390/625] eta 0:01:02 lr 0.000051 wd 0.0500 time 0.2545 (0.2646) data time 0.0009 (0.0022) model time 0.2536 (0.2609) loss 6.2073 (5.4430) grad_norm 1.9673 (2.9574) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][400/625] eta 0:00:59 lr 0.000051 wd 0.0500 time 0.2575 (0.2650) data time 0.0008 (0.0022) model time 0.2567 (0.2614) loss 5.8156 (5.4381) grad_norm 2.4833 (3.0997) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][410/625] eta 0:00:56 lr 0.000051 wd 0.0500 time 0.2593 (0.2647) data time 0.0009 (0.0021) model time 0.2584 (0.2612) loss 5.3394 (5.4410) grad_norm 4.3805 (3.1145) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][420/625] eta 0:00:54 lr 0.000051 wd 0.0500 time 0.2562 (0.2645) data time 0.0010 (0.0021) model time 0.2552 (0.2611) loss 4.1831 (5.4375) grad_norm 4.2880 (3.1195) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][430/625] eta 0:00:51 lr 0.000051 wd 0.0500 time 0.2572 (0.2643) data time 0.0006 (0.0021) model time 0.2565 (0.2609) loss 5.2598 (5.4339) grad_norm 3.0144 (3.1143) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][440/625] eta 0:00:48 lr 0.000051 wd 0.0500 time 0.2602 (0.2642) data time 0.0009 (0.0021) model time 0.2593 (0.2607) loss 6.3959 (5.4403) grad_norm 3.3269 (3.1207) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][450/625] eta 0:00:46 lr 0.000051 wd 0.0500 time 0.2508 (0.2639) data time 0.0008 (0.0020) model time 0.2500 (0.2606) loss 4.6451 (5.4480) grad_norm 1.9766 (3.1200) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][460/625] eta 0:00:43 lr 0.000051 wd 0.0500 time 0.2567 (0.2642) data time 0.0009 (0.0020) model time 0.2558 (0.2609) loss 6.3985 (5.4430) grad_norm 2.3738 (3.1196) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][470/625] eta 0:00:40 lr 0.000051 wd 0.0500 time 0.2546 (0.2640) data time 0.0007 (0.0020) model time 0.2539 (0.2608) loss 5.6400 (5.4423) grad_norm 2.1296 (3.1013) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][480/625] eta 0:00:38 lr 0.000051 wd 0.0500 time 0.2520 (0.2646) data time 0.0008 (0.0020) model time 0.2512 (0.2615) loss 5.4501 (5.4419) grad_norm 2.2425 (3.0880) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][490/625] eta 0:00:35 lr 0.000051 wd 0.0500 time 0.2540 (0.2644) data time 0.0011 (0.0019) model time 0.2529 (0.2613) loss 4.8075 (5.4449) grad_norm 1.8498 (3.0812) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][500/625] eta 0:00:33 lr 0.000051 wd 0.0500 time 0.2562 (0.2644) data time 0.0012 (0.0019) model time 0.2550 (0.2613) loss 5.2126 (5.4434) grad_norm 2.3127 (3.0754) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][510/625] eta 0:00:30 lr 0.000051 wd 0.0500 time 0.2596 (0.2645) data time 0.0008 (0.0019) model time 0.2588 (0.2615) loss 5.8856 (5.4460) grad_norm 1.8369 (3.0686) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][520/625] eta 0:00:27 lr 0.000050 wd 0.0500 time 0.2531 (0.2646) data time 0.0008 (0.0019) model time 0.2523 (0.2617) loss 5.5798 (5.4497) grad_norm 3.0897 (3.0707) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][530/625] eta 0:00:25 lr 0.000050 wd 0.0500 time 0.2518 (0.2645) data time 0.0007 (0.0019) model time 0.2511 (0.2615) loss 6.1259 (5.4520) grad_norm 3.1258 (3.0690) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][540/625] eta 0:00:22 lr 0.000050 wd 0.0500 time 0.2702 (0.2647) data time 0.0008 (0.0019) model time 0.2694 (0.2618) loss 6.2149 (5.4546) grad_norm 3.7040 (3.0997) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][550/625] eta 0:00:19 lr 0.000050 wd 0.0500 time 0.2577 (0.2645) data time 0.0007 (0.0018) model time 0.2570 (0.2617) loss 5.3231 (5.4571) grad_norm 2.3181 (3.0931) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][560/625] eta 0:00:17 lr 0.000050 wd 0.0500 time 0.2616 (0.2646) data time 0.0009 (0.0018) model time 0.2607 (0.2619) loss 4.5384 (5.4545) grad_norm 1.7125 (3.0807) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][570/625] eta 0:00:14 lr 0.000050 wd 0.0500 time 0.2542 (0.2648) data time 0.0007 (0.0018) model time 0.2535 (0.2621) loss 4.6631 (5.4501) grad_norm 3.9135 (3.0782) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][580/625] eta 0:00:11 lr 0.000050 wd 0.0500 time 0.2550 (0.2647) data time 0.0009 (0.0018) model time 0.2542 (0.2620) loss 4.9056 (5.4553) grad_norm 2.1742 (3.0698) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][590/625] eta 0:00:09 lr 0.000050 wd 0.0500 time 0.2563 (0.2645) data time 0.0006 (0.0018) model time 0.2556 (0.2619) loss 5.3429 (5.4564) grad_norm 2.2310 (3.0801) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][600/625] eta 0:00:06 lr 0.000050 wd 0.0500 time 0.2548 (0.2644) data time 0.0009 (0.0018) model time 0.2540 (0.2617) loss 5.2496 (5.4549) grad_norm 1.8213 (3.0856) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][610/625] eta 0:00:03 lr 0.000050 wd 0.0500 time 0.2532 (0.2649) data time 0.0004 (0.0018) model time 0.2528 (0.2623) loss 5.5596 (5.4559) grad_norm 9.3825 (3.0899) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][620/625] eta 0:00:01 lr 0.000050 wd 0.0500 time 0.2525 (0.2647) data time 0.0005 (0.0017) model time 0.2520 (0.2622) loss 5.3918 (5.4572) grad_norm 2.3577 (3.0976) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 277 training takes 0:02:45 [2024-08-04 10:37:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:37:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:37:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6021 (0.6021) Acc@1 90.234 (90.234) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9004 (0.7161) Acc@1 81.934 (87.243) Acc@5 96.631 (97.843) Mem 9655MB [2024-08-04 10:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0039 (0.8305) Acc@1 78.857 (84.212) Acc@5 95.459 (96.752) Mem 9655MB [2024-08-04 10:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.865 Acc@5 96.765 [2024-08-04 10:37:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.761 (0.761) Loss 0.5874 (0.5874) Acc@1 90.283 (90.283) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8950 (0.7062) Acc@1 82.031 (87.132) Acc@5 96.436 (97.785) Mem 9655MB [2024-08-04 10:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0029 (0.8245) Acc@1 78.809 (84.091) Acc@5 95.654 (96.677) Mem 9655MB [2024-08-04 10:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.733 Acc@5 96.699 [2024-08-04 10:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:37:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.73% [2024-08-04 10:37:25 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:37:26 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:37:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][0/625] eta 0:07:31 lr 0.000050 wd 0.0500 time 0.7231 (0.7231) data time 0.4810 (0.4810) model time 0.0000 (0.0000) loss 6.1520 (6.1520) grad_norm 24.9724 (24.9724) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][10/625] eta 0:03:03 lr 0.000050 wd 0.0500 time 0.2555 (0.2980) data time 0.0009 (0.0447) model time 0.0000 (0.0000) loss 4.8558 (5.5882) grad_norm 1.7743 (4.9166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][20/625] eta 0:02:48 lr 0.000050 wd 0.0500 time 0.2576 (0.2782) data time 0.0008 (0.0239) model time 0.0000 (0.0000) loss 5.4775 (5.4765) grad_norm 1.9624 (3.6326) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][30/625] eta 0:02:41 lr 0.000050 wd 0.0500 time 0.2558 (0.2709) data time 0.0010 (0.0166) model time 0.0000 (0.0000) loss 5.6818 (5.4647) grad_norm 1.6067 (3.7761) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][40/625] eta 0:02:36 lr 0.000050 wd 0.0500 time 0.2521 (0.2674) data time 0.0011 (0.0127) model time 0.0000 (0.0000) loss 5.5507 (5.4597) grad_norm 2.9889 (3.7394) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][50/625] eta 0:02:32 lr 0.000050 wd 0.0500 time 0.2577 (0.2652) data time 0.0010 (0.0104) model time 0.0000 (0.0000) loss 4.4024 (5.3420) grad_norm 2.0467 (3.5648) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][60/625] eta 0:02:30 lr 0.000050 wd 0.0500 time 0.3774 (0.2660) data time 0.0005 (0.0088) model time 0.3769 (0.2690) loss 5.1595 (5.3549) grad_norm 2.9634 (3.8507) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][70/625] eta 0:02:28 lr 0.000050 wd 0.0500 time 0.2544 (0.2669) data time 0.0010 (0.0077) model time 0.2534 (0.2704) loss 5.3181 (5.3599) grad_norm 2.4202 (3.6354) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][80/625] eta 0:02:24 lr 0.000050 wd 0.0500 time 0.2627 (0.2656) data time 0.0008 (0.0069) model time 0.2619 (0.2654) loss 5.1811 (5.3788) grad_norm 1.6109 (3.5542) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][90/625] eta 0:02:21 lr 0.000050 wd 0.0500 time 0.2514 (0.2645) data time 0.0008 (0.0062) model time 0.2506 (0.2627) loss 5.0639 (5.3890) grad_norm 1.7762 (3.4001) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][100/625] eta 0:02:19 lr 0.000050 wd 0.0500 time 0.2530 (0.2654) data time 0.0010 (0.0057) model time 0.2520 (0.2648) loss 5.4477 (5.4037) grad_norm 3.1745 (3.4053) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][110/625] eta 0:02:16 lr 0.000050 wd 0.0500 time 0.2574 (0.2646) data time 0.0008 (0.0053) model time 0.2566 (0.2631) loss 5.6034 (5.4125) grad_norm 2.9670 (3.3678) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][120/625] eta 0:02:14 lr 0.000049 wd 0.0500 time 0.2588 (0.2664) data time 0.0006 (0.0049) model time 0.2582 (0.2663) loss 5.7738 (5.4256) grad_norm 3.3991 (3.3214) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][130/625] eta 0:02:11 lr 0.000049 wd 0.0500 time 0.2584 (0.2656) data time 0.0010 (0.0046) model time 0.2573 (0.2649) loss 5.7837 (5.4164) grad_norm 2.2753 (3.2727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][140/625] eta 0:02:09 lr 0.000049 wd 0.0500 time 0.2550 (0.2661) data time 0.0008 (0.0044) model time 0.2542 (0.2658) loss 6.0768 (5.4156) grad_norm 3.3772 (3.2411) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][150/625] eta 0:02:06 lr 0.000049 wd 0.0500 time 0.2567 (0.2663) data time 0.0010 (0.0041) model time 0.2556 (0.2659) loss 5.6442 (5.4139) grad_norm 3.0866 (3.2161) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][160/625] eta 0:02:03 lr 0.000049 wd 0.0500 time 0.2594 (0.2657) data time 0.0006 (0.0039) model time 0.2588 (0.2649) loss 6.1198 (5.4122) grad_norm 5.0401 (3.1919) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][170/625] eta 0:02:00 lr 0.000049 wd 0.0500 time 0.2587 (0.2652) data time 0.0010 (0.0038) model time 0.2577 (0.2642) loss 6.2545 (5.4199) grad_norm 3.5041 (3.1992) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][180/625] eta 0:01:57 lr 0.000049 wd 0.0500 time 0.2604 (0.2647) data time 0.0007 (0.0036) model time 0.2597 (0.2636) loss 5.4408 (5.4414) grad_norm 2.4311 (3.1673) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][190/625] eta 0:01:54 lr 0.000049 wd 0.0500 time 0.2552 (0.2643) data time 0.0009 (0.0035) model time 0.2543 (0.2630) loss 4.9419 (5.4365) grad_norm 2.3510 (3.1405) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][200/625] eta 0:01:52 lr 0.000049 wd 0.0500 time 0.2566 (0.2639) data time 0.0010 (0.0033) model time 0.2556 (0.2625) loss 6.1867 (5.4387) grad_norm 2.5188 (3.1292) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][210/625] eta 0:01:49 lr 0.000049 wd 0.0500 time 0.2566 (0.2642) data time 0.0007 (0.0032) model time 0.2559 (0.2629) loss 6.4206 (5.4574) grad_norm 3.3625 (3.0910) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][220/625] eta 0:01:47 lr 0.000049 wd 0.0500 time 0.4374 (0.2646) data time 0.0007 (0.0031) model time 0.4367 (0.2635) loss 4.2194 (5.4533) grad_norm 2.9865 (3.0828) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][230/625] eta 0:01:44 lr 0.000049 wd 0.0500 time 0.2546 (0.2642) data time 0.0007 (0.0030) model time 0.2539 (0.2630) loss 5.3212 (5.4428) grad_norm 1.9778 (3.0537) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][240/625] eta 0:01:41 lr 0.000049 wd 0.0500 time 0.2604 (0.2639) data time 0.0008 (0.0029) model time 0.2596 (0.2627) loss 5.3642 (5.4438) grad_norm 1.7220 (3.0256) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][250/625] eta 0:01:38 lr 0.000049 wd 0.0500 time 0.2575 (0.2637) data time 0.0008 (0.0029) model time 0.2567 (0.2624) loss 4.8524 (5.4441) grad_norm 3.2983 (3.0154) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][260/625] eta 0:01:36 lr 0.000049 wd 0.0500 time 0.2540 (0.2641) data time 0.0006 (0.0028) model time 0.2534 (0.2629) loss 5.2685 (5.4365) grad_norm 3.9772 (3.0089) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][270/625] eta 0:01:33 lr 0.000049 wd 0.0500 time 0.2521 (0.2638) data time 0.0010 (0.0027) model time 0.2511 (0.2625) loss 4.0670 (5.4321) grad_norm 2.0016 (3.0032) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][280/625] eta 0:01:30 lr 0.000049 wd 0.0500 time 0.2614 (0.2635) data time 0.0008 (0.0027) model time 0.2606 (0.2622) loss 4.4507 (5.4370) grad_norm 2.7823 (3.0011) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][290/625] eta 0:01:28 lr 0.000049 wd 0.0500 time 0.2560 (0.2633) data time 0.0008 (0.0026) model time 0.2552 (0.2620) loss 4.8070 (5.4436) grad_norm 2.5875 (2.9797) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][300/625] eta 0:01:25 lr 0.000049 wd 0.0500 time 0.2542 (0.2631) data time 0.0007 (0.0025) model time 0.2536 (0.2617) loss 5.0603 (5.4487) grad_norm 3.5055 (2.9672) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][310/625] eta 0:01:22 lr 0.000049 wd 0.0500 time 0.2511 (0.2628) data time 0.0008 (0.0025) model time 0.2503 (0.2614) loss 6.8490 (5.4584) grad_norm 2.8489 (2.9573) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][320/625] eta 0:01:20 lr 0.000049 wd 0.0500 time 0.2588 (0.2626) data time 0.0006 (0.0024) model time 0.2582 (0.2612) loss 5.3048 (5.4512) grad_norm 2.6390 (2.9461) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][330/625] eta 0:01:17 lr 0.000049 wd 0.0500 time 0.2555 (0.2629) data time 0.0010 (0.0024) model time 0.2545 (0.2616) loss 4.5423 (5.4453) grad_norm 2.3812 (2.9937) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][340/625] eta 0:01:15 lr 0.000049 wd 0.0500 time 0.2535 (0.2632) data time 0.0010 (0.0024) model time 0.2525 (0.2619) loss 6.4062 (5.4595) grad_norm 2.7608 (2.9916) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:38:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][350/625] eta 0:01:12 lr 0.000049 wd 0.0500 time 0.2536 (0.2634) data time 0.0010 (0.0023) model time 0.2526 (0.2621) loss 5.3479 (5.4567) grad_norm 3.2141 (3.0447) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][360/625] eta 0:01:09 lr 0.000048 wd 0.0500 time 0.2596 (0.2632) data time 0.0009 (0.0023) model time 0.2588 (0.2619) loss 4.5959 (5.4578) grad_norm 2.1482 (3.0302) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][370/625] eta 0:01:07 lr 0.000048 wd 0.0500 time 0.2552 (0.2636) data time 0.0011 (0.0022) model time 0.2541 (0.2624) loss 5.4612 (5.4494) grad_norm 2.8984 (3.0536) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][380/625] eta 0:01:04 lr 0.000048 wd 0.0500 time 0.2524 (0.2634) data time 0.0007 (0.0022) model time 0.2516 (0.2622) loss 6.0197 (5.4531) grad_norm 1.7583 (3.0580) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][390/625] eta 0:01:01 lr 0.000048 wd 0.0500 time 0.2558 (0.2635) data time 0.0009 (0.0022) model time 0.2549 (0.2623) loss 5.8330 (5.4483) grad_norm 2.6600 (3.0393) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][400/625] eta 0:00:59 lr 0.000048 wd 0.0500 time 0.2601 (0.2634) data time 0.0008 (0.0022) model time 0.2593 (0.2622) loss 5.8757 (5.4410) grad_norm 6.3879 (3.0659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][410/625] eta 0:00:56 lr 0.000048 wd 0.0500 time 0.2552 (0.2637) data time 0.0009 (0.0021) model time 0.2543 (0.2626) loss 4.9679 (5.4404) grad_norm 10.9254 (3.0707) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][420/625] eta 0:00:54 lr 0.000048 wd 0.0500 time 0.2534 (0.2635) data time 0.0009 (0.0021) model time 0.2525 (0.2623) loss 6.3064 (5.4344) grad_norm 3.4303 (3.0503) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][430/625] eta 0:00:51 lr 0.000048 wd 0.0500 time 0.2596 (0.2633) data time 0.0007 (0.0021) model time 0.2589 (0.2622) loss 4.7054 (5.4268) grad_norm 2.1985 (3.1253) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][440/625] eta 0:00:48 lr 0.000048 wd 0.0500 time 0.2610 (0.2632) data time 0.0010 (0.0020) model time 0.2600 (0.2620) loss 6.1364 (5.4321) grad_norm 1.8359 (3.1150) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][450/625] eta 0:00:46 lr 0.000048 wd 0.0500 time 0.2518 (0.2630) data time 0.0010 (0.0020) model time 0.2508 (0.2618) loss 5.2399 (5.4308) grad_norm 3.9352 (3.0993) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:39:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][460/625] eta 0:00:43 lr 0.000048 wd 0.0500 time 0.2557 (0.2629) data time 0.0009 (0.0020) model time 0.2547 (0.2616) loss 4.6485 (5.4277) grad_norm 2.4172 (3.1086) loss_scale 512.0000 (257.6659) mem 9655MB [2024-08-04 10:39:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][470/625] eta 0:00:40 lr 0.000048 wd 0.0500 time 0.2549 (0.2628) data time 0.0008 (0.0020) model time 0.2541 (0.2615) loss 5.9328 (5.4321) grad_norm 2.2999 (3.1010) loss_scale 512.0000 (263.0658) mem 9655MB [2024-08-04 10:39:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][480/625] eta 0:00:38 lr 0.000048 wd 0.0500 time 0.2574 (0.2626) data time 0.0007 (0.0019) model time 0.2567 (0.2614) loss 5.1610 (5.4320) grad_norm 4.0815 (3.1009) loss_scale 512.0000 (268.2412) mem 9655MB [2024-08-04 10:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][490/625] eta 0:00:35 lr 0.000048 wd 0.0500 time 0.2595 (0.2625) data time 0.0008 (0.0019) model time 0.2586 (0.2612) loss 5.8291 (5.4369) grad_norm 2.1613 (3.0874) loss_scale 512.0000 (273.2057) mem 9655MB [2024-08-04 10:39:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][500/625] eta 0:00:32 lr 0.000048 wd 0.0500 time 0.2768 (0.2628) data time 0.0006 (0.0019) model time 0.2761 (0.2616) loss 5.5206 (5.4391) grad_norm 2.3097 (3.1026) loss_scale 512.0000 (277.9721) mem 9655MB [2024-08-04 10:39:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][510/625] eta 0:00:30 lr 0.000048 wd 0.0500 time 0.2569 (0.2631) data time 0.0006 (0.0019) model time 0.2562 (0.2619) loss 5.3764 (5.4422) grad_norm 5.9991 (3.1114) loss_scale 512.0000 (282.5519) mem 9655MB [2024-08-04 10:39:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][520/625] eta 0:00:27 lr 0.000048 wd 0.0500 time 0.2555 (0.2634) data time 0.0008 (0.0019) model time 0.2547 (0.2622) loss 5.1475 (5.4409) grad_norm 3.7476 (3.1066) loss_scale 512.0000 (286.9559) mem 9655MB [2024-08-04 10:39:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][530/625] eta 0:00:25 lr 0.000048 wd 0.0500 time 0.2537 (0.2632) data time 0.0009 (0.0019) model time 0.2528 (0.2621) loss 6.3014 (5.4438) grad_norm 2.9640 (3.0933) loss_scale 512.0000 (291.1940) mem 9655MB [2024-08-04 10:39:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][540/625] eta 0:00:22 lr 0.000048 wd 0.0500 time 0.2536 (0.2631) data time 0.0009 (0.0018) model time 0.2527 (0.2619) loss 5.2360 (5.4452) grad_norm 8.3848 (3.0979) loss_scale 512.0000 (295.2754) mem 9655MB [2024-08-04 10:39:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][550/625] eta 0:00:19 lr 0.000048 wd 0.0500 time 0.2579 (0.2630) data time 0.0006 (0.0018) model time 0.2573 (0.2618) loss 6.7200 (5.4442) grad_norm 4.2172 (3.0952) loss_scale 512.0000 (299.2087) mem 9655MB [2024-08-04 10:39:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][560/625] eta 0:00:17 lr 0.000048 wd 0.0500 time 0.2552 (0.2628) data time 0.0008 (0.0018) model time 0.2544 (0.2616) loss 5.2740 (5.4500) grad_norm 2.1339 (3.0864) loss_scale 512.0000 (303.0018) mem 9655MB [2024-08-04 10:39:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][570/625] eta 0:00:14 lr 0.000048 wd 0.0500 time 0.2550 (0.2627) data time 0.0008 (0.0018) model time 0.2542 (0.2615) loss 4.4149 (5.4517) grad_norm 2.4435 (3.0965) loss_scale 512.0000 (306.6620) mem 9655MB [2024-08-04 10:39:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][580/625] eta 0:00:11 lr 0.000048 wd 0.0500 time 0.2565 (0.2626) data time 0.0008 (0.0018) model time 0.2557 (0.2614) loss 5.0475 (5.4547) grad_norm 2.3069 (3.0850) loss_scale 512.0000 (310.1962) mem 9655MB [2024-08-04 10:40:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][590/625] eta 0:00:09 lr 0.000047 wd 0.0500 time 0.2651 (0.2625) data time 0.0007 (0.0018) model time 0.2644 (0.2613) loss 5.2401 (5.4518) grad_norm 2.6303 (3.0766) loss_scale 512.0000 (313.6108) mem 9655MB [2024-08-04 10:40:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][600/625] eta 0:00:06 lr 0.000047 wd 0.0500 time 0.2587 (0.2624) data time 0.0009 (0.0018) model time 0.2578 (0.2612) loss 5.9858 (5.4554) grad_norm 2.5705 (3.0777) loss_scale 512.0000 (316.9118) mem 9655MB [2024-08-04 10:40:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][610/625] eta 0:00:03 lr 0.000047 wd 0.0500 time 0.2531 (0.2625) data time 0.0006 (0.0017) model time 0.2525 (0.2613) loss 5.4818 (5.4599) grad_norm 2.0819 (3.0775) loss_scale 512.0000 (320.1047) mem 9655MB [2024-08-04 10:40:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][620/625] eta 0:00:01 lr 0.000047 wd 0.0500 time 0.2533 (0.2626) data time 0.0004 (0.0017) model time 0.2529 (0.2615) loss 5.8060 (5.4592) grad_norm 2.4936 (3.0863) loss_scale 512.0000 (323.1948) mem 9655MB [2024-08-04 10:40:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 278 training takes 0:02:44 [2024-08-04 10:40:10 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:40:10 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5977 (0.5977) Acc@1 90.088 (90.088) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:40:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9019 (0.7087) Acc@1 81.787 (87.296) Acc@5 96.729 (97.878) Mem 9655MB [2024-08-04 10:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9878 (0.8234) Acc@1 78.857 (84.210) Acc@5 95.654 (96.782) Mem 9655MB [2024-08-04 10:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.787 [2024-08-04 10:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:40:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.91% [2024-08-04 10:40:12 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:40:13 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:40:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.5879 (0.5879) Acc@1 90.234 (90.234) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8950 (0.7065) Acc@1 82.031 (87.140) Acc@5 96.436 (97.794) Mem 9655MB [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0029 (0.8244) Acc@1 78.760 (84.103) Acc@5 95.557 (96.677) Mem 9655MB [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.749 Acc@5 96.701 [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.75% [2024-08-04 10:40:14 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:40:15 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:40:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][0/625] eta 0:08:15 lr 0.000047 wd 0.0500 time 0.7928 (0.7928) data time 0.5536 (0.5536) model time 0.0000 (0.0000) loss 5.3120 (5.3120) grad_norm 2.4057 (2.4057) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][10/625] eta 0:03:07 lr 0.000047 wd 0.0500 time 0.2558 (0.3049) data time 0.0008 (0.0511) model time 0.0000 (0.0000) loss 6.0360 (5.6471) grad_norm 2.6130 (2.9218) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][20/625] eta 0:02:53 lr 0.000047 wd 0.0500 time 0.2548 (0.2873) data time 0.0007 (0.0272) model time 0.0000 (0.0000) loss 5.6626 (5.4791) grad_norm 3.8754 (2.7150) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][30/625] eta 0:02:48 lr 0.000047 wd 0.0500 time 0.2558 (0.2826) data time 0.0008 (0.0187) model time 0.0000 (0.0000) loss 5.0166 (5.4409) grad_norm 1.7584 (2.7773) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][40/625] eta 0:02:43 lr 0.000047 wd 0.0500 time 0.2565 (0.2790) data time 0.0005 (0.0144) model time 0.0000 (0.0000) loss 5.3900 (5.4658) grad_norm 2.3045 (2.6937) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][50/625] eta 0:02:40 lr 0.000047 wd 0.0500 time 0.2626 (0.2788) data time 0.0006 (0.0118) model time 0.0000 (0.0000) loss 5.9909 (5.3925) grad_norm 2.3969 (2.7564) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][60/625] eta 0:02:35 lr 0.000047 wd 0.0500 time 0.2550 (0.2750) data time 0.0009 (0.0100) model time 0.2542 (0.2548) loss 5.5539 (5.4320) grad_norm 2.1460 (2.8380) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][70/625] eta 0:02:32 lr 0.000047 wd 0.0500 time 0.2572 (0.2742) data time 0.0008 (0.0087) model time 0.2564 (0.2616) loss 5.6276 (5.4571) grad_norm 3.4807 (2.8386) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][80/625] eta 0:02:28 lr 0.000047 wd 0.0500 time 0.2565 (0.2721) data time 0.0007 (0.0077) model time 0.2559 (0.2597) loss 5.0902 (5.4417) grad_norm 2.3191 (2.8169) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][90/625] eta 0:02:25 lr 0.000047 wd 0.0500 time 0.2536 (0.2727) data time 0.0011 (0.0070) model time 0.2526 (0.2639) loss 4.3018 (5.4137) grad_norm 2.8282 (2.8217) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][100/625] eta 0:02:22 lr 0.000047 wd 0.0500 time 0.2599 (0.2711) data time 0.0007 (0.0064) model time 0.2592 (0.2622) loss 5.0951 (5.4421) grad_norm 3.2732 (2.8250) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][110/625] eta 0:02:18 lr 0.000047 wd 0.0500 time 0.2578 (0.2698) data time 0.0010 (0.0059) model time 0.2569 (0.2611) loss 6.2265 (5.4359) grad_norm 1.8906 (2.7883) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][120/625] eta 0:02:15 lr 0.000047 wd 0.0500 time 0.2558 (0.2687) data time 0.0006 (0.0055) model time 0.2552 (0.2604) loss 4.6959 (5.4297) grad_norm 2.9771 (2.8315) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][130/625] eta 0:02:12 lr 0.000047 wd 0.0500 time 0.2542 (0.2678) data time 0.0007 (0.0052) model time 0.2535 (0.2598) loss 4.6685 (5.4397) grad_norm 2.1463 (2.8338) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][140/625] eta 0:02:10 lr 0.000047 wd 0.0500 time 0.4716 (0.2684) data time 0.0007 (0.0049) model time 0.4709 (0.2616) loss 5.8390 (5.4237) grad_norm 3.6440 (2.8618) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][150/625] eta 0:02:07 lr 0.000047 wd 0.0500 time 0.2612 (0.2677) data time 0.0008 (0.0046) model time 0.2604 (0.2611) loss 5.3394 (5.4218) grad_norm 2.1379 (2.8816) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:40:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][160/625] eta 0:02:04 lr 0.000047 wd 0.0500 time 0.2595 (0.2671) data time 0.0008 (0.0044) model time 0.2587 (0.2608) loss 6.4984 (5.4124) grad_norm 1.9713 (2.8683) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][170/625] eta 0:02:01 lr 0.000047 wd 0.0500 time 0.2681 (0.2677) data time 0.0008 (0.0042) model time 0.2673 (0.2621) loss 4.8297 (5.4147) grad_norm 2.1320 (2.8662) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][180/625] eta 0:01:59 lr 0.000047 wd 0.0500 time 0.2590 (0.2676) data time 0.0006 (0.0040) model time 0.2585 (0.2623) loss 4.5909 (5.3952) grad_norm 2.5782 (2.8241) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][190/625] eta 0:01:56 lr 0.000047 wd 0.0500 time 0.2575 (0.2677) data time 0.0008 (0.0038) model time 0.2566 (0.2627) loss 5.4998 (5.4052) grad_norm 1.8972 (2.8161) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][200/625] eta 0:01:53 lr 0.000047 wd 0.0500 time 0.2575 (0.2672) data time 0.0006 (0.0037) model time 0.2569 (0.2623) loss 6.1215 (5.4123) grad_norm 2.1335 (2.8014) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][210/625] eta 0:01:51 lr 0.000046 wd 0.0500 time 0.2543 (0.2676) data time 0.0010 (0.0036) model time 0.2534 (0.2630) loss 6.1556 (5.4104) grad_norm 2.7252 (2.8066) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][220/625] eta 0:01:48 lr 0.000046 wd 0.0500 time 0.2576 (0.2671) data time 0.0006 (0.0034) model time 0.2570 (0.2626) loss 5.7818 (5.4132) grad_norm 2.0909 (2.8091) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][230/625] eta 0:01:45 lr 0.000046 wd 0.0500 time 0.2594 (0.2666) data time 0.0007 (0.0033) model time 0.2587 (0.2622) loss 6.1947 (5.4109) grad_norm 2.5152 (2.7926) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][240/625] eta 0:01:42 lr 0.000046 wd 0.0500 time 0.2465 (0.2669) data time 0.0009 (0.0032) model time 0.2456 (0.2628) loss 4.6213 (5.4079) grad_norm 2.1304 (2.9132) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][250/625] eta 0:01:39 lr 0.000046 wd 0.0500 time 0.2604 (0.2665) data time 0.0007 (0.0031) model time 0.2598 (0.2625) loss 4.2163 (5.4004) grad_norm 2.7973 (2.9182) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][260/625] eta 0:01:37 lr 0.000046 wd 0.0500 time 0.2520 (0.2661) data time 0.0011 (0.0031) model time 0.2509 (0.2621) loss 5.1331 (5.3956) grad_norm 2.1330 (2.9045) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][270/625] eta 0:01:34 lr 0.000046 wd 0.0500 time 0.2543 (0.2658) data time 0.0009 (0.0030) model time 0.2534 (0.2618) loss 5.3904 (5.3893) grad_norm 5.7995 (2.9203) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][280/625] eta 0:01:31 lr 0.000046 wd 0.0500 time 0.2568 (0.2654) data time 0.0008 (0.0029) model time 0.2560 (0.2615) loss 5.2286 (5.3978) grad_norm 2.4325 (2.9175) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][290/625] eta 0:01:28 lr 0.000046 wd 0.0500 time 0.2616 (0.2655) data time 0.0009 (0.0028) model time 0.2608 (0.2617) loss 5.5722 (5.3956) grad_norm 2.5029 (2.9125) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][300/625] eta 0:01:26 lr 0.000046 wd 0.0500 time 0.2565 (0.2657) data time 0.0009 (0.0028) model time 0.2556 (0.2621) loss 4.3971 (5.4016) grad_norm 2.3915 (2.8954) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][310/625] eta 0:01:23 lr 0.000046 wd 0.0500 time 0.2572 (0.2655) data time 0.0007 (0.0027) model time 0.2565 (0.2619) loss 6.2229 (5.4072) grad_norm 2.4455 (2.8898) loss_scale 512.0000 (512.0000) mem 9655MB [2024-08-04 10:41:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][320/625] eta 0:01:20 lr 0.000046 wd 0.0500 time 0.2640 (0.2652) data time 0.0010 (0.0027) model time 0.2630 (0.2616) loss 5.2103 (5.4046) grad_norm 1.5948 (inf) loss_scale 256.0000 (508.8100) mem 9655MB [2024-08-04 10:41:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][330/625] eta 0:01:18 lr 0.000046 wd 0.0500 time 0.2543 (0.2649) data time 0.0008 (0.0026) model time 0.2535 (0.2614) loss 4.7896 (5.3994) grad_norm 1.7964 (inf) loss_scale 256.0000 (501.1722) mem 9655MB [2024-08-04 10:41:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][340/625] eta 0:01:15 lr 0.000046 wd 0.0500 time 0.2544 (0.2646) data time 0.0009 (0.0026) model time 0.2536 (0.2612) loss 4.1256 (5.3901) grad_norm 2.6596 (inf) loss_scale 256.0000 (493.9824) mem 9655MB [2024-08-04 10:41:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][350/625] eta 0:01:12 lr 0.000046 wd 0.0500 time 0.2578 (0.2644) data time 0.0009 (0.0025) model time 0.2569 (0.2610) loss 5.7942 (5.3973) grad_norm 2.3672 (inf) loss_scale 256.0000 (487.2023) mem 9655MB [2024-08-04 10:41:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][360/625] eta 0:01:10 lr 0.000046 wd 0.0500 time 0.2580 (0.2642) data time 0.0007 (0.0025) model time 0.2573 (0.2608) loss 4.4326 (5.3944) grad_norm 2.4380 (inf) loss_scale 256.0000 (480.7978) mem 9655MB [2024-08-04 10:41:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][370/625] eta 0:01:07 lr 0.000046 wd 0.0500 time 0.2533 (0.2645) data time 0.0010 (0.0024) model time 0.2523 (0.2613) loss 6.0638 (5.3998) grad_norm 3.4267 (inf) loss_scale 256.0000 (474.7385) mem 9655MB [2024-08-04 10:41:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][380/625] eta 0:01:04 lr 0.000046 wd 0.0500 time 0.4050 (0.2647) data time 0.0008 (0.0024) model time 0.4041 (0.2616) loss 5.3979 (5.4041) grad_norm 2.1116 (inf) loss_scale 256.0000 (468.9974) mem 9655MB [2024-08-04 10:41:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][390/625] eta 0:01:02 lr 0.000046 wd 0.0500 time 0.2588 (0.2645) data time 0.0006 (0.0023) model time 0.2582 (0.2614) loss 5.4550 (5.4059) grad_norm 1.6251 (inf) loss_scale 256.0000 (463.5499) mem 9655MB [2024-08-04 10:42:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][400/625] eta 0:00:59 lr 0.000046 wd 0.0500 time 0.2591 (0.2648) data time 0.0010 (0.0023) model time 0.2581 (0.2619) loss 5.3892 (5.4112) grad_norm 2.3203 (inf) loss_scale 256.0000 (458.3741) mem 9655MB [2024-08-04 10:42:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][410/625] eta 0:00:56 lr 0.000046 wd 0.0500 time 0.2536 (0.2647) data time 0.0007 (0.0023) model time 0.2529 (0.2617) loss 4.9730 (5.4197) grad_norm 2.0450 (inf) loss_scale 256.0000 (453.4501) mem 9655MB [2024-08-04 10:42:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][420/625] eta 0:00:54 lr 0.000046 wd 0.0500 time 0.2575 (0.2649) data time 0.0009 (0.0022) model time 0.2566 (0.2621) loss 5.6370 (5.4239) grad_norm 1.9817 (inf) loss_scale 256.0000 (448.7601) mem 9655MB [2024-08-04 10:42:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][430/625] eta 0:00:51 lr 0.000046 wd 0.0500 time 0.2582 (0.2647) data time 0.0007 (0.0022) model time 0.2576 (0.2619) loss 4.6118 (5.4248) grad_norm 3.0332 (inf) loss_scale 256.0000 (444.2877) mem 9655MB [2024-08-04 10:42:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][440/625] eta 0:00:48 lr 0.000046 wd 0.0500 time 0.2567 (0.2645) data time 0.0009 (0.0022) model time 0.2558 (0.2618) loss 5.4696 (5.4177) grad_norm 2.7029 (inf) loss_scale 256.0000 (440.0181) mem 9655MB [2024-08-04 10:42:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][450/625] eta 0:00:46 lr 0.000046 wd 0.0500 time 0.2516 (0.2643) data time 0.0011 (0.0022) model time 0.2505 (0.2616) loss 5.7000 (5.4184) grad_norm 2.1908 (inf) loss_scale 256.0000 (435.9379) mem 9655MB [2024-08-04 10:42:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][460/625] eta 0:00:43 lr 0.000045 wd 0.0500 time 0.2570 (0.2646) data time 0.0010 (0.0021) model time 0.2561 (0.2619) loss 4.3732 (5.4178) grad_norm 2.8225 (inf) loss_scale 256.0000 (432.0347) mem 9655MB [2024-08-04 10:42:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][470/625] eta 0:00:40 lr 0.000045 wd 0.0500 time 0.2583 (0.2644) data time 0.0008 (0.0021) model time 0.2575 (0.2617) loss 5.8138 (5.4184) grad_norm 2.8544 (inf) loss_scale 256.0000 (428.2972) mem 9655MB [2024-08-04 10:42:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][480/625] eta 0:00:38 lr 0.000045 wd 0.0500 time 0.2558 (0.2642) data time 0.0009 (0.0021) model time 0.2548 (0.2615) loss 6.1514 (5.4207) grad_norm 2.6770 (inf) loss_scale 256.0000 (424.7152) mem 9655MB [2024-08-04 10:42:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][490/625] eta 0:00:35 lr 0.000045 wd 0.0500 time 0.2537 (0.2640) data time 0.0011 (0.0021) model time 0.2526 (0.2614) loss 4.9945 (5.4219) grad_norm 2.6406 (inf) loss_scale 256.0000 (421.2790) mem 9655MB [2024-08-04 10:42:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][500/625] eta 0:00:32 lr 0.000045 wd 0.0500 time 0.2548 (0.2639) data time 0.0012 (0.0020) model time 0.2537 (0.2612) loss 4.6631 (5.4230) grad_norm 2.6589 (inf) loss_scale 256.0000 (417.9800) mem 9655MB [2024-08-04 10:42:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][510/625] eta 0:00:30 lr 0.000045 wd 0.0500 time 0.2502 (0.2637) data time 0.0009 (0.0020) model time 0.2493 (0.2611) loss 6.2334 (5.4266) grad_norm 1.8213 (inf) loss_scale 256.0000 (414.8102) mem 9655MB [2024-08-04 10:42:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][520/625] eta 0:00:27 lr 0.000045 wd 0.0500 time 0.2566 (0.2639) data time 0.0009 (0.0020) model time 0.2557 (0.2613) loss 4.7846 (5.4236) grad_norm 1.8456 (inf) loss_scale 256.0000 (411.7620) mem 9655MB [2024-08-04 10:42:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][530/625] eta 0:00:25 lr 0.000045 wd 0.0500 time 0.2533 (0.2638) data time 0.0010 (0.0020) model time 0.2523 (0.2612) loss 6.3823 (5.4235) grad_norm 2.7878 (inf) loss_scale 256.0000 (408.8286) mem 9655MB [2024-08-04 10:42:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][540/625] eta 0:00:22 lr 0.000045 wd 0.0500 time 0.2583 (0.2640) data time 0.0008 (0.0020) model time 0.2576 (0.2615) loss 5.7611 (5.4225) grad_norm 3.0915 (inf) loss_scale 256.0000 (406.0037) mem 9655MB [2024-08-04 10:42:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][550/625] eta 0:00:19 lr 0.000045 wd 0.0500 time 0.2527 (0.2642) data time 0.0009 (0.0019) model time 0.2518 (0.2618) loss 5.0951 (5.4226) grad_norm 2.0060 (inf) loss_scale 256.0000 (403.2813) mem 9655MB [2024-08-04 10:42:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][560/625] eta 0:00:17 lr 0.000045 wd 0.0500 time 0.2507 (0.2640) data time 0.0008 (0.0019) model time 0.2500 (0.2616) loss 5.7652 (5.4243) grad_norm 2.1883 (inf) loss_scale 256.0000 (400.6560) mem 9655MB [2024-08-04 10:42:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][570/625] eta 0:00:14 lr 0.000045 wd 0.0500 time 0.2515 (0.2639) data time 0.0007 (0.0019) model time 0.2508 (0.2615) loss 4.6904 (5.4263) grad_norm 2.4798 (inf) loss_scale 256.0000 (398.1226) mem 9655MB [2024-08-04 10:42:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][580/625] eta 0:00:11 lr 0.000045 wd 0.0500 time 0.2563 (0.2638) data time 0.0009 (0.0019) model time 0.2554 (0.2614) loss 4.9035 (5.4304) grad_norm 2.6897 (inf) loss_scale 256.0000 (395.6764) mem 9655MB [2024-08-04 10:42:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][590/625] eta 0:00:09 lr 0.000045 wd 0.0500 time 0.2515 (0.2636) data time 0.0007 (0.0019) model time 0.2508 (0.2613) loss 4.9040 (5.4295) grad_norm 1.7845 (inf) loss_scale 256.0000 (393.3130) mem 9655MB [2024-08-04 10:42:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][600/625] eta 0:00:06 lr 0.000045 wd 0.0500 time 0.2552 (0.2635) data time 0.0010 (0.0019) model time 0.2541 (0.2612) loss 5.0783 (5.4287) grad_norm 2.6925 (inf) loss_scale 256.0000 (391.0283) mem 9655MB [2024-08-04 10:42:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][610/625] eta 0:00:03 lr 0.000045 wd 0.0500 time 0.2523 (0.2634) data time 0.0006 (0.0018) model time 0.2517 (0.2610) loss 4.8794 (5.4310) grad_norm 2.7572 (inf) loss_scale 256.0000 (388.8183) mem 9655MB [2024-08-04 10:42:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][620/625] eta 0:00:01 lr 0.000045 wd 0.0500 time 0.2535 (0.2632) data time 0.0005 (0.0018) model time 0.2530 (0.2609) loss 5.0521 (5.4323) grad_norm 2.6737 (inf) loss_scale 256.0000 (386.6795) mem 9655MB [2024-08-04 10:42:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 279 training takes 0:02:44 [2024-08-04 10:42:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:43:00 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:43:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5952 (0.5952) Acc@1 90.283 (90.283) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9097 (0.7132) Acc@1 81.299 (87.140) Acc@5 96.680 (97.856) Mem 9655MB [2024-08-04 10:43:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 0.9976 (0.8277) Acc@1 79.150 (84.138) Acc@5 95.605 (96.766) Mem 9655MB [2024-08-04 10:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.841 Acc@5 96.777 [2024-08-04 10:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:43:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.744 (0.744) Loss 0.5879 (0.5879) Acc@1 90.234 (90.234) Acc@5 98.682 (98.682) Mem 9655MB [2024-08-04 10:43:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.124) Loss 0.8950 (0.7063) Acc@1 81.885 (87.123) Acc@5 96.436 (97.789) Mem 9655MB [2024-08-04 10:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0020 (0.8241) Acc@1 78.955 (84.101) Acc@5 95.557 (96.682) Mem 9655MB [2024-08-04 10:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.747 Acc@5 96.703 [2024-08-04 10:43:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-04 10:43:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][0/625] eta 0:11:27 lr 0.000045 wd 0.0500 time 1.0999 (1.0999) data time 0.5129 (0.5129) model time 0.0000 (0.0000) loss 5.5177 (5.5177) grad_norm 1.5312 (1.5312) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][10/625] eta 0:03:24 lr 0.000045 wd 0.0500 time 0.2528 (0.3326) data time 0.0011 (0.0474) model time 0.0000 (0.0000) loss 5.2484 (5.3166) grad_norm 2.3540 (3.0240) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][20/625] eta 0:03:09 lr 0.000045 wd 0.0500 time 0.2601 (0.3130) data time 0.0007 (0.0253) model time 0.0000 (0.0000) loss 4.8488 (5.3617) grad_norm 2.6540 (2.8029) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][30/625] eta 0:02:55 lr 0.000045 wd 0.0500 time 0.2498 (0.2942) data time 0.0009 (0.0174) model time 0.0000 (0.0000) loss 5.5295 (5.3460) grad_norm 4.0424 (2.7851) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][40/625] eta 0:02:49 lr 0.000045 wd 0.0500 time 0.2573 (0.2894) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 5.0695 (5.3280) grad_norm 27.9172 (3.4386) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][50/625] eta 0:02:44 lr 0.000045 wd 0.0500 time 0.2525 (0.2864) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 6.4126 (5.3889) grad_norm 3.6760 (3.5920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][60/625] eta 0:02:39 lr 0.000045 wd 0.0500 time 0.2610 (0.2817) data time 0.0007 (0.0093) model time 0.2603 (0.2564) loss 5.6770 (5.4045) grad_norm 2.6882 (3.4505) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][70/625] eta 0:02:34 lr 0.000045 wd 0.0500 time 0.2562 (0.2781) data time 0.0011 (0.0081) model time 0.2551 (0.2558) loss 4.4773 (5.3505) grad_norm 2.6591 (3.4594) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][80/625] eta 0:02:30 lr 0.000045 wd 0.0500 time 0.2625 (0.2754) data time 0.0009 (0.0072) model time 0.2616 (0.2557) loss 5.1340 (5.3398) grad_norm 2.2053 (3.3823) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][90/625] eta 0:02:26 lr 0.000044 wd 0.0500 time 0.2538 (0.2733) data time 0.0010 (0.0065) model time 0.2527 (0.2556) loss 5.9356 (5.3486) grad_norm 3.0128 (3.3990) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][100/625] eta 0:02:23 lr 0.000044 wd 0.0500 time 0.2598 (0.2735) data time 0.0009 (0.0060) model time 0.2589 (0.2593) loss 5.2135 (5.3380) grad_norm 2.2762 (3.3850) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][110/625] eta 0:02:20 lr 0.000044 wd 0.0500 time 0.2585 (0.2728) data time 0.0009 (0.0055) model time 0.2576 (0.2603) loss 4.1458 (5.3688) grad_norm 2.8773 (3.3332) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][120/625] eta 0:02:17 lr 0.000044 wd 0.0500 time 0.2554 (0.2716) data time 0.0009 (0.0052) model time 0.2545 (0.2599) loss 5.1516 (5.3760) grad_norm 2.4511 (3.4077) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][130/625] eta 0:02:14 lr 0.000044 wd 0.0500 time 0.2579 (0.2717) data time 0.0007 (0.0048) model time 0.2571 (0.2614) loss 5.2214 (5.4064) grad_norm 2.3742 (3.3701) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][140/625] eta 0:02:11 lr 0.000044 wd 0.0500 time 0.2550 (0.2706) data time 0.0009 (0.0046) model time 0.2541 (0.2607) loss 4.8351 (5.4191) grad_norm 2.9431 (3.4463) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][150/625] eta 0:02:08 lr 0.000044 wd 0.0500 time 0.2580 (0.2697) data time 0.0008 (0.0043) model time 0.2572 (0.2601) loss 5.5648 (5.4241) grad_norm 2.5072 (3.4602) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][160/625] eta 0:02:05 lr 0.000044 wd 0.0500 time 0.2547 (0.2689) data time 0.0011 (0.0041) model time 0.2536 (0.2597) loss 5.4096 (5.4193) grad_norm 2.8122 (3.4581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][170/625] eta 0:02:01 lr 0.000044 wd 0.0500 time 0.2553 (0.2681) data time 0.0008 (0.0039) model time 0.2545 (0.2594) loss 5.2492 (5.4208) grad_norm 2.1570 (3.5030) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][180/625] eta 0:01:59 lr 0.000044 wd 0.0500 time 0.2601 (0.2674) data time 0.0008 (0.0038) model time 0.2593 (0.2590) loss 5.6754 (5.4160) grad_norm 2.4563 (3.5511) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][190/625] eta 0:01:56 lr 0.000044 wd 0.0500 time 0.2594 (0.2669) data time 0.0008 (0.0036) model time 0.2586 (0.2587) loss 5.1066 (5.4167) grad_norm 1.6938 (3.4957) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:43:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][200/625] eta 0:01:54 lr 0.000044 wd 0.0500 time 0.4652 (0.2685) data time 0.0009 (0.0035) model time 0.4643 (0.2614) loss 4.5295 (5.4127) grad_norm 2.4846 (3.4405) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][210/625] eta 0:01:51 lr 0.000044 wd 0.0500 time 0.2586 (0.2687) data time 0.0008 (0.0034) model time 0.2578 (0.2621) loss 5.6119 (5.3996) grad_norm 1.8698 (3.4191) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][220/625] eta 0:01:48 lr 0.000044 wd 0.0500 time 0.2551 (0.2682) data time 0.0010 (0.0033) model time 0.2541 (0.2618) loss 5.4507 (5.4032) grad_norm 1.7868 (3.3804) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][230/625] eta 0:01:45 lr 0.000044 wd 0.0500 time 0.2589 (0.2677) data time 0.0010 (0.0032) model time 0.2579 (0.2614) loss 5.3824 (5.4073) grad_norm 4.4997 (3.3444) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][240/625] eta 0:01:42 lr 0.000044 wd 0.0500 time 0.2488 (0.2672) data time 0.0008 (0.0031) model time 0.2480 (0.2611) loss 5.6375 (5.4154) grad_norm 4.5843 (3.3251) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][250/625] eta 0:01:40 lr 0.000044 wd 0.0500 time 0.2571 (0.2668) data time 0.0007 (0.0030) model time 0.2565 (0.2608) loss 5.1933 (5.4181) grad_norm 2.2946 (3.2975) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][260/625] eta 0:01:37 lr 0.000044 wd 0.0500 time 0.2587 (0.2664) data time 0.0006 (0.0029) model time 0.2581 (0.2605) loss 5.3726 (5.4199) grad_norm 2.3272 (3.3393) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][270/625] eta 0:01:34 lr 0.000044 wd 0.0500 time 0.2591 (0.2660) data time 0.0010 (0.0028) model time 0.2580 (0.2603) loss 6.1623 (5.4205) grad_norm 2.6426 (3.3177) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][280/625] eta 0:01:31 lr 0.000044 wd 0.0500 time 0.2522 (0.2657) data time 0.0008 (0.0028) model time 0.2514 (0.2601) loss 4.5985 (5.4104) grad_norm 1.9752 (3.3607) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][290/625] eta 0:01:28 lr 0.000044 wd 0.0500 time 0.2590 (0.2653) data time 0.0008 (0.0027) model time 0.2582 (0.2599) loss 4.9140 (5.4156) grad_norm 2.3184 (3.3539) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][300/625] eta 0:01:26 lr 0.000044 wd 0.0500 time 0.4647 (0.2657) data time 0.0012 (0.0027) model time 0.4635 (0.2606) loss 5.9140 (5.4105) grad_norm 1.9723 (3.3345) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][310/625] eta 0:01:23 lr 0.000044 wd 0.0500 time 0.2554 (0.2654) data time 0.0008 (0.0026) model time 0.2546 (0.2604) loss 5.3835 (5.4073) grad_norm 2.8212 (3.3125) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][320/625] eta 0:01:20 lr 0.000044 wd 0.0500 time 0.2559 (0.2651) data time 0.0009 (0.0025) model time 0.2550 (0.2601) loss 5.1322 (5.4022) grad_norm 2.4540 (3.3018) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][330/625] eta 0:01:18 lr 0.000044 wd 0.0500 time 0.2569 (0.2649) data time 0.0010 (0.0025) model time 0.2559 (0.2600) loss 4.9600 (5.3983) grad_norm 3.8620 (3.3099) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][340/625] eta 0:01:15 lr 0.000043 wd 0.0500 time 0.2565 (0.2652) data time 0.0011 (0.0024) model time 0.2554 (0.2605) loss 5.0842 (5.4019) grad_norm 2.2751 (3.2947) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][350/625] eta 0:01:13 lr 0.000043 wd 0.0500 time 0.2562 (0.2665) data time 0.0009 (0.0024) model time 0.2553 (0.2621) loss 5.2711 (5.4056) grad_norm 2.1774 (3.2671) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][360/625] eta 0:01:10 lr 0.000043 wd 0.0500 time 0.2526 (0.2662) data time 0.0011 (0.0024) model time 0.2515 (0.2619) loss 5.6913 (5.4031) grad_norm 3.2135 (3.2468) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][370/625] eta 0:01:07 lr 0.000043 wd 0.0500 time 0.2598 (0.2664) data time 0.0008 (0.0023) model time 0.2590 (0.2623) loss 4.9618 (5.3985) grad_norm 3.4002 (3.2454) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][380/625] eta 0:01:05 lr 0.000043 wd 0.0500 time 0.2595 (0.2661) data time 0.0007 (0.0023) model time 0.2588 (0.2620) loss 4.6019 (5.3906) grad_norm 1.4997 (3.2229) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][390/625] eta 0:01:02 lr 0.000043 wd 0.0500 time 0.2571 (0.2659) data time 0.0011 (0.0023) model time 0.2559 (0.2619) loss 5.5536 (5.3825) grad_norm 2.4047 (3.2036) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][400/625] eta 0:00:59 lr 0.000043 wd 0.0500 time 0.2552 (0.2657) data time 0.0007 (0.0022) model time 0.2545 (0.2617) loss 4.8123 (5.3825) grad_norm 2.0327 (3.1828) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][410/625] eta 0:00:57 lr 0.000043 wd 0.0500 time 0.4468 (0.2663) data time 0.0010 (0.0022) model time 0.4458 (0.2625) loss 5.8243 (5.3832) grad_norm 4.5409 (3.1952) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][420/625] eta 0:00:54 lr 0.000043 wd 0.0500 time 0.2559 (0.2661) data time 0.0009 (0.0022) model time 0.2551 (0.2623) loss 6.2491 (5.3875) grad_norm 4.1105 (3.2015) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:44:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][430/625] eta 0:00:51 lr 0.000043 wd 0.0500 time 0.2544 (0.2659) data time 0.0008 (0.0021) model time 0.2536 (0.2621) loss 4.4359 (5.3876) grad_norm 3.8107 (3.1865) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][440/625] eta 0:00:49 lr 0.000043 wd 0.0500 time 0.2551 (0.2665) data time 0.0010 (0.0021) model time 0.2542 (0.2630) loss 4.5458 (5.3895) grad_norm 2.9621 (3.1769) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][450/625] eta 0:00:46 lr 0.000043 wd 0.0500 time 0.2569 (0.2663) data time 0.0006 (0.0021) model time 0.2563 (0.2628) loss 6.0244 (5.3978) grad_norm 3.6039 (3.1732) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][460/625] eta 0:00:43 lr 0.000043 wd 0.0500 time 0.2565 (0.2661) data time 0.0006 (0.0020) model time 0.2559 (0.2627) loss 4.0867 (5.3986) grad_norm 17.6920 (3.2057) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][470/625] eta 0:00:41 lr 0.000043 wd 0.0500 time 0.2559 (0.2659) data time 0.0012 (0.0020) model time 0.2547 (0.2625) loss 5.8367 (5.3935) grad_norm 1.5947 (3.1926) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][480/625] eta 0:00:38 lr 0.000043 wd 0.0500 time 0.2556 (0.2661) data time 0.0008 (0.0020) model time 0.2548 (0.2627) loss 5.2207 (5.3959) grad_norm 2.1558 (3.1809) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][490/625] eta 0:00:35 lr 0.000043 wd 0.0500 time 0.2573 (0.2659) data time 0.0015 (0.0020) model time 0.2558 (0.2626) loss 6.1198 (5.3957) grad_norm 1.9227 (3.1816) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][500/625] eta 0:00:33 lr 0.000043 wd 0.0500 time 0.2519 (0.2660) data time 0.0010 (0.0020) model time 0.2509 (0.2628) loss 6.2228 (5.4001) grad_norm 1.7817 (3.1844) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][510/625] eta 0:00:30 lr 0.000043 wd 0.0500 time 0.2570 (0.2658) data time 0.0008 (0.0019) model time 0.2562 (0.2626) loss 6.2705 (5.4066) grad_norm 3.8098 (3.1803) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 10:45:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][520/625] eta 0:00:27 lr 0.000043 wd 0.0500 time 0.2564 (0.2656) data time 0.0009 (0.0019) model time 0.2555 (0.2624) loss 4.2956 (5.4079) grad_norm 2.2342 (inf) loss_scale 128.0000 (254.2802) mem 9655MB [2024-08-04 10:45:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][530/625] eta 0:00:25 lr 0.000043 wd 0.0500 time 0.2547 (0.2655) data time 0.0010 (0.0019) model time 0.2538 (0.2623) loss 4.6906 (5.4093) grad_norm 2.7059 (inf) loss_scale 128.0000 (251.9021) mem 9655MB [2024-08-04 10:45:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][540/625] eta 0:00:22 lr 0.000043 wd 0.0500 time 0.2574 (0.2653) data time 0.0009 (0.0019) model time 0.2565 (0.2622) loss 5.4255 (5.4091) grad_norm 2.0833 (inf) loss_scale 128.0000 (249.6118) mem 9655MB [2024-08-04 10:45:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][550/625] eta 0:00:19 lr 0.000043 wd 0.0500 time 0.2566 (0.2654) data time 0.0008 (0.0019) model time 0.2559 (0.2623) loss 5.2092 (5.4072) grad_norm 4.2311 (inf) loss_scale 128.0000 (247.4047) mem 9655MB [2024-08-04 10:45:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][560/625] eta 0:00:17 lr 0.000043 wd 0.0500 time 0.4541 (0.2656) data time 0.0007 (0.0019) model time 0.4534 (0.2626) loss 5.8943 (5.4092) grad_norm 2.3844 (inf) loss_scale 128.0000 (245.2763) mem 9655MB [2024-08-04 10:45:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][570/625] eta 0:00:14 lr 0.000043 wd 0.0500 time 0.2539 (0.2654) data time 0.0010 (0.0018) model time 0.2529 (0.2624) loss 5.4388 (5.4049) grad_norm 2.4279 (inf) loss_scale 128.0000 (243.2224) mem 9655MB [2024-08-04 10:45:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][580/625] eta 0:00:11 lr 0.000043 wd 0.0500 time 0.2546 (0.2655) data time 0.0012 (0.0018) model time 0.2534 (0.2626) loss 5.6521 (5.4051) grad_norm 2.6161 (inf) loss_scale 128.0000 (241.2392) mem 9655MB [2024-08-04 10:45:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][590/625] eta 0:00:09 lr 0.000043 wd 0.0500 time 0.2604 (0.2654) data time 0.0009 (0.0018) model time 0.2595 (0.2625) loss 6.1979 (5.4093) grad_norm 3.0607 (inf) loss_scale 128.0000 (239.3232) mem 9655MB [2024-08-04 10:45:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][600/625] eta 0:00:06 lr 0.000043 wd 0.0500 time 0.2617 (0.2652) data time 0.0006 (0.0018) model time 0.2611 (0.2623) loss 4.5672 (5.4052) grad_norm 2.9376 (inf) loss_scale 128.0000 (237.4709) mem 9655MB [2024-08-04 10:45:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][610/625] eta 0:00:03 lr 0.000042 wd 0.0500 time 0.2534 (0.2651) data time 0.0004 (0.0018) model time 0.2530 (0.2622) loss 6.0531 (5.4035) grad_norm 2.7578 (inf) loss_scale 128.0000 (235.6792) mem 9655MB [2024-08-04 10:45:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][620/625] eta 0:00:01 lr 0.000042 wd 0.0500 time 0.2533 (0.2649) data time 0.0004 (0.0018) model time 0.2529 (0.2620) loss 4.4210 (5.4022) grad_norm 2.2339 (inf) loss_scale 128.0000 (233.9452) mem 9655MB [2024-08-04 10:45:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 280 training takes 0:02:45 [2024-08-04 10:45:49 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:45:50 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:45:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.494 (0.494) Loss 0.6001 (0.6001) Acc@1 90.479 (90.479) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:45:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.9038 (0.7139) Acc@1 82.080 (87.358) Acc@5 96.582 (97.874) Mem 9655MB [2024-08-04 10:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 1.0078 (0.8317) Acc@1 78.760 (84.240) Acc@5 95.605 (96.733) Mem 9655MB [2024-08-04 10:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.899 Acc@5 96.753 [2024-08-04 10:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:45:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.732 (0.732) Loss 0.5884 (0.5884) Acc@1 90.234 (90.234) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 10:45:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.8945 (0.7065) Acc@1 81.836 (87.136) Acc@5 96.436 (97.807) Mem 9655MB [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0020 (0.8241) Acc@1 79.102 (84.129) Acc@5 95.508 (96.689) Mem 9655MB [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.775 Acc@5 96.711 [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.78% [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:45:54 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:45:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][0/625] eta 0:07:57 lr 0.000042 wd 0.0500 time 0.7633 (0.7633) data time 0.5218 (0.5218) model time 0.0000 (0.0000) loss 5.1826 (5.1826) grad_norm 3.2044 (3.2044) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:45:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][10/625] eta 0:03:05 lr 0.000042 wd 0.0500 time 0.2534 (0.3017) data time 0.0008 (0.0483) model time 0.0000 (0.0000) loss 5.6256 (5.4208) grad_norm 2.1190 (2.9099) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][20/625] eta 0:02:49 lr 0.000042 wd 0.0500 time 0.2545 (0.2798) data time 0.0010 (0.0258) model time 0.0000 (0.0000) loss 5.0044 (5.4286) grad_norm 3.3201 (2.8900) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][30/625] eta 0:02:41 lr 0.000042 wd 0.0500 time 0.2488 (0.2722) data time 0.0008 (0.0178) model time 0.0000 (0.0000) loss 4.3237 (5.3422) grad_norm 4.1413 (3.7791) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][40/625] eta 0:02:38 lr 0.000042 wd 0.0500 time 0.2537 (0.2715) data time 0.0010 (0.0137) model time 0.0000 (0.0000) loss 5.5119 (5.3221) grad_norm 1.5284 (3.8057) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][50/625] eta 0:02:36 lr 0.000042 wd 0.0500 time 0.2536 (0.2727) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 4.7855 (5.3504) grad_norm 3.0482 (3.6281) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][60/625] eta 0:02:32 lr 0.000042 wd 0.0500 time 0.2581 (0.2700) data time 0.0006 (0.0095) model time 0.2575 (0.2550) loss 6.1444 (5.4201) grad_norm 2.6593 (3.4593) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][70/625] eta 0:02:28 lr 0.000042 wd 0.0500 time 0.2502 (0.2679) data time 0.0009 (0.0083) model time 0.2494 (0.2545) loss 6.0207 (5.3881) grad_norm 2.0193 (3.9246) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][80/625] eta 0:02:27 lr 0.000042 wd 0.0500 time 0.2575 (0.2707) data time 0.0008 (0.0074) model time 0.2567 (0.2663) loss 5.3511 (5.4187) grad_norm 8.7228 (3.8921) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][90/625] eta 0:02:24 lr 0.000042 wd 0.0500 time 0.2518 (0.2692) data time 0.0006 (0.0067) model time 0.2511 (0.2638) loss 4.3522 (5.4154) grad_norm 2.5028 (3.8857) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][100/625] eta 0:02:21 lr 0.000042 wd 0.0500 time 0.2527 (0.2691) data time 0.0009 (0.0061) model time 0.2518 (0.2645) loss 6.0900 (5.4452) grad_norm 1.9678 (3.7446) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][110/625] eta 0:02:18 lr 0.000042 wd 0.0500 time 0.2564 (0.2696) data time 0.0009 (0.0056) model time 0.2556 (0.2661) loss 4.5666 (5.4573) grad_norm 3.3497 (3.6438) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][120/625] eta 0:02:16 lr 0.000042 wd 0.0500 time 0.2497 (0.2696) data time 0.0008 (0.0052) model time 0.2490 (0.2664) loss 5.8997 (5.4891) grad_norm 2.6357 (3.6072) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][130/625] eta 0:02:12 lr 0.000042 wd 0.0500 time 0.2562 (0.2685) data time 0.0008 (0.0049) model time 0.2554 (0.2649) loss 4.9591 (5.5048) grad_norm 1.9154 (3.5654) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][140/625] eta 0:02:10 lr 0.000042 wd 0.0500 time 0.2587 (0.2689) data time 0.0008 (0.0046) model time 0.2578 (0.2658) loss 4.5966 (5.5074) grad_norm 2.4479 (3.5388) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][150/625] eta 0:02:07 lr 0.000042 wd 0.0500 time 0.2529 (0.2693) data time 0.0009 (0.0044) model time 0.2520 (0.2666) loss 5.2547 (5.5017) grad_norm 2.5148 (3.4892) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:46:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][160/625] eta 0:02:04 lr 0.000042 wd 0.0500 time 0.2554 (0.2684) data time 0.0011 (0.0042) model time 0.2542 (0.2655) loss 5.4739 (5.4985) grad_norm 2.1897 (inf) loss_scale 64.0000 (124.0248) mem 9655MB [2024-08-04 10:46:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][170/625] eta 0:02:01 lr 0.000042 wd 0.0500 time 0.2572 (0.2677) data time 0.0006 (0.0040) model time 0.2566 (0.2647) loss 6.1787 (5.5132) grad_norm 3.2748 (inf) loss_scale 64.0000 (120.5146) mem 9655MB [2024-08-04 10:46:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][180/625] eta 0:01:58 lr 0.000042 wd 0.0500 time 0.2542 (0.2671) data time 0.0008 (0.0038) model time 0.2533 (0.2639) loss 6.1782 (5.5213) grad_norm 1.9029 (inf) loss_scale 64.0000 (117.3923) mem 9655MB [2024-08-04 10:46:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][190/625] eta 0:01:55 lr 0.000042 wd 0.0500 time 0.2581 (0.2665) data time 0.0008 (0.0037) model time 0.2573 (0.2634) loss 6.1568 (5.5302) grad_norm 2.1630 (inf) loss_scale 64.0000 (114.5969) mem 9655MB [2024-08-04 10:46:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][200/625] eta 0:01:53 lr 0.000042 wd 0.0500 time 0.2539 (0.2667) data time 0.0009 (0.0035) model time 0.2530 (0.2637) loss 5.9836 (5.5189) grad_norm 4.4701 (inf) loss_scale 64.0000 (112.0796) mem 9655MB [2024-08-04 10:46:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][210/625] eta 0:01:50 lr 0.000042 wd 0.0500 time 0.2591 (0.2662) data time 0.0012 (0.0034) model time 0.2580 (0.2632) loss 5.0675 (5.5085) grad_norm 2.3845 (inf) loss_scale 64.0000 (109.8009) mem 9655MB [2024-08-04 10:46:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][220/625] eta 0:01:47 lr 0.000042 wd 0.0500 time 0.4597 (0.2666) data time 0.0008 (0.0033) model time 0.4589 (0.2639) loss 5.3806 (5.4975) grad_norm 3.1241 (inf) loss_scale 64.0000 (107.7285) mem 9655MB [2024-08-04 10:46:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][230/625] eta 0:01:45 lr 0.000042 wd 0.0500 time 0.2546 (0.2671) data time 0.0007 (0.0032) model time 0.2539 (0.2645) loss 6.6800 (5.4899) grad_norm 2.0054 (inf) loss_scale 64.0000 (105.8355) mem 9655MB [2024-08-04 10:46:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][240/625] eta 0:01:42 lr 0.000042 wd 0.0500 time 0.2533 (0.2666) data time 0.0007 (0.0031) model time 0.2526 (0.2640) loss 5.4850 (5.4872) grad_norm 1.9418 (inf) loss_scale 64.0000 (104.0996) mem 9655MB [2024-08-04 10:47:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][250/625] eta 0:01:39 lr 0.000041 wd 0.0500 time 0.2560 (0.2662) data time 0.0011 (0.0030) model time 0.2549 (0.2636) loss 4.4134 (5.4784) grad_norm 2.9740 (inf) loss_scale 64.0000 (102.5020) mem 9655MB [2024-08-04 10:47:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][260/625] eta 0:01:37 lr 0.000041 wd 0.0500 time 0.2592 (0.2658) data time 0.0008 (0.0029) model time 0.2585 (0.2632) loss 6.0263 (5.4831) grad_norm 2.5689 (inf) loss_scale 64.0000 (101.0268) mem 9655MB [2024-08-04 10:47:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][270/625] eta 0:01:34 lr 0.000041 wd 0.0500 time 0.2574 (0.2659) data time 0.0006 (0.0029) model time 0.2568 (0.2634) loss 5.9495 (5.4928) grad_norm 2.6828 (inf) loss_scale 64.0000 (99.6605) mem 9655MB [2024-08-04 10:47:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][280/625] eta 0:01:31 lr 0.000041 wd 0.0500 time 0.2574 (0.2656) data time 0.0009 (0.0028) model time 0.2565 (0.2631) loss 5.8838 (5.4941) grad_norm 3.9478 (inf) loss_scale 64.0000 (98.3915) mem 9655MB [2024-08-04 10:47:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][290/625] eta 0:01:28 lr 0.000041 wd 0.0500 time 0.2547 (0.2653) data time 0.0009 (0.0027) model time 0.2538 (0.2627) loss 4.6795 (5.4886) grad_norm 3.2037 (inf) loss_scale 64.0000 (97.2096) mem 9655MB [2024-08-04 10:47:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][300/625] eta 0:01:26 lr 0.000041 wd 0.0500 time 0.2566 (0.2650) data time 0.0006 (0.0027) model time 0.2559 (0.2624) loss 4.7116 (5.4959) grad_norm 1.8216 (inf) loss_scale 64.0000 (96.1063) mem 9655MB [2024-08-04 10:47:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][310/625] eta 0:01:23 lr 0.000041 wd 0.0500 time 0.2578 (0.2647) data time 0.0006 (0.0026) model time 0.2572 (0.2621) loss 5.0410 (5.4907) grad_norm 2.3292 (inf) loss_scale 64.0000 (95.0740) mem 9655MB [2024-08-04 10:47:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][320/625] eta 0:01:20 lr 0.000041 wd 0.0500 time 0.2566 (0.2644) data time 0.0008 (0.0026) model time 0.2557 (0.2619) loss 4.4219 (5.4845) grad_norm 3.3860 (inf) loss_scale 64.0000 (94.1059) mem 9655MB [2024-08-04 10:47:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][330/625] eta 0:01:17 lr 0.000041 wd 0.0500 time 0.2582 (0.2642) data time 0.0008 (0.0025) model time 0.2574 (0.2617) loss 5.6835 (5.4860) grad_norm 1.7510 (inf) loss_scale 64.0000 (93.1964) mem 9655MB [2024-08-04 10:47:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][340/625] eta 0:01:15 lr 0.000041 wd 0.0500 time 0.2546 (0.2639) data time 0.0008 (0.0025) model time 0.2538 (0.2614) loss 6.0918 (5.4761) grad_norm 2.3001 (inf) loss_scale 64.0000 (92.3402) mem 9655MB [2024-08-04 10:47:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][350/625] eta 0:01:12 lr 0.000041 wd 0.0500 time 0.2531 (0.2637) data time 0.0009 (0.0024) model time 0.2522 (0.2612) loss 5.4348 (5.4695) grad_norm 6.1309 (inf) loss_scale 64.0000 (91.5328) mem 9655MB [2024-08-04 10:47:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][360/625] eta 0:01:09 lr 0.000041 wd 0.0500 time 0.2541 (0.2638) data time 0.0010 (0.0024) model time 0.2531 (0.2614) loss 5.9217 (5.4603) grad_norm 6.4766 (inf) loss_scale 64.0000 (90.7701) mem 9655MB [2024-08-04 10:47:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][370/625] eta 0:01:07 lr 0.000041 wd 0.0500 time 0.2525 (0.2636) data time 0.0008 (0.0024) model time 0.2517 (0.2612) loss 5.2224 (5.4610) grad_norm 4.6385 (inf) loss_scale 64.0000 (90.0485) mem 9655MB [2024-08-04 10:47:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][380/625] eta 0:01:04 lr 0.000041 wd 0.0500 time 0.2546 (0.2634) data time 0.0010 (0.0023) model time 0.2536 (0.2610) loss 5.1754 (5.4614) grad_norm 2.4535 (inf) loss_scale 64.0000 (89.3648) mem 9655MB [2024-08-04 10:47:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][390/625] eta 0:01:01 lr 0.000041 wd 0.0500 time 0.2545 (0.2637) data time 0.0007 (0.0023) model time 0.2537 (0.2614) loss 5.7593 (5.4645) grad_norm 3.1293 (inf) loss_scale 64.0000 (88.7161) mem 9655MB [2024-08-04 10:47:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][400/625] eta 0:00:59 lr 0.000041 wd 0.0500 time 0.2547 (0.2645) data time 0.0007 (0.0023) model time 0.2541 (0.2623) loss 5.3644 (5.4582) grad_norm 2.3471 (inf) loss_scale 64.0000 (88.0998) mem 9655MB [2024-08-04 10:47:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][410/625] eta 0:00:56 lr 0.000041 wd 0.0500 time 0.2528 (0.2648) data time 0.0010 (0.0022) model time 0.2518 (0.2627) loss 5.6116 (5.4498) grad_norm 3.8652 (inf) loss_scale 64.0000 (87.5134) mem 9655MB [2024-08-04 10:47:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][420/625] eta 0:00:54 lr 0.000041 wd 0.0500 time 0.2544 (0.2646) data time 0.0010 (0.0022) model time 0.2534 (0.2625) loss 6.1384 (5.4529) grad_norm 3.3256 (inf) loss_scale 64.0000 (86.9549) mem 9655MB [2024-08-04 10:47:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][430/625] eta 0:00:51 lr 0.000041 wd 0.0500 time 0.2529 (0.2644) data time 0.0010 (0.0022) model time 0.2519 (0.2623) loss 5.3926 (5.4556) grad_norm 2.9293 (inf) loss_scale 64.0000 (86.4223) mem 9655MB [2024-08-04 10:47:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][440/625] eta 0:00:48 lr 0.000041 wd 0.0500 time 0.2570 (0.2643) data time 0.0010 (0.0021) model time 0.2560 (0.2622) loss 5.9355 (5.4552) grad_norm 1.9568 (inf) loss_scale 64.0000 (85.9138) mem 9655MB [2024-08-04 10:47:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][450/625] eta 0:00:46 lr 0.000041 wd 0.0500 time 0.2541 (0.2641) data time 0.0009 (0.0021) model time 0.2532 (0.2620) loss 5.3611 (5.4586) grad_norm 2.1076 (inf) loss_scale 64.0000 (85.4279) mem 9655MB [2024-08-04 10:47:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][460/625] eta 0:00:43 lr 0.000041 wd 0.0500 time 0.2547 (0.2642) data time 0.0007 (0.0021) model time 0.2540 (0.2622) loss 5.6869 (5.4515) grad_norm 2.0606 (inf) loss_scale 64.0000 (84.9631) mem 9655MB [2024-08-04 10:47:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][470/625] eta 0:00:40 lr 0.000041 wd 0.0500 time 0.2533 (0.2640) data time 0.0008 (0.0021) model time 0.2524 (0.2620) loss 5.7744 (5.4471) grad_norm 2.6539 (inf) loss_scale 64.0000 (84.5180) mem 9655MB [2024-08-04 10:48:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][480/625] eta 0:00:38 lr 0.000041 wd 0.0500 time 0.2651 (0.2639) data time 0.0007 (0.0020) model time 0.2644 (0.2619) loss 4.5498 (5.4453) grad_norm 2.5821 (inf) loss_scale 64.0000 (84.0915) mem 9655MB [2024-08-04 10:48:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][490/625] eta 0:00:35 lr 0.000041 wd 0.0500 time 0.2561 (0.2637) data time 0.0010 (0.0020) model time 0.2551 (0.2617) loss 6.0075 (5.4479) grad_norm 4.8644 (inf) loss_scale 64.0000 (83.6823) mem 9655MB [2024-08-04 10:48:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][500/625] eta 0:00:32 lr 0.000041 wd 0.0500 time 0.2586 (0.2638) data time 0.0006 (0.0020) model time 0.2579 (0.2618) loss 5.9451 (5.4517) grad_norm 2.0419 (inf) loss_scale 64.0000 (83.2894) mem 9655MB [2024-08-04 10:48:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][510/625] eta 0:00:30 lr 0.000041 wd 0.0500 time 0.2633 (0.2637) data time 0.0012 (0.0020) model time 0.2621 (0.2617) loss 5.5923 (5.4507) grad_norm 2.1718 (inf) loss_scale 64.0000 (82.9119) mem 9655MB [2024-08-04 10:48:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][520/625] eta 0:00:27 lr 0.000040 wd 0.0500 time 0.2578 (0.2638) data time 0.0010 (0.0020) model time 0.2567 (0.2619) loss 5.1869 (5.4505) grad_norm 2.0701 (inf) loss_scale 64.0000 (82.5489) mem 9655MB [2024-08-04 10:48:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][530/625] eta 0:00:25 lr 0.000040 wd 0.0500 time 0.2543 (0.2637) data time 0.0009 (0.0019) model time 0.2534 (0.2618) loss 5.2563 (5.4458) grad_norm 3.0341 (inf) loss_scale 64.0000 (82.1996) mem 9655MB [2024-08-04 10:48:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][540/625] eta 0:00:22 lr 0.000040 wd 0.0500 time 0.2542 (0.2639) data time 0.0009 (0.0019) model time 0.2533 (0.2620) loss 5.1076 (5.4449) grad_norm 1.8743 (inf) loss_scale 64.0000 (81.8632) mem 9655MB [2024-08-04 10:48:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][550/625] eta 0:00:19 lr 0.000040 wd 0.0500 time 0.2557 (0.2638) data time 0.0009 (0.0019) model time 0.2548 (0.2619) loss 5.4445 (5.4423) grad_norm 2.1782 (inf) loss_scale 64.0000 (81.5390) mem 9655MB [2024-08-04 10:48:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][560/625] eta 0:00:17 lr 0.000040 wd 0.0500 time 0.2578 (0.2636) data time 0.0009 (0.0019) model time 0.2569 (0.2618) loss 5.8965 (5.4382) grad_norm 15.3570 (inf) loss_scale 64.0000 (81.2264) mem 9655MB [2024-08-04 10:48:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][570/625] eta 0:00:14 lr 0.000040 wd 0.0500 time 0.2527 (0.2635) data time 0.0007 (0.0019) model time 0.2519 (0.2616) loss 5.6393 (5.4373) grad_norm 2.9733 (inf) loss_scale 64.0000 (80.9247) mem 9655MB [2024-08-04 10:48:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][580/625] eta 0:00:11 lr 0.000040 wd 0.0500 time 0.2546 (0.2634) data time 0.0010 (0.0019) model time 0.2536 (0.2615) loss 5.4187 (5.4359) grad_norm 1.8825 (inf) loss_scale 64.0000 (80.6334) mem 9655MB [2024-08-04 10:48:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][590/625] eta 0:00:09 lr 0.000040 wd 0.0500 time 0.2532 (0.2633) data time 0.0011 (0.0018) model time 0.2521 (0.2614) loss 5.9244 (5.4351) grad_norm 2.5846 (inf) loss_scale 64.0000 (80.3519) mem 9655MB [2024-08-04 10:48:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][600/625] eta 0:00:06 lr 0.000040 wd 0.0500 time 0.2577 (0.2631) data time 0.0009 (0.0018) model time 0.2568 (0.2613) loss 4.7613 (5.4380) grad_norm 2.2598 (inf) loss_scale 64.0000 (80.0799) mem 9655MB [2024-08-04 10:48:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][610/625] eta 0:00:03 lr 0.000040 wd 0.0500 time 0.2524 (0.2630) data time 0.0004 (0.0018) model time 0.2520 (0.2612) loss 5.7153 (5.4435) grad_norm 5.4123 (inf) loss_scale 64.0000 (79.8167) mem 9655MB [2024-08-04 10:48:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][620/625] eta 0:00:01 lr 0.000040 wd 0.0500 time 0.2536 (0.2628) data time 0.0005 (0.0018) model time 0.2531 (0.2610) loss 5.1446 (5.4511) grad_norm 1.6578 (inf) loss_scale 64.0000 (79.5620) mem 9655MB [2024-08-04 10:48:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 281 training takes 0:02:44 [2024-08-04 10:48:39 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:48:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.6021 (0.6021) Acc@1 90.234 (90.234) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:48:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.9062 (0.7170) Acc@1 81.982 (87.163) Acc@5 96.826 (97.905) Mem 9655MB [2024-08-04 10:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0068 (0.8317) Acc@1 78.662 (84.233) Acc@5 95.508 (96.777) Mem 9655MB [2024-08-04 10:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.921 Acc@5 96.781 [2024-08-04 10:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:48:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.92% [2024-08-04 10:48:41 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:48:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:48:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5884 (0.5884) Acc@1 90.234 (90.234) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8940 (0.7063) Acc@1 81.885 (87.154) Acc@5 96.484 (97.807) Mem 9655MB [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 1.0020 (0.8241) Acc@1 79.102 (84.129) Acc@5 95.508 (96.696) Mem 9655MB [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.777 Acc@5 96.721 [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.78% [2024-08-04 10:48:43 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:48:44 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:48:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][0/625] eta 0:06:58 lr 0.000040 wd 0.0500 time 0.6695 (0.6695) data time 0.4278 (0.4278) model time 0.0000 (0.0000) loss 5.4247 (5.4247) grad_norm 4.7284 (4.7284) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:48:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][10/625] eta 0:03:00 lr 0.000040 wd 0.0500 time 0.2532 (0.2939) data time 0.0008 (0.0398) model time 0.0000 (0.0000) loss 5.9225 (5.5596) grad_norm 2.1185 (2.7588) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:48:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][20/625] eta 0:02:52 lr 0.000040 wd 0.0500 time 0.2579 (0.2844) data time 0.0007 (0.0213) model time 0.0000 (0.0000) loss 5.2485 (5.6672) grad_norm 1.8350 (2.8052) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:48:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][30/625] eta 0:02:47 lr 0.000040 wd 0.0500 time 0.2606 (0.2816) data time 0.0006 (0.0147) model time 0.0000 (0.0000) loss 4.4057 (5.6253) grad_norm 2.7028 (2.7957) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:48:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][40/625] eta 0:02:42 lr 0.000040 wd 0.0500 time 0.2564 (0.2777) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 6.0781 (5.5937) grad_norm 3.6939 (3.0016) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:48:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][50/625] eta 0:02:39 lr 0.000040 wd 0.0500 time 0.2525 (0.2770) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 4.4568 (5.5240) grad_norm 3.9565 (3.2109) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][60/625] eta 0:02:34 lr 0.000040 wd 0.0500 time 0.2518 (0.2734) data time 0.0010 (0.0079) model time 0.2509 (0.2543) loss 5.3613 (5.5126) grad_norm 2.0880 (3.4770) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][70/625] eta 0:02:30 lr 0.000040 wd 0.0500 time 0.2531 (0.2708) data time 0.0007 (0.0070) model time 0.2524 (0.2540) loss 6.1279 (5.5333) grad_norm 2.0724 (3.4271) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][80/625] eta 0:02:26 lr 0.000040 wd 0.0500 time 0.2552 (0.2690) data time 0.0008 (0.0062) model time 0.2544 (0.2544) loss 5.1271 (5.4959) grad_norm 3.9146 (3.3892) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][90/625] eta 0:02:23 lr 0.000040 wd 0.0500 time 0.2546 (0.2674) data time 0.0010 (0.0056) model time 0.2536 (0.2543) loss 5.3859 (5.4471) grad_norm 2.0192 (3.4018) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][100/625] eta 0:02:20 lr 0.000040 wd 0.0500 time 0.2554 (0.2681) data time 0.0008 (0.0052) model time 0.2547 (0.2581) loss 6.2192 (5.4491) grad_norm 2.4800 (3.3503) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][110/625] eta 0:02:17 lr 0.000040 wd 0.0500 time 0.2555 (0.2669) data time 0.0008 (0.0048) model time 0.2547 (0.2574) loss 5.0024 (5.4446) grad_norm 1.9649 (3.2705) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][120/625] eta 0:02:15 lr 0.000040 wd 0.0500 time 0.2543 (0.2675) data time 0.0009 (0.0045) model time 0.2534 (0.2596) loss 5.8692 (5.4398) grad_norm 2.0578 (3.1975) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][130/625] eta 0:02:12 lr 0.000040 wd 0.0500 time 0.2593 (0.2681) data time 0.0007 (0.0042) model time 0.2586 (0.2615) loss 4.4795 (5.4380) grad_norm 2.6628 (3.2633) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][140/625] eta 0:02:10 lr 0.000040 wd 0.0500 time 0.2534 (0.2686) data time 0.0008 (0.0039) model time 0.2526 (0.2630) loss 4.9451 (5.4103) grad_norm 2.9676 (3.2594) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][150/625] eta 0:02:07 lr 0.000040 wd 0.0500 time 0.2544 (0.2678) data time 0.0011 (0.0037) model time 0.2533 (0.2623) loss 5.6262 (5.4139) grad_norm 1.9386 (3.4062) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][160/625] eta 0:02:04 lr 0.000040 wd 0.0500 time 0.2555 (0.2677) data time 0.0007 (0.0036) model time 0.2548 (0.2625) loss 5.6908 (5.4307) grad_norm 2.2344 (3.3573) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][170/625] eta 0:02:01 lr 0.000040 wd 0.0500 time 0.2504 (0.2670) data time 0.0010 (0.0034) model time 0.2494 (0.2618) loss 4.6975 (5.4235) grad_norm 2.1985 (3.3284) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][180/625] eta 0:01:58 lr 0.000039 wd 0.0500 time 0.2564 (0.2663) data time 0.0006 (0.0033) model time 0.2558 (0.2613) loss 4.7248 (5.4370) grad_norm 2.0598 (3.2897) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][190/625] eta 0:01:55 lr 0.000039 wd 0.0500 time 0.2524 (0.2658) data time 0.0008 (0.0032) model time 0.2515 (0.2607) loss 5.1683 (5.4471) grad_norm 3.6471 (3.2941) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][200/625] eta 0:01:53 lr 0.000039 wd 0.0500 time 0.2558 (0.2672) data time 0.0007 (0.0031) model time 0.2552 (0.2629) loss 5.8170 (5.4485) grad_norm 2.6399 (3.2956) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][210/625] eta 0:01:50 lr 0.000039 wd 0.0500 time 0.2542 (0.2673) data time 0.0009 (0.0030) model time 0.2533 (0.2632) loss 5.7856 (5.4537) grad_norm 3.7295 (3.3110) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][220/625] eta 0:01:48 lr 0.000039 wd 0.0500 time 0.2559 (0.2667) data time 0.0007 (0.0029) model time 0.2553 (0.2627) loss 5.0635 (5.4621) grad_norm 2.9481 (3.2761) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][230/625] eta 0:01:45 lr 0.000039 wd 0.0500 time 0.2522 (0.2662) data time 0.0008 (0.0028) model time 0.2514 (0.2622) loss 5.0370 (5.4535) grad_norm 1.9156 (3.2516) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][240/625] eta 0:01:42 lr 0.000039 wd 0.0500 time 0.2589 (0.2658) data time 0.0008 (0.0027) model time 0.2581 (0.2618) loss 5.4368 (5.4575) grad_norm 1.9715 (3.2120) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][250/625] eta 0:01:39 lr 0.000039 wd 0.0500 time 0.2589 (0.2654) data time 0.0007 (0.0026) model time 0.2582 (0.2615) loss 6.1963 (5.4691) grad_norm 4.0093 (3.1981) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][260/625] eta 0:01:36 lr 0.000039 wd 0.0500 time 0.2626 (0.2657) data time 0.0008 (0.0026) model time 0.2618 (0.2620) loss 4.7050 (5.4613) grad_norm 2.2195 (3.1601) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][270/625] eta 0:01:34 lr 0.000039 wd 0.0500 time 0.2523 (0.2653) data time 0.0007 (0.0025) model time 0.2516 (0.2617) loss 4.6743 (5.4581) grad_norm 1.9529 (3.1488) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:49:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][280/625] eta 0:01:31 lr 0.000039 wd 0.0500 time 0.2564 (0.2650) data time 0.0009 (0.0025) model time 0.2556 (0.2614) loss 5.3052 (5.4662) grad_norm 2.4571 (3.1466) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][290/625] eta 0:01:28 lr 0.000039 wd 0.0500 time 0.2529 (0.2651) data time 0.0011 (0.0024) model time 0.2518 (0.2617) loss 5.5189 (5.4680) grad_norm 1.9254 (3.1337) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][300/625] eta 0:01:26 lr 0.000039 wd 0.0500 time 0.2557 (0.2648) data time 0.0008 (0.0024) model time 0.2549 (0.2614) loss 5.4305 (5.4622) grad_norm 1.8489 (3.1263) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][310/625] eta 0:01:23 lr 0.000039 wd 0.0500 time 0.2515 (0.2645) data time 0.0007 (0.0023) model time 0.2508 (0.2611) loss 4.3764 (5.4534) grad_norm 2.3300 (3.1083) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][320/625] eta 0:01:20 lr 0.000039 wd 0.0500 time 0.2566 (0.2643) data time 0.0011 (0.0023) model time 0.2555 (0.2609) loss 5.6475 (5.4583) grad_norm 2.3909 (3.0899) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][330/625] eta 0:01:17 lr 0.000039 wd 0.0500 time 0.2574 (0.2641) data time 0.0009 (0.0022) model time 0.2565 (0.2608) loss 4.8549 (5.4578) grad_norm 1.7927 (3.1006) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][340/625] eta 0:01:15 lr 0.000039 wd 0.0500 time 0.2562 (0.2639) data time 0.0007 (0.0022) model time 0.2555 (0.2606) loss 6.1153 (5.4604) grad_norm 2.3292 (3.1207) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][350/625] eta 0:01:12 lr 0.000039 wd 0.0500 time 0.2542 (0.2642) data time 0.0008 (0.0022) model time 0.2535 (0.2611) loss 5.5767 (5.4630) grad_norm 3.6123 (3.1131) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][360/625] eta 0:01:09 lr 0.000039 wd 0.0500 time 0.2548 (0.2640) data time 0.0010 (0.0021) model time 0.2538 (0.2609) loss 6.4332 (5.4641) grad_norm 6.4536 (3.1403) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][370/625] eta 0:01:07 lr 0.000039 wd 0.0500 time 0.2553 (0.2638) data time 0.0007 (0.0021) model time 0.2546 (0.2608) loss 6.1835 (5.4639) grad_norm 6.1603 (3.1392) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][380/625] eta 0:01:04 lr 0.000039 wd 0.0500 time 0.2534 (0.2636) data time 0.0006 (0.0021) model time 0.2527 (0.2606) loss 5.4175 (5.4569) grad_norm 2.0838 (3.1592) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][390/625] eta 0:01:02 lr 0.000039 wd 0.0500 time 0.2549 (0.2639) data time 0.0009 (0.0020) model time 0.2540 (0.2610) loss 4.8948 (5.4585) grad_norm 6.7178 (3.1637) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][400/625] eta 0:00:59 lr 0.000039 wd 0.0500 time 0.2583 (0.2647) data time 0.0010 (0.0020) model time 0.2573 (0.2620) loss 5.5235 (5.4651) grad_norm 2.5512 (3.1505) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][410/625] eta 0:00:56 lr 0.000039 wd 0.0500 time 0.2524 (0.2645) data time 0.0010 (0.0020) model time 0.2514 (0.2618) loss 5.4382 (5.4686) grad_norm 24.9106 (3.1891) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][420/625] eta 0:00:54 lr 0.000039 wd 0.0500 time 0.2608 (0.2648) data time 0.0010 (0.0020) model time 0.2598 (0.2622) loss 5.6334 (5.4691) grad_norm 2.8731 (3.1748) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][430/625] eta 0:00:51 lr 0.000039 wd 0.0500 time 0.2538 (0.2647) data time 0.0009 (0.0019) model time 0.2530 (0.2621) loss 4.8004 (5.4596) grad_norm 1.6444 (3.1543) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][440/625] eta 0:00:48 lr 0.000039 wd 0.0500 time 0.2549 (0.2645) data time 0.0010 (0.0019) model time 0.2538 (0.2619) loss 5.4283 (5.4517) grad_norm 2.6340 (3.1480) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][450/625] eta 0:00:46 lr 0.000039 wd 0.0500 time 0.2594 (0.2654) data time 0.0006 (0.0019) model time 0.2588 (0.2630) loss 5.5699 (5.4532) grad_norm 2.3854 (3.1375) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][460/625] eta 0:00:43 lr 0.000039 wd 0.0500 time 0.2524 (0.2655) data time 0.0012 (0.0019) model time 0.2513 (0.2632) loss 5.4752 (5.4501) grad_norm 2.4707 (3.1203) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][470/625] eta 0:00:41 lr 0.000038 wd 0.0500 time 0.2551 (0.2658) data time 0.0008 (0.0018) model time 0.2543 (0.2635) loss 5.7908 (5.4488) grad_norm 3.8862 (3.1121) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][480/625] eta 0:00:38 lr 0.000038 wd 0.0500 time 0.2592 (0.2656) data time 0.0010 (0.0018) model time 0.2582 (0.2633) loss 5.8512 (5.4504) grad_norm 2.0769 (3.1132) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][490/625] eta 0:00:35 lr 0.000038 wd 0.0500 time 0.2514 (0.2659) data time 0.0007 (0.0018) model time 0.2507 (0.2637) loss 5.2614 (5.4480) grad_norm 1.9073 (3.1020) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:50:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][500/625] eta 0:00:33 lr 0.000038 wd 0.0500 time 0.2599 (0.2657) data time 0.0008 (0.0018) model time 0.2591 (0.2635) loss 5.8586 (5.4515) grad_norm 2.8670 (3.0956) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][510/625] eta 0:00:30 lr 0.000038 wd 0.0500 time 0.2590 (0.2656) data time 0.0008 (0.0018) model time 0.2581 (0.2634) loss 4.7152 (5.4500) grad_norm 2.8575 (3.0863) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][520/625] eta 0:00:27 lr 0.000038 wd 0.0500 time 0.2546 (0.2654) data time 0.0010 (0.0018) model time 0.2536 (0.2632) loss 6.8963 (5.4514) grad_norm 2.1985 (3.0772) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][530/625] eta 0:00:25 lr 0.000038 wd 0.0500 time 0.2530 (0.2652) data time 0.0011 (0.0017) model time 0.2519 (0.2630) loss 6.3141 (5.4526) grad_norm 2.0589 (3.0729) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][540/625] eta 0:00:22 lr 0.000038 wd 0.0500 time 0.2589 (0.2651) data time 0.0006 (0.0017) model time 0.2583 (0.2629) loss 4.5324 (5.4493) grad_norm 1.9421 (3.0621) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][550/625] eta 0:00:19 lr 0.000038 wd 0.0500 time 0.2583 (0.2653) data time 0.0006 (0.0017) model time 0.2577 (0.2632) loss 6.2255 (5.4486) grad_norm 1.9998 (3.0470) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][560/625] eta 0:00:17 lr 0.000038 wd 0.0500 time 0.2569 (0.2651) data time 0.0009 (0.0017) model time 0.2560 (0.2630) loss 5.7270 (5.4474) grad_norm 2.1123 (3.0423) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][570/625] eta 0:00:14 lr 0.000038 wd 0.0500 time 0.2555 (0.2653) data time 0.0008 (0.0017) model time 0.2547 (0.2632) loss 5.0431 (5.4448) grad_norm 1.7759 (3.0309) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][580/625] eta 0:00:11 lr 0.000038 wd 0.0500 time 0.2553 (0.2655) data time 0.0008 (0.0017) model time 0.2545 (0.2634) loss 4.1316 (5.4441) grad_norm 2.1123 (3.0774) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][590/625] eta 0:00:09 lr 0.000038 wd 0.0500 time 0.2627 (0.2653) data time 0.0007 (0.0017) model time 0.2620 (0.2633) loss 4.2370 (5.4390) grad_norm 1.6703 (3.0895) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][600/625] eta 0:00:06 lr 0.000038 wd 0.0500 time 0.2533 (0.2652) data time 0.0006 (0.0017) model time 0.2527 (0.2631) loss 6.1642 (5.4408) grad_norm 2.3762 (3.0839) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][610/625] eta 0:00:03 lr 0.000038 wd 0.0500 time 0.2527 (0.2653) data time 0.0004 (0.0016) model time 0.2522 (0.2633) loss 4.9627 (5.4368) grad_norm 2.7834 (3.0762) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][620/625] eta 0:00:01 lr 0.000038 wd 0.0500 time 0.2520 (0.2651) data time 0.0003 (0.0016) model time 0.2517 (0.2631) loss 4.6709 (5.4341) grad_norm 2.2252 (3.0719) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 282 training takes 0:02:45 [2024-08-04 10:51:30 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:51:30 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:51:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.5991 (0.5991) Acc@1 90.674 (90.674) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:51:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.9102 (0.7141) Acc@1 81.836 (87.380) Acc@5 96.484 (97.847) Mem 9655MB [2024-08-04 10:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0059 (0.8296) Acc@1 78.955 (84.347) Acc@5 95.459 (96.738) Mem 9655MB [2024-08-04 10:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.031 Acc@5 96.773 [2024-08-04 10:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 10:51:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.03% [2024-08-04 10:51:32 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 10:51:33 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 10:51:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.553 (0.553) Loss 0.5884 (0.5884) Acc@1 90.234 (90.234) Acc@5 98.730 (98.730) Mem 9655MB [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.101) Loss 0.8931 (0.7061) Acc@1 82.031 (87.163) Acc@5 96.582 (97.812) Mem 9655MB [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 1.0020 (0.8238) Acc@1 79.053 (84.126) Acc@5 95.459 (96.703) Mem 9655MB [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.785 Acc@5 96.727 [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.79% [2024-08-04 10:51:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:51:35 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:51:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][0/625] eta 0:07:31 lr 0.000038 wd 0.0500 time 0.7229 (0.7229) data time 0.4818 (0.4818) model time 0.0000 (0.0000) loss 5.1801 (5.1801) grad_norm 5.8077 (5.8077) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][10/625] eta 0:03:03 lr 0.000038 wd 0.0500 time 0.2534 (0.2983) data time 0.0010 (0.0447) model time 0.0000 (0.0000) loss 6.4388 (5.5410) grad_norm 2.3715 (5.4110) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][20/625] eta 0:02:51 lr 0.000038 wd 0.0500 time 0.2571 (0.2842) data time 0.0006 (0.0238) model time 0.0000 (0.0000) loss 6.0911 (5.6371) grad_norm 2.3477 (4.1633) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][30/625] eta 0:02:43 lr 0.000038 wd 0.0500 time 0.2540 (0.2747) data time 0.0008 (0.0164) model time 0.0000 (0.0000) loss 5.4590 (5.5869) grad_norm 3.8478 (3.7971) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][40/625] eta 0:02:37 lr 0.000038 wd 0.0500 time 0.2552 (0.2699) data time 0.0007 (0.0126) model time 0.0000 (0.0000) loss 4.6550 (5.5196) grad_norm 2.0995 (3.5436) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][50/625] eta 0:02:33 lr 0.000038 wd 0.0500 time 0.2538 (0.2671) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 5.1799 (5.4745) grad_norm 1.9405 (3.4341) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][60/625] eta 0:02:29 lr 0.000038 wd 0.0500 time 0.2572 (0.2652) data time 0.0007 (0.0088) model time 0.2564 (0.2542) loss 5.1669 (5.4460) grad_norm 2.3478 (3.4214) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][70/625] eta 0:02:27 lr 0.000038 wd 0.0500 time 0.2531 (0.2664) data time 0.0009 (0.0077) model time 0.2522 (0.2636) loss 6.1711 (5.4900) grad_norm 3.3301 (3.2989) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][80/625] eta 0:02:24 lr 0.000038 wd 0.0500 time 0.2547 (0.2652) data time 0.0010 (0.0068) model time 0.2537 (0.2611) loss 5.0214 (5.5038) grad_norm 2.4552 (3.2619) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:51:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][90/625] eta 0:02:21 lr 0.000038 wd 0.0500 time 0.2542 (0.2641) data time 0.0009 (0.0062) model time 0.2533 (0.2593) loss 6.2368 (5.5292) grad_norm 2.6533 (3.1676) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][100/625] eta 0:02:19 lr 0.000038 wd 0.0500 time 0.2527 (0.2651) data time 0.0007 (0.0057) model time 0.2520 (0.2622) loss 6.1146 (5.5406) grad_norm 3.7386 (3.1315) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][110/625] eta 0:02:16 lr 0.000038 wd 0.0500 time 0.2566 (0.2643) data time 0.0011 (0.0053) model time 0.2555 (0.2610) loss 5.0535 (5.4818) grad_norm 2.0129 (3.1038) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][120/625] eta 0:02:13 lr 0.000038 wd 0.0500 time 0.2519 (0.2637) data time 0.0010 (0.0049) model time 0.2510 (0.2602) loss 5.6994 (5.4745) grad_norm 3.3077 (3.0948) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][130/625] eta 0:02:10 lr 0.000038 wd 0.0500 time 0.2577 (0.2644) data time 0.0008 (0.0046) model time 0.2570 (0.2617) loss 6.0164 (5.4724) grad_norm 2.7132 (3.0437) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][140/625] eta 0:02:07 lr 0.000037 wd 0.0500 time 0.2583 (0.2637) data time 0.0009 (0.0043) model time 0.2574 (0.2609) loss 5.6047 (5.4484) grad_norm 2.2064 (3.0628) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][150/625] eta 0:02:05 lr 0.000037 wd 0.0500 time 0.2503 (0.2632) data time 0.0010 (0.0041) model time 0.2494 (0.2602) loss 5.5303 (5.4514) grad_norm 2.9284 (3.0636) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][160/625] eta 0:02:02 lr 0.000037 wd 0.0500 time 0.2590 (0.2628) data time 0.0008 (0.0039) model time 0.2582 (0.2599) loss 5.0895 (5.4537) grad_norm 3.0766 (3.0391) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][170/625] eta 0:01:59 lr 0.000037 wd 0.0500 time 0.2497 (0.2625) data time 0.0010 (0.0037) model time 0.2487 (0.2596) loss 5.0961 (5.4476) grad_norm 2.5562 (3.0092) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][180/625] eta 0:01:56 lr 0.000037 wd 0.0500 time 0.2592 (0.2623) data time 0.0006 (0.0036) model time 0.2585 (0.2594) loss 5.8627 (5.4609) grad_norm 2.4393 (3.0525) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][190/625] eta 0:01:53 lr 0.000037 wd 0.0500 time 0.2575 (0.2619) data time 0.0008 (0.0034) model time 0.2567 (0.2591) loss 5.9716 (5.4586) grad_norm 2.1445 (3.0462) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][200/625] eta 0:01:51 lr 0.000037 wd 0.0500 time 0.2602 (0.2617) data time 0.0009 (0.0033) model time 0.2593 (0.2589) loss 5.0280 (5.4632) grad_norm 2.4412 (3.0010) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][210/625] eta 0:01:48 lr 0.000037 wd 0.0500 time 0.2541 (0.2621) data time 0.0011 (0.0032) model time 0.2530 (0.2595) loss 6.3155 (5.4821) grad_norm 2.5705 (3.0090) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][220/625] eta 0:01:46 lr 0.000037 wd 0.0500 time 0.4353 (0.2626) data time 0.0009 (0.0031) model time 0.4345 (0.2603) loss 6.2440 (5.4684) grad_norm 1.9976 (3.0090) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][230/625] eta 0:01:43 lr 0.000037 wd 0.0500 time 0.2577 (0.2623) data time 0.0006 (0.0030) model time 0.2571 (0.2600) loss 5.1017 (5.4664) grad_norm 2.0515 (3.0593) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][240/625] eta 0:01:41 lr 0.000037 wd 0.0500 time 0.2572 (0.2629) data time 0.0008 (0.0029) model time 0.2564 (0.2608) loss 5.6582 (5.4596) grad_norm 2.4459 (3.0597) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][250/625] eta 0:01:38 lr 0.000037 wd 0.0500 time 0.2576 (0.2626) data time 0.0009 (0.0028) model time 0.2567 (0.2605) loss 4.8296 (5.4689) grad_norm 2.8279 (3.0555) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][260/625] eta 0:01:35 lr 0.000037 wd 0.0500 time 0.2640 (0.2628) data time 0.0008 (0.0028) model time 0.2631 (0.2609) loss 5.4571 (5.4600) grad_norm 2.0589 (3.0396) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][270/625] eta 0:01:33 lr 0.000037 wd 0.0500 time 0.2553 (0.2633) data time 0.0007 (0.0027) model time 0.2545 (0.2614) loss 6.1161 (5.4615) grad_norm 3.1665 (3.0581) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][280/625] eta 0:01:31 lr 0.000037 wd 0.0500 time 0.2566 (0.2638) data time 0.0006 (0.0026) model time 0.2560 (0.2621) loss 5.5525 (5.4606) grad_norm 2.7284 (3.0591) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][290/625] eta 0:01:28 lr 0.000037 wd 0.0500 time 0.2530 (0.2635) data time 0.0008 (0.0026) model time 0.2522 (0.2618) loss 6.2257 (5.4543) grad_norm 1.9980 (3.0478) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][300/625] eta 0:01:25 lr 0.000037 wd 0.0500 time 0.2566 (0.2633) data time 0.0010 (0.0025) model time 0.2556 (0.2615) loss 5.9512 (5.4571) grad_norm 3.0636 (3.0703) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][310/625] eta 0:01:22 lr 0.000037 wd 0.0500 time 0.2650 (0.2630) data time 0.0010 (0.0025) model time 0.2641 (0.2613) loss 4.9884 (5.4689) grad_norm 2.0941 (3.1277) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:52:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][320/625] eta 0:01:20 lr 0.000037 wd 0.0500 time 0.2557 (0.2628) data time 0.0010 (0.0024) model time 0.2547 (0.2611) loss 6.0100 (5.4748) grad_norm 4.2957 (3.1271) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][330/625] eta 0:01:17 lr 0.000037 wd 0.0500 time 0.2618 (0.2626) data time 0.0007 (0.0024) model time 0.2611 (0.2609) loss 4.7300 (5.4751) grad_norm 4.4038 (3.1378) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][340/625] eta 0:01:14 lr 0.000037 wd 0.0500 time 0.2562 (0.2629) data time 0.0006 (0.0023) model time 0.2556 (0.2613) loss 6.2084 (5.4767) grad_norm 1.7259 (3.1281) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][350/625] eta 0:01:12 lr 0.000037 wd 0.0500 time 0.2554 (0.2633) data time 0.0007 (0.0023) model time 0.2546 (0.2617) loss 5.1340 (5.4803) grad_norm 1.9649 (3.1129) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][360/625] eta 0:01:09 lr 0.000037 wd 0.0500 time 0.2608 (0.2631) data time 0.0011 (0.0022) model time 0.2597 (0.2615) loss 6.5042 (5.4847) grad_norm 2.3657 (3.1063) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][370/625] eta 0:01:07 lr 0.000037 wd 0.0500 time 0.2508 (0.2629) data time 0.0011 (0.0022) model time 0.2497 (0.2613) loss 5.3227 (5.4871) grad_norm 1.7969 (3.0837) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][380/625] eta 0:01:04 lr 0.000037 wd 0.0500 time 0.2580 (0.2627) data time 0.0007 (0.0022) model time 0.2573 (0.2611) loss 6.3867 (5.4894) grad_norm 3.5311 (3.0932) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][390/625] eta 0:01:01 lr 0.000037 wd 0.0500 time 0.2564 (0.2628) data time 0.0008 (0.0021) model time 0.2557 (0.2613) loss 6.4017 (5.4885) grad_norm 2.4134 (3.0936) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][400/625] eta 0:00:59 lr 0.000037 wd 0.0500 time 0.2575 (0.2631) data time 0.0009 (0.0021) model time 0.2566 (0.2616) loss 5.6210 (5.4860) grad_norm 3.3172 (3.0865) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][410/625] eta 0:00:56 lr 0.000037 wd 0.0500 time 0.2510 (0.2629) data time 0.0009 (0.0021) model time 0.2501 (0.2614) loss 4.7185 (5.4844) grad_norm 1.6510 (3.0851) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][420/625] eta 0:00:53 lr 0.000037 wd 0.0500 time 0.4416 (0.2632) data time 0.0006 (0.0021) model time 0.4409 (0.2617) loss 6.1757 (5.4854) grad_norm 1.9421 (3.1048) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][430/625] eta 0:00:51 lr 0.000037 wd 0.0500 time 0.2506 (0.2630) data time 0.0007 (0.0020) model time 0.2499 (0.2615) loss 6.0082 (5.4879) grad_norm 2.4085 (3.0909) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][440/625] eta 0:00:48 lr 0.000037 wd 0.0500 time 0.2553 (0.2628) data time 0.0009 (0.0020) model time 0.2544 (0.2613) loss 5.6721 (5.4855) grad_norm 2.3839 (3.0762) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][450/625] eta 0:00:45 lr 0.000036 wd 0.0500 time 0.2538 (0.2627) data time 0.0006 (0.0020) model time 0.2532 (0.2612) loss 6.3135 (5.4810) grad_norm 3.7639 (3.0955) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][460/625] eta 0:00:43 lr 0.000036 wd 0.0500 time 0.2589 (0.2629) data time 0.0009 (0.0020) model time 0.2580 (0.2615) loss 5.5485 (5.4759) grad_norm 1.8954 (3.0845) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][470/625] eta 0:00:40 lr 0.000036 wd 0.0500 time 0.2526 (0.2628) data time 0.0008 (0.0019) model time 0.2518 (0.2613) loss 4.7257 (5.4712) grad_norm 2.2819 (3.0750) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][480/625] eta 0:00:38 lr 0.000036 wd 0.0500 time 0.2588 (0.2627) data time 0.0009 (0.0019) model time 0.2578 (0.2612) loss 6.0680 (5.4629) grad_norm 2.0450 (3.0594) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][490/625] eta 0:00:35 lr 0.000036 wd 0.0500 time 0.2567 (0.2626) data time 0.0006 (0.0019) model time 0.2561 (0.2611) loss 4.5504 (5.4596) grad_norm 2.2904 (3.0780) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][500/625] eta 0:00:32 lr 0.000036 wd 0.0500 time 0.2631 (0.2628) data time 0.0008 (0.0019) model time 0.2623 (0.2614) loss 5.3918 (5.4574) grad_norm 2.1118 (3.0732) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][510/625] eta 0:00:30 lr 0.000036 wd 0.0500 time 0.2581 (0.2629) data time 0.0006 (0.0018) model time 0.2575 (0.2616) loss 5.7803 (5.4587) grad_norm 2.6419 (3.0761) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][520/625] eta 0:00:27 lr 0.000036 wd 0.0500 time 0.2529 (0.2631) data time 0.0006 (0.0018) model time 0.2522 (0.2618) loss 5.2043 (5.4646) grad_norm 6.3368 (3.0726) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][530/625] eta 0:00:24 lr 0.000036 wd 0.0500 time 0.2579 (0.2630) data time 0.0007 (0.0018) model time 0.2572 (0.2617) loss 5.2095 (5.4672) grad_norm 1.5878 (3.0857) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:53:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][540/625] eta 0:00:22 lr 0.000036 wd 0.0500 time 0.2573 (0.2629) data time 0.0011 (0.0018) model time 0.2562 (0.2615) loss 5.2100 (5.4654) grad_norm 3.2878 (3.0842) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][550/625] eta 0:00:19 lr 0.000036 wd 0.0500 time 0.2553 (0.2627) data time 0.0009 (0.0018) model time 0.2545 (0.2614) loss 5.4974 (5.4635) grad_norm 2.1499 (3.0778) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][560/625] eta 0:00:17 lr 0.000036 wd 0.0500 time 0.2577 (0.2626) data time 0.0007 (0.0018) model time 0.2570 (0.2613) loss 5.3962 (5.4614) grad_norm 3.6467 (3.0777) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][570/625] eta 0:00:14 lr 0.000036 wd 0.0500 time 0.2592 (0.2625) data time 0.0011 (0.0018) model time 0.2582 (0.2612) loss 5.3832 (5.4616) grad_norm 2.1999 (3.0732) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][580/625] eta 0:00:11 lr 0.000036 wd 0.0500 time 0.2553 (0.2624) data time 0.0008 (0.0017) model time 0.2545 (0.2611) loss 5.3518 (5.4619) grad_norm 2.6482 (3.0693) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][590/625] eta 0:00:09 lr 0.000036 wd 0.0500 time 0.2520 (0.2623) data time 0.0007 (0.0017) model time 0.2514 (0.2610) loss 4.9536 (5.4554) grad_norm 3.9716 (3.0708) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][600/625] eta 0:00:06 lr 0.000036 wd 0.0500 time 0.2482 (0.2625) data time 0.0009 (0.0017) model time 0.2473 (0.2612) loss 6.1890 (5.4517) grad_norm 3.0957 (3.0654) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][610/625] eta 0:00:03 lr 0.000036 wd 0.0500 time 0.2531 (0.2628) data time 0.0004 (0.0017) model time 0.2527 (0.2614) loss 5.3091 (5.4541) grad_norm 3.4568 (3.0588) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][620/625] eta 0:00:01 lr 0.000036 wd 0.0500 time 0.2541 (0.2626) data time 0.0003 (0.0017) model time 0.2538 (0.2613) loss 6.4161 (5.4577) grad_norm 3.7094 (3.0867) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 283 training takes 0:02:44 [2024-08-04 10:54:19 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:54:19 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:54:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.5942 (0.5942) Acc@1 90.625 (90.625) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 10:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.095) Loss 0.9028 (0.7090) Acc@1 81.982 (87.282) Acc@5 96.826 (97.905) Mem 9655MB [2024-08-04 10:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0000 (0.8259) Acc@1 78.662 (84.217) Acc@5 95.459 (96.754) Mem 9655MB [2024-08-04 10:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.781 [2024-08-04 10:54:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:54:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.757 (0.757) Loss 0.5898 (0.5898) Acc@1 90.234 (90.234) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.125) Loss 0.8936 (0.7064) Acc@1 82.080 (87.185) Acc@5 96.533 (97.816) Mem 9655MB [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0020 (0.8239) Acc@1 79.150 (84.163) Acc@5 95.459 (96.708) Mem 9655MB [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.731 [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.82% [2024-08-04 10:54:23 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 10:54:24 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 10:54:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][0/625] eta 0:07:07 lr 0.000036 wd 0.0500 time 0.6837 (0.6837) data time 0.4452 (0.4452) model time 0.0000 (0.0000) loss 5.8584 (5.8584) grad_norm 2.0036 (2.0036) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][10/625] eta 0:03:01 lr 0.000036 wd 0.0500 time 0.2568 (0.2949) data time 0.0008 (0.0413) model time 0.0000 (0.0000) loss 5.0000 (5.1507) grad_norm 2.0647 (3.2622) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][20/625] eta 0:02:47 lr 0.000036 wd 0.0500 time 0.2549 (0.2765) data time 0.0008 (0.0221) model time 0.0000 (0.0000) loss 4.9777 (5.1797) grad_norm 2.2880 (2.9046) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][30/625] eta 0:02:48 lr 0.000036 wd 0.0500 time 0.2554 (0.2829) data time 0.0007 (0.0152) model time 0.0000 (0.0000) loss 6.1262 (5.2505) grad_norm 5.1242 (3.4480) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][40/625] eta 0:02:41 lr 0.000036 wd 0.0500 time 0.2529 (0.2763) data time 0.0010 (0.0118) model time 0.0000 (0.0000) loss 5.2090 (5.2892) grad_norm 2.9218 (3.2370) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][50/625] eta 0:02:36 lr 0.000036 wd 0.0500 time 0.2504 (0.2723) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 6.1298 (5.2970) grad_norm 2.4157 (3.0962) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][60/625] eta 0:02:32 lr 0.000036 wd 0.0500 time 0.2552 (0.2697) data time 0.0008 (0.0082) model time 0.2544 (0.2558) loss 5.9030 (5.3548) grad_norm 4.7515 (3.0684) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][70/625] eta 0:02:28 lr 0.000036 wd 0.0500 time 0.2569 (0.2678) data time 0.0005 (0.0072) model time 0.2564 (0.2555) loss 5.8926 (5.3800) grad_norm 2.6145 (3.5754) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][80/625] eta 0:02:27 lr 0.000036 wd 0.0500 time 0.2524 (0.2710) data time 0.0009 (0.0064) model time 0.2515 (0.2679) loss 4.9244 (5.3797) grad_norm 4.4426 (3.4966) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][90/625] eta 0:02:25 lr 0.000036 wd 0.0500 time 0.2563 (0.2715) data time 0.0008 (0.0058) model time 0.2555 (0.2695) loss 5.5168 (5.3728) grad_norm 2.0392 (3.4598) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][100/625] eta 0:02:21 lr 0.000036 wd 0.0500 time 0.2576 (0.2703) data time 0.0007 (0.0053) model time 0.2568 (0.2672) loss 4.4148 (5.3504) grad_norm 2.7576 (3.3764) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][110/625] eta 0:02:19 lr 0.000036 wd 0.0500 time 0.2544 (0.2709) data time 0.0006 (0.0049) model time 0.2538 (0.2687) loss 6.3906 (5.3866) grad_norm 1.9729 (3.3652) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][120/625] eta 0:02:16 lr 0.000036 wd 0.0500 time 0.2641 (0.2697) data time 0.0008 (0.0046) model time 0.2632 (0.2669) loss 5.7226 (5.3906) grad_norm 2.8991 (3.8344) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:54:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][130/625] eta 0:02:12 lr 0.000035 wd 0.0500 time 0.2560 (0.2687) data time 0.0009 (0.0043) model time 0.2551 (0.2655) loss 5.9599 (5.4129) grad_norm 2.8881 (3.8020) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][140/625] eta 0:02:10 lr 0.000035 wd 0.0500 time 0.2557 (0.2691) data time 0.0010 (0.0041) model time 0.2546 (0.2663) loss 5.9454 (5.4238) grad_norm 2.6536 (3.8119) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][150/625] eta 0:02:07 lr 0.000035 wd 0.0500 time 0.2563 (0.2682) data time 0.0010 (0.0039) model time 0.2553 (0.2651) loss 4.5546 (5.4183) grad_norm 2.4543 (3.7453) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][160/625] eta 0:02:04 lr 0.000035 wd 0.0500 time 0.2553 (0.2674) data time 0.0008 (0.0037) model time 0.2546 (0.2642) loss 6.0848 (5.4326) grad_norm 2.8913 (3.6756) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][170/625] eta 0:02:01 lr 0.000035 wd 0.0500 time 0.2571 (0.2667) data time 0.0008 (0.0035) model time 0.2563 (0.2635) loss 5.9585 (5.4308) grad_norm 3.0868 (3.6274) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][180/625] eta 0:01:59 lr 0.000035 wd 0.0500 time 0.4716 (0.2691) data time 0.0009 (0.0034) model time 0.4707 (0.2669) loss 6.4685 (5.4449) grad_norm 2.1619 (3.5649) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][190/625] eta 0:01:56 lr 0.000035 wd 0.0500 time 0.2574 (0.2684) data time 0.0010 (0.0032) model time 0.2564 (0.2661) loss 5.5639 (5.4325) grad_norm 2.1038 (3.4986) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][200/625] eta 0:01:53 lr 0.000035 wd 0.0500 time 0.2537 (0.2678) data time 0.0011 (0.0031) model time 0.2527 (0.2653) loss 5.1426 (5.4149) grad_norm 8.4692 (3.4930) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][210/625] eta 0:01:51 lr 0.000035 wd 0.0500 time 0.2560 (0.2678) data time 0.0009 (0.0030) model time 0.2551 (0.2654) loss 5.5943 (5.4211) grad_norm 2.8784 (3.4822) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][220/625] eta 0:01:48 lr 0.000035 wd 0.0500 time 0.2547 (0.2672) data time 0.0007 (0.0029) model time 0.2541 (0.2647) loss 5.5625 (5.4239) grad_norm 2.8567 (3.4934) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][230/625] eta 0:01:45 lr 0.000035 wd 0.0500 time 0.2578 (0.2667) data time 0.0007 (0.0028) model time 0.2571 (0.2642) loss 5.8263 (5.4198) grad_norm 2.2962 (3.4519) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][240/625] eta 0:01:42 lr 0.000035 wd 0.0500 time 0.4634 (0.2671) data time 0.0008 (0.0028) model time 0.4626 (0.2647) loss 5.3277 (5.4240) grad_norm 2.5858 (3.4269) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][250/625] eta 0:01:39 lr 0.000035 wd 0.0500 time 0.2536 (0.2666) data time 0.0008 (0.0027) model time 0.2528 (0.2643) loss 4.9095 (5.4195) grad_norm 2.0762 (3.4125) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][260/625] eta 0:01:37 lr 0.000035 wd 0.0500 time 0.2568 (0.2663) data time 0.0007 (0.0026) model time 0.2562 (0.2639) loss 5.4313 (5.4186) grad_norm 3.4068 (3.3925) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][270/625] eta 0:01:34 lr 0.000035 wd 0.0500 time 0.4285 (0.2673) data time 0.0008 (0.0026) model time 0.4277 (0.2652) loss 6.0498 (5.4249) grad_norm 3.1135 (3.3635) loss_scale 64.0000 (64.0000) mem 9655MB [2024-08-04 10:55:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][280/625] eta 0:01:32 lr 0.000035 wd 0.0500 time 0.2535 (0.2670) data time 0.0011 (0.0025) model time 0.2524 (0.2649) loss 6.1261 (5.4225) grad_norm 8.2922 (3.3653) loss_scale 128.0000 (65.1388) mem 9655MB [2024-08-04 10:55:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][290/625] eta 0:01:29 lr 0.000035 wd 0.0500 time 0.2539 (0.2669) data time 0.0008 (0.0024) model time 0.2531 (0.2648) loss 5.2907 (5.4243) grad_norm 2.0888 (3.3537) loss_scale 128.0000 (67.2990) mem 9655MB [2024-08-04 10:55:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][300/625] eta 0:01:26 lr 0.000035 wd 0.0500 time 0.2575 (0.2665) data time 0.0007 (0.0024) model time 0.2568 (0.2645) loss 4.8102 (5.4157) grad_norm 2.8788 (3.3468) loss_scale 128.0000 (69.3156) mem 9655MB [2024-08-04 10:55:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][310/625] eta 0:01:23 lr 0.000035 wd 0.0500 time 0.2607 (0.2662) data time 0.0006 (0.0023) model time 0.2601 (0.2641) loss 5.7783 (5.4111) grad_norm 3.4470 (3.3246) loss_scale 128.0000 (71.2026) mem 9655MB [2024-08-04 10:55:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][320/625] eta 0:01:21 lr 0.000035 wd 0.0500 time 0.2581 (0.2659) data time 0.0006 (0.0023) model time 0.2575 (0.2638) loss 5.7938 (5.4049) grad_norm 1.8452 (3.2938) loss_scale 128.0000 (72.9720) mem 9655MB [2024-08-04 10:55:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][330/625] eta 0:01:18 lr 0.000035 wd 0.0500 time 0.2576 (0.2660) data time 0.0008 (0.0023) model time 0.2568 (0.2639) loss 5.9937 (5.4047) grad_norm 8.1720 (3.3073) loss_scale 128.0000 (74.6344) mem 9655MB [2024-08-04 10:55:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][340/625] eta 0:01:15 lr 0.000035 wd 0.0500 time 0.2529 (0.2663) data time 0.0009 (0.0022) model time 0.2520 (0.2643) loss 5.4198 (5.4080) grad_norm 4.6899 (3.2871) loss_scale 128.0000 (76.1994) mem 9655MB [2024-08-04 10:55:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][350/625] eta 0:01:13 lr 0.000035 wd 0.0500 time 0.2566 (0.2661) data time 0.0009 (0.0022) model time 0.2557 (0.2641) loss 5.9638 (5.4059) grad_norm 2.2529 (3.2836) loss_scale 128.0000 (77.6752) mem 9655MB [2024-08-04 10:56:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][360/625] eta 0:01:10 lr 0.000035 wd 0.0500 time 0.2561 (0.2658) data time 0.0007 (0.0021) model time 0.2554 (0.2638) loss 5.9815 (5.3983) grad_norm 1.8841 (3.2596) loss_scale 128.0000 (79.0693) mem 9655MB [2024-08-04 10:56:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][370/625] eta 0:01:07 lr 0.000035 wd 0.0500 time 0.2565 (0.2655) data time 0.0008 (0.0021) model time 0.2558 (0.2636) loss 5.7747 (5.4014) grad_norm 1.7738 (3.2466) loss_scale 128.0000 (80.3881) mem 9655MB [2024-08-04 10:56:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][380/625] eta 0:01:05 lr 0.000035 wd 0.0500 time 0.2584 (0.2653) data time 0.0006 (0.0021) model time 0.2578 (0.2633) loss 4.7578 (5.3975) grad_norm 2.3165 (3.2380) loss_scale 128.0000 (81.6378) mem 9655MB [2024-08-04 10:56:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][390/625] eta 0:01:02 lr 0.000035 wd 0.0500 time 0.2530 (0.2651) data time 0.0008 (0.0020) model time 0.2522 (0.2631) loss 6.6499 (5.4103) grad_norm 2.1299 (3.2439) loss_scale 128.0000 (82.8235) mem 9655MB [2024-08-04 10:56:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][400/625] eta 0:00:59 lr 0.000035 wd 0.0500 time 0.2546 (0.2649) data time 0.0007 (0.0020) model time 0.2539 (0.2629) loss 4.4076 (5.4142) grad_norm 4.1965 (3.2301) loss_scale 128.0000 (83.9501) mem 9655MB [2024-08-04 10:56:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][410/625] eta 0:00:56 lr 0.000035 wd 0.0500 time 0.2537 (0.2647) data time 0.0008 (0.0020) model time 0.2528 (0.2627) loss 5.6319 (5.4157) grad_norm 3.1714 (3.2192) loss_scale 128.0000 (85.0219) mem 9655MB [2024-08-04 10:56:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][420/625] eta 0:00:54 lr 0.000035 wd 0.0500 time 0.2573 (0.2645) data time 0.0010 (0.0020) model time 0.2563 (0.2625) loss 5.6454 (5.4187) grad_norm 2.7938 (3.2258) loss_scale 128.0000 (86.0428) mem 9655MB [2024-08-04 10:56:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][430/625] eta 0:00:51 lr 0.000035 wd 0.0500 time 0.2529 (0.2646) data time 0.0010 (0.0019) model time 0.2520 (0.2627) loss 5.2967 (5.4192) grad_norm 2.2948 (3.2172) loss_scale 128.0000 (87.0162) mem 9655MB [2024-08-04 10:56:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][440/625] eta 0:00:48 lr 0.000035 wd 0.0500 time 0.2609 (0.2644) data time 0.0008 (0.0019) model time 0.2601 (0.2625) loss 4.7778 (5.4188) grad_norm 3.9621 (3.2212) loss_scale 128.0000 (87.9456) mem 9655MB [2024-08-04 10:56:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][450/625] eta 0:00:46 lr 0.000035 wd 0.0500 time 0.2530 (0.2645) data time 0.0010 (0.0019) model time 0.2520 (0.2626) loss 5.6765 (5.4230) grad_norm 5.2473 (3.2249) loss_scale 128.0000 (88.8337) mem 9655MB [2024-08-04 10:56:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][460/625] eta 0:00:43 lr 0.000034 wd 0.0500 time 0.2534 (0.2648) data time 0.0008 (0.0019) model time 0.2526 (0.2629) loss 5.5402 (5.4240) grad_norm 1.8569 (3.2174) loss_scale 128.0000 (89.6833) mem 9655MB [2024-08-04 10:56:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][470/625] eta 0:00:41 lr 0.000034 wd 0.0500 time 0.2560 (0.2646) data time 0.0012 (0.0019) model time 0.2548 (0.2627) loss 5.5860 (5.4291) grad_norm 2.6133 (3.2072) loss_scale 128.0000 (90.4968) mem 9655MB [2024-08-04 10:56:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][480/625] eta 0:00:38 lr 0.000034 wd 0.0500 time 0.2631 (0.2648) data time 0.0007 (0.0018) model time 0.2624 (0.2630) loss 5.5892 (5.4261) grad_norm 1.9125 (3.1908) loss_scale 128.0000 (91.2765) mem 9655MB [2024-08-04 10:56:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][490/625] eta 0:00:35 lr 0.000034 wd 0.0500 time 0.2516 (0.2649) data time 0.0008 (0.0018) model time 0.2508 (0.2632) loss 5.3511 (5.4272) grad_norm 1.6678 (3.1791) loss_scale 128.0000 (92.0244) mem 9655MB [2024-08-04 10:56:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][500/625] eta 0:00:33 lr 0.000034 wd 0.0500 time 0.2528 (0.2648) data time 0.0011 (0.0018) model time 0.2517 (0.2630) loss 5.3269 (5.4311) grad_norm 1.7899 (3.2353) loss_scale 128.0000 (92.7425) mem 9655MB [2024-08-04 10:56:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][510/625] eta 0:00:30 lr 0.000034 wd 0.0500 time 0.2588 (0.2646) data time 0.0006 (0.0018) model time 0.2582 (0.2628) loss 5.3578 (5.4267) grad_norm 2.3777 (3.3819) loss_scale 128.0000 (93.4325) mem 9655MB [2024-08-04 10:56:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][520/625] eta 0:00:27 lr 0.000034 wd 0.0500 time 0.2551 (0.2644) data time 0.0007 (0.0018) model time 0.2544 (0.2627) loss 5.4005 (5.4294) grad_norm 1.8897 (3.3673) loss_scale 128.0000 (94.0960) mem 9655MB [2024-08-04 10:56:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][530/625] eta 0:00:25 lr 0.000034 wd 0.0500 time 0.2562 (0.2646) data time 0.0007 (0.0017) model time 0.2555 (0.2629) loss 5.3593 (5.4288) grad_norm 7.4648 (3.3593) loss_scale 128.0000 (94.7345) mem 9655MB [2024-08-04 10:56:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][540/625] eta 0:00:22 lr 0.000034 wd 0.0500 time 0.2525 (0.2648) data time 0.0010 (0.0017) model time 0.2515 (0.2632) loss 5.3511 (5.4227) grad_norm 2.7291 (3.3482) loss_scale 128.0000 (95.3494) mem 9655MB [2024-08-04 10:56:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][550/625] eta 0:00:19 lr 0.000034 wd 0.0500 time 0.2565 (0.2647) data time 0.0009 (0.0017) model time 0.2556 (0.2630) loss 5.3073 (5.4278) grad_norm 2.3614 (3.3338) loss_scale 128.0000 (95.9419) mem 9655MB [2024-08-04 10:56:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][560/625] eta 0:00:17 lr 0.000034 wd 0.0500 time 0.2613 (0.2648) data time 0.0006 (0.0017) model time 0.2607 (0.2631) loss 6.2753 (5.4316) grad_norm 3.0293 (3.3360) loss_scale 128.0000 (96.5134) mem 9655MB [2024-08-04 10:56:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][570/625] eta 0:00:14 lr 0.000034 wd 0.0500 time 0.2533 (0.2646) data time 0.0009 (0.0017) model time 0.2524 (0.2629) loss 5.4105 (5.4276) grad_norm 2.0498 (3.3327) loss_scale 128.0000 (97.0648) mem 9655MB [2024-08-04 10:56:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][580/625] eta 0:00:11 lr 0.000034 wd 0.0500 time 0.2564 (0.2644) data time 0.0005 (0.0017) model time 0.2558 (0.2628) loss 5.6923 (5.4246) grad_norm 7.4530 (3.3337) loss_scale 128.0000 (97.5972) mem 9655MB [2024-08-04 10:57:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][590/625] eta 0:00:09 lr 0.000034 wd 0.0500 time 0.2560 (0.2643) data time 0.0007 (0.0017) model time 0.2553 (0.2626) loss 4.9822 (5.4268) grad_norm 3.5676 (3.3316) loss_scale 128.0000 (98.1117) mem 9655MB [2024-08-04 10:57:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][600/625] eta 0:00:06 lr 0.000034 wd 0.0500 time 0.2521 (0.2642) data time 0.0007 (0.0017) model time 0.2514 (0.2625) loss 6.1896 (5.4310) grad_norm 1.7901 (3.3213) loss_scale 128.0000 (98.6090) mem 9655MB [2024-08-04 10:57:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][610/625] eta 0:00:03 lr 0.000034 wd 0.0500 time 0.2532 (0.2640) data time 0.0004 (0.0016) model time 0.2528 (0.2624) loss 4.8513 (5.4265) grad_norm 2.0643 (3.3182) loss_scale 128.0000 (99.0900) mem 9655MB [2024-08-04 10:57:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][620/625] eta 0:00:01 lr 0.000034 wd 0.0500 time 0.2536 (0.2639) data time 0.0006 (0.0016) model time 0.2530 (0.2622) loss 6.5121 (5.4235) grad_norm 2.2043 (3.3202) loss_scale 128.0000 (99.5556) mem 9655MB [2024-08-04 10:57:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 284 training takes 0:02:44 [2024-08-04 10:57:09 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:57:09 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 10:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.6050 (0.6050) Acc@1 90.674 (90.674) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 10:57:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9048 (0.7163) Acc@1 81.543 (87.336) Acc@5 96.729 (97.900) Mem 9655MB [2024-08-04 10:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0059 (0.8304) Acc@1 78.857 (84.277) Acc@5 95.654 (96.787) Mem 9655MB [2024-08-04 10:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.933 Acc@5 96.801 [2024-08-04 10:57:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 10:57:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.711 (0.711) Loss 0.5903 (0.5903) Acc@1 90.234 (90.234) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 10:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.123) Loss 0.8940 (0.7065) Acc@1 82.080 (87.194) Acc@5 96.631 (97.834) Mem 9655MB [2024-08-04 10:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 1.0020 (0.8239) Acc@1 79.102 (84.156) Acc@5 95.508 (96.726) Mem 9655MB [2024-08-04 10:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.809 Acc@5 96.749 [2024-08-04 10:57:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 10:57:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][0/625] eta 0:11:04 lr 0.000034 wd 0.0500 time 1.0634 (1.0634) data time 0.4398 (0.4398) model time 0.0000 (0.0000) loss 5.1323 (5.1323) grad_norm 2.6892 (2.6892) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][10/625] eta 0:03:29 lr 0.000034 wd 0.0500 time 0.2541 (0.3412) data time 0.0007 (0.0409) model time 0.0000 (0.0000) loss 4.5729 (5.5325) grad_norm 3.4898 (2.6634) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][20/625] eta 0:03:01 lr 0.000034 wd 0.0500 time 0.2576 (0.3006) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 5.2872 (5.5279) grad_norm 2.5154 (3.4286) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][30/625] eta 0:02:53 lr 0.000034 wd 0.0500 time 0.2527 (0.2918) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 6.0689 (5.5123) grad_norm 3.0869 (4.3131) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][40/625] eta 0:02:45 lr 0.000034 wd 0.0500 time 0.2571 (0.2832) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 5.3971 (5.4761) grad_norm 1.7438 (3.9649) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][50/625] eta 0:02:39 lr 0.000034 wd 0.0500 time 0.2516 (0.2778) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 5.0404 (5.4572) grad_norm 4.3562 (3.8540) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][60/625] eta 0:02:36 lr 0.000034 wd 0.0500 time 0.4154 (0.2769) data time 0.0008 (0.0081) model time 0.4146 (0.2714) loss 4.9373 (5.4435) grad_norm 4.7905 (3.7686) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][70/625] eta 0:02:32 lr 0.000034 wd 0.0500 time 0.2548 (0.2740) data time 0.0011 (0.0071) model time 0.2537 (0.2632) loss 5.7328 (5.4091) grad_norm 3.6007 (4.0174) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][80/625] eta 0:02:28 lr 0.000034 wd 0.0500 time 0.2535 (0.2718) data time 0.0009 (0.0063) model time 0.2526 (0.2606) loss 5.2343 (5.3758) grad_norm 2.4405 (3.8776) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][90/625] eta 0:02:24 lr 0.000034 wd 0.0500 time 0.2572 (0.2700) data time 0.0008 (0.0057) model time 0.2563 (0.2592) loss 6.5015 (5.4087) grad_norm 3.8186 (3.8189) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][100/625] eta 0:02:22 lr 0.000034 wd 0.0500 time 0.2590 (0.2706) data time 0.0008 (0.0053) model time 0.2582 (0.2623) loss 5.6436 (5.3967) grad_norm 3.0091 (3.8129) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][110/625] eta 0:02:20 lr 0.000034 wd 0.0500 time 0.2518 (0.2719) data time 0.0007 (0.0049) model time 0.2511 (0.2660) loss 5.6041 (5.3734) grad_norm 2.1413 (3.6942) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][120/625] eta 0:02:16 lr 0.000034 wd 0.0500 time 0.2585 (0.2707) data time 0.0007 (0.0045) model time 0.2578 (0.2645) loss 5.1458 (5.3567) grad_norm 1.7744 (3.6154) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][130/625] eta 0:02:13 lr 0.000034 wd 0.0500 time 0.2612 (0.2695) data time 0.0007 (0.0043) model time 0.2604 (0.2633) loss 5.6355 (5.3490) grad_norm 2.0288 (3.5454) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][140/625] eta 0:02:10 lr 0.000034 wd 0.0500 time 0.2570 (0.2685) data time 0.0011 (0.0040) model time 0.2559 (0.2624) loss 4.8611 (5.3599) grad_norm 2.8568 (3.4855) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][150/625] eta 0:02:07 lr 0.000034 wd 0.0500 time 0.2551 (0.2690) data time 0.0007 (0.0038) model time 0.2544 (0.2636) loss 5.6563 (5.3543) grad_norm 2.2077 (3.4177) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][160/625] eta 0:02:04 lr 0.000034 wd 0.0500 time 0.2529 (0.2682) data time 0.0007 (0.0036) model time 0.2522 (0.2628) loss 6.2157 (5.3676) grad_norm 2.6305 (3.3763) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:57:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][170/625] eta 0:02:01 lr 0.000033 wd 0.0500 time 0.2599 (0.2675) data time 0.0006 (0.0035) model time 0.2594 (0.2621) loss 5.3924 (5.3718) grad_norm 4.2468 (3.3867) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][180/625] eta 0:01:59 lr 0.000033 wd 0.0500 time 0.2562 (0.2678) data time 0.0007 (0.0033) model time 0.2555 (0.2629) loss 5.9845 (5.3767) grad_norm 3.2980 (3.3921) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][190/625] eta 0:01:56 lr 0.000033 wd 0.0500 time 0.2543 (0.2681) data time 0.0010 (0.0032) model time 0.2533 (0.2637) loss 5.3905 (5.3689) grad_norm 10.1920 (3.4176) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][200/625] eta 0:01:54 lr 0.000033 wd 0.0500 time 0.2570 (0.2686) data time 0.0009 (0.0031) model time 0.2561 (0.2645) loss 4.9488 (5.3704) grad_norm 3.8821 (3.4030) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][210/625] eta 0:01:51 lr 0.000033 wd 0.0500 time 0.2591 (0.2679) data time 0.0007 (0.0030) model time 0.2585 (0.2639) loss 4.3815 (5.3611) grad_norm 7.6087 (3.4064) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][220/625] eta 0:01:48 lr 0.000033 wd 0.0500 time 0.2545 (0.2674) data time 0.0008 (0.0029) model time 0.2537 (0.2634) loss 5.8935 (5.3706) grad_norm 3.1940 (3.3812) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][230/625] eta 0:01:45 lr 0.000033 wd 0.0500 time 0.2587 (0.2669) data time 0.0006 (0.0028) model time 0.2581 (0.2629) loss 6.5494 (5.3801) grad_norm 4.4111 (3.5079) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][240/625] eta 0:01:42 lr 0.000033 wd 0.0500 time 0.2555 (0.2673) data time 0.0008 (0.0027) model time 0.2547 (0.2636) loss 4.9340 (5.3649) grad_norm 2.7557 (3.4623) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][250/625] eta 0:01:40 lr 0.000033 wd 0.0500 time 0.2553 (0.2669) data time 0.0009 (0.0027) model time 0.2545 (0.2631) loss 5.2398 (5.3602) grad_norm 2.7471 (3.4461) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][260/625] eta 0:01:37 lr 0.000033 wd 0.0500 time 0.2541 (0.2664) data time 0.0008 (0.0026) model time 0.2533 (0.2628) loss 5.4245 (5.3642) grad_norm 3.0601 (3.4307) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][270/625] eta 0:01:34 lr 0.000033 wd 0.0500 time 0.2516 (0.2661) data time 0.0010 (0.0025) model time 0.2506 (0.2624) loss 5.3890 (5.3667) grad_norm 2.5346 (3.3950) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][280/625] eta 0:01:31 lr 0.000033 wd 0.0500 time 0.2576 (0.2657) data time 0.0006 (0.0025) model time 0.2570 (0.2621) loss 6.1936 (5.3702) grad_norm 2.1832 (3.3731) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][290/625] eta 0:01:28 lr 0.000033 wd 0.0500 time 0.2539 (0.2653) data time 0.0011 (0.0024) model time 0.2528 (0.2618) loss 5.0982 (5.3765) grad_norm 3.2038 (3.3532) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][300/625] eta 0:01:26 lr 0.000033 wd 0.0500 time 0.2556 (0.2657) data time 0.0008 (0.0024) model time 0.2548 (0.2623) loss 4.6850 (5.3804) grad_norm 2.9600 (3.3619) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][310/625] eta 0:01:23 lr 0.000033 wd 0.0500 time 0.2544 (0.2654) data time 0.0018 (0.0023) model time 0.2526 (0.2620) loss 6.8482 (5.3883) grad_norm 2.6993 (3.3436) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][320/625] eta 0:01:20 lr 0.000033 wd 0.0500 time 0.2601 (0.2651) data time 0.0007 (0.0023) model time 0.2595 (0.2618) loss 6.2544 (5.3878) grad_norm 1.6403 (3.3220) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][330/625] eta 0:01:18 lr 0.000033 wd 0.0500 time 0.2588 (0.2649) data time 0.0008 (0.0023) model time 0.2580 (0.2616) loss 5.3678 (5.3868) grad_norm 7.0415 (3.3237) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][340/625] eta 0:01:15 lr 0.000033 wd 0.0500 time 0.2559 (0.2647) data time 0.0010 (0.0022) model time 0.2550 (0.2614) loss 5.8338 (5.3882) grad_norm 9.8772 (3.3532) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][350/625] eta 0:01:12 lr 0.000033 wd 0.0500 time 0.2553 (0.2644) data time 0.0011 (0.0022) model time 0.2542 (0.2612) loss 6.6022 (5.3922) grad_norm 2.2791 (3.3429) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][360/625] eta 0:01:10 lr 0.000033 wd 0.0500 time 0.2565 (0.2648) data time 0.0008 (0.0021) model time 0.2557 (0.2617) loss 5.8547 (5.4032) grad_norm 2.1237 (3.3591) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][370/625] eta 0:01:07 lr 0.000033 wd 0.0500 time 0.2692 (0.2645) data time 0.0014 (0.0021) model time 0.2678 (0.2615) loss 4.6423 (5.4070) grad_norm 1.9472 (3.3526) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][380/625] eta 0:01:05 lr 0.000033 wd 0.0500 time 0.2543 (0.2657) data time 0.0006 (0.0021) model time 0.2537 (0.2629) loss 4.9753 (5.4093) grad_norm 2.7010 (3.3974) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:58:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][390/625] eta 0:01:02 lr 0.000033 wd 0.0500 time 0.2589 (0.2655) data time 0.0006 (0.0020) model time 0.2582 (0.2627) loss 6.4695 (5.4150) grad_norm 2.5481 (3.3855) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][400/625] eta 0:00:59 lr 0.000033 wd 0.0500 time 0.2538 (0.2653) data time 0.0009 (0.0020) model time 0.2529 (0.2625) loss 4.2988 (5.4173) grad_norm 1.8009 (3.3624) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][410/625] eta 0:00:56 lr 0.000033 wd 0.0500 time 0.2598 (0.2650) data time 0.0008 (0.0020) model time 0.2589 (0.2623) loss 5.4295 (5.4218) grad_norm 2.8526 (3.3700) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][420/625] eta 0:00:54 lr 0.000033 wd 0.0500 time 0.2589 (0.2648) data time 0.0006 (0.0020) model time 0.2583 (0.2621) loss 5.2832 (5.4260) grad_norm 2.2130 (3.3834) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][430/625] eta 0:00:51 lr 0.000033 wd 0.0500 time 0.2535 (0.2646) data time 0.0008 (0.0019) model time 0.2527 (0.2619) loss 6.0098 (5.4242) grad_norm 3.8036 (3.3663) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][440/625] eta 0:00:49 lr 0.000033 wd 0.0500 time 0.2575 (0.2649) data time 0.0009 (0.0019) model time 0.2566 (0.2623) loss 5.1891 (5.4265) grad_norm 5.4087 (3.3536) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][450/625] eta 0:00:46 lr 0.000033 wd 0.0500 time 0.2587 (0.2651) data time 0.0009 (0.0019) model time 0.2578 (0.2625) loss 5.4779 (5.4302) grad_norm 2.3440 (3.3426) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][460/625] eta 0:00:43 lr 0.000033 wd 0.0500 time 0.2597 (0.2649) data time 0.0007 (0.0019) model time 0.2589 (0.2623) loss 6.2557 (5.4295) grad_norm 2.2030 (3.3355) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][470/625] eta 0:00:41 lr 0.000033 wd 0.0500 time 0.2512 (0.2647) data time 0.0008 (0.0019) model time 0.2505 (0.2622) loss 5.2252 (5.4340) grad_norm 1.9374 (3.3153) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][480/625] eta 0:00:38 lr 0.000033 wd 0.0500 time 0.4642 (0.2650) data time 0.0010 (0.0018) model time 0.4632 (0.2625) loss 5.1293 (5.4346) grad_norm 1.5934 (3.2951) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][490/625] eta 0:00:35 lr 0.000033 wd 0.0500 time 0.2615 (0.2648) data time 0.0007 (0.0018) model time 0.2608 (0.2624) loss 5.6965 (5.4335) grad_norm 3.7397 (3.2886) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][500/625] eta 0:00:33 lr 0.000033 wd 0.0500 time 0.2592 (0.2650) data time 0.0008 (0.0018) model time 0.2584 (0.2626) loss 4.6926 (5.4340) grad_norm 2.4162 (3.2741) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][510/625] eta 0:00:30 lr 0.000033 wd 0.0500 time 0.2515 (0.2648) data time 0.0016 (0.0018) model time 0.2499 (0.2625) loss 5.7634 (5.4325) grad_norm 2.5362 (3.2624) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][520/625] eta 0:00:27 lr 0.000032 wd 0.0500 time 0.2562 (0.2649) data time 0.0010 (0.0018) model time 0.2552 (0.2626) loss 5.7729 (5.4341) grad_norm 2.0535 (3.2839) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][530/625] eta 0:00:25 lr 0.000032 wd 0.0500 time 0.2532 (0.2651) data time 0.0009 (0.0017) model time 0.2523 (0.2628) loss 5.2003 (5.4285) grad_norm 2.7896 (3.2670) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][540/625] eta 0:00:22 lr 0.000032 wd 0.0500 time 0.2569 (0.2653) data time 0.0007 (0.0017) model time 0.2561 (0.2631) loss 6.2134 (5.4322) grad_norm 2.9444 (3.2536) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][550/625] eta 0:00:19 lr 0.000032 wd 0.0500 time 0.2589 (0.2651) data time 0.0005 (0.0017) model time 0.2584 (0.2629) loss 4.9075 (5.4304) grad_norm 2.8174 (3.2419) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][560/625] eta 0:00:17 lr 0.000032 wd 0.0500 time 0.2571 (0.2649) data time 0.0008 (0.0017) model time 0.2564 (0.2627) loss 4.9796 (5.4267) grad_norm 2.0947 (3.2339) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][570/625] eta 0:00:14 lr 0.000032 wd 0.0500 time 0.2509 (0.2648) data time 0.0010 (0.0017) model time 0.2499 (0.2626) loss 5.0593 (5.4319) grad_norm 2.3489 (3.2231) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][580/625] eta 0:00:11 lr 0.000032 wd 0.0500 time 0.2569 (0.2647) data time 0.0009 (0.0017) model time 0.2560 (0.2625) loss 5.6207 (5.4330) grad_norm 2.1505 (3.2161) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][590/625] eta 0:00:09 lr 0.000032 wd 0.0500 time 0.2526 (0.2645) data time 0.0010 (0.0017) model time 0.2516 (0.2623) loss 5.7579 (5.4352) grad_norm 3.2015 (3.2072) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][600/625] eta 0:00:06 lr 0.000032 wd 0.0500 time 0.2556 (0.2647) data time 0.0017 (0.0017) model time 0.2538 (0.2625) loss 5.6071 (5.4336) grad_norm 3.2591 (3.1945) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][610/625] eta 0:00:03 lr 0.000032 wd 0.0500 time 0.2522 (0.2645) data time 0.0004 (0.0016) model time 0.2518 (0.2624) loss 4.4792 (5.4256) grad_norm 2.0936 (3.1830) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][620/625] eta 0:00:01 lr 0.000032 wd 0.0500 time 0.2526 (0.2643) data time 0.0006 (0.0016) model time 0.2520 (0.2622) loss 6.1079 (5.4264) grad_norm 2.5644 (3.1722) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 10:59:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 285 training takes 0:02:45 [2024-08-04 10:59:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 10:59:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.490 (0.490) Loss 0.6074 (0.6074) Acc@1 90.186 (90.186) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:00:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.9077 (0.7203) Acc@1 81.494 (87.282) Acc@5 96.582 (97.860) Mem 9655MB [2024-08-04 11:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 0.9976 (0.8342) Acc@1 79.346 (84.277) Acc@5 95.654 (96.752) Mem 9655MB [2024-08-04 11:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.975 Acc@5 96.779 [2024-08-04 11:00:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.739 (0.739) Loss 0.5898 (0.5898) Acc@1 90.283 (90.283) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:00:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.125) Loss 0.8955 (0.7065) Acc@1 82.080 (87.194) Acc@5 96.631 (97.829) Mem 9655MB [2024-08-04 11:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 1.0020 (0.8239) Acc@1 79.053 (84.156) Acc@5 95.459 (96.717) Mem 9655MB [2024-08-04 11:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.809 Acc@5 96.741 [2024-08-04 11:00:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 11:00:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][0/625] eta 0:11:23 lr 0.000032 wd 0.0500 time 1.0934 (1.0934) data time 0.5369 (0.5369) model time 0.0000 (0.0000) loss 6.2442 (6.2442) grad_norm 1.8537 (1.8537) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][10/625] eta 0:03:24 lr 0.000032 wd 0.0500 time 0.2534 (0.3325) data time 0.0008 (0.0497) model time 0.0000 (0.0000) loss 6.4405 (5.5299) grad_norm 3.3022 (2.8434) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][20/625] eta 0:03:04 lr 0.000032 wd 0.0500 time 0.2541 (0.3056) data time 0.0010 (0.0265) model time 0.0000 (0.0000) loss 5.7687 (5.3981) grad_norm 3.2202 (2.9318) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][30/625] eta 0:02:52 lr 0.000032 wd 0.0500 time 0.2562 (0.2897) data time 0.0010 (0.0183) model time 0.0000 (0.0000) loss 5.0246 (5.2874) grad_norm 2.5555 (2.7948) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][40/625] eta 0:02:46 lr 0.000032 wd 0.0500 time 0.2610 (0.2845) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 4.9562 (5.2852) grad_norm 3.1077 (2.7578) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][50/625] eta 0:02:40 lr 0.000032 wd 0.0500 time 0.2548 (0.2789) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 5.0920 (5.2307) grad_norm 2.1196 (2.7855) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][60/625] eta 0:02:35 lr 0.000032 wd 0.0500 time 0.2531 (0.2757) data time 0.0007 (0.0097) model time 0.2523 (0.2586) loss 4.4262 (5.2406) grad_norm 1.9080 (2.6909) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][70/625] eta 0:02:31 lr 0.000032 wd 0.0500 time 0.2606 (0.2729) data time 0.0008 (0.0085) model time 0.2598 (0.2566) loss 4.6094 (5.2728) grad_norm 2.5877 (2.7668) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][80/625] eta 0:02:27 lr 0.000032 wd 0.0500 time 0.2549 (0.2708) data time 0.0010 (0.0076) model time 0.2539 (0.2560) loss 4.9005 (5.2925) grad_norm 2.1190 (2.7335) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][90/625] eta 0:02:24 lr 0.000032 wd 0.0500 time 0.2565 (0.2693) data time 0.0009 (0.0069) model time 0.2556 (0.2561) loss 5.0395 (5.3202) grad_norm 1.8959 (2.7076) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][100/625] eta 0:02:21 lr 0.000032 wd 0.0500 time 0.2517 (0.2697) data time 0.0008 (0.0063) model time 0.2508 (0.2593) loss 6.4761 (5.3331) grad_norm 2.0826 (2.8426) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][110/625] eta 0:02:19 lr 0.000032 wd 0.0500 time 0.2641 (0.2700) data time 0.0007 (0.0058) model time 0.2634 (0.2615) loss 4.9394 (5.3370) grad_norm 2.5067 (2.9055) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][120/625] eta 0:02:15 lr 0.000032 wd 0.0500 time 0.2604 (0.2689) data time 0.0008 (0.0054) model time 0.2596 (0.2607) loss 6.8818 (5.3615) grad_norm 3.4886 (2.8857) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][130/625] eta 0:02:12 lr 0.000032 wd 0.0500 time 0.2561 (0.2679) data time 0.0010 (0.0050) model time 0.2551 (0.2600) loss 5.6855 (5.3585) grad_norm 2.3644 (2.8925) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][140/625] eta 0:02:09 lr 0.000032 wd 0.0500 time 0.2644 (0.2671) data time 0.0009 (0.0047) model time 0.2635 (0.2595) loss 5.4153 (5.3487) grad_norm 1.7187 (2.8600) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][150/625] eta 0:02:07 lr 0.000032 wd 0.0500 time 0.2584 (0.2677) data time 0.0013 (0.0045) model time 0.2571 (0.2610) loss 5.6375 (5.3732) grad_norm 2.8787 (2.8925) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][160/625] eta 0:02:04 lr 0.000032 wd 0.0500 time 0.2578 (0.2670) data time 0.0008 (0.0043) model time 0.2571 (0.2605) loss 5.2453 (5.3865) grad_norm 1.9347 (2.8840) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][170/625] eta 0:02:01 lr 0.000032 wd 0.0500 time 0.2573 (0.2665) data time 0.0011 (0.0041) model time 0.2562 (0.2602) loss 6.1381 (5.3873) grad_norm 2.3512 (2.8785) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][180/625] eta 0:01:59 lr 0.000032 wd 0.0500 time 0.2549 (0.2677) data time 0.0009 (0.0039) model time 0.2540 (0.2624) loss 5.3363 (5.3913) grad_norm 6.0897 (2.8992) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][190/625] eta 0:01:56 lr 0.000032 wd 0.0500 time 0.2547 (0.2671) data time 0.0007 (0.0037) model time 0.2541 (0.2619) loss 5.1756 (5.4104) grad_norm 2.3102 (2.9050) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][200/625] eta 0:01:53 lr 0.000032 wd 0.0500 time 0.2603 (0.2665) data time 0.0008 (0.0036) model time 0.2595 (0.2614) loss 4.8511 (5.4117) grad_norm 1.5597 (2.8957) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:00:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][210/625] eta 0:01:50 lr 0.000032 wd 0.0500 time 0.2575 (0.2662) data time 0.0006 (0.0035) model time 0.2568 (0.2612) loss 4.4705 (5.4074) grad_norm 4.0911 (2.8931) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][220/625] eta 0:01:47 lr 0.000032 wd 0.0500 time 0.2603 (0.2658) data time 0.0006 (0.0034) model time 0.2597 (0.2609) loss 4.9971 (5.3972) grad_norm 2.7915 (2.9048) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][230/625] eta 0:01:45 lr 0.000032 wd 0.0500 time 0.2515 (0.2661) data time 0.0013 (0.0032) model time 0.2503 (0.2615) loss 5.2520 (5.3980) grad_norm 3.1116 (2.8968) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][240/625] eta 0:01:42 lr 0.000032 wd 0.0500 time 0.2567 (0.2664) data time 0.0006 (0.0031) model time 0.2561 (0.2621) loss 4.1036 (5.3785) grad_norm 1.7427 (2.8916) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][250/625] eta 0:01:40 lr 0.000032 wd 0.0500 time 0.2536 (0.2670) data time 0.0008 (0.0031) model time 0.2528 (0.2631) loss 6.2570 (5.3683) grad_norm 2.5897 (2.8954) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][260/625] eta 0:01:37 lr 0.000031 wd 0.0500 time 0.2588 (0.2666) data time 0.0006 (0.0030) model time 0.2582 (0.2627) loss 4.6893 (5.3709) grad_norm 2.5987 (2.9124) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][270/625] eta 0:01:34 lr 0.000031 wd 0.0500 time 0.2531 (0.2662) data time 0.0008 (0.0029) model time 0.2524 (0.2624) loss 5.9000 (5.3758) grad_norm 3.4362 (2.8974) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][280/625] eta 0:01:31 lr 0.000031 wd 0.0500 time 0.2569 (0.2659) data time 0.0006 (0.0028) model time 0.2562 (0.2621) loss 5.8631 (5.3732) grad_norm 8.9698 (2.9346) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][290/625] eta 0:01:28 lr 0.000031 wd 0.0500 time 0.2625 (0.2656) data time 0.0006 (0.0028) model time 0.2619 (0.2619) loss 5.0743 (5.3671) grad_norm 2.4556 (2.9232) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][300/625] eta 0:01:26 lr 0.000031 wd 0.0500 time 0.2566 (0.2657) data time 0.0011 (0.0027) model time 0.2555 (0.2622) loss 4.7403 (5.3592) grad_norm 4.1003 (2.9189) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][310/625] eta 0:01:23 lr 0.000031 wd 0.0500 time 0.2555 (0.2655) data time 0.0011 (0.0026) model time 0.2544 (0.2620) loss 5.8263 (5.3623) grad_norm 1.9371 (2.9401) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][320/625] eta 0:01:20 lr 0.000031 wd 0.0500 time 0.2532 (0.2653) data time 0.0007 (0.0026) model time 0.2525 (0.2618) loss 4.5849 (5.3621) grad_norm 2.3303 (2.9302) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][330/625] eta 0:01:18 lr 0.000031 wd 0.0500 time 0.2568 (0.2656) data time 0.0006 (0.0025) model time 0.2562 (0.2622) loss 5.0740 (5.3603) grad_norm 2.2470 (2.9223) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][340/625] eta 0:01:15 lr 0.000031 wd 0.0500 time 0.2517 (0.2653) data time 0.0009 (0.0025) model time 0.2508 (0.2620) loss 5.6196 (5.3603) grad_norm 2.8011 (3.0046) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][350/625] eta 0:01:12 lr 0.000031 wd 0.0500 time 0.2578 (0.2650) data time 0.0010 (0.0024) model time 0.2568 (0.2617) loss 6.1907 (5.3595) grad_norm 1.6676 (2.9946) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][360/625] eta 0:01:10 lr 0.000031 wd 0.0500 time 0.2553 (0.2651) data time 0.0007 (0.0024) model time 0.2545 (0.2619) loss 4.5345 (5.3558) grad_norm 4.5327 (2.9920) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][370/625] eta 0:01:07 lr 0.000031 wd 0.0500 time 0.2523 (0.2658) data time 0.0010 (0.0024) model time 0.2513 (0.2628) loss 5.9577 (5.3524) grad_norm 5.4245 (2.9874) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][380/625] eta 0:01:05 lr 0.000031 wd 0.0500 time 0.2559 (0.2656) data time 0.0008 (0.0023) model time 0.2551 (0.2626) loss 5.1482 (5.3504) grad_norm 1.6877 (2.9946) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][390/625] eta 0:01:02 lr 0.000031 wd 0.0500 time 0.2544 (0.2658) data time 0.0009 (0.0023) model time 0.2535 (0.2629) loss 4.7088 (5.3429) grad_norm 1.9911 (2.9955) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][400/625] eta 0:00:59 lr 0.000031 wd 0.0500 time 0.2521 (0.2660) data time 0.0010 (0.0023) model time 0.2511 (0.2632) loss 6.3541 (5.3510) grad_norm 3.0130 (2.9937) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][410/625] eta 0:00:57 lr 0.000031 wd 0.0500 time 0.2566 (0.2658) data time 0.0011 (0.0022) model time 0.2555 (0.2630) loss 5.2398 (5.3519) grad_norm 2.4661 (2.9777) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][420/625] eta 0:00:54 lr 0.000031 wd 0.0500 time 0.2567 (0.2660) data time 0.0008 (0.0022) model time 0.2560 (0.2633) loss 5.8547 (5.3543) grad_norm 1.9149 (2.9667) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:01:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][430/625] eta 0:00:51 lr 0.000031 wd 0.0500 time 0.2542 (0.2661) data time 0.0010 (0.0022) model time 0.2533 (0.2635) loss 5.8261 (5.3588) grad_norm 11.7697 (2.9766) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][440/625] eta 0:00:49 lr 0.000031 wd 0.0500 time 0.2538 (0.2659) data time 0.0008 (0.0021) model time 0.2530 (0.2633) loss 4.7101 (5.3616) grad_norm 2.8613 (2.9841) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][450/625] eta 0:00:46 lr 0.000031 wd 0.0500 time 0.2543 (0.2657) data time 0.0006 (0.0021) model time 0.2537 (0.2631) loss 6.2420 (5.3700) grad_norm 7.1395 (2.9895) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][460/625] eta 0:00:43 lr 0.000031 wd 0.0500 time 0.2554 (0.2655) data time 0.0010 (0.0021) model time 0.2544 (0.2629) loss 6.0913 (5.3809) grad_norm 2.4493 (2.9937) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][470/625] eta 0:00:41 lr 0.000031 wd 0.0500 time 0.2556 (0.2653) data time 0.0007 (0.0021) model time 0.2549 (0.2627) loss 5.3068 (5.3795) grad_norm 2.7588 (3.0039) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][480/625] eta 0:00:38 lr 0.000031 wd 0.0500 time 0.2573 (0.2651) data time 0.0007 (0.0021) model time 0.2567 (0.2625) loss 5.2125 (5.3763) grad_norm 1.6261 (2.9876) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][490/625] eta 0:00:35 lr 0.000031 wd 0.0500 time 0.2534 (0.2649) data time 0.0010 (0.0020) model time 0.2523 (0.2623) loss 5.1043 (5.3769) grad_norm 2.5705 (2.9777) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][500/625] eta 0:00:33 lr 0.000031 wd 0.0500 time 0.2558 (0.2648) data time 0.0007 (0.0020) model time 0.2551 (0.2622) loss 5.4961 (5.3786) grad_norm 2.5457 (2.9774) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][510/625] eta 0:00:30 lr 0.000031 wd 0.0500 time 0.2523 (0.2646) data time 0.0012 (0.0020) model time 0.2511 (0.2620) loss 5.2134 (5.3821) grad_norm 2.8785 (2.9816) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][520/625] eta 0:00:27 lr 0.000031 wd 0.0500 time 0.2565 (0.2644) data time 0.0010 (0.0020) model time 0.2555 (0.2619) loss 5.2825 (5.3835) grad_norm 2.1603 (2.9806) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][530/625] eta 0:00:25 lr 0.000031 wd 0.0500 time 0.2593 (0.2645) data time 0.0010 (0.0020) model time 0.2583 (0.2621) loss 4.9595 (5.3845) grad_norm 2.0754 (2.9834) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][540/625] eta 0:00:22 lr 0.000031 wd 0.0500 time 0.2579 (0.2644) data time 0.0006 (0.0019) model time 0.2573 (0.2619) loss 6.1958 (5.3853) grad_norm 2.6230 (3.0133) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][550/625] eta 0:00:19 lr 0.000031 wd 0.0500 time 0.2536 (0.2645) data time 0.0007 (0.0019) model time 0.2528 (0.2621) loss 4.8530 (5.3831) grad_norm 1.9552 (3.0046) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][560/625] eta 0:00:17 lr 0.000031 wd 0.0500 time 0.2593 (0.2647) data time 0.0008 (0.0019) model time 0.2586 (0.2623) loss 4.9196 (5.3853) grad_norm 1.4894 (2.9952) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][570/625] eta 0:00:14 lr 0.000031 wd 0.0500 time 0.2571 (0.2648) data time 0.0006 (0.0019) model time 0.2565 (0.2625) loss 5.8776 (5.3884) grad_norm 3.2738 (2.9900) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][580/625] eta 0:00:11 lr 0.000031 wd 0.0500 time 0.2555 (0.2649) data time 0.0013 (0.0019) model time 0.2542 (0.2626) loss 5.0925 (5.3897) grad_norm 2.8230 (2.9805) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][590/625] eta 0:00:09 lr 0.000031 wd 0.0500 time 0.2554 (0.2648) data time 0.0009 (0.0018) model time 0.2545 (0.2625) loss 5.7657 (5.3924) grad_norm 3.4765 (2.9855) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][600/625] eta 0:00:06 lr 0.000031 wd 0.0500 time 0.2549 (0.2646) data time 0.0010 (0.0018) model time 0.2539 (0.2623) loss 5.2858 (5.3951) grad_norm 1.7704 (2.9889) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][610/625] eta 0:00:03 lr 0.000031 wd 0.0500 time 0.2552 (0.2645) data time 0.0006 (0.0018) model time 0.2546 (0.2622) loss 5.1701 (5.3928) grad_norm 3.7111 (2.9920) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][620/625] eta 0:00:01 lr 0.000031 wd 0.0500 time 0.2522 (0.2643) data time 0.0004 (0.0018) model time 0.2518 (0.2620) loss 6.1405 (5.3956) grad_norm 2.5344 (2.9882) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 286 training takes 0:02:45 [2024-08-04 11:02:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:02:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:02:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.492 (0.492) Loss 0.6016 (0.6016) Acc@1 90.234 (90.234) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 11:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.096) Loss 0.9009 (0.7135) Acc@1 81.787 (87.260) Acc@5 96.777 (97.865) Mem 9655MB [2024-08-04 11:02:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 1.0049 (0.8300) Acc@1 78.955 (84.268) Acc@5 95.801 (96.775) Mem 9655MB [2024-08-04 11:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.963 Acc@5 96.783 [2024-08-04 11:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:02:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.803 (0.803) Loss 0.5903 (0.5903) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.130) Loss 0.8945 (0.7067) Acc@1 82.080 (87.211) Acc@5 96.582 (97.825) Mem 9655MB [2024-08-04 11:02:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 1.0010 (0.8238) Acc@1 79.102 (84.166) Acc@5 95.459 (96.717) Mem 9655MB [2024-08-04 11:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.737 [2024-08-04 11:02:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 11:02:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][0/625] eta 0:11:20 lr 0.000031 wd 0.0500 time 1.0886 (1.0886) data time 0.5230 (0.5230) model time 0.0000 (0.0000) loss 5.4433 (5.4433) grad_norm 3.4728 (3.4728) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][10/625] eta 0:03:23 lr 0.000030 wd 0.0500 time 0.2570 (0.3316) data time 0.0006 (0.0484) model time 0.0000 (0.0000) loss 5.9576 (5.5353) grad_norm 3.0836 (2.5041) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:02:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][20/625] eta 0:02:59 lr 0.000030 wd 0.0500 time 0.2553 (0.2961) data time 0.0009 (0.0258) model time 0.0000 (0.0000) loss 5.5572 (5.5214) grad_norm 1.9219 (2.5062) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][30/625] eta 0:02:48 lr 0.000030 wd 0.0500 time 0.2588 (0.2836) data time 0.0006 (0.0177) model time 0.0000 (0.0000) loss 5.7214 (5.5258) grad_norm 2.2541 (2.5491) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][40/625] eta 0:02:44 lr 0.000030 wd 0.0500 time 0.2512 (0.2814) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 4.4080 (5.5038) grad_norm 2.0154 (2.4377) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][50/625] eta 0:02:38 lr 0.000030 wd 0.0500 time 0.2532 (0.2764) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 5.8520 (5.5077) grad_norm 1.8206 (2.4323) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][60/625] eta 0:02:35 lr 0.000030 wd 0.0500 time 0.3795 (0.2752) data time 0.0008 (0.0094) model time 0.3788 (0.2682) loss 6.1448 (5.5185) grad_norm 2.9991 (2.5130) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][70/625] eta 0:02:32 lr 0.000030 wd 0.0500 time 0.2493 (0.2749) data time 0.0011 (0.0082) model time 0.2482 (0.2703) loss 5.3746 (5.4994) grad_norm 2.5660 (2.5325) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][80/625] eta 0:02:28 lr 0.000030 wd 0.0500 time 0.2607 (0.2728) data time 0.0009 (0.0073) model time 0.2598 (0.2658) loss 5.3173 (5.5400) grad_norm 4.8301 (2.7224) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][90/625] eta 0:02:25 lr 0.000030 wd 0.0500 time 0.2542 (0.2710) data time 0.0006 (0.0066) model time 0.2535 (0.2633) loss 5.2344 (5.4854) grad_norm 2.6310 (2.7567) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][100/625] eta 0:02:22 lr 0.000030 wd 0.0500 time 0.2583 (0.2706) data time 0.0008 (0.0061) model time 0.2575 (0.2639) loss 5.0970 (5.4741) grad_norm 1.9206 (2.6923) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][110/625] eta 0:02:18 lr 0.000030 wd 0.0500 time 0.2563 (0.2692) data time 0.0009 (0.0056) model time 0.2554 (0.2622) loss 6.0596 (5.4968) grad_norm 1.6940 (2.7103) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][120/625] eta 0:02:16 lr 0.000030 wd 0.0500 time 0.2531 (0.2695) data time 0.0009 (0.0052) model time 0.2523 (0.2636) loss 4.7176 (5.5035) grad_norm 3.1212 (2.6929) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][130/625] eta 0:02:12 lr 0.000030 wd 0.0500 time 0.2588 (0.2685) data time 0.0008 (0.0049) model time 0.2580 (0.2625) loss 4.9753 (5.4847) grad_norm 3.0241 (2.6826) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][140/625] eta 0:02:10 lr 0.000030 wd 0.0500 time 0.2574 (0.2690) data time 0.0008 (0.0046) model time 0.2565 (0.2639) loss 6.2237 (5.4968) grad_norm 2.3173 (2.7545) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][150/625] eta 0:02:07 lr 0.000030 wd 0.0500 time 0.2559 (0.2681) data time 0.0005 (0.0044) model time 0.2553 (0.2630) loss 5.4566 (5.4987) grad_norm 3.4612 (2.7932) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][160/625] eta 0:02:04 lr 0.000030 wd 0.0500 time 0.2569 (0.2674) data time 0.0006 (0.0041) model time 0.2562 (0.2623) loss 4.8735 (5.4893) grad_norm 2.3084 (2.7999) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][170/625] eta 0:02:01 lr 0.000030 wd 0.0500 time 0.2543 (0.2667) data time 0.0011 (0.0040) model time 0.2532 (0.2616) loss 4.7664 (5.4879) grad_norm 1.8959 (2.7963) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][180/625] eta 0:01:58 lr 0.000030 wd 0.0500 time 0.2577 (0.2667) data time 0.0010 (0.0038) model time 0.2567 (0.2620) loss 6.1309 (5.4923) grad_norm 2.1754 (2.8496) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][190/625] eta 0:01:55 lr 0.000030 wd 0.0500 time 0.2707 (0.2662) data time 0.0015 (0.0036) model time 0.2693 (0.2616) loss 5.4974 (5.4948) grad_norm 3.0124 (2.8481) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][200/625] eta 0:01:53 lr 0.000030 wd 0.0500 time 0.2589 (0.2666) data time 0.0008 (0.0035) model time 0.2581 (0.2623) loss 4.6600 (5.5040) grad_norm 2.2859 (2.8202) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][210/625] eta 0:01:50 lr 0.000030 wd 0.0500 time 0.2715 (0.2661) data time 0.0007 (0.0034) model time 0.2708 (0.2620) loss 6.4926 (5.5013) grad_norm 7.5851 (2.8428) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][220/625] eta 0:01:47 lr 0.000030 wd 0.0500 time 0.2614 (0.2658) data time 0.0008 (0.0033) model time 0.2607 (0.2617) loss 5.1215 (5.5044) grad_norm 2.4193 (2.8495) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][230/625] eta 0:01:44 lr 0.000030 wd 0.0500 time 0.2531 (0.2654) data time 0.0007 (0.0032) model time 0.2524 (0.2613) loss 6.0329 (5.5032) grad_norm 2.9399 (2.8601) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][240/625] eta 0:01:42 lr 0.000030 wd 0.0500 time 0.2565 (0.2650) data time 0.0011 (0.0031) model time 0.2554 (0.2610) loss 5.4743 (5.5023) grad_norm 2.0280 (2.8659) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:03:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][250/625] eta 0:01:39 lr 0.000030 wd 0.0500 time 0.2540 (0.2646) data time 0.0007 (0.0030) model time 0.2533 (0.2607) loss 5.1667 (5.4944) grad_norm 1.7801 (2.8928) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][260/625] eta 0:01:36 lr 0.000030 wd 0.0500 time 0.2541 (0.2643) data time 0.0012 (0.0029) model time 0.2529 (0.2604) loss 6.3952 (5.4866) grad_norm 1.9520 (2.8683) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][270/625] eta 0:01:33 lr 0.000030 wd 0.0500 time 0.2526 (0.2640) data time 0.0007 (0.0028) model time 0.2519 (0.2602) loss 4.9899 (5.4871) grad_norm 3.2340 (2.8506) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][280/625] eta 0:01:30 lr 0.000030 wd 0.0500 time 0.2526 (0.2637) data time 0.0008 (0.0028) model time 0.2518 (0.2600) loss 5.2590 (5.4874) grad_norm 2.3210 (2.8268) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][290/625] eta 0:01:28 lr 0.000030 wd 0.0500 time 0.2554 (0.2635) data time 0.0009 (0.0027) model time 0.2546 (0.2598) loss 4.4288 (5.4822) grad_norm 2.2504 (2.8764) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][300/625] eta 0:01:25 lr 0.000030 wd 0.0500 time 0.3581 (0.2636) data time 0.0011 (0.0027) model time 0.3570 (0.2601) loss 5.2354 (5.4889) grad_norm 3.5024 (2.8667) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][310/625] eta 0:01:22 lr 0.000030 wd 0.0500 time 0.2564 (0.2634) data time 0.0007 (0.0026) model time 0.2557 (0.2599) loss 5.0096 (5.4734) grad_norm 2.2098 (2.8746) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][320/625] eta 0:01:20 lr 0.000030 wd 0.0500 time 0.2562 (0.2632) data time 0.0007 (0.0025) model time 0.2555 (0.2597) loss 4.6437 (5.4731) grad_norm 2.0078 (2.8631) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][330/625] eta 0:01:17 lr 0.000030 wd 0.0500 time 0.2546 (0.2629) data time 0.0007 (0.0025) model time 0.2539 (0.2595) loss 5.6560 (5.4658) grad_norm 2.1314 (2.8505) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][340/625] eta 0:01:15 lr 0.000030 wd 0.0500 time 0.2550 (0.2633) data time 0.0006 (0.0024) model time 0.2544 (0.2601) loss 4.7953 (5.4627) grad_norm 3.1581 (2.8514) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][350/625] eta 0:01:12 lr 0.000030 wd 0.0500 time 0.2568 (0.2637) data time 0.0008 (0.0024) model time 0.2561 (0.2606) loss 5.8288 (5.4594) grad_norm 2.1748 (2.8538) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][360/625] eta 0:01:09 lr 0.000030 wd 0.0500 time 0.2552 (0.2635) data time 0.0007 (0.0024) model time 0.2545 (0.2604) loss 5.7033 (5.4542) grad_norm 3.7393 (2.8580) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][370/625] eta 0:01:07 lr 0.000030 wd 0.0500 time 0.2635 (0.2638) data time 0.0012 (0.0023) model time 0.2623 (0.2609) loss 5.9977 (5.4550) grad_norm 2.4348 (2.8591) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][380/625] eta 0:01:04 lr 0.000030 wd 0.0500 time 0.2557 (0.2636) data time 0.0017 (0.0023) model time 0.2539 (0.2608) loss 5.6390 (5.4518) grad_norm 1.8132 (2.8548) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][390/625] eta 0:01:01 lr 0.000030 wd 0.0500 time 0.2530 (0.2635) data time 0.0009 (0.0023) model time 0.2521 (0.2606) loss 6.1333 (5.4527) grad_norm 2.3691 (2.9252) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][400/625] eta 0:00:59 lr 0.000030 wd 0.0500 time 0.2551 (0.2633) data time 0.0009 (0.0022) model time 0.2542 (0.2605) loss 5.9604 (5.4430) grad_norm 2.6522 (2.9243) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:04:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][410/625] eta 0:00:56 lr 0.000029 wd 0.0500 time 0.2563 (0.2631) data time 0.0008 (0.0022) model time 0.2555 (0.2603) loss 5.7769 (5.4413) grad_norm 2.5581 (2.9228) loss_scale 256.0000 (131.1144) mem 9655MB [2024-08-04 11:04:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][420/625] eta 0:00:53 lr 0.000029 wd 0.0500 time 0.2541 (0.2634) data time 0.0007 (0.0022) model time 0.2534 (0.2607) loss 5.5711 (5.4418) grad_norm 1.8871 (2.9346) loss_scale 256.0000 (134.0808) mem 9655MB [2024-08-04 11:04:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][430/625] eta 0:00:51 lr 0.000029 wd 0.0500 time 0.2558 (0.2632) data time 0.0009 (0.0021) model time 0.2549 (0.2606) loss 4.8427 (5.4440) grad_norm 2.2860 (2.9321) loss_scale 256.0000 (136.9095) mem 9655MB [2024-08-04 11:04:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][440/625] eta 0:00:48 lr 0.000029 wd 0.0500 time 0.2588 (0.2635) data time 0.0006 (0.0021) model time 0.2582 (0.2609) loss 6.0963 (5.4486) grad_norm 2.3652 (2.9556) loss_scale 256.0000 (139.6100) mem 9655MB [2024-08-04 11:04:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][450/625] eta 0:00:46 lr 0.000029 wd 0.0500 time 0.2567 (0.2636) data time 0.0005 (0.0021) model time 0.2562 (0.2611) loss 5.3510 (5.4457) grad_norm 2.5182 (2.9628) loss_scale 256.0000 (142.1907) mem 9655MB [2024-08-04 11:04:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][460/625] eta 0:00:43 lr 0.000029 wd 0.0500 time 0.2519 (0.2639) data time 0.0008 (0.0020) model time 0.2511 (0.2615) loss 4.7068 (5.4426) grad_norm 2.5178 (2.9767) loss_scale 256.0000 (144.6594) mem 9655MB [2024-08-04 11:04:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][470/625] eta 0:00:40 lr 0.000029 wd 0.0500 time 0.2562 (0.2638) data time 0.0009 (0.0020) model time 0.2553 (0.2613) loss 5.2753 (5.4392) grad_norm 2.7300 (2.9944) loss_scale 256.0000 (147.0234) mem 9655MB [2024-08-04 11:05:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][480/625] eta 0:00:38 lr 0.000029 wd 0.0500 time 0.2570 (0.2644) data time 0.0006 (0.0020) model time 0.2564 (0.2620) loss 5.5480 (5.4396) grad_norm 2.6768 (2.9877) loss_scale 256.0000 (149.2890) mem 9655MB [2024-08-04 11:05:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][490/625] eta 0:00:35 lr 0.000029 wd 0.0500 time 0.2542 (0.2642) data time 0.0007 (0.0020) model time 0.2535 (0.2619) loss 4.4669 (5.4393) grad_norm 3.4583 (2.9806) loss_scale 256.0000 (151.4623) mem 9655MB [2024-08-04 11:05:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][500/625] eta 0:00:33 lr 0.000029 wd 0.0500 time 0.2533 (0.2641) data time 0.0010 (0.0020) model time 0.2523 (0.2617) loss 4.7067 (5.4391) grad_norm 2.0465 (2.9817) loss_scale 256.0000 (153.5489) mem 9655MB [2024-08-04 11:05:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][510/625] eta 0:00:30 lr 0.000029 wd 0.0500 time 0.2533 (0.2643) data time 0.0009 (0.0019) model time 0.2524 (0.2620) loss 4.7005 (5.4386) grad_norm 4.3979 (2.9808) loss_scale 256.0000 (155.5538) mem 9655MB [2024-08-04 11:05:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][520/625] eta 0:00:27 lr 0.000029 wd 0.0500 time 0.2562 (0.2642) data time 0.0008 (0.0019) model time 0.2553 (0.2619) loss 6.2007 (5.4423) grad_norm 2.5943 (2.9751) loss_scale 256.0000 (157.4818) mem 9655MB [2024-08-04 11:05:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][530/625] eta 0:00:25 lr 0.000029 wd 0.0500 time 0.2561 (0.2640) data time 0.0009 (0.0019) model time 0.2552 (0.2618) loss 5.1353 (5.4420) grad_norm 2.5968 (2.9674) loss_scale 256.0000 (159.3371) mem 9655MB [2024-08-04 11:05:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][540/625] eta 0:00:22 lr 0.000029 wd 0.0500 time 0.2624 (0.2639) data time 0.0008 (0.0019) model time 0.2617 (0.2617) loss 6.0632 (5.4421) grad_norm 2.2121 (2.9616) loss_scale 256.0000 (161.1238) mem 9655MB [2024-08-04 11:05:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][550/625] eta 0:00:19 lr 0.000029 wd 0.0500 time 0.2504 (0.2641) data time 0.0010 (0.0019) model time 0.2495 (0.2619) loss 5.1422 (5.4408) grad_norm 1.7761 (2.9526) loss_scale 256.0000 (162.8457) mem 9655MB [2024-08-04 11:05:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][560/625] eta 0:00:17 lr 0.000029 wd 0.0500 time 0.2550 (0.2640) data time 0.0009 (0.0019) model time 0.2541 (0.2618) loss 5.4730 (5.4396) grad_norm 6.5584 (2.9754) loss_scale 256.0000 (164.5062) mem 9655MB [2024-08-04 11:05:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][570/625] eta 0:00:14 lr 0.000029 wd 0.0500 time 0.2513 (0.2638) data time 0.0008 (0.0018) model time 0.2505 (0.2616) loss 4.9836 (5.4394) grad_norm 2.6127 (2.9681) loss_scale 256.0000 (166.1086) mem 9655MB [2024-08-04 11:05:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][580/625] eta 0:00:11 lr 0.000029 wd 0.0500 time 0.2580 (0.2637) data time 0.0008 (0.0018) model time 0.2572 (0.2615) loss 5.6283 (5.4366) grad_norm 2.0655 (2.9903) loss_scale 256.0000 (167.6558) mem 9655MB [2024-08-04 11:05:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][590/625] eta 0:00:09 lr 0.000029 wd 0.0500 time 0.2577 (0.2638) data time 0.0012 (0.0018) model time 0.2565 (0.2617) loss 4.8342 (5.4392) grad_norm 2.4980 (2.9835) loss_scale 256.0000 (169.1506) mem 9655MB [2024-08-04 11:05:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][600/625] eta 0:00:06 lr 0.000029 wd 0.0500 time 0.2585 (0.2639) data time 0.0011 (0.0018) model time 0.2574 (0.2619) loss 5.8280 (5.4418) grad_norm 2.6181 (2.9758) loss_scale 256.0000 (170.5957) mem 9655MB [2024-08-04 11:05:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][610/625] eta 0:00:03 lr 0.000029 wd 0.0500 time 0.2534 (0.2638) data time 0.0004 (0.0018) model time 0.2530 (0.2617) loss 4.7124 (5.4372) grad_norm 3.1889 (2.9812) loss_scale 256.0000 (171.9935) mem 9655MB [2024-08-04 11:05:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][620/625] eta 0:00:01 lr 0.000029 wd 0.0500 time 0.2530 (0.2636) data time 0.0004 (0.0018) model time 0.2527 (0.2616) loss 6.5196 (5.4389) grad_norm 2.9087 (2.9768) loss_scale 256.0000 (173.3462) mem 9655MB [2024-08-04 11:05:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 287 training takes 0:02:44 [2024-08-04 11:05:37 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:05:38 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:05:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.5991 (0.5991) Acc@1 90.234 (90.234) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:05:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8979 (0.7109) Acc@1 81.543 (87.243) Acc@5 96.826 (97.887) Mem 9655MB [2024-08-04 11:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9976 (0.8249) Acc@1 78.760 (84.256) Acc@5 95.752 (96.794) Mem 9655MB [2024-08-04 11:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.951 Acc@5 96.813 [2024-08-04 11:05:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.766 (0.766) Loss 0.5913 (0.5913) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:05:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8936 (0.7067) Acc@1 82.178 (87.229) Acc@5 96.533 (97.820) Mem 9655MB [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 0.9995 (0.8235) Acc@1 78.955 (84.166) Acc@5 95.410 (96.703) Mem 9655MB [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.829 Acc@5 96.729 [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.83% [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:05:42 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:05:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][0/625] eta 0:07:12 lr 0.000029 wd 0.0500 time 0.6926 (0.6926) data time 0.4561 (0.4561) model time 0.0000 (0.0000) loss 4.1837 (4.1837) grad_norm 2.5768 (2.5768) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:05:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][10/625] eta 0:03:13 lr 0.000029 wd 0.0500 time 0.2549 (0.3144) data time 0.0007 (0.0423) model time 0.0000 (0.0000) loss 4.8267 (5.4888) grad_norm 2.1489 (4.0996) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:05:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][20/625] eta 0:02:57 lr 0.000029 wd 0.0500 time 0.2555 (0.2932) data time 0.0009 (0.0225) model time 0.0000 (0.0000) loss 4.5474 (5.5639) grad_norm 2.4085 (3.9008) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:05:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][30/625] eta 0:02:47 lr 0.000029 wd 0.0500 time 0.2584 (0.2817) data time 0.0008 (0.0155) model time 0.0000 (0.0000) loss 5.7479 (5.5731) grad_norm 2.3727 (3.6946) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:05:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][40/625] eta 0:02:43 lr 0.000029 wd 0.0500 time 0.2595 (0.2790) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 5.6229 (5.4677) grad_norm 2.6199 (3.6384) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:05:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][50/625] eta 0:02:43 lr 0.000029 wd 0.0500 time 0.2562 (0.2841) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 5.5967 (5.4508) grad_norm 2.2152 (3.4527) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][60/625] eta 0:02:37 lr 0.000029 wd 0.0500 time 0.2596 (0.2794) data time 0.0010 (0.0084) model time 0.2586 (0.2543) loss 5.8572 (5.4657) grad_norm 2.2366 (3.2749) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][70/625] eta 0:02:34 lr 0.000029 wd 0.0500 time 0.2560 (0.2787) data time 0.0006 (0.0073) model time 0.2554 (0.2639) loss 5.9294 (5.4705) grad_norm 5.1312 (3.2794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][80/625] eta 0:02:30 lr 0.000029 wd 0.0500 time 0.2569 (0.2759) data time 0.0009 (0.0065) model time 0.2560 (0.2611) loss 5.4843 (5.4865) grad_norm 2.1509 (3.1985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][90/625] eta 0:02:26 lr 0.000029 wd 0.0500 time 0.2564 (0.2738) data time 0.0011 (0.0059) model time 0.2553 (0.2598) loss 5.1246 (5.4837) grad_norm 1.7860 (3.1647) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][100/625] eta 0:02:22 lr 0.000029 wd 0.0500 time 0.2578 (0.2722) data time 0.0008 (0.0054) model time 0.2570 (0.2591) loss 5.0395 (5.4746) grad_norm 1.8001 (3.0937) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][110/625] eta 0:02:21 lr 0.000029 wd 0.0500 time 0.2577 (0.2740) data time 0.0008 (0.0050) model time 0.2570 (0.2644) loss 5.1853 (5.4479) grad_norm 3.1054 (3.0488) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][120/625] eta 0:02:17 lr 0.000029 wd 0.0500 time 0.2549 (0.2725) data time 0.0010 (0.0047) model time 0.2539 (0.2631) loss 5.2558 (5.4612) grad_norm 2.3295 (2.9985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][130/625] eta 0:02:14 lr 0.000029 wd 0.0500 time 0.2554 (0.2713) data time 0.0006 (0.0044) model time 0.2548 (0.2622) loss 4.5183 (5.4585) grad_norm 2.5189 (2.9849) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][140/625] eta 0:02:11 lr 0.000029 wd 0.0500 time 0.2554 (0.2703) data time 0.0009 (0.0041) model time 0.2545 (0.2615) loss 5.9262 (5.4855) grad_norm 2.3833 (2.9457) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][150/625] eta 0:02:07 lr 0.000029 wd 0.0500 time 0.2500 (0.2694) data time 0.0008 (0.0039) model time 0.2492 (0.2609) loss 6.1131 (5.5058) grad_norm 2.2010 (2.9224) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][160/625] eta 0:02:04 lr 0.000029 wd 0.0500 time 0.2553 (0.2686) data time 0.0009 (0.0037) model time 0.2544 (0.2604) loss 4.4944 (5.4804) grad_norm 1.6418 (2.9103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][170/625] eta 0:02:02 lr 0.000029 wd 0.0500 time 0.2548 (0.2689) data time 0.0009 (0.0036) model time 0.2539 (0.2615) loss 5.4907 (5.4772) grad_norm 2.8170 (2.9193) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][180/625] eta 0:01:59 lr 0.000029 wd 0.0500 time 0.2600 (0.2682) data time 0.0010 (0.0034) model time 0.2590 (0.2610) loss 5.3206 (5.4787) grad_norm 3.3432 (2.9422) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][190/625] eta 0:01:56 lr 0.000029 wd 0.0500 time 0.2565 (0.2675) data time 0.0006 (0.0033) model time 0.2558 (0.2605) loss 5.4882 (5.4691) grad_norm 3.1019 (2.9727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][200/625] eta 0:01:53 lr 0.000028 wd 0.0500 time 0.2570 (0.2670) data time 0.0009 (0.0032) model time 0.2561 (0.2603) loss 4.6728 (5.4595) grad_norm 3.0087 (2.9592) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][210/625] eta 0:01:50 lr 0.000028 wd 0.0500 time 0.2557 (0.2665) data time 0.0009 (0.0031) model time 0.2548 (0.2599) loss 4.6462 (5.4495) grad_norm 3.5590 (2.9581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][220/625] eta 0:01:47 lr 0.000028 wd 0.0500 time 0.2563 (0.2661) data time 0.0012 (0.0030) model time 0.2551 (0.2597) loss 4.7148 (5.4505) grad_norm 2.8948 (2.9533) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][230/625] eta 0:01:44 lr 0.000028 wd 0.0500 time 0.2565 (0.2656) data time 0.0010 (0.0029) model time 0.2555 (0.2595) loss 4.7986 (5.4485) grad_norm 2.1483 (2.9548) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][240/625] eta 0:01:42 lr 0.000028 wd 0.0500 time 0.2583 (0.2652) data time 0.0009 (0.0028) model time 0.2574 (0.2592) loss 5.5565 (5.4452) grad_norm 2.2348 (2.9673) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][250/625] eta 0:01:39 lr 0.000028 wd 0.0500 time 0.2547 (0.2654) data time 0.0009 (0.0028) model time 0.2538 (0.2596) loss 4.1266 (5.4345) grad_norm 2.1497 (2.9608) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][260/625] eta 0:01:37 lr 0.000028 wd 0.0500 time 0.2547 (0.2658) data time 0.0012 (0.0027) model time 0.2536 (0.2603) loss 5.0912 (5.4313) grad_norm 4.7209 (2.9549) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][270/625] eta 0:01:34 lr 0.000028 wd 0.0500 time 0.2712 (0.2662) data time 0.0008 (0.0026) model time 0.2704 (0.2610) loss 5.0661 (5.4254) grad_norm 6.5366 (2.9702) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:06:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][280/625] eta 0:01:31 lr 0.000028 wd 0.0500 time 0.2554 (0.2658) data time 0.0018 (0.0026) model time 0.2536 (0.2608) loss 5.0798 (5.4137) grad_norm 5.0861 (3.0392) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][290/625] eta 0:01:28 lr 0.000028 wd 0.0500 time 0.2583 (0.2656) data time 0.0007 (0.0025) model time 0.2577 (0.2607) loss 5.2670 (5.4069) grad_norm 3.4111 (3.0749) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][300/625] eta 0:01:26 lr 0.000028 wd 0.0500 time 0.2549 (0.2658) data time 0.0010 (0.0025) model time 0.2539 (0.2611) loss 4.9746 (5.4015) grad_norm 1.7449 (3.0718) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][310/625] eta 0:01:23 lr 0.000028 wd 0.0500 time 0.2542 (0.2655) data time 0.0008 (0.0024) model time 0.2535 (0.2609) loss 5.7299 (5.3984) grad_norm 4.8691 (3.0742) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][320/625] eta 0:01:20 lr 0.000028 wd 0.0500 time 0.2573 (0.2653) data time 0.0007 (0.0024) model time 0.2566 (0.2608) loss 6.2409 (5.4033) grad_norm 3.6613 (3.0587) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][330/625] eta 0:01:18 lr 0.000028 wd 0.0500 time 0.2565 (0.2650) data time 0.0009 (0.0023) model time 0.2556 (0.2606) loss 5.1369 (5.4029) grad_norm 4.2265 (3.0671) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][340/625] eta 0:01:15 lr 0.000028 wd 0.0500 time 0.2566 (0.2648) data time 0.0011 (0.0023) model time 0.2555 (0.2604) loss 4.3912 (5.4059) grad_norm 1.6209 (3.0421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][350/625] eta 0:01:12 lr 0.000028 wd 0.0500 time 0.2541 (0.2646) data time 0.0009 (0.0023) model time 0.2533 (0.2603) loss 5.3224 (5.3999) grad_norm 2.0264 (3.0306) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][360/625] eta 0:01:10 lr 0.000028 wd 0.0500 time 0.2592 (0.2644) data time 0.0008 (0.0022) model time 0.2584 (0.2601) loss 5.8640 (5.4047) grad_norm 3.5028 (3.0657) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][370/625] eta 0:01:07 lr 0.000028 wd 0.0500 time 0.2678 (0.2647) data time 0.0010 (0.0022) model time 0.2668 (0.2606) loss 6.2291 (5.4052) grad_norm 2.2858 (3.0787) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][380/625] eta 0:01:04 lr 0.000028 wd 0.0500 time 0.2598 (0.2645) data time 0.0008 (0.0022) model time 0.2590 (0.2605) loss 6.3677 (5.4179) grad_norm 3.1502 (3.0735) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][390/625] eta 0:01:02 lr 0.000028 wd 0.0500 time 0.2592 (0.2643) data time 0.0007 (0.0021) model time 0.2585 (0.2603) loss 5.7025 (5.4180) grad_norm 2.2387 (3.0825) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][400/625] eta 0:00:59 lr 0.000028 wd 0.0500 time 0.2580 (0.2645) data time 0.0009 (0.0021) model time 0.2571 (0.2607) loss 4.7072 (5.4199) grad_norm 2.0632 (3.0742) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][410/625] eta 0:00:56 lr 0.000028 wd 0.0500 time 0.2542 (0.2643) data time 0.0007 (0.0021) model time 0.2534 (0.2605) loss 4.2662 (5.4175) grad_norm 1.8450 (3.0654) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][420/625] eta 0:00:54 lr 0.000028 wd 0.0500 time 0.2550 (0.2641) data time 0.0009 (0.0020) model time 0.2541 (0.2604) loss 5.9287 (5.4093) grad_norm 2.7260 (3.0659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][430/625] eta 0:00:51 lr 0.000028 wd 0.0500 time 0.2521 (0.2640) data time 0.0010 (0.0020) model time 0.2511 (0.2603) loss 5.4091 (5.4052) grad_norm 3.1301 (3.0646) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][440/625] eta 0:00:48 lr 0.000028 wd 0.0500 time 0.2702 (0.2643) data time 0.0010 (0.0020) model time 0.2693 (0.2608) loss 6.1833 (5.4040) grad_norm 3.2527 (3.0664) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][450/625] eta 0:00:46 lr 0.000028 wd 0.0500 time 0.2552 (0.2641) data time 0.0008 (0.0020) model time 0.2545 (0.2606) loss 4.6773 (5.4016) grad_norm 2.0368 (3.0659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][460/625] eta 0:00:43 lr 0.000028 wd 0.0500 time 0.2569 (0.2640) data time 0.0009 (0.0019) model time 0.2560 (0.2605) loss 5.5476 (5.4061) grad_norm 3.5292 (3.0591) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][470/625] eta 0:00:40 lr 0.000028 wd 0.0500 time 0.2564 (0.2638) data time 0.0009 (0.0019) model time 0.2555 (0.2604) loss 6.0563 (5.4086) grad_norm 2.6185 (3.0511) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][480/625] eta 0:00:38 lr 0.000028 wd 0.0500 time 0.2526 (0.2636) data time 0.0010 (0.0019) model time 0.2516 (0.2602) loss 6.0872 (5.4044) grad_norm 1.9852 (3.0574) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][490/625] eta 0:00:35 lr 0.000028 wd 0.0500 time 0.2572 (0.2635) data time 0.0016 (0.0019) model time 0.2556 (0.2601) loss 5.8304 (5.4085) grad_norm 2.7636 (3.0611) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][500/625] eta 0:00:32 lr 0.000028 wd 0.0500 time 0.2538 (0.2633) data time 0.0008 (0.0019) model time 0.2530 (0.2600) loss 5.1942 (5.4064) grad_norm 2.0403 (3.0543) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:07:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][510/625] eta 0:00:30 lr 0.000028 wd 0.0500 time 0.2532 (0.2638) data time 0.0008 (0.0018) model time 0.2524 (0.2606) loss 6.2402 (5.4047) grad_norm 1.9598 (3.0450) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][520/625] eta 0:00:27 lr 0.000028 wd 0.0500 time 0.2610 (0.2637) data time 0.0009 (0.0018) model time 0.2601 (0.2605) loss 5.3306 (5.4084) grad_norm 2.3156 (3.0422) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][530/625] eta 0:00:25 lr 0.000028 wd 0.0500 time 0.2530 (0.2639) data time 0.0009 (0.0018) model time 0.2521 (0.2608) loss 5.5990 (5.4030) grad_norm 2.6167 (3.0421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][540/625] eta 0:00:22 lr 0.000028 wd 0.0500 time 0.2567 (0.2638) data time 0.0010 (0.0018) model time 0.2557 (0.2607) loss 5.6400 (5.4044) grad_norm 1.7190 (3.0652) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][550/625] eta 0:00:19 lr 0.000028 wd 0.0500 time 0.2533 (0.2636) data time 0.0007 (0.0018) model time 0.2526 (0.2606) loss 4.3139 (5.3984) grad_norm 4.4857 (3.0648) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][560/625] eta 0:00:17 lr 0.000028 wd 0.0500 time 0.2568 (0.2635) data time 0.0008 (0.0018) model time 0.2560 (0.2605) loss 6.0617 (5.3979) grad_norm 2.3062 (3.0581) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][570/625] eta 0:00:14 lr 0.000028 wd 0.0500 time 0.2545 (0.2634) data time 0.0008 (0.0018) model time 0.2537 (0.2604) loss 6.6085 (5.4036) grad_norm 2.0008 (3.0483) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][580/625] eta 0:00:11 lr 0.000028 wd 0.0500 time 0.2556 (0.2632) data time 0.0007 (0.0017) model time 0.2549 (0.2603) loss 5.9268 (5.4028) grad_norm 2.5574 (3.0362) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][590/625] eta 0:00:09 lr 0.000028 wd 0.0500 time 0.2503 (0.2631) data time 0.0009 (0.0017) model time 0.2493 (0.2602) loss 5.3303 (5.4027) grad_norm 2.5756 (3.0290) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][600/625] eta 0:00:06 lr 0.000028 wd 0.0500 time 0.2609 (0.2633) data time 0.0009 (0.0017) model time 0.2600 (0.2604) loss 5.6144 (5.4007) grad_norm 5.2393 (3.0268) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][610/625] eta 0:00:03 lr 0.000028 wd 0.0500 time 0.2521 (0.2632) data time 0.0004 (0.0017) model time 0.2517 (0.2603) loss 6.3818 (5.3978) grad_norm 2.5429 (3.0236) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][620/625] eta 0:00:01 lr 0.000028 wd 0.0500 time 0.2506 (0.2633) data time 0.0007 (0.0017) model time 0.2499 (0.2605) loss 4.8082 (5.3935) grad_norm 2.8020 (3.0245) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 288 training takes 0:02:44 [2024-08-04 11:08:27 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:08:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:08:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6011 (0.6011) Acc@1 90.527 (90.527) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.096) Loss 0.8989 (0.7124) Acc@1 81.787 (87.336) Acc@5 96.777 (97.865) Mem 9655MB [2024-08-04 11:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9990 (0.8262) Acc@1 79.199 (84.284) Acc@5 95.410 (96.798) Mem 9655MB [2024-08-04 11:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.975 Acc@5 96.829 [2024-08-04 11:08:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:08:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.724 (0.724) Loss 0.5913 (0.5913) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8936 (0.7068) Acc@1 82.324 (87.238) Acc@5 96.533 (97.829) Mem 9655MB [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 0.9995 (0.8236) Acc@1 78.906 (84.175) Acc@5 95.459 (96.710) Mem 9655MB [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.833 Acc@5 96.739 [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.83% [2024-08-04 11:08:31 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:08:32 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:08:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][0/625] eta 0:07:21 lr 0.000028 wd 0.0500 time 0.7062 (0.7062) data time 0.4599 (0.4599) model time 0.0000 (0.0000) loss 5.6942 (5.6942) grad_norm 2.2066 (2.2066) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][10/625] eta 0:03:02 lr 0.000028 wd 0.0500 time 0.2553 (0.2961) data time 0.0011 (0.0427) model time 0.0000 (0.0000) loss 5.0145 (5.5355) grad_norm 1.9918 (3.0590) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][20/625] eta 0:02:47 lr 0.000027 wd 0.0500 time 0.2530 (0.2766) data time 0.0007 (0.0228) model time 0.0000 (0.0000) loss 5.5381 (5.5503) grad_norm 2.5201 (2.9340) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][30/625] eta 0:02:44 lr 0.000027 wd 0.0500 time 0.2561 (0.2761) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 4.8829 (5.4535) grad_norm 4.9337 (3.1413) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][40/625] eta 0:02:40 lr 0.000027 wd 0.0500 time 0.2538 (0.2745) data time 0.0010 (0.0121) model time 0.0000 (0.0000) loss 5.0969 (5.4533) grad_norm 2.8357 (3.0945) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][50/625] eta 0:02:37 lr 0.000027 wd 0.0500 time 0.2526 (0.2747) data time 0.0010 (0.0099) model time 0.0000 (0.0000) loss 5.0738 (5.3990) grad_norm 3.4485 (3.1100) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][60/625] eta 0:02:34 lr 0.000027 wd 0.0500 time 0.2535 (0.2734) data time 0.0007 (0.0084) model time 0.2527 (0.2658) loss 4.9937 (5.3859) grad_norm 3.9767 (3.1118) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][70/625] eta 0:02:31 lr 0.000027 wd 0.0500 time 0.2560 (0.2735) data time 0.0011 (0.0074) model time 0.2549 (0.2694) loss 5.9190 (5.3658) grad_norm 4.5312 (3.0906) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][80/625] eta 0:02:27 lr 0.000027 wd 0.0500 time 0.2576 (0.2714) data time 0.0009 (0.0066) model time 0.2568 (0.2647) loss 4.3504 (5.3474) grad_norm 2.0388 (3.0267) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][90/625] eta 0:02:24 lr 0.000027 wd 0.0500 time 0.2559 (0.2697) data time 0.0010 (0.0060) model time 0.2549 (0.2623) loss 4.9063 (5.3689) grad_norm 12.1997 (3.1682) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:08:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][100/625] eta 0:02:20 lr 0.000027 wd 0.0500 time 0.2546 (0.2685) data time 0.0008 (0.0055) model time 0.2538 (0.2612) loss 5.4070 (5.3979) grad_norm 2.9139 (3.1475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][110/625] eta 0:02:18 lr 0.000027 wd 0.0500 time 0.2557 (0.2692) data time 0.0009 (0.0051) model time 0.2547 (0.2635) loss 5.4769 (5.4216) grad_norm 2.6514 (3.3601) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][120/625] eta 0:02:16 lr 0.000027 wd 0.0500 time 0.2595 (0.2699) data time 0.0009 (0.0048) model time 0.2586 (0.2654) loss 6.1747 (5.4105) grad_norm 2.9997 (3.2648) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][130/625] eta 0:02:13 lr 0.000027 wd 0.0500 time 0.2675 (0.2706) data time 0.0011 (0.0045) model time 0.2663 (0.2670) loss 5.2245 (5.4038) grad_norm 2.4934 (3.2004) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][140/625] eta 0:02:11 lr 0.000027 wd 0.0500 time 0.2545 (0.2709) data time 0.0010 (0.0042) model time 0.2535 (0.2677) loss 5.5090 (5.3959) grad_norm 2.7825 (3.2512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][150/625] eta 0:02:08 lr 0.000027 wd 0.0500 time 0.2554 (0.2700) data time 0.0012 (0.0040) model time 0.2542 (0.2666) loss 5.3788 (5.3978) grad_norm 2.4510 (3.3140) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][160/625] eta 0:02:05 lr 0.000027 wd 0.0500 time 0.2529 (0.2691) data time 0.0007 (0.0038) model time 0.2522 (0.2655) loss 5.0817 (5.4042) grad_norm 3.1066 (3.2754) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][170/625] eta 0:02:02 lr 0.000027 wd 0.0500 time 0.2573 (0.2696) data time 0.0009 (0.0036) model time 0.2563 (0.2664) loss 5.6706 (5.4107) grad_norm 2.2937 (3.2538) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][180/625] eta 0:02:00 lr 0.000027 wd 0.0500 time 0.2572 (0.2704) data time 0.0012 (0.0035) model time 0.2560 (0.2677) loss 5.5710 (5.4144) grad_norm 1.6205 (3.2224) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][190/625] eta 0:01:57 lr 0.000027 wd 0.0500 time 0.2544 (0.2696) data time 0.0009 (0.0034) model time 0.2535 (0.2667) loss 5.1696 (5.4102) grad_norm 4.3599 (3.2113) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][200/625] eta 0:01:54 lr 0.000027 wd 0.0500 time 0.2561 (0.2689) data time 0.0009 (0.0033) model time 0.2552 (0.2659) loss 5.3109 (5.4122) grad_norm 3.5041 (3.1983) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][210/625] eta 0:01:51 lr 0.000027 wd 0.0500 time 0.2555 (0.2691) data time 0.0012 (0.0032) model time 0.2543 (0.2664) loss 6.3458 (5.4022) grad_norm 2.3273 (3.1727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][220/625] eta 0:01:48 lr 0.000027 wd 0.0500 time 0.2584 (0.2685) data time 0.0008 (0.0031) model time 0.2576 (0.2657) loss 5.7756 (5.4062) grad_norm 3.3216 (3.1680) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][230/625] eta 0:01:45 lr 0.000027 wd 0.0500 time 0.2583 (0.2680) data time 0.0007 (0.0030) model time 0.2576 (0.2651) loss 5.3673 (5.4101) grad_norm 2.4979 (3.1528) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][240/625] eta 0:01:43 lr 0.000027 wd 0.0500 time 0.2585 (0.2689) data time 0.0008 (0.0029) model time 0.2577 (0.2663) loss 5.1270 (5.4143) grad_norm 1.9430 (3.1497) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][250/625] eta 0:01:40 lr 0.000027 wd 0.0500 time 0.2530 (0.2683) data time 0.0011 (0.0028) model time 0.2519 (0.2657) loss 5.4291 (5.4105) grad_norm 2.2856 (3.1474) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][260/625] eta 0:01:37 lr 0.000027 wd 0.0500 time 0.2535 (0.2679) data time 0.0011 (0.0028) model time 0.2524 (0.2652) loss 4.7639 (5.4041) grad_norm 2.9901 (3.1823) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][270/625] eta 0:01:34 lr 0.000027 wd 0.0500 time 0.2608 (0.2675) data time 0.0007 (0.0027) model time 0.2601 (0.2648) loss 5.6857 (5.4047) grad_norm 2.2362 (3.1726) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][280/625] eta 0:01:32 lr 0.000027 wd 0.0500 time 0.2591 (0.2671) data time 0.0011 (0.0026) model time 0.2580 (0.2644) loss 5.3449 (5.4070) grad_norm 2.4439 (3.1542) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][290/625] eta 0:01:29 lr 0.000027 wd 0.0500 time 0.2573 (0.2675) data time 0.0009 (0.0026) model time 0.2564 (0.2649) loss 5.3499 (5.3986) grad_norm 2.2364 (3.1501) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][300/625] eta 0:01:26 lr 0.000027 wd 0.0500 time 0.2570 (0.2671) data time 0.0011 (0.0025) model time 0.2558 (0.2645) loss 5.5384 (5.4052) grad_norm 20.0065 (3.2055) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][310/625] eta 0:01:24 lr 0.000027 wd 0.0500 time 0.2562 (0.2667) data time 0.0008 (0.0025) model time 0.2554 (0.2642) loss 5.8245 (5.4087) grad_norm 2.5780 (3.1877) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:09:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][320/625] eta 0:01:21 lr 0.000027 wd 0.0500 time 0.2700 (0.2665) data time 0.0008 (0.0024) model time 0.2691 (0.2639) loss 5.4600 (5.4128) grad_norm 5.0395 (3.1741) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][330/625] eta 0:01:18 lr 0.000027 wd 0.0500 time 0.2594 (0.2665) data time 0.0008 (0.0024) model time 0.2586 (0.2640) loss 5.3367 (5.4160) grad_norm 2.4929 (3.1738) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][340/625] eta 0:01:16 lr 0.000027 wd 0.0500 time 0.2587 (0.2668) data time 0.0012 (0.0023) model time 0.2575 (0.2644) loss 5.7495 (5.4052) grad_norm 2.5215 (3.1645) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][350/625] eta 0:01:13 lr 0.000027 wd 0.0500 time 0.2566 (0.2671) data time 0.0006 (0.0023) model time 0.2559 (0.2648) loss 6.1291 (5.4069) grad_norm 2.7516 (3.1481) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][360/625] eta 0:01:10 lr 0.000027 wd 0.0500 time 0.2582 (0.2677) data time 0.0008 (0.0023) model time 0.2573 (0.2656) loss 5.0416 (5.4026) grad_norm 2.2264 (3.1365) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][370/625] eta 0:01:08 lr 0.000027 wd 0.0500 time 0.2595 (0.2674) data time 0.0014 (0.0022) model time 0.2581 (0.2652) loss 5.0802 (5.3993) grad_norm 5.5683 (3.1455) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][380/625] eta 0:01:05 lr 0.000027 wd 0.0500 time 0.2664 (0.2671) data time 0.0010 (0.0022) model time 0.2654 (0.2650) loss 5.6481 (5.3967) grad_norm 2.2551 (3.1370) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][390/625] eta 0:01:02 lr 0.000027 wd 0.0500 time 0.2552 (0.2668) data time 0.0009 (0.0022) model time 0.2543 (0.2646) loss 5.4865 (5.3982) grad_norm 2.4383 (3.1301) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][400/625] eta 0:00:59 lr 0.000027 wd 0.0500 time 0.2541 (0.2665) data time 0.0008 (0.0021) model time 0.2533 (0.2644) loss 5.7970 (5.3911) grad_norm 2.4208 (3.1300) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][410/625] eta 0:00:57 lr 0.000027 wd 0.0500 time 0.2585 (0.2663) data time 0.0006 (0.0021) model time 0.2579 (0.2641) loss 5.5148 (5.3896) grad_norm 2.6828 (3.1313) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][420/625] eta 0:00:54 lr 0.000027 wd 0.0500 time 0.2594 (0.2660) data time 0.0006 (0.0021) model time 0.2588 (0.2638) loss 5.3563 (5.3908) grad_norm 2.2785 (3.1155) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][430/625] eta 0:00:51 lr 0.000027 wd 0.0500 time 0.2585 (0.2658) data time 0.0011 (0.0020) model time 0.2573 (0.2636) loss 4.4931 (5.3889) grad_norm 3.1298 (3.1011) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][440/625] eta 0:00:49 lr 0.000027 wd 0.0500 time 0.2562 (0.2656) data time 0.0007 (0.0020) model time 0.2555 (0.2634) loss 5.8627 (5.3888) grad_norm 2.4425 (3.0854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][450/625] eta 0:00:46 lr 0.000027 wd 0.0500 time 0.2588 (0.2657) data time 0.0010 (0.0020) model time 0.2578 (0.2635) loss 5.9158 (5.3926) grad_norm 2.4773 (3.0758) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][460/625] eta 0:00:43 lr 0.000027 wd 0.0500 time 0.2616 (0.2655) data time 0.0008 (0.0020) model time 0.2608 (0.2634) loss 5.9944 (5.3934) grad_norm 6.0361 (3.0727) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][470/625] eta 0:00:41 lr 0.000027 wd 0.0500 time 0.2568 (0.2653) data time 0.0008 (0.0020) model time 0.2560 (0.2632) loss 5.4892 (5.3922) grad_norm 3.3309 (3.0726) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][480/625] eta 0:00:38 lr 0.000027 wd 0.0500 time 0.2585 (0.2651) data time 0.0008 (0.0019) model time 0.2577 (0.2630) loss 5.0321 (5.3884) grad_norm 2.8041 (3.0732) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][490/625] eta 0:00:35 lr 0.000026 wd 0.0500 time 0.2526 (0.2649) data time 0.0011 (0.0019) model time 0.2515 (0.2628) loss 4.9883 (5.3955) grad_norm 2.6272 (3.0630) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][500/625] eta 0:00:33 lr 0.000026 wd 0.0500 time 0.2573 (0.2648) data time 0.0008 (0.0019) model time 0.2566 (0.2626) loss 4.7090 (5.3922) grad_norm 1.8511 (3.0614) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][510/625] eta 0:00:30 lr 0.000026 wd 0.0500 time 0.2546 (0.2650) data time 0.0008 (0.0019) model time 0.2538 (0.2629) loss 4.6387 (5.3902) grad_norm 18.8383 (3.0843) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][520/625] eta 0:00:27 lr 0.000026 wd 0.0500 time 0.2547 (0.2648) data time 0.0008 (0.0019) model time 0.2539 (0.2627) loss 5.2641 (5.3869) grad_norm 3.6267 (3.1504) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][530/625] eta 0:00:25 lr 0.000026 wd 0.0500 time 0.2565 (0.2646) data time 0.0008 (0.0019) model time 0.2557 (0.2625) loss 4.7069 (5.3885) grad_norm 2.9343 (3.1419) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][540/625] eta 0:00:22 lr 0.000026 wd 0.0500 time 0.2702 (0.2645) data time 0.0005 (0.0018) model time 0.2697 (0.2624) loss 6.0551 (5.3913) grad_norm 2.0362 (3.1376) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:10:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][550/625] eta 0:00:19 lr 0.000026 wd 0.0500 time 0.2573 (0.2643) data time 0.0008 (0.0018) model time 0.2565 (0.2623) loss 5.7853 (5.3902) grad_norm 3.0166 (3.1354) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][560/625] eta 0:00:17 lr 0.000026 wd 0.0500 time 0.2582 (0.2642) data time 0.0008 (0.0018) model time 0.2574 (0.2621) loss 5.0191 (5.3899) grad_norm 3.1472 (3.1234) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][570/625] eta 0:00:14 lr 0.000026 wd 0.0500 time 0.2562 (0.2640) data time 0.0010 (0.0018) model time 0.2552 (0.2620) loss 5.1220 (5.3917) grad_norm 4.2202 (3.1245) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][580/625] eta 0:00:11 lr 0.000026 wd 0.0500 time 0.2628 (0.2639) data time 0.0007 (0.0018) model time 0.2621 (0.2619) loss 5.2897 (5.3856) grad_norm 3.1997 (3.1280) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][590/625] eta 0:00:09 lr 0.000026 wd 0.0500 time 0.2551 (0.2638) data time 0.0008 (0.0018) model time 0.2542 (0.2617) loss 5.2822 (5.3874) grad_norm 25.5781 (3.1614) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][600/625] eta 0:00:06 lr 0.000026 wd 0.0500 time 0.2540 (0.2636) data time 0.0007 (0.0017) model time 0.2533 (0.2616) loss 5.2495 (5.3906) grad_norm 3.1331 (3.1815) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][610/625] eta 0:00:03 lr 0.000026 wd 0.0500 time 0.2532 (0.2635) data time 0.0004 (0.0017) model time 0.2529 (0.2615) loss 4.8444 (5.3931) grad_norm 2.7191 (3.1835) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][620/625] eta 0:00:01 lr 0.000026 wd 0.0500 time 0.2540 (0.2634) data time 0.0005 (0.0017) model time 0.2535 (0.2614) loss 6.1989 (5.3957) grad_norm 2.3850 (3.1844) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 289 training takes 0:02:44 [2024-08-04 11:11:17 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:11:17 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5967 (0.5967) Acc@1 90.088 (90.088) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:11:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.8955 (0.7083) Acc@1 81.787 (87.269) Acc@5 96.631 (97.874) Mem 9655MB [2024-08-04 11:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 0.9995 (0.8238) Acc@1 79.102 (84.289) Acc@5 95.605 (96.780) Mem 9655MB [2024-08-04 11:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.963 Acc@5 96.793 [2024-08-04 11:11:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:11:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.704 (0.704) Loss 0.5913 (0.5913) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:11:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.126) Loss 0.8936 (0.7068) Acc@1 82.373 (87.269) Acc@5 96.631 (97.843) Mem 9655MB [2024-08-04 11:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 0.9990 (0.8233) Acc@1 78.809 (84.191) Acc@5 95.459 (96.722) Mem 9655MB [2024-08-04 11:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.851 Acc@5 96.749 [2024-08-04 11:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:11:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.85% [2024-08-04 11:11:21 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:11:22 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:11:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][0/625] eta 0:07:41 lr 0.000026 wd 0.0500 time 0.7376 (0.7376) data time 0.4892 (0.4892) model time 0.0000 (0.0000) loss 5.9265 (5.9265) grad_norm 2.4413 (2.4413) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][10/625] eta 0:03:11 lr 0.000026 wd 0.0500 time 0.2587 (0.3115) data time 0.0008 (0.0454) model time 0.0000 (0.0000) loss 5.0334 (5.3889) grad_norm 2.9396 (2.6118) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][20/625] eta 0:02:52 lr 0.000026 wd 0.0500 time 0.2617 (0.2853) data time 0.0007 (0.0242) model time 0.0000 (0.0000) loss 5.9094 (5.4857) grad_norm 3.2088 (2.5945) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][30/625] eta 0:02:43 lr 0.000026 wd 0.0500 time 0.2531 (0.2756) data time 0.0010 (0.0167) model time 0.0000 (0.0000) loss 5.3582 (5.4216) grad_norm 2.0532 (2.5632) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][40/625] eta 0:02:40 lr 0.000026 wd 0.0500 time 0.2535 (0.2751) data time 0.0009 (0.0129) model time 0.0000 (0.0000) loss 5.7843 (5.4071) grad_norm 2.3175 (2.5745) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][50/625] eta 0:02:38 lr 0.000026 wd 0.0500 time 0.2565 (0.2749) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 5.2603 (5.4122) grad_norm 3.0734 (2.8432) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][60/625] eta 0:02:34 lr 0.000026 wd 0.0500 time 0.3923 (0.2739) data time 0.0009 (0.0090) model time 0.3915 (0.2680) loss 5.4691 (5.4015) grad_norm 1.9203 (2.7539) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][70/625] eta 0:02:30 lr 0.000026 wd 0.0500 time 0.2498 (0.2713) data time 0.0010 (0.0078) model time 0.2489 (0.2613) loss 5.4516 (5.4150) grad_norm 1.9720 (2.7684) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][80/625] eta 0:02:26 lr 0.000026 wd 0.0500 time 0.2557 (0.2694) data time 0.0011 (0.0070) model time 0.2546 (0.2591) loss 4.8239 (5.4304) grad_norm 1.8645 (2.8335) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][90/625] eta 0:02:23 lr 0.000026 wd 0.0500 time 0.2548 (0.2681) data time 0.0008 (0.0063) model time 0.2540 (0.2585) loss 5.2742 (5.4286) grad_norm 2.2422 (2.8511) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][100/625] eta 0:02:21 lr 0.000026 wd 0.0500 time 0.2611 (0.2690) data time 0.0009 (0.0058) model time 0.2602 (0.2620) loss 5.9924 (5.4466) grad_norm 1.9919 (2.8148) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][110/625] eta 0:02:17 lr 0.000026 wd 0.0500 time 0.2547 (0.2678) data time 0.0011 (0.0053) model time 0.2536 (0.2608) loss 5.5871 (5.4294) grad_norm 1.8021 (2.7863) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][120/625] eta 0:02:15 lr 0.000026 wd 0.0500 time 0.2550 (0.2686) data time 0.0006 (0.0050) model time 0.2544 (0.2630) loss 4.5633 (5.4273) grad_norm 2.3616 (2.8035) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][130/625] eta 0:02:13 lr 0.000026 wd 0.0500 time 0.2546 (0.2689) data time 0.0006 (0.0047) model time 0.2539 (0.2642) loss 5.7085 (5.4283) grad_norm 1.9269 (2.7987) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:11:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][140/625] eta 0:02:09 lr 0.000026 wd 0.0500 time 0.2549 (0.2680) data time 0.0009 (0.0044) model time 0.2540 (0.2632) loss 5.8450 (5.4239) grad_norm 2.7872 (2.8234) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][150/625] eta 0:02:07 lr 0.000026 wd 0.0500 time 0.2554 (0.2682) data time 0.0007 (0.0042) model time 0.2547 (0.2639) loss 6.2674 (5.4264) grad_norm 2.6962 (2.7963) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][160/625] eta 0:02:04 lr 0.000026 wd 0.0500 time 0.2588 (0.2675) data time 0.0008 (0.0040) model time 0.2581 (0.2632) loss 5.2294 (5.4344) grad_norm 2.3114 (2.8025) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][170/625] eta 0:02:01 lr 0.000026 wd 0.0500 time 0.2563 (0.2668) data time 0.0011 (0.0038) model time 0.2552 (0.2625) loss 5.0668 (5.4534) grad_norm 1.9172 (2.8404) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][180/625] eta 0:01:58 lr 0.000026 wd 0.0500 time 0.2608 (0.2663) data time 0.0007 (0.0036) model time 0.2600 (0.2620) loss 6.0228 (5.4431) grad_norm 1.9296 (2.8688) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][190/625] eta 0:01:55 lr 0.000026 wd 0.0500 time 0.2553 (0.2658) data time 0.0009 (0.0035) model time 0.2544 (0.2615) loss 4.5168 (5.4269) grad_norm 2.7632 (2.8603) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][200/625] eta 0:01:52 lr 0.000026 wd 0.0500 time 0.2543 (0.2653) data time 0.0008 (0.0034) model time 0.2535 (0.2611) loss 5.0073 (5.4273) grad_norm 2.4577 (2.8735) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][210/625] eta 0:01:50 lr 0.000026 wd 0.0500 time 0.2558 (0.2664) data time 0.0007 (0.0033) model time 0.2552 (0.2628) loss 6.0109 (5.4387) grad_norm 3.3433 (2.8675) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][220/625] eta 0:01:47 lr 0.000026 wd 0.0500 time 0.2551 (0.2660) data time 0.0006 (0.0031) model time 0.2545 (0.2624) loss 5.6104 (5.4304) grad_norm 1.7820 (2.8832) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][230/625] eta 0:01:45 lr 0.000026 wd 0.0500 time 0.2586 (0.2664) data time 0.0010 (0.0030) model time 0.2576 (0.2631) loss 5.3521 (5.4332) grad_norm 1.8860 (2.8759) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][240/625] eta 0:01:42 lr 0.000026 wd 0.0500 time 0.2663 (0.2669) data time 0.0010 (0.0030) model time 0.2654 (0.2638) loss 5.7276 (5.4370) grad_norm 3.8818 (2.8670) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][250/625] eta 0:01:39 lr 0.000026 wd 0.0500 time 0.2551 (0.2665) data time 0.0007 (0.0029) model time 0.2544 (0.2634) loss 4.4219 (5.4252) grad_norm 1.7966 (2.9343) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][260/625] eta 0:01:37 lr 0.000026 wd 0.0500 time 0.2554 (0.2661) data time 0.0011 (0.0028) model time 0.2543 (0.2630) loss 4.9023 (5.4178) grad_norm 4.1696 (2.9454) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][270/625] eta 0:01:34 lr 0.000026 wd 0.0500 time 0.2589 (0.2664) data time 0.0007 (0.0027) model time 0.2582 (0.2635) loss 5.1479 (5.4127) grad_norm 1.8895 (2.9291) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][280/625] eta 0:01:31 lr 0.000026 wd 0.0500 time 0.2579 (0.2660) data time 0.0011 (0.0027) model time 0.2568 (0.2632) loss 5.8141 (5.4154) grad_norm 2.6634 (2.9190) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][290/625] eta 0:01:29 lr 0.000026 wd 0.0500 time 0.2565 (0.2657) data time 0.0011 (0.0026) model time 0.2555 (0.2628) loss 5.9396 (5.4269) grad_norm 2.0380 (2.9217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][300/625] eta 0:01:26 lr 0.000026 wd 0.0500 time 0.2554 (0.2659) data time 0.0007 (0.0026) model time 0.2547 (0.2631) loss 5.8905 (5.4227) grad_norm 2.4873 (2.9112) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][310/625] eta 0:01:23 lr 0.000026 wd 0.0500 time 0.2608 (0.2656) data time 0.0006 (0.0025) model time 0.2602 (0.2628) loss 5.1869 (5.4248) grad_norm 1.9909 (2.8973) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][320/625] eta 0:01:20 lr 0.000026 wd 0.0500 time 0.2568 (0.2653) data time 0.0008 (0.0025) model time 0.2561 (0.2625) loss 5.0357 (5.4294) grad_norm 4.3089 (2.9082) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][330/625] eta 0:01:18 lr 0.000026 wd 0.0500 time 0.2592 (0.2650) data time 0.0008 (0.0024) model time 0.2584 (0.2623) loss 5.8354 (5.4254) grad_norm 2.7003 (2.9087) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][340/625] eta 0:01:15 lr 0.000026 wd 0.0500 time 0.2554 (0.2648) data time 0.0009 (0.0024) model time 0.2544 (0.2621) loss 6.2238 (5.4356) grad_norm 3.8982 (2.9181) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][350/625] eta 0:01:12 lr 0.000026 wd 0.0500 time 0.2620 (0.2651) data time 0.0006 (0.0023) model time 0.2614 (0.2625) loss 4.7801 (5.4231) grad_norm 2.2077 (2.9315) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:12:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][360/625] eta 0:01:10 lr 0.000026 wd 0.0500 time 0.2589 (0.2655) data time 0.0007 (0.0023) model time 0.2583 (0.2630) loss 4.8877 (5.4227) grad_norm 3.4932 (2.9389) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][370/625] eta 0:01:07 lr 0.000026 wd 0.0500 time 0.2518 (0.2658) data time 0.0007 (0.0023) model time 0.2511 (0.2634) loss 5.6724 (5.4224) grad_norm 2.9529 (2.9341) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][380/625] eta 0:01:05 lr 0.000025 wd 0.0500 time 0.2560 (0.2659) data time 0.0007 (0.0022) model time 0.2553 (0.2636) loss 5.7073 (5.4213) grad_norm 2.1473 (2.9610) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][390/625] eta 0:01:02 lr 0.000025 wd 0.0500 time 0.2579 (0.2657) data time 0.0008 (0.0022) model time 0.2571 (0.2633) loss 5.6859 (5.4214) grad_norm 1.8941 (2.9448) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][400/625] eta 0:00:59 lr 0.000025 wd 0.0500 time 0.2570 (0.2657) data time 0.0007 (0.0022) model time 0.2563 (0.2634) loss 5.8855 (5.4316) grad_norm 2.6735 (2.9617) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][410/625] eta 0:00:57 lr 0.000025 wd 0.0500 time 0.4371 (0.2659) data time 0.0010 (0.0021) model time 0.4361 (0.2637) loss 6.0921 (5.4321) grad_norm 2.7256 (2.9640) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][420/625] eta 0:00:54 lr 0.000025 wd 0.0500 time 0.2561 (0.2657) data time 0.0010 (0.0021) model time 0.2551 (0.2635) loss 5.8662 (5.4329) grad_norm 3.9452 (2.9596) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][430/625] eta 0:00:51 lr 0.000025 wd 0.0500 time 0.2586 (0.2655) data time 0.0008 (0.0021) model time 0.2578 (0.2633) loss 5.2756 (5.4269) grad_norm 2.3398 (2.9547) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][440/625] eta 0:00:49 lr 0.000025 wd 0.0500 time 0.2569 (0.2657) data time 0.0007 (0.0020) model time 0.2562 (0.2635) loss 6.5783 (5.4276) grad_norm 2.2947 (2.9564) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][450/625] eta 0:00:46 lr 0.000025 wd 0.0500 time 0.2546 (0.2655) data time 0.0011 (0.0020) model time 0.2536 (0.2633) loss 5.0061 (5.4210) grad_norm 5.7790 (2.9509) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][460/625] eta 0:00:43 lr 0.000025 wd 0.0500 time 0.2534 (0.2653) data time 0.0007 (0.0020) model time 0.2527 (0.2631) loss 5.2616 (5.4174) grad_norm 2.8955 (2.9432) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][470/625] eta 0:00:41 lr 0.000025 wd 0.0500 time 0.2539 (0.2655) data time 0.0006 (0.0020) model time 0.2532 (0.2634) loss 4.6048 (5.4139) grad_norm 2.3639 (2.9344) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][480/625] eta 0:00:38 lr 0.000025 wd 0.0500 time 0.2585 (0.2653) data time 0.0007 (0.0019) model time 0.2578 (0.2632) loss 6.1989 (5.4183) grad_norm 3.1895 (2.9297) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][490/625] eta 0:00:35 lr 0.000025 wd 0.0500 time 0.2550 (0.2651) data time 0.0007 (0.0019) model time 0.2543 (0.2631) loss 5.9496 (5.4217) grad_norm 2.4392 (2.9485) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][500/625] eta 0:00:33 lr 0.000025 wd 0.0500 time 0.2566 (0.2650) data time 0.0008 (0.0019) model time 0.2558 (0.2629) loss 4.9958 (5.4171) grad_norm 1.9532 (2.9443) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][510/625] eta 0:00:30 lr 0.000025 wd 0.0500 time 0.2538 (0.2648) data time 0.0008 (0.0019) model time 0.2530 (0.2628) loss 4.5221 (5.4192) grad_norm 1.7690 (2.9393) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][520/625] eta 0:00:27 lr 0.000025 wd 0.0500 time 0.2585 (0.2647) data time 0.0006 (0.0019) model time 0.2579 (0.2626) loss 5.2038 (5.4175) grad_norm 2.1507 (2.9320) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:13:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][530/625] eta 0:00:25 lr 0.000025 wd 0.0500 time 0.2571 (0.2645) data time 0.0011 (0.0018) model time 0.2560 (0.2625) loss 4.6697 (5.4202) grad_norm 3.0406 (2.9220) loss_scale 512.0000 (258.4105) mem 9655MB [2024-08-04 11:13:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][540/625] eta 0:00:22 lr 0.000025 wd 0.0500 time 0.2596 (0.2647) data time 0.0008 (0.0018) model time 0.2589 (0.2627) loss 5.7449 (5.4213) grad_norm 2.4909 (inf) loss_scale 256.0000 (258.8392) mem 9655MB [2024-08-04 11:13:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][550/625] eta 0:00:19 lr 0.000025 wd 0.0500 time 0.2520 (0.2648) data time 0.0008 (0.0018) model time 0.2513 (0.2629) loss 4.8900 (5.4177) grad_norm 3.0009 (inf) loss_scale 256.0000 (258.7877) mem 9655MB [2024-08-04 11:13:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][560/625] eta 0:00:17 lr 0.000025 wd 0.0500 time 0.2563 (0.2650) data time 0.0005 (0.0018) model time 0.2558 (0.2631) loss 6.1539 (5.4181) grad_norm 2.7490 (inf) loss_scale 256.0000 (258.7380) mem 9655MB [2024-08-04 11:13:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][570/625] eta 0:00:14 lr 0.000025 wd 0.0500 time 0.2527 (0.2648) data time 0.0010 (0.0018) model time 0.2517 (0.2629) loss 5.4014 (5.4155) grad_norm 3.1106 (inf) loss_scale 256.0000 (258.6900) mem 9655MB [2024-08-04 11:13:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][580/625] eta 0:00:11 lr 0.000025 wd 0.0500 time 0.2609 (0.2647) data time 0.0007 (0.0018) model time 0.2603 (0.2628) loss 5.0739 (5.4153) grad_norm 3.3435 (inf) loss_scale 256.0000 (258.6437) mem 9655MB [2024-08-04 11:13:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][590/625] eta 0:00:09 lr 0.000025 wd 0.0500 time 0.2581 (0.2646) data time 0.0016 (0.0017) model time 0.2565 (0.2627) loss 6.2002 (5.4151) grad_norm 3.1836 (inf) loss_scale 256.0000 (258.5990) mem 9655MB [2024-08-04 11:14:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][600/625] eta 0:00:06 lr 0.000025 wd 0.0500 time 0.2600 (0.2647) data time 0.0006 (0.0017) model time 0.2594 (0.2629) loss 4.1800 (5.4113) grad_norm 1.8761 (inf) loss_scale 256.0000 (258.5557) mem 9655MB [2024-08-04 11:14:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][610/625] eta 0:00:03 lr 0.000025 wd 0.0500 time 0.2554 (0.2646) data time 0.0004 (0.0017) model time 0.2550 (0.2627) loss 4.9951 (5.4115) grad_norm 1.9239 (inf) loss_scale 256.0000 (258.5139) mem 9655MB [2024-08-04 11:14:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][620/625] eta 0:00:01 lr 0.000025 wd 0.0500 time 0.2547 (0.2644) data time 0.0005 (0.0017) model time 0.2542 (0.2626) loss 5.1749 (5.4056) grad_norm 4.4016 (inf) loss_scale 256.0000 (258.4734) mem 9655MB [2024-08-04 11:14:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 290 training takes 0:02:45 [2024-08-04 11:14:07 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:14:07 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.5986 (0.5986) Acc@1 90.283 (90.283) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 11:14:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.8999 (0.7124) Acc@1 81.738 (87.211) Acc@5 96.875 (97.909) Mem 9655MB [2024-08-04 11:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 0.9966 (0.8272) Acc@1 78.760 (84.268) Acc@5 95.410 (96.773) Mem 9655MB [2024-08-04 11:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.949 Acc@5 96.795 [2024-08-04 11:14:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:14:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.745 (0.745) Loss 0.5918 (0.5918) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:14:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.124) Loss 0.8936 (0.7066) Acc@1 82.471 (87.287) Acc@5 96.680 (97.856) Mem 9655MB [2024-08-04 11:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 0.9990 (0.8231) Acc@1 78.906 (84.203) Acc@5 95.459 (96.731) Mem 9655MB [2024-08-04 11:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.859 Acc@5 96.755 [2024-08-04 11:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:14:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.86% [2024-08-04 11:14:11 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:14:12 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:14:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][0/625] eta 0:08:17 lr 0.000025 wd 0.0500 time 0.7957 (0.7957) data time 0.5565 (0.5565) model time 0.0000 (0.0000) loss 5.4827 (5.4827) grad_norm 1.7752 (1.7752) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][10/625] eta 0:03:07 lr 0.000025 wd 0.0500 time 0.2528 (0.3045) data time 0.0010 (0.0514) model time 0.0000 (0.0000) loss 4.9901 (5.5901) grad_norm 9.1754 (4.4385) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][20/625] eta 0:02:55 lr 0.000025 wd 0.0500 time 0.2575 (0.2906) data time 0.0008 (0.0273) model time 0.0000 (0.0000) loss 4.9635 (5.4211) grad_norm 3.9803 (4.2621) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][30/625] eta 0:02:50 lr 0.000025 wd 0.0500 time 0.2532 (0.2861) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 5.4059 (5.4462) grad_norm 2.7065 (3.8639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][40/625] eta 0:02:45 lr 0.000025 wd 0.0500 time 0.2573 (0.2834) data time 0.0006 (0.0144) model time 0.0000 (0.0000) loss 5.5296 (5.4518) grad_norm 2.9425 (3.6069) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][50/625] eta 0:02:39 lr 0.000025 wd 0.0500 time 0.2604 (0.2780) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 4.9937 (5.4361) grad_norm 4.0974 (3.4699) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][60/625] eta 0:02:35 lr 0.000025 wd 0.0500 time 0.2633 (0.2745) data time 0.0010 (0.0100) model time 0.2623 (0.2555) loss 6.4340 (5.4364) grad_norm 2.9361 (3.5596) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][70/625] eta 0:02:30 lr 0.000025 wd 0.0500 time 0.2655 (0.2719) data time 0.0009 (0.0087) model time 0.2646 (0.2555) loss 4.6742 (5.4190) grad_norm 13.7069 (3.6333) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][80/625] eta 0:02:27 lr 0.000025 wd 0.0500 time 0.2563 (0.2699) data time 0.0008 (0.0077) model time 0.2554 (0.2553) loss 5.1483 (5.3938) grad_norm 6.1251 (3.5791) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][90/625] eta 0:02:23 lr 0.000025 wd 0.0500 time 0.2554 (0.2684) data time 0.0008 (0.0070) model time 0.2546 (0.2552) loss 4.8838 (5.3996) grad_norm 5.4826 (3.5310) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][100/625] eta 0:02:20 lr 0.000025 wd 0.0500 time 0.2583 (0.2671) data time 0.0011 (0.0064) model time 0.2572 (0.2550) loss 4.7168 (5.4246) grad_norm 2.8596 (3.4230) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][110/625] eta 0:02:17 lr 0.000025 wd 0.0500 time 0.2568 (0.2661) data time 0.0009 (0.0059) model time 0.2559 (0.2550) loss 5.2023 (5.4451) grad_norm 1.4674 (3.4199) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][120/625] eta 0:02:14 lr 0.000025 wd 0.0500 time 0.2651 (0.2663) data time 0.0016 (0.0055) model time 0.2634 (0.2567) loss 4.5296 (5.4085) grad_norm 2.6217 (3.3927) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][130/625] eta 0:02:11 lr 0.000025 wd 0.0500 time 0.2555 (0.2654) data time 0.0009 (0.0052) model time 0.2545 (0.2565) loss 5.7146 (5.4184) grad_norm 2.9816 (3.4316) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][140/625] eta 0:02:08 lr 0.000025 wd 0.0500 time 0.2573 (0.2659) data time 0.0007 (0.0049) model time 0.2566 (0.2581) loss 4.8326 (5.3877) grad_norm 7.1526 (3.4064) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][150/625] eta 0:02:05 lr 0.000025 wd 0.0500 time 0.2538 (0.2652) data time 0.0011 (0.0046) model time 0.2527 (0.2578) loss 5.7961 (5.4146) grad_norm 2.5344 (3.3519) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][160/625] eta 0:02:03 lr 0.000025 wd 0.0500 time 0.2569 (0.2646) data time 0.0008 (0.0044) model time 0.2561 (0.2575) loss 5.9245 (5.4298) grad_norm 2.2448 (3.3063) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:14:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][170/625] eta 0:02:00 lr 0.000025 wd 0.0500 time 0.2583 (0.2641) data time 0.0008 (0.0042) model time 0.2574 (0.2572) loss 5.4896 (5.4432) grad_norm 2.9685 (3.2971) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][180/625] eta 0:01:57 lr 0.000025 wd 0.0500 time 0.2552 (0.2643) data time 0.0010 (0.0040) model time 0.2541 (0.2580) loss 5.7476 (5.4390) grad_norm 2.4610 (3.2429) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][190/625] eta 0:01:54 lr 0.000025 wd 0.0500 time 0.2550 (0.2640) data time 0.0009 (0.0038) model time 0.2541 (0.2579) loss 6.7620 (5.4368) grad_norm 2.3166 (3.2434) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][200/625] eta 0:01:52 lr 0.000025 wd 0.0500 time 0.2562 (0.2642) data time 0.0009 (0.0037) model time 0.2553 (0.2585) loss 5.2110 (5.4332) grad_norm 3.8315 (3.3164) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][210/625] eta 0:01:49 lr 0.000025 wd 0.0500 time 0.2501 (0.2638) data time 0.0009 (0.0036) model time 0.2492 (0.2583) loss 5.2211 (5.4230) grad_norm 2.0773 (3.2794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][220/625] eta 0:01:46 lr 0.000025 wd 0.0500 time 0.2570 (0.2634) data time 0.0007 (0.0035) model time 0.2563 (0.2581) loss 5.0679 (5.4338) grad_norm 2.0912 (3.2472) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][230/625] eta 0:01:43 lr 0.000025 wd 0.0500 time 0.2607 (0.2631) data time 0.0006 (0.0033) model time 0.2601 (0.2579) loss 5.0824 (5.4266) grad_norm 2.4729 (3.2711) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][240/625] eta 0:01:41 lr 0.000025 wd 0.0500 time 0.2587 (0.2628) data time 0.0010 (0.0032) model time 0.2577 (0.2578) loss 5.0593 (5.4198) grad_norm 1.8324 (3.2512) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][250/625] eta 0:01:38 lr 0.000025 wd 0.0500 time 0.2516 (0.2625) data time 0.0010 (0.0031) model time 0.2506 (0.2577) loss 5.2557 (5.4048) grad_norm 3.3426 (3.2270) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][260/625] eta 0:01:35 lr 0.000025 wd 0.0500 time 0.2591 (0.2623) data time 0.0006 (0.0030) model time 0.2584 (0.2576) loss 4.4896 (5.3964) grad_norm 2.7200 (3.2325) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][270/625] eta 0:01:33 lr 0.000025 wd 0.0500 time 0.2578 (0.2628) data time 0.0009 (0.0030) model time 0.2570 (0.2584) loss 5.4985 (5.3996) grad_norm 2.2010 (3.2536) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][280/625] eta 0:01:30 lr 0.000025 wd 0.0500 time 0.2539 (0.2626) data time 0.0008 (0.0029) model time 0.2532 (0.2583) loss 5.7997 (5.3990) grad_norm 2.8268 (3.2274) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][290/625] eta 0:01:27 lr 0.000025 wd 0.0500 time 0.2556 (0.2624) data time 0.0008 (0.0028) model time 0.2547 (0.2581) loss 5.6057 (5.4055) grad_norm 2.0156 (3.2066) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][300/625] eta 0:01:25 lr 0.000025 wd 0.0500 time 0.2557 (0.2625) data time 0.0007 (0.0028) model time 0.2549 (0.2584) loss 5.4953 (5.4084) grad_norm 3.4790 (3.1974) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][310/625] eta 0:01:22 lr 0.000025 wd 0.0500 time 0.2590 (0.2623) data time 0.0007 (0.0027) model time 0.2582 (0.2583) loss 5.1401 (5.4028) grad_norm 2.7124 (3.1709) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][320/625] eta 0:01:19 lr 0.000024 wd 0.0500 time 0.2555 (0.2621) data time 0.0010 (0.0027) model time 0.2545 (0.2582) loss 4.9152 (5.4006) grad_norm 2.4640 (3.1759) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][330/625] eta 0:01:17 lr 0.000024 wd 0.0500 time 0.2509 (0.2630) data time 0.0008 (0.0026) model time 0.2502 (0.2593) loss 5.8866 (5.4135) grad_norm 2.2132 (3.1756) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][340/625] eta 0:01:14 lr 0.000024 wd 0.0500 time 0.2544 (0.2628) data time 0.0008 (0.0026) model time 0.2536 (0.2591) loss 5.0489 (5.4208) grad_norm 5.7602 (3.1758) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][350/625] eta 0:01:12 lr 0.000024 wd 0.0500 time 0.2588 (0.2626) data time 0.0009 (0.0025) model time 0.2579 (0.2590) loss 6.0770 (5.4271) grad_norm 3.7334 (3.1691) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][360/625] eta 0:01:09 lr 0.000024 wd 0.0500 time 0.2587 (0.2624) data time 0.0008 (0.0025) model time 0.2579 (0.2589) loss 4.7768 (5.4306) grad_norm 4.1490 (3.2149) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][370/625] eta 0:01:07 lr 0.000024 wd 0.0500 time 0.2530 (0.2628) data time 0.0010 (0.0024) model time 0.2520 (0.2594) loss 5.5895 (5.4426) grad_norm 1.9613 (3.2138) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][380/625] eta 0:01:04 lr 0.000024 wd 0.0500 time 0.2562 (0.2626) data time 0.0009 (0.0024) model time 0.2554 (0.2593) loss 5.0316 (5.4394) grad_norm 2.4165 (3.2210) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][390/625] eta 0:01:01 lr 0.000024 wd 0.0500 time 0.2545 (0.2624) data time 0.0007 (0.0024) model time 0.2537 (0.2591) loss 5.5692 (5.4300) grad_norm 4.2318 (3.2184) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:15:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][400/625] eta 0:00:59 lr 0.000024 wd 0.0500 time 0.2561 (0.2625) data time 0.0005 (0.0023) model time 0.2555 (0.2593) loss 5.8739 (5.4304) grad_norm 2.4004 (3.2159) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][410/625] eta 0:00:56 lr 0.000024 wd 0.0500 time 0.2571 (0.2628) data time 0.0009 (0.0023) model time 0.2562 (0.2597) loss 6.0977 (5.4291) grad_norm 3.1210 (3.2157) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][420/625] eta 0:00:53 lr 0.000024 wd 0.0500 time 0.2544 (0.2626) data time 0.0009 (0.0023) model time 0.2536 (0.2596) loss 5.7725 (5.4306) grad_norm 2.7087 (3.1977) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][430/625] eta 0:00:51 lr 0.000024 wd 0.0500 time 0.2531 (0.2625) data time 0.0009 (0.0022) model time 0.2522 (0.2595) loss 4.6332 (5.4301) grad_norm 2.8341 (3.1985) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][440/625] eta 0:00:48 lr 0.000024 wd 0.0500 time 0.2594 (0.2626) data time 0.0007 (0.0022) model time 0.2586 (0.2596) loss 4.4801 (5.4237) grad_norm 2.0704 (3.1885) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][450/625] eta 0:00:46 lr 0.000024 wd 0.0500 time 0.2525 (0.2629) data time 0.0007 (0.0022) model time 0.2518 (0.2601) loss 5.8207 (5.4199) grad_norm 2.8137 (3.1676) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][460/625] eta 0:00:43 lr 0.000024 wd 0.0500 time 0.2541 (0.2643) data time 0.0006 (0.0021) model time 0.2535 (0.2616) loss 5.1071 (5.4159) grad_norm 6.5351 (3.1830) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][470/625] eta 0:00:40 lr 0.000024 wd 0.0500 time 0.2576 (0.2641) data time 0.0008 (0.0021) model time 0.2567 (0.2615) loss 5.9907 (5.4179) grad_norm 2.5951 (3.1766) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][480/625] eta 0:00:38 lr 0.000024 wd 0.0500 time 0.2557 (0.2639) data time 0.0009 (0.0021) model time 0.2548 (0.2613) loss 5.1407 (5.4176) grad_norm 4.3261 (3.1791) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][490/625] eta 0:00:35 lr 0.000024 wd 0.0500 time 0.2533 (0.2638) data time 0.0009 (0.0021) model time 0.2524 (0.2612) loss 6.1126 (5.4194) grad_norm 4.6208 (3.1747) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][500/625] eta 0:00:32 lr 0.000024 wd 0.0500 time 0.2545 (0.2637) data time 0.0009 (0.0020) model time 0.2536 (0.2611) loss 4.8939 (5.4194) grad_norm 4.7208 (3.1847) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][510/625] eta 0:00:30 lr 0.000024 wd 0.0500 time 0.2635 (0.2639) data time 0.0006 (0.0020) model time 0.2629 (0.2614) loss 6.1790 (5.4128) grad_norm 6.4721 (3.1812) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][520/625] eta 0:00:27 lr 0.000024 wd 0.0500 time 0.2545 (0.2638) data time 0.0009 (0.0020) model time 0.2536 (0.2613) loss 5.4110 (5.4078) grad_norm 2.4722 (3.1992) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][530/625] eta 0:00:25 lr 0.000024 wd 0.0500 time 0.2546 (0.2636) data time 0.0008 (0.0020) model time 0.2538 (0.2612) loss 5.3576 (5.4093) grad_norm 8.3847 (3.2062) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][540/625] eta 0:00:22 lr 0.000024 wd 0.0500 time 0.2516 (0.2635) data time 0.0009 (0.0020) model time 0.2507 (0.2610) loss 5.7180 (5.4052) grad_norm 2.6111 (3.2254) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][550/625] eta 0:00:19 lr 0.000024 wd 0.0500 time 0.2569 (0.2634) data time 0.0006 (0.0019) model time 0.2562 (0.2609) loss 5.4779 (5.4055) grad_norm 3.2125 (3.2269) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][560/625] eta 0:00:17 lr 0.000024 wd 0.0500 time 0.2549 (0.2636) data time 0.0010 (0.0019) model time 0.2539 (0.2612) loss 5.4012 (5.4072) grad_norm 2.0460 (3.2255) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][570/625] eta 0:00:14 lr 0.000024 wd 0.0500 time 0.2508 (0.2634) data time 0.0009 (0.0019) model time 0.2499 (0.2611) loss 5.9030 (5.4091) grad_norm 2.1471 (3.2111) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][580/625] eta 0:00:11 lr 0.000024 wd 0.0500 time 0.2591 (0.2636) data time 0.0008 (0.0019) model time 0.2583 (0.2612) loss 6.6002 (5.4134) grad_norm 3.0980 (3.2055) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][590/625] eta 0:00:09 lr 0.000024 wd 0.0500 time 0.2567 (0.2637) data time 0.0005 (0.0019) model time 0.2562 (0.2614) loss 4.9301 (5.4138) grad_norm 4.8777 (3.2186) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][600/625] eta 0:00:06 lr 0.000024 wd 0.0500 time 0.2550 (0.2639) data time 0.0009 (0.0019) model time 0.2541 (0.2616) loss 5.4395 (5.4122) grad_norm 2.0563 (3.2196) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][610/625] eta 0:00:03 lr 0.000024 wd 0.0500 time 0.2516 (0.2637) data time 0.0004 (0.0018) model time 0.2513 (0.2615) loss 5.8683 (5.4087) grad_norm 1.7841 (3.2163) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][620/625] eta 0:00:01 lr 0.000024 wd 0.0500 time 0.2547 (0.2636) data time 0.0005 (0.0018) model time 0.2543 (0.2614) loss 5.7058 (5.4041) grad_norm 4.6753 (3.2209) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:16:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 291 training takes 0:02:44 [2024-08-04 11:16:56 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:16:57 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:16:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.546 (0.546) Loss 0.5996 (0.5996) Acc@1 90.332 (90.332) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 11:16:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.103) Loss 0.8916 (0.7106) Acc@1 81.982 (87.349) Acc@5 96.777 (97.905) Mem 9655MB [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.080) Loss 0.9946 (0.8249) Acc@1 79.150 (84.382) Acc@5 95.508 (96.780) Mem 9655MB [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.057 Acc@5 96.803 [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.06% [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 11:16:59 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 11:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.546 (0.546) Loss 0.5918 (0.5918) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:17:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.100) Loss 0.8936 (0.7067) Acc@1 82.373 (87.282) Acc@5 96.680 (97.865) Mem 9655MB [2024-08-04 11:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 0.9990 (0.8232) Acc@1 78.857 (84.212) Acc@5 95.508 (96.749) Mem 9655MB [2024-08-04 11:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.867 Acc@5 96.769 [2024-08-04 11:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:17:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.87% [2024-08-04 11:17:01 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:17:02 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:17:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][0/625] eta 0:08:22 lr 0.000024 wd 0.0500 time 0.8036 (0.8036) data time 0.5637 (0.5637) model time 0.0000 (0.0000) loss 4.3658 (4.3658) grad_norm 4.2893 (4.2893) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][10/625] eta 0:03:08 lr 0.000024 wd 0.0500 time 0.2530 (0.3060) data time 0.0007 (0.0520) model time 0.0000 (0.0000) loss 5.5441 (5.1170) grad_norm 4.1496 (3.3809) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][20/625] eta 0:02:50 lr 0.000024 wd 0.0500 time 0.2567 (0.2821) data time 0.0010 (0.0277) model time 0.0000 (0.0000) loss 5.4645 (5.3571) grad_norm 2.7668 (3.6909) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][30/625] eta 0:02:46 lr 0.000024 wd 0.0500 time 0.2539 (0.2802) data time 0.0009 (0.0191) model time 0.0000 (0.0000) loss 5.8812 (5.3074) grad_norm 2.5240 (3.4337) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][40/625] eta 0:02:43 lr 0.000024 wd 0.0500 time 0.2519 (0.2792) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 5.1073 (5.3387) grad_norm 3.1418 (3.2293) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][50/625] eta 0:02:40 lr 0.000024 wd 0.0500 time 0.2541 (0.2785) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 5.4004 (5.3577) grad_norm 2.7134 (3.0847) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][60/625] eta 0:02:36 lr 0.000024 wd 0.0500 time 0.3730 (0.2766) data time 0.0010 (0.0102) model time 0.3720 (0.2662) loss 4.8306 (5.3040) grad_norm 1.9028 (3.0258) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][70/625] eta 0:02:31 lr 0.000024 wd 0.0500 time 0.2535 (0.2736) data time 0.0006 (0.0088) model time 0.2529 (0.2602) loss 4.7908 (5.3217) grad_norm 2.8838 (3.0161) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][80/625] eta 0:02:29 lr 0.000024 wd 0.0500 time 0.2525 (0.2736) data time 0.0007 (0.0079) model time 0.2519 (0.2646) loss 4.9928 (5.3359) grad_norm 3.1462 (3.0319) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][90/625] eta 0:02:25 lr 0.000024 wd 0.0500 time 0.2548 (0.2718) data time 0.0006 (0.0071) model time 0.2542 (0.2625) loss 4.8847 (5.3168) grad_norm 2.8503 (3.1114) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][100/625] eta 0:02:22 lr 0.000024 wd 0.0500 time 0.2593 (0.2723) data time 0.0009 (0.0065) model time 0.2584 (0.2652) loss 5.1224 (5.3478) grad_norm 2.4165 (3.0905) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][110/625] eta 0:02:19 lr 0.000024 wd 0.0500 time 0.2577 (0.2708) data time 0.0006 (0.0060) model time 0.2571 (0.2634) loss 5.5684 (5.3704) grad_norm 3.5270 (3.3192) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][120/625] eta 0:02:16 lr 0.000024 wd 0.0500 time 0.2625 (0.2696) data time 0.0008 (0.0055) model time 0.2618 (0.2623) loss 6.2345 (5.3870) grad_norm 2.3332 (3.2732) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][130/625] eta 0:02:12 lr 0.000024 wd 0.0500 time 0.2502 (0.2686) data time 0.0009 (0.0052) model time 0.2493 (0.2614) loss 6.6339 (5.4011) grad_norm 2.3015 (3.2407) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][140/625] eta 0:02:09 lr 0.000024 wd 0.0500 time 0.2553 (0.2677) data time 0.0008 (0.0049) model time 0.2545 (0.2607) loss 5.9713 (5.3954) grad_norm 2.3118 (3.2425) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][150/625] eta 0:02:06 lr 0.000024 wd 0.0500 time 0.2589 (0.2669) data time 0.0010 (0.0046) model time 0.2579 (0.2602) loss 5.8419 (5.3916) grad_norm 2.2454 (3.2295) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][160/625] eta 0:02:04 lr 0.000024 wd 0.0500 time 0.2592 (0.2675) data time 0.0007 (0.0044) model time 0.2585 (0.2615) loss 4.6172 (5.4083) grad_norm 2.3987 (3.2274) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][170/625] eta 0:02:01 lr 0.000024 wd 0.0500 time 0.2513 (0.2669) data time 0.0007 (0.0042) model time 0.2506 (0.2611) loss 5.4547 (5.4033) grad_norm 2.8241 (3.1906) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][180/625] eta 0:01:58 lr 0.000024 wd 0.0500 time 0.2532 (0.2663) data time 0.0010 (0.0040) model time 0.2522 (0.2606) loss 5.1163 (5.4026) grad_norm 5.2213 (3.2113) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][190/625] eta 0:01:55 lr 0.000024 wd 0.0500 time 0.2559 (0.2657) data time 0.0009 (0.0038) model time 0.2550 (0.2602) loss 5.4835 (5.3990) grad_norm 2.0777 (3.2210) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][200/625] eta 0:01:52 lr 0.000024 wd 0.0500 time 0.2570 (0.2652) data time 0.0008 (0.0037) model time 0.2562 (0.2598) loss 5.3077 (5.3924) grad_norm 3.0428 (3.2283) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:17:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][210/625] eta 0:01:50 lr 0.000024 wd 0.0500 time 0.2554 (0.2655) data time 0.0009 (0.0036) model time 0.2546 (0.2604) loss 5.9507 (5.3997) grad_norm 2.7782 (3.2814) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][220/625] eta 0:01:47 lr 0.000024 wd 0.0500 time 0.2532 (0.2651) data time 0.0007 (0.0034) model time 0.2525 (0.2601) loss 5.5462 (5.3959) grad_norm 2.0339 (3.2415) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][230/625] eta 0:01:44 lr 0.000024 wd 0.0500 time 0.2557 (0.2648) data time 0.0008 (0.0033) model time 0.2549 (0.2600) loss 5.0633 (5.4097) grad_norm 4.0470 (3.2114) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][240/625] eta 0:01:42 lr 0.000024 wd 0.0500 time 0.2612 (0.2653) data time 0.0018 (0.0032) model time 0.2594 (0.2608) loss 5.3878 (5.4181) grad_norm 2.9867 (3.1987) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][250/625] eta 0:01:39 lr 0.000024 wd 0.0500 time 0.2524 (0.2649) data time 0.0009 (0.0031) model time 0.2514 (0.2606) loss 5.8941 (5.4133) grad_norm 2.1927 (3.1819) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][260/625] eta 0:01:36 lr 0.000024 wd 0.0500 time 0.2530 (0.2646) data time 0.0011 (0.0031) model time 0.2518 (0.2603) loss 5.0952 (5.4159) grad_norm 1.9703 (3.1672) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][270/625] eta 0:01:33 lr 0.000024 wd 0.0500 time 0.2560 (0.2643) data time 0.0009 (0.0030) model time 0.2551 (0.2601) loss 5.0644 (5.4190) grad_norm 4.0236 (3.1720) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][280/625] eta 0:01:31 lr 0.000024 wd 0.0500 time 0.2572 (0.2640) data time 0.0008 (0.0029) model time 0.2564 (0.2599) loss 6.2891 (5.4184) grad_norm 2.8898 (3.1920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][290/625] eta 0:01:28 lr 0.000024 wd 0.0500 time 0.2627 (0.2639) data time 0.0007 (0.0028) model time 0.2620 (0.2598) loss 5.1502 (5.4113) grad_norm 3.2960 (3.1675) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][300/625] eta 0:01:25 lr 0.000024 wd 0.0500 time 0.2583 (0.2636) data time 0.0008 (0.0028) model time 0.2574 (0.2597) loss 5.2276 (5.4127) grad_norm 2.0592 (3.1567) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][310/625] eta 0:01:22 lr 0.000024 wd 0.0500 time 0.2556 (0.2634) data time 0.0008 (0.0027) model time 0.2547 (0.2595) loss 5.3084 (5.4123) grad_norm 3.8836 (3.1772) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][320/625] eta 0:01:20 lr 0.000023 wd 0.0500 time 0.2565 (0.2631) data time 0.0008 (0.0027) model time 0.2557 (0.2593) loss 5.4377 (5.4039) grad_norm 2.7344 (3.1614) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][330/625] eta 0:01:17 lr 0.000023 wd 0.0500 time 0.2564 (0.2633) data time 0.0007 (0.0026) model time 0.2557 (0.2597) loss 4.6372 (5.3999) grad_norm 3.8290 (3.1955) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][340/625] eta 0:01:14 lr 0.000023 wd 0.0500 time 0.2562 (0.2631) data time 0.0009 (0.0026) model time 0.2553 (0.2595) loss 4.8853 (5.3928) grad_norm 3.4483 (3.1845) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][350/625] eta 0:01:12 lr 0.000023 wd 0.0500 time 0.2561 (0.2635) data time 0.0006 (0.0025) model time 0.2555 (0.2600) loss 6.3520 (5.3888) grad_norm 2.2764 (3.1920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][360/625] eta 0:01:09 lr 0.000023 wd 0.0500 time 0.2552 (0.2633) data time 0.0008 (0.0025) model time 0.2544 (0.2599) loss 5.7178 (5.3919) grad_norm 3.4207 (3.1799) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][370/625] eta 0:01:07 lr 0.000023 wd 0.0500 time 0.2528 (0.2631) data time 0.0010 (0.0024) model time 0.2517 (0.2597) loss 4.9458 (5.3862) grad_norm 2.6118 (3.1774) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][380/625] eta 0:01:04 lr 0.000023 wd 0.0500 time 0.2526 (0.2633) data time 0.0009 (0.0024) model time 0.2517 (0.2600) loss 4.7425 (5.3767) grad_norm 2.9073 (3.1688) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][390/625] eta 0:01:01 lr 0.000023 wd 0.0500 time 0.2596 (0.2631) data time 0.0007 (0.0024) model time 0.2590 (0.2599) loss 5.0629 (5.3767) grad_norm 2.9871 (3.2036) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][400/625] eta 0:00:59 lr 0.000023 wd 0.0500 time 0.2537 (0.2637) data time 0.0006 (0.0023) model time 0.2530 (0.2606) loss 6.1073 (5.3748) grad_norm 3.2639 (3.2482) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][410/625] eta 0:00:56 lr 0.000023 wd 0.0500 time 0.2577 (0.2635) data time 0.0011 (0.0023) model time 0.2567 (0.2605) loss 4.8748 (5.3688) grad_norm 2.6490 (3.2304) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][420/625] eta 0:00:53 lr 0.000023 wd 0.0500 time 0.2563 (0.2634) data time 0.0012 (0.0023) model time 0.2552 (0.2604) loss 5.4378 (5.3710) grad_norm 2.0125 (3.2158) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][430/625] eta 0:00:51 lr 0.000023 wd 0.0500 time 0.2518 (0.2632) data time 0.0007 (0.0022) model time 0.2511 (0.2602) loss 5.1262 (5.3680) grad_norm 4.3486 (3.2151) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:18:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][440/625] eta 0:00:48 lr 0.000023 wd 0.0500 time 0.2598 (0.2635) data time 0.0007 (0.0022) model time 0.2591 (0.2606) loss 5.0735 (5.3679) grad_norm 2.1051 (3.2047) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][450/625] eta 0:00:46 lr 0.000023 wd 0.0500 time 0.2581 (0.2638) data time 0.0008 (0.0022) model time 0.2574 (0.2610) loss 5.7756 (5.3732) grad_norm 3.0095 (3.2020) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][460/625] eta 0:00:43 lr 0.000023 wd 0.0500 time 0.2588 (0.2636) data time 0.0006 (0.0021) model time 0.2582 (0.2609) loss 6.1975 (5.3800) grad_norm 2.4656 (3.1909) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][470/625] eta 0:00:40 lr 0.000023 wd 0.0500 time 0.2522 (0.2635) data time 0.0006 (0.0021) model time 0.2516 (0.2607) loss 4.6213 (5.3808) grad_norm 2.5034 (3.1780) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][480/625] eta 0:00:38 lr 0.000023 wd 0.0500 time 0.2551 (0.2633) data time 0.0006 (0.0021) model time 0.2545 (0.2606) loss 5.7968 (5.3773) grad_norm 1.8045 (3.1762) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][490/625] eta 0:00:35 lr 0.000023 wd 0.0500 time 0.2542 (0.2632) data time 0.0006 (0.0021) model time 0.2536 (0.2605) loss 4.7639 (5.3793) grad_norm 2.0332 (3.1720) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][500/625] eta 0:00:32 lr 0.000023 wd 0.0500 time 0.2544 (0.2630) data time 0.0009 (0.0020) model time 0.2535 (0.2604) loss 5.6744 (5.3828) grad_norm 3.3056 (3.1822) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][510/625] eta 0:00:30 lr 0.000023 wd 0.0500 time 0.2552 (0.2636) data time 0.0008 (0.0020) model time 0.2543 (0.2610) loss 4.5754 (5.3832) grad_norm 2.4816 (3.2659) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][520/625] eta 0:00:27 lr 0.000023 wd 0.0500 time 0.2554 (0.2634) data time 0.0008 (0.0020) model time 0.2545 (0.2608) loss 5.2838 (5.3783) grad_norm 3.1749 (3.2603) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][530/625] eta 0:00:25 lr 0.000023 wd 0.0500 time 0.2563 (0.2633) data time 0.0006 (0.0020) model time 0.2557 (0.2607) loss 5.8980 (5.3780) grad_norm 2.7710 (3.2424) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][540/625] eta 0:00:22 lr 0.000023 wd 0.0500 time 0.2555 (0.2631) data time 0.0010 (0.0020) model time 0.2545 (0.2606) loss 5.1026 (5.3781) grad_norm 3.1792 (3.2425) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][550/625] eta 0:00:19 lr 0.000023 wd 0.0500 time 0.2551 (0.2630) data time 0.0011 (0.0019) model time 0.2539 (0.2605) loss 4.7494 (5.3749) grad_norm 2.4832 (3.2434) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][560/625] eta 0:00:17 lr 0.000023 wd 0.0500 time 0.2554 (0.2629) data time 0.0008 (0.0019) model time 0.2546 (0.2604) loss 6.2701 (5.3789) grad_norm 3.5562 (3.2530) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][570/625] eta 0:00:14 lr 0.000023 wd 0.0500 time 0.2593 (0.2631) data time 0.0008 (0.0019) model time 0.2586 (0.2606) loss 5.9928 (5.3859) grad_norm 2.4553 (3.2420) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][580/625] eta 0:00:11 lr 0.000023 wd 0.0500 time 0.2577 (0.2629) data time 0.0011 (0.0019) model time 0.2566 (0.2605) loss 4.9754 (5.3814) grad_norm 3.7710 (3.2472) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][590/625] eta 0:00:09 lr 0.000023 wd 0.0500 time 0.2541 (0.2628) data time 0.0007 (0.0019) model time 0.2534 (0.2604) loss 5.6801 (5.3814) grad_norm 3.4598 (3.2481) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:19:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][600/625] eta 0:00:06 lr 0.000023 wd 0.0500 time 0.2597 (0.2627) data time 0.0010 (0.0019) model time 0.2587 (0.2603) loss 5.8726 (5.3816) grad_norm 4.1551 (inf) loss_scale 128.0000 (255.3611) mem 9655MB [2024-08-04 11:19:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][610/625] eta 0:00:03 lr 0.000023 wd 0.0500 time 0.2520 (0.2629) data time 0.0006 (0.0019) model time 0.2515 (0.2606) loss 5.0145 (5.3832) grad_norm 1.9785 (inf) loss_scale 128.0000 (253.2766) mem 9655MB [2024-08-04 11:19:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][620/625] eta 0:00:01 lr 0.000023 wd 0.0500 time 0.2544 (0.2628) data time 0.0005 (0.0018) model time 0.2539 (0.2604) loss 4.6560 (5.3802) grad_norm 1.9960 (inf) loss_scale 128.0000 (251.2593) mem 9655MB [2024-08-04 11:19:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 292 training takes 0:02:44 [2024-08-04 11:19:46 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:19:46 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:19:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.6006 (0.6006) Acc@1 90.430 (90.430) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8984 (0.7105) Acc@1 82.080 (87.322) Acc@5 96.729 (97.865) Mem 9655MB [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 0.9912 (0.8251) Acc@1 79.443 (84.380) Acc@5 95.654 (96.768) Mem 9655MB [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.061 Acc@5 96.791 [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.06% [2024-08-04 11:19:48 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saving...... [2024-08-04 11:19:49 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt.pth saved !!! [2024-08-04 11:19:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5918 (0.5918) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.094) Loss 0.8931 (0.7065) Acc@1 82.471 (87.318) Acc@5 96.680 (97.865) Mem 9655MB [2024-08-04 11:19:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.075) Loss 0.9976 (0.8229) Acc@1 78.857 (84.235) Acc@5 95.508 (96.742) Mem 9655MB [2024-08-04 11:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.893 Acc@5 96.765 [2024-08-04 11:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:19:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.89% [2024-08-04 11:19:51 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:19:51 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:19:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][0/625] eta 0:07:50 lr 0.000023 wd 0.0500 time 0.7521 (0.7521) data time 0.5103 (0.5103) model time 0.0000 (0.0000) loss 4.8101 (4.8101) grad_norm 3.3484 (3.3484) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:19:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][10/625] eta 0:03:23 lr 0.000023 wd 0.0500 time 0.3832 (0.3312) data time 0.0010 (0.0473) model time 0.0000 (0.0000) loss 5.4408 (5.3022) grad_norm 2.5833 (2.6619) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:19:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][20/625] eta 0:02:58 lr 0.000023 wd 0.0500 time 0.2555 (0.2954) data time 0.0007 (0.0252) model time 0.0000 (0.0000) loss 5.2917 (5.3480) grad_norm 2.2756 (3.0141) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][30/625] eta 0:02:48 lr 0.000023 wd 0.0500 time 0.2646 (0.2833) data time 0.0010 (0.0174) model time 0.0000 (0.0000) loss 4.6507 (5.3395) grad_norm 1.7412 (2.9662) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][40/625] eta 0:02:41 lr 0.000023 wd 0.0500 time 0.2569 (0.2769) data time 0.0007 (0.0134) model time 0.0000 (0.0000) loss 4.5385 (5.3681) grad_norm 2.9877 (2.9143) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][50/625] eta 0:02:40 lr 0.000023 wd 0.0500 time 0.2550 (0.2793) data time 0.0011 (0.0110) model time 0.0000 (0.0000) loss 5.2317 (5.3476) grad_norm 3.6432 (3.0001) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][60/625] eta 0:02:38 lr 0.000023 wd 0.0500 time 0.3774 (0.2802) data time 0.0007 (0.0093) model time 0.3767 (0.2837) loss 6.9707 (5.3648) grad_norm 18.8232 (3.2678) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][70/625] eta 0:02:33 lr 0.000023 wd 0.0500 time 0.2515 (0.2766) data time 0.0010 (0.0081) model time 0.2504 (0.2688) loss 5.5474 (5.4062) grad_norm 3.0811 (3.1932) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][80/625] eta 0:02:29 lr 0.000023 wd 0.0500 time 0.2509 (0.2740) data time 0.0008 (0.0072) model time 0.2501 (0.2639) loss 5.2203 (5.3855) grad_norm 2.8343 (3.1359) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][90/625] eta 0:02:25 lr 0.000023 wd 0.0500 time 0.2553 (0.2720) data time 0.0008 (0.0066) model time 0.2545 (0.2616) loss 5.7928 (5.3509) grad_norm 3.8361 (3.1579) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][100/625] eta 0:02:23 lr 0.000023 wd 0.0500 time 0.2543 (0.2741) data time 0.0009 (0.0060) model time 0.2534 (0.2679) loss 5.8195 (5.3687) grad_norm 2.1110 (3.1566) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][110/625] eta 0:02:21 lr 0.000023 wd 0.0500 time 0.2555 (0.2739) data time 0.0006 (0.0055) model time 0.2549 (0.2684) loss 5.6454 (5.3455) grad_norm 2.3822 (3.1367) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][120/625] eta 0:02:17 lr 0.000023 wd 0.0500 time 0.2575 (0.2727) data time 0.0007 (0.0052) model time 0.2567 (0.2670) loss 5.4975 (5.3578) grad_norm 3.6742 (3.1027) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][130/625] eta 0:02:14 lr 0.000023 wd 0.0500 time 0.2561 (0.2714) data time 0.0008 (0.0048) model time 0.2554 (0.2655) loss 5.1506 (5.3250) grad_norm 2.4938 (3.0932) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][140/625] eta 0:02:12 lr 0.000023 wd 0.0500 time 0.2562 (0.2730) data time 0.0011 (0.0046) model time 0.2551 (0.2686) loss 5.6947 (5.3293) grad_norm 1.8431 (3.0819) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][150/625] eta 0:02:10 lr 0.000023 wd 0.0500 time 0.2556 (0.2753) data time 0.0010 (0.0043) model time 0.2547 (0.2724) loss 6.3374 (5.3222) grad_norm 2.4473 (3.0518) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][160/625] eta 0:02:07 lr 0.000023 wd 0.0500 time 0.2567 (0.2741) data time 0.0007 (0.0041) model time 0.2560 (0.2708) loss 6.1943 (5.3343) grad_norm 4.2858 (3.1581) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][170/625] eta 0:02:04 lr 0.000023 wd 0.0500 time 0.2564 (0.2731) data time 0.0008 (0.0039) model time 0.2556 (0.2695) loss 5.1635 (5.3356) grad_norm 1.7739 (3.1195) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][180/625] eta 0:02:01 lr 0.000023 wd 0.0500 time 0.2568 (0.2730) data time 0.0007 (0.0038) model time 0.2561 (0.2695) loss 5.1926 (5.3487) grad_norm 5.3735 (3.1275) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][190/625] eta 0:01:58 lr 0.000023 wd 0.0500 time 0.2513 (0.2720) data time 0.0010 (0.0036) model time 0.2503 (0.2684) loss 4.9762 (5.3485) grad_norm 2.4649 (3.0940) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][200/625] eta 0:01:55 lr 0.000023 wd 0.0500 time 0.2516 (0.2712) data time 0.0008 (0.0035) model time 0.2508 (0.2676) loss 5.7706 (5.3563) grad_norm 11.7537 (3.1889) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][210/625] eta 0:01:52 lr 0.000023 wd 0.0500 time 0.2528 (0.2705) data time 0.0008 (0.0034) model time 0.2520 (0.2667) loss 6.3519 (5.3487) grad_norm 3.1961 (3.1705) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][220/625] eta 0:01:49 lr 0.000023 wd 0.0500 time 0.2576 (0.2699) data time 0.0006 (0.0033) model time 0.2570 (0.2661) loss 5.9613 (5.3508) grad_norm 2.7253 (3.1354) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][230/625] eta 0:01:46 lr 0.000023 wd 0.0500 time 0.2565 (0.2693) data time 0.0010 (0.0032) model time 0.2555 (0.2656) loss 6.1711 (5.3458) grad_norm 2.3163 (3.1340) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][240/625] eta 0:01:43 lr 0.000023 wd 0.0500 time 0.2550 (0.2687) data time 0.0008 (0.0031) model time 0.2542 (0.2650) loss 5.0638 (5.3540) grad_norm 3.5524 (3.1005) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:20:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][250/625] eta 0:01:40 lr 0.000023 wd 0.0500 time 0.2494 (0.2682) data time 0.0009 (0.0030) model time 0.2485 (0.2645) loss 5.4034 (5.3546) grad_norm 2.7376 (3.0979) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][260/625] eta 0:01:38 lr 0.000023 wd 0.0500 time 0.2537 (0.2685) data time 0.0009 (0.0029) model time 0.2528 (0.2649) loss 4.6511 (5.3523) grad_norm 2.0930 (3.0695) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][270/625] eta 0:01:35 lr 0.000023 wd 0.0500 time 0.2623 (0.2681) data time 0.0009 (0.0028) model time 0.2614 (0.2646) loss 6.1895 (5.3587) grad_norm 2.5096 (3.0521) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][280/625] eta 0:01:32 lr 0.000023 wd 0.0500 time 0.2532 (0.2677) data time 0.0013 (0.0028) model time 0.2519 (0.2642) loss 5.6251 (5.3534) grad_norm 1.9793 (3.0514) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][290/625] eta 0:01:29 lr 0.000023 wd 0.0500 time 0.2553 (0.2673) data time 0.0010 (0.0027) model time 0.2542 (0.2638) loss 5.6579 (5.3518) grad_norm 2.1170 (3.0471) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][300/625] eta 0:01:26 lr 0.000023 wd 0.0500 time 0.2535 (0.2669) data time 0.0012 (0.0026) model time 0.2523 (0.2634) loss 5.9808 (5.3520) grad_norm 1.9929 (3.0491) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][310/625] eta 0:01:23 lr 0.000023 wd 0.0500 time 0.2573 (0.2665) data time 0.0008 (0.0026) model time 0.2565 (0.2631) loss 4.3934 (5.3558) grad_norm 2.9472 (3.0322) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][320/625] eta 0:01:21 lr 0.000023 wd 0.0500 time 0.2552 (0.2662) data time 0.0008 (0.0025) model time 0.2543 (0.2628) loss 4.6226 (5.3500) grad_norm 3.4295 (3.0220) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][330/625] eta 0:01:18 lr 0.000023 wd 0.0500 time 0.2548 (0.2659) data time 0.0011 (0.0025) model time 0.2537 (0.2625) loss 5.5086 (5.3593) grad_norm 2.7080 (3.0115) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][340/625] eta 0:01:15 lr 0.000023 wd 0.0500 time 0.2533 (0.2662) data time 0.0007 (0.0024) model time 0.2526 (0.2630) loss 4.2465 (5.3453) grad_norm 4.0148 (3.0095) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][350/625] eta 0:01:13 lr 0.000023 wd 0.0500 time 0.2532 (0.2659) data time 0.0010 (0.0024) model time 0.2522 (0.2627) loss 5.4000 (5.3473) grad_norm 2.6017 (2.9938) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][360/625] eta 0:01:10 lr 0.000023 wd 0.0500 time 0.2568 (0.2657) data time 0.0010 (0.0024) model time 0.2558 (0.2625) loss 4.5632 (5.3495) grad_norm 3.0186 (2.9877) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][370/625] eta 0:01:07 lr 0.000023 wd 0.0500 time 0.2535 (0.2659) data time 0.0008 (0.0023) model time 0.2527 (0.2628) loss 4.5775 (5.3473) grad_norm 2.3190 (2.9823) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][380/625] eta 0:01:05 lr 0.000023 wd 0.0500 time 0.2768 (0.2660) data time 0.0006 (0.0023) model time 0.2762 (0.2630) loss 4.8165 (5.3481) grad_norm 2.0228 (2.9782) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][390/625] eta 0:01:02 lr 0.000023 wd 0.0500 time 0.2554 (0.2660) data time 0.0008 (0.0023) model time 0.2546 (0.2631) loss 5.3520 (5.3520) grad_norm 2.6143 (3.0168) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][400/625] eta 0:00:59 lr 0.000023 wd 0.0500 time 0.2563 (0.2658) data time 0.0007 (0.0022) model time 0.2556 (0.2629) loss 5.0298 (5.3517) grad_norm 2.1525 (3.0161) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][410/625] eta 0:00:57 lr 0.000023 wd 0.0500 time 0.2490 (0.2655) data time 0.0010 (0.0022) model time 0.2480 (0.2627) loss 5.3986 (5.3496) grad_norm 2.4073 (3.0170) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][420/625] eta 0:00:54 lr 0.000022 wd 0.0500 time 0.2559 (0.2653) data time 0.0009 (0.0022) model time 0.2550 (0.2624) loss 5.7567 (5.3469) grad_norm 3.9645 (3.0450) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][430/625] eta 0:00:51 lr 0.000022 wd 0.0500 time 0.2584 (0.2652) data time 0.0008 (0.0021) model time 0.2577 (0.2623) loss 5.8635 (5.3448) grad_norm 1.9869 (3.0443) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][440/625] eta 0:00:49 lr 0.000022 wd 0.0500 time 0.2592 (0.2650) data time 0.0006 (0.0021) model time 0.2586 (0.2621) loss 4.5614 (5.3494) grad_norm 2.9198 (3.0720) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][450/625] eta 0:00:46 lr 0.000022 wd 0.0500 time 0.2491 (0.2648) data time 0.0010 (0.0021) model time 0.2482 (0.2620) loss 5.5167 (5.3515) grad_norm 3.4785 (3.0821) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][460/625] eta 0:00:43 lr 0.000022 wd 0.0500 time 0.2621 (0.2649) data time 0.0008 (0.0021) model time 0.2613 (0.2622) loss 3.8992 (5.3480) grad_norm 1.8157 (3.0751) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][470/625] eta 0:00:41 lr 0.000022 wd 0.0500 time 0.2580 (0.2648) data time 0.0007 (0.0020) model time 0.2572 (0.2621) loss 5.2140 (5.3508) grad_norm 4.0088 (3.0702) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:21:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][480/625] eta 0:00:38 lr 0.000022 wd 0.0500 time 0.2556 (0.2649) data time 0.0009 (0.0020) model time 0.2548 (0.2623) loss 6.4081 (5.3512) grad_norm 2.4116 (3.0645) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][490/625] eta 0:00:35 lr 0.000022 wd 0.0500 time 0.2531 (0.2647) data time 0.0011 (0.0020) model time 0.2520 (0.2621) loss 5.7626 (5.3525) grad_norm 2.1809 (3.0622) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][500/625] eta 0:00:33 lr 0.000022 wd 0.0500 time 0.2533 (0.2646) data time 0.0007 (0.0020) model time 0.2526 (0.2620) loss 5.8546 (5.3548) grad_norm 1.8490 (3.0479) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][510/625] eta 0:00:30 lr 0.000022 wd 0.0500 time 0.2598 (0.2644) data time 0.0006 (0.0019) model time 0.2592 (0.2618) loss 5.8046 (5.3565) grad_norm 4.0106 (3.0418) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][520/625] eta 0:00:27 lr 0.000022 wd 0.0500 time 0.2539 (0.2643) data time 0.0011 (0.0019) model time 0.2527 (0.2617) loss 5.8451 (5.3606) grad_norm 1.7344 (3.0401) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][530/625] eta 0:00:25 lr 0.000022 wd 0.0500 time 0.2536 (0.2641) data time 0.0007 (0.0019) model time 0.2529 (0.2616) loss 5.3954 (5.3560) grad_norm 2.8822 (3.0393) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][540/625] eta 0:00:22 lr 0.000022 wd 0.0500 time 0.2554 (0.2640) data time 0.0009 (0.0019) model time 0.2545 (0.2615) loss 5.6289 (5.3538) grad_norm 2.0152 (3.0426) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][550/625] eta 0:00:19 lr 0.000022 wd 0.0500 time 0.2545 (0.2639) data time 0.0008 (0.0019) model time 0.2537 (0.2613) loss 4.4637 (5.3491) grad_norm 3.5522 (3.0984) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][560/625] eta 0:00:17 lr 0.000022 wd 0.0500 time 0.2537 (0.2641) data time 0.0009 (0.0019) model time 0.2528 (0.2616) loss 4.7075 (5.3453) grad_norm 3.4679 (3.1053) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][570/625] eta 0:00:14 lr 0.000022 wd 0.0500 time 0.2552 (0.2639) data time 0.0007 (0.0018) model time 0.2545 (0.2615) loss 4.6624 (5.3430) grad_norm 2.4089 (3.0994) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][580/625] eta 0:00:11 lr 0.000022 wd 0.0500 time 0.2581 (0.2638) data time 0.0006 (0.0018) model time 0.2575 (0.2614) loss 4.8231 (5.3432) grad_norm 2.6760 (3.0973) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][590/625] eta 0:00:09 lr 0.000022 wd 0.0500 time 0.2522 (0.2636) data time 0.0008 (0.0018) model time 0.2514 (0.2612) loss 5.7142 (5.3446) grad_norm 2.2316 (3.0991) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][600/625] eta 0:00:06 lr 0.000022 wd 0.0500 time 0.2551 (0.2638) data time 0.0009 (0.0018) model time 0.2542 (0.2615) loss 5.2450 (5.3413) grad_norm 2.6297 (3.0961) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][610/625] eta 0:00:03 lr 0.000022 wd 0.0500 time 0.2516 (0.2637) data time 0.0006 (0.0018) model time 0.2510 (0.2613) loss 5.2354 (5.3365) grad_norm 2.6589 (3.1020) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][620/625] eta 0:00:01 lr 0.000022 wd 0.0500 time 0.2531 (0.2635) data time 0.0004 (0.0018) model time 0.2527 (0.2612) loss 4.6742 (5.3385) grad_norm 2.6911 (3.1095) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 293 training takes 0:02:44 [2024-08-04 11:22:36 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:22:36 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.6055 (0.6055) Acc@1 90.186 (90.186) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 11:22:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.098) Loss 0.8979 (0.7130) Acc@1 82.129 (87.300) Acc@5 96.875 (97.909) Mem 9655MB [2024-08-04 11:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9995 (0.8285) Acc@1 78.955 (84.301) Acc@5 95.557 (96.794) Mem 9655MB [2024-08-04 11:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.979 Acc@5 96.817 [2024-08-04 11:22:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:22:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.693 (0.693) Loss 0.5923 (0.5923) Acc@1 90.283 (90.283) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.121) Loss 0.8926 (0.7068) Acc@1 82.520 (87.327) Acc@5 96.631 (97.865) Mem 9655MB [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.089) Loss 0.9976 (0.8231) Acc@1 78.857 (84.254) Acc@5 95.459 (96.749) Mem 9655MB [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.911 Acc@5 96.769 [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-04 11:22:40 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:22:41 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:22:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][0/625] eta 0:07:08 lr 0.000022 wd 0.0500 time 0.6857 (0.6857) data time 0.4353 (0.4353) model time 0.0000 (0.0000) loss 5.0730 (5.0730) grad_norm 2.7447 (2.7447) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][10/625] eta 0:03:08 lr 0.000022 wd 0.0500 time 0.2512 (0.3072) data time 0.0010 (0.0404) model time 0.0000 (0.0000) loss 5.3321 (5.2440) grad_norm 2.5990 (2.6008) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][20/625] eta 0:02:51 lr 0.000022 wd 0.0500 time 0.2554 (0.2831) data time 0.0010 (0.0216) model time 0.0000 (0.0000) loss 5.3382 (5.2713) grad_norm 2.7854 (3.4069) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][30/625] eta 0:02:46 lr 0.000022 wd 0.0500 time 0.2573 (0.2802) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 5.5228 (5.3361) grad_norm 2.3389 (3.0926) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][40/625] eta 0:02:43 lr 0.000022 wd 0.0500 time 0.2556 (0.2786) data time 0.0007 (0.0115) model time 0.0000 (0.0000) loss 4.9619 (5.3273) grad_norm 1.5735 (3.1120) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][50/625] eta 0:02:37 lr 0.000022 wd 0.0500 time 0.2570 (0.2742) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 5.7245 (5.3893) grad_norm 2.1738 (3.1345) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:22:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][60/625] eta 0:02:33 lr 0.000022 wd 0.0500 time 0.2513 (0.2711) data time 0.0009 (0.0081) model time 0.2504 (0.2541) loss 4.8431 (5.3539) grad_norm 3.4643 (3.2856) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][70/625] eta 0:02:30 lr 0.000022 wd 0.0500 time 0.2531 (0.2715) data time 0.0011 (0.0071) model time 0.2521 (0.2635) loss 6.3658 (5.3797) grad_norm 3.3632 (3.9081) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][80/625] eta 0:02:28 lr 0.000022 wd 0.0500 time 0.2564 (0.2721) data time 0.0006 (0.0063) model time 0.2557 (0.2675) loss 4.2956 (5.3726) grad_norm 2.4066 (4.3027) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][90/625] eta 0:02:24 lr 0.000022 wd 0.0500 time 0.2562 (0.2704) data time 0.0010 (0.0058) model time 0.2552 (0.2645) loss 5.1675 (5.3872) grad_norm 2.4650 (4.5495) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][100/625] eta 0:02:21 lr 0.000022 wd 0.0500 time 0.2558 (0.2690) data time 0.0016 (0.0053) model time 0.2542 (0.2626) loss 6.6499 (5.4134) grad_norm 2.6282 (4.3738) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][110/625] eta 0:02:17 lr 0.000022 wd 0.0500 time 0.2539 (0.2677) data time 0.0007 (0.0049) model time 0.2532 (0.2612) loss 6.0691 (5.4149) grad_norm 2.1484 (4.2358) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:13 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][120/625] eta 0:02:15 lr 0.000022 wd 0.0500 time 0.2562 (0.2682) data time 0.0013 (0.0046) model time 0.2549 (0.2628) loss 4.5350 (5.4088) grad_norm 2.3617 (4.1368) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][130/625] eta 0:02:12 lr 0.000022 wd 0.0500 time 0.2531 (0.2674) data time 0.0008 (0.0043) model time 0.2523 (0.2620) loss 5.4146 (5.4035) grad_norm 1.9327 (4.0139) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][140/625] eta 0:02:10 lr 0.000022 wd 0.0500 time 0.2576 (0.2681) data time 0.0010 (0.0041) model time 0.2566 (0.2636) loss 5.8040 (5.4170) grad_norm 1.9399 (3.9455) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][150/625] eta 0:02:06 lr 0.000022 wd 0.0500 time 0.2551 (0.2673) data time 0.0006 (0.0039) model time 0.2544 (0.2627) loss 6.0808 (5.4192) grad_norm 2.1465 (3.8781) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][160/625] eta 0:02:03 lr 0.000022 wd 0.0500 time 0.2593 (0.2666) data time 0.0006 (0.0037) model time 0.2587 (0.2622) loss 4.9066 (5.3957) grad_norm 3.0476 (3.8060) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][170/625] eta 0:02:01 lr 0.000022 wd 0.0500 time 0.2589 (0.2661) data time 0.0008 (0.0035) model time 0.2582 (0.2617) loss 5.2433 (5.3867) grad_norm 2.1877 (3.8356) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][180/625] eta 0:01:58 lr 0.000022 wd 0.0500 time 0.2572 (0.2663) data time 0.0010 (0.0034) model time 0.2562 (0.2622) loss 5.0280 (5.3901) grad_norm 2.5274 (3.7685) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][190/625] eta 0:01:55 lr 0.000022 wd 0.0500 time 0.2572 (0.2657) data time 0.0007 (0.0032) model time 0.2564 (0.2617) loss 5.7287 (5.3972) grad_norm 2.6252 (3.7487) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][200/625] eta 0:01:52 lr 0.000022 wd 0.0500 time 0.2671 (0.2654) data time 0.0010 (0.0031) model time 0.2660 (0.2614) loss 5.6741 (5.4041) grad_norm 2.6861 (3.6977) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][210/625] eta 0:01:50 lr 0.000022 wd 0.0500 time 0.2523 (0.2659) data time 0.0007 (0.0030) model time 0.2516 (0.2623) loss 5.6229 (5.4167) grad_norm 1.8363 (3.6700) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][220/625] eta 0:01:48 lr 0.000022 wd 0.0500 time 0.4639 (0.2673) data time 0.0008 (0.0029) model time 0.4631 (0.2642) loss 5.2107 (5.4068) grad_norm 3.1686 (3.6580) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][230/625] eta 0:01:45 lr 0.000022 wd 0.0500 time 0.2566 (0.2668) data time 0.0009 (0.0028) model time 0.2557 (0.2637) loss 5.0695 (5.3842) grad_norm 3.4533 (3.6408) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][240/625] eta 0:01:42 lr 0.000022 wd 0.0500 time 0.2561 (0.2664) data time 0.0009 (0.0028) model time 0.2553 (0.2633) loss 4.7503 (5.3808) grad_norm 2.0873 (3.5892) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][250/625] eta 0:01:39 lr 0.000022 wd 0.0500 time 0.2525 (0.2659) data time 0.0006 (0.0027) model time 0.2519 (0.2629) loss 5.7240 (5.3839) grad_norm 2.6465 (3.6981) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][260/625] eta 0:01:36 lr 0.000022 wd 0.0500 time 0.2527 (0.2655) data time 0.0009 (0.0026) model time 0.2517 (0.2625) loss 4.5160 (5.3718) grad_norm 3.5284 (3.6834) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][270/625] eta 0:01:34 lr 0.000022 wd 0.0500 time 0.2516 (0.2658) data time 0.0010 (0.0026) model time 0.2506 (0.2630) loss 5.5214 (5.3775) grad_norm 2.9481 (3.6677) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][280/625] eta 0:01:31 lr 0.000022 wd 0.0500 time 0.2549 (0.2655) data time 0.0006 (0.0025) model time 0.2542 (0.2626) loss 5.7191 (5.3791) grad_norm 4.9651 (3.6454) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:23:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][290/625] eta 0:01:29 lr 0.000022 wd 0.0500 time 0.2584 (0.2659) data time 0.0008 (0.0024) model time 0.2576 (0.2632) loss 4.5127 (5.3742) grad_norm 2.5286 (3.6218) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][300/625] eta 0:01:26 lr 0.000022 wd 0.0500 time 0.2517 (0.2662) data time 0.0008 (0.0024) model time 0.2509 (0.2636) loss 5.3047 (5.3717) grad_norm 2.9864 (3.6189) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][310/625] eta 0:01:23 lr 0.000022 wd 0.0500 time 0.2560 (0.2665) data time 0.0009 (0.0023) model time 0.2551 (0.2640) loss 4.6767 (5.3744) grad_norm 2.6500 (3.6279) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][320/625] eta 0:01:21 lr 0.000022 wd 0.0500 time 0.2595 (0.2662) data time 0.0008 (0.0023) model time 0.2587 (0.2637) loss 6.1000 (5.3748) grad_norm 3.7064 (3.6014) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][330/625] eta 0:01:18 lr 0.000022 wd 0.0500 time 0.4368 (0.2664) data time 0.0009 (0.0023) model time 0.4360 (0.2641) loss 5.7703 (5.3813) grad_norm 2.4429 (3.6355) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][340/625] eta 0:01:15 lr 0.000022 wd 0.0500 time 0.2623 (0.2662) data time 0.0007 (0.0022) model time 0.2616 (0.2639) loss 5.8000 (5.3832) grad_norm 2.3039 (3.6312) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][350/625] eta 0:01:13 lr 0.000022 wd 0.0500 time 0.2553 (0.2659) data time 0.0008 (0.0022) model time 0.2545 (0.2636) loss 6.0485 (5.3918) grad_norm 2.1221 (3.5949) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][360/625] eta 0:01:10 lr 0.000022 wd 0.0500 time 0.4693 (0.2668) data time 0.0008 (0.0021) model time 0.4686 (0.2646) loss 5.6557 (5.3942) grad_norm 2.0554 (3.5720) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][370/625] eta 0:01:08 lr 0.000022 wd 0.0500 time 0.2552 (0.2670) data time 0.0008 (0.0021) model time 0.2544 (0.2649) loss 5.5873 (5.3977) grad_norm 4.8929 (3.5794) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][380/625] eta 0:01:05 lr 0.000022 wd 0.0500 time 0.2576 (0.2667) data time 0.0007 (0.0021) model time 0.2569 (0.2646) loss 5.7869 (5.3982) grad_norm 3.5462 (3.5821) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][390/625] eta 0:01:02 lr 0.000022 wd 0.0500 time 0.2532 (0.2668) data time 0.0010 (0.0021) model time 0.2522 (0.2647) loss 4.9300 (5.3960) grad_norm 2.0052 (3.5479) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][400/625] eta 0:00:59 lr 0.000022 wd 0.0500 time 0.2577 (0.2665) data time 0.0008 (0.0020) model time 0.2569 (0.2645) loss 4.7374 (5.3911) grad_norm 3.1988 (3.5216) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][410/625] eta 0:00:57 lr 0.000022 wd 0.0500 time 0.2546 (0.2663) data time 0.0009 (0.0020) model time 0.2537 (0.2642) loss 6.5252 (5.3944) grad_norm 3.7379 (3.5088) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][420/625] eta 0:00:54 lr 0.000022 wd 0.0500 time 0.2554 (0.2670) data time 0.0008 (0.0020) model time 0.2546 (0.2650) loss 6.3132 (5.3980) grad_norm 2.6520 (3.5130) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][430/625] eta 0:00:52 lr 0.000022 wd 0.0500 time 0.2537 (0.2667) data time 0.0015 (0.0020) model time 0.2521 (0.2648) loss 5.6711 (5.3955) grad_norm 2.7035 (3.5002) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][440/625] eta 0:00:49 lr 0.000022 wd 0.0500 time 0.2572 (0.2669) data time 0.0006 (0.0019) model time 0.2566 (0.2650) loss 6.4754 (5.3994) grad_norm 3.9813 (3.5141) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][450/625] eta 0:00:46 lr 0.000022 wd 0.0500 time 0.2569 (0.2667) data time 0.0010 (0.0019) model time 0.2559 (0.2648) loss 5.8859 (5.4044) grad_norm 2.4835 (3.5070) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][460/625] eta 0:00:43 lr 0.000022 wd 0.0500 time 0.2639 (0.2665) data time 0.0008 (0.0019) model time 0.2631 (0.2646) loss 5.1809 (5.4048) grad_norm 3.6790 (3.5178) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][470/625] eta 0:00:41 lr 0.000022 wd 0.0500 time 0.2558 (0.2663) data time 0.0008 (0.0019) model time 0.2550 (0.2644) loss 5.0830 (5.3979) grad_norm 2.7519 (3.5081) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][480/625] eta 0:00:38 lr 0.000022 wd 0.0500 time 0.2558 (0.2661) data time 0.0008 (0.0018) model time 0.2550 (0.2642) loss 6.1033 (5.3956) grad_norm 2.6904 (3.5071) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][490/625] eta 0:00:35 lr 0.000022 wd 0.0500 time 0.2527 (0.2659) data time 0.0010 (0.0018) model time 0.2517 (0.2639) loss 4.6272 (5.3948) grad_norm 3.7737 (3.5088) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][500/625] eta 0:00:33 lr 0.000022 wd 0.0500 time 0.2617 (0.2657) data time 0.0006 (0.0018) model time 0.2611 (0.2638) loss 4.4423 (5.3941) grad_norm 1.9939 (3.4915) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][510/625] eta 0:00:30 lr 0.000022 wd 0.0500 time 0.2577 (0.2659) data time 0.0008 (0.0018) model time 0.2569 (0.2640) loss 6.5081 (5.3946) grad_norm 3.7366 (3.4871) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:24:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][520/625] eta 0:00:27 lr 0.000022 wd 0.0500 time 0.2546 (0.2660) data time 0.0010 (0.0018) model time 0.2536 (0.2641) loss 5.5094 (5.3977) grad_norm 2.2624 (3.4708) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][530/625] eta 0:00:25 lr 0.000022 wd 0.0500 time 0.2602 (0.2658) data time 0.0008 (0.0018) model time 0.2594 (0.2639) loss 6.1718 (5.4007) grad_norm 1.7568 (3.4815) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][540/625] eta 0:00:22 lr 0.000022 wd 0.0500 time 0.2618 (0.2656) data time 0.0006 (0.0017) model time 0.2612 (0.2638) loss 4.8558 (5.4032) grad_norm 3.4544 (3.4636) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][550/625] eta 0:00:19 lr 0.000022 wd 0.0500 time 0.2579 (0.2654) data time 0.0010 (0.0017) model time 0.2568 (0.2636) loss 4.9848 (5.4068) grad_norm 2.3836 (3.4513) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][560/625] eta 0:00:17 lr 0.000022 wd 0.0500 time 0.2591 (0.2656) data time 0.0009 (0.0017) model time 0.2583 (0.2638) loss 5.5896 (5.4026) grad_norm 2.2511 (3.4296) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][570/625] eta 0:00:14 lr 0.000022 wd 0.0500 time 0.2563 (0.2654) data time 0.0007 (0.0017) model time 0.2556 (0.2636) loss 6.2908 (5.4025) grad_norm 2.2434 (3.4805) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][580/625] eta 0:00:11 lr 0.000022 wd 0.0500 time 0.2555 (0.2653) data time 0.0009 (0.0017) model time 0.2547 (0.2635) loss 5.7249 (5.4024) grad_norm 3.4682 (3.4828) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][590/625] eta 0:00:09 lr 0.000022 wd 0.0500 time 0.2529 (0.2651) data time 0.0012 (0.0017) model time 0.2517 (0.2633) loss 5.4250 (5.3973) grad_norm 1.7240 (3.4994) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][600/625] eta 0:00:06 lr 0.000022 wd 0.0500 time 0.2568 (0.2650) data time 0.0010 (0.0017) model time 0.2558 (0.2632) loss 5.4218 (5.3949) grad_norm 2.0063 (3.4879) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][610/625] eta 0:00:03 lr 0.000022 wd 0.0500 time 0.2525 (0.2648) data time 0.0006 (0.0017) model time 0.2519 (0.2630) loss 5.5631 (5.3951) grad_norm 3.3945 (3.4836) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][620/625] eta 0:00:01 lr 0.000022 wd 0.0500 time 0.2541 (0.2646) data time 0.0005 (0.0017) model time 0.2535 (0.2628) loss 6.5984 (5.3936) grad_norm 6.0502 (3.4874) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:26 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 294 training takes 0:02:45 [2024-08-04 11:25:26 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:25:27 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:25:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.619 (0.619) Loss 0.6001 (0.6001) Acc@1 90.479 (90.479) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 11:25:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.108) Loss 0.9058 (0.7132) Acc@1 81.689 (87.300) Acc@5 96.777 (97.909) Mem 9655MB [2024-08-04 11:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.083) Loss 0.9971 (0.8292) Acc@1 79.395 (84.333) Acc@5 95.752 (96.826) Mem 9655MB [2024-08-04 11:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.029 Acc@5 96.847 [2024-08-04 11:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:25:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.728 (0.728) Loss 0.5923 (0.5923) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:25:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.126) Loss 0.8921 (0.7065) Acc@1 82.471 (87.314) Acc@5 96.631 (97.865) Mem 9655MB [2024-08-04 11:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.092) Loss 0.9976 (0.8228) Acc@1 78.955 (84.259) Acc@5 95.459 (96.742) Mem 9655MB [2024-08-04 11:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.911 Acc@5 96.761 [2024-08-04 11:25:31 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:25:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][0/625] eta 0:11:21 lr 0.000022 wd 0.0500 time 1.0900 (1.0900) data time 0.7247 (0.7247) model time 0.0000 (0.0000) loss 4.6965 (4.6965) grad_norm 2.0809 (2.0809) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][10/625] eta 0:03:24 lr 0.000022 wd 0.0500 time 0.2588 (0.3325) data time 0.0006 (0.0667) model time 0.0000 (0.0000) loss 5.8135 (5.4880) grad_norm 1.8535 (3.2168) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][20/625] eta 0:02:58 lr 0.000022 wd 0.0500 time 0.2540 (0.2958) data time 0.0009 (0.0354) model time 0.0000 (0.0000) loss 4.5744 (5.2436) grad_norm 14.9333 (3.8608) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][30/625] eta 0:02:48 lr 0.000022 wd 0.0500 time 0.2549 (0.2828) data time 0.0011 (0.0243) model time 0.0000 (0.0000) loss 5.4418 (5.3422) grad_norm 5.2663 (3.8930) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:42 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][40/625] eta 0:02:41 lr 0.000022 wd 0.0500 time 0.2583 (0.2763) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 5.3983 (5.3882) grad_norm 2.8197 (3.7942) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][50/625] eta 0:02:38 lr 0.000022 wd 0.0500 time 0.2573 (0.2758) data time 0.0006 (0.0152) model time 0.0000 (0.0000) loss 4.7806 (5.3645) grad_norm 3.8478 (3.7205) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][60/625] eta 0:02:33 lr 0.000021 wd 0.0500 time 0.2561 (0.2725) data time 0.0011 (0.0129) model time 0.2551 (0.2546) loss 5.5009 (5.3471) grad_norm 1.7838 (3.5713) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][70/625] eta 0:02:29 lr 0.000021 wd 0.0500 time 0.2531 (0.2701) data time 0.0011 (0.0112) model time 0.2521 (0.2546) loss 5.5044 (5.3392) grad_norm 3.2241 (3.4726) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][80/625] eta 0:02:26 lr 0.000021 wd 0.0500 time 0.2612 (0.2684) data time 0.0010 (0.0099) model time 0.2602 (0.2548) loss 5.7834 (5.3363) grad_norm 2.6257 (3.3909) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][90/625] eta 0:02:22 lr 0.000021 wd 0.0500 time 0.2566 (0.2670) data time 0.0008 (0.0089) model time 0.2558 (0.2549) loss 5.6323 (5.3420) grad_norm 2.4846 (3.3323) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:25:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][100/625] eta 0:02:19 lr 0.000021 wd 0.0500 time 0.2576 (0.2660) data time 0.0008 (0.0081) model time 0.2568 (0.2551) loss 4.7071 (5.3486) grad_norm 4.0984 (3.3291) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][110/625] eta 0:02:17 lr 0.000021 wd 0.0500 time 0.2584 (0.2678) data time 0.0008 (0.0075) model time 0.2576 (0.2601) loss 5.3029 (5.3529) grad_norm 3.4430 (3.3473) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][120/625] eta 0:02:14 lr 0.000021 wd 0.0500 time 0.2534 (0.2668) data time 0.0009 (0.0069) model time 0.2525 (0.2593) loss 6.3170 (5.3590) grad_norm 2.6757 (3.3114) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][130/625] eta 0:02:11 lr 0.000021 wd 0.0500 time 0.2567 (0.2660) data time 0.0009 (0.0065) model time 0.2558 (0.2588) loss 6.1333 (5.3671) grad_norm 2.9937 (3.3039) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][140/625] eta 0:02:08 lr 0.000021 wd 0.0500 time 0.2596 (0.2653) data time 0.0008 (0.0061) model time 0.2588 (0.2584) loss 4.7538 (5.3842) grad_norm 2.3985 (3.4347) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][150/625] eta 0:02:06 lr 0.000021 wd 0.0500 time 0.2591 (0.2659) data time 0.0009 (0.0058) model time 0.2582 (0.2600) loss 4.6573 (5.3719) grad_norm 2.6523 (3.3857) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][160/625] eta 0:02:03 lr 0.000021 wd 0.0500 time 0.2566 (0.2666) data time 0.0009 (0.0055) model time 0.2557 (0.2614) loss 5.8674 (5.3858) grad_norm 7.2646 (3.3899) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][170/625] eta 0:02:01 lr 0.000021 wd 0.0500 time 0.2550 (0.2660) data time 0.0006 (0.0052) model time 0.2544 (0.2609) loss 5.4967 (5.3817) grad_norm 2.0936 (3.3696) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][180/625] eta 0:01:58 lr 0.000021 wd 0.0500 time 0.2578 (0.2655) data time 0.0012 (0.0050) model time 0.2567 (0.2604) loss 5.0710 (5.3751) grad_norm 3.3206 (3.3868) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][190/625] eta 0:01:55 lr 0.000021 wd 0.0500 time 0.2606 (0.2651) data time 0.0006 (0.0048) model time 0.2600 (0.2602) loss 5.0887 (5.3748) grad_norm 1.8137 (3.3521) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][200/625] eta 0:01:52 lr 0.000021 wd 0.0500 time 0.2626 (0.2655) data time 0.0011 (0.0046) model time 0.2615 (0.2611) loss 5.7683 (5.3799) grad_norm 3.0562 (3.3570) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][210/625] eta 0:01:50 lr 0.000021 wd 0.0500 time 0.2572 (0.2657) data time 0.0006 (0.0044) model time 0.2566 (0.2616) loss 5.6214 (5.3807) grad_norm 3.4006 (3.3340) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][220/625] eta 0:01:47 lr 0.000021 wd 0.0500 time 0.2595 (0.2653) data time 0.0008 (0.0042) model time 0.2588 (0.2612) loss 5.1978 (5.3695) grad_norm 2.1617 (3.3318) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][230/625] eta 0:01:44 lr 0.000021 wd 0.0500 time 0.2563 (0.2650) data time 0.0007 (0.0041) model time 0.2556 (0.2609) loss 5.9332 (5.3756) grad_norm 2.9418 (3.3213) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][240/625] eta 0:01:41 lr 0.000021 wd 0.0500 time 0.2586 (0.2646) data time 0.0008 (0.0040) model time 0.2579 (0.2607) loss 6.2844 (5.3859) grad_norm 2.8756 (3.3088) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][250/625] eta 0:01:39 lr 0.000021 wd 0.0500 time 0.2579 (0.2643) data time 0.0006 (0.0039) model time 0.2574 (0.2604) loss 4.9459 (5.3731) grad_norm 1.9658 (3.3205) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][260/625] eta 0:01:36 lr 0.000021 wd 0.0500 time 0.2522 (0.2647) data time 0.0010 (0.0037) model time 0.2512 (0.2611) loss 5.9505 (5.3698) grad_norm 2.1218 (3.2939) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][270/625] eta 0:01:34 lr 0.000021 wd 0.0500 time 0.2579 (0.2651) data time 0.0009 (0.0037) model time 0.2570 (0.2616) loss 5.1454 (5.3695) grad_norm 1.8212 (3.2781) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][280/625] eta 0:01:31 lr 0.000021 wd 0.0500 time 0.2540 (0.2648) data time 0.0010 (0.0036) model time 0.2530 (0.2613) loss 5.6279 (5.3663) grad_norm 5.0098 (3.2504) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][290/625] eta 0:01:28 lr 0.000021 wd 0.0500 time 0.2560 (0.2648) data time 0.0008 (0.0035) model time 0.2552 (0.2615) loss 5.2945 (5.3690) grad_norm 2.2013 (3.2197) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][300/625] eta 0:01:25 lr 0.000021 wd 0.0500 time 0.2568 (0.2646) data time 0.0008 (0.0034) model time 0.2560 (0.2613) loss 4.9694 (5.3793) grad_norm 1.9886 (3.2287) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][310/625] eta 0:01:23 lr 0.000021 wd 0.0500 time 0.2524 (0.2649) data time 0.0009 (0.0033) model time 0.2514 (0.2618) loss 5.9533 (5.3776) grad_norm 2.1778 (3.2521) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][320/625] eta 0:01:20 lr 0.000021 wd 0.0500 time 0.2575 (0.2647) data time 0.0009 (0.0032) model time 0.2566 (0.2616) loss 5.9904 (5.3887) grad_norm 2.9543 (3.2433) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:26:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][330/625] eta 0:01:18 lr 0.000021 wd 0.0500 time 0.2562 (0.2650) data time 0.0008 (0.0032) model time 0.2554 (0.2621) loss 5.4768 (5.3895) grad_norm 1.7534 (3.2258) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][340/625] eta 0:01:15 lr 0.000021 wd 0.0500 time 0.2558 (0.2651) data time 0.0009 (0.0031) model time 0.2549 (0.2622) loss 4.5424 (5.3918) grad_norm 2.3782 (3.2328) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][350/625] eta 0:01:12 lr 0.000021 wd 0.0500 time 0.2512 (0.2652) data time 0.0009 (0.0031) model time 0.2504 (0.2624) loss 5.3667 (5.3955) grad_norm 2.0796 (3.2167) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][360/625] eta 0:01:10 lr 0.000021 wd 0.0500 time 0.2590 (0.2653) data time 0.0009 (0.0030) model time 0.2581 (0.2626) loss 4.4475 (5.3953) grad_norm 3.8260 (3.2480) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][370/625] eta 0:01:07 lr 0.000021 wd 0.0500 time 0.2569 (0.2651) data time 0.0008 (0.0029) model time 0.2561 (0.2624) loss 6.0907 (5.3959) grad_norm 5.6558 (3.2467) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][380/625] eta 0:01:04 lr 0.000021 wd 0.0500 time 0.2533 (0.2648) data time 0.0010 (0.0029) model time 0.2523 (0.2621) loss 6.1427 (5.4056) grad_norm 2.7818 (3.3259) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][390/625] eta 0:01:02 lr 0.000021 wd 0.0500 time 0.2567 (0.2646) data time 0.0007 (0.0028) model time 0.2560 (0.2619) loss 5.1829 (5.4073) grad_norm 4.2679 (3.3222) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][400/625] eta 0:00:59 lr 0.000021 wd 0.0500 time 0.2626 (0.2644) data time 0.0007 (0.0028) model time 0.2619 (0.2618) loss 5.5318 (5.4076) grad_norm 3.2412 (3.3171) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][410/625] eta 0:00:56 lr 0.000021 wd 0.0500 time 0.4584 (0.2647) data time 0.0011 (0.0027) model time 0.4574 (0.2622) loss 5.7230 (5.4082) grad_norm 3.2339 (3.3010) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][420/625] eta 0:00:54 lr 0.000021 wd 0.0500 time 0.2562 (0.2645) data time 0.0008 (0.0027) model time 0.2553 (0.2620) loss 5.7185 (5.4087) grad_norm 2.7309 (3.3103) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][430/625] eta 0:00:51 lr 0.000021 wd 0.0500 time 0.2592 (0.2648) data time 0.0010 (0.0027) model time 0.2583 (0.2624) loss 4.6410 (5.4047) grad_norm 2.9220 (3.2976) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][440/625] eta 0:00:49 lr 0.000021 wd 0.0500 time 0.2556 (0.2650) data time 0.0009 (0.0026) model time 0.2547 (0.2627) loss 6.4384 (5.4036) grad_norm 1.8518 (3.2857) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][450/625] eta 0:00:46 lr 0.000021 wd 0.0500 time 0.2603 (0.2648) data time 0.0008 (0.0026) model time 0.2595 (0.2625) loss 5.6038 (5.4029) grad_norm 2.9381 (3.2776) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][460/625] eta 0:00:43 lr 0.000021 wd 0.0500 time 0.2569 (0.2646) data time 0.0008 (0.0025) model time 0.2561 (0.2623) loss 5.4608 (5.3973) grad_norm 2.7876 (3.2639) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][470/625] eta 0:00:40 lr 0.000021 wd 0.0500 time 0.2565 (0.2645) data time 0.0008 (0.0025) model time 0.2557 (0.2621) loss 5.0935 (5.3947) grad_norm 2.5791 (3.2529) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][480/625] eta 0:00:38 lr 0.000021 wd 0.0500 time 0.2607 (0.2643) data time 0.0005 (0.0025) model time 0.2601 (0.2620) loss 4.5743 (5.3967) grad_norm 2.9566 (3.2566) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][490/625] eta 0:00:35 lr 0.000021 wd 0.0500 time 0.2601 (0.2641) data time 0.0009 (0.0024) model time 0.2593 (0.2618) loss 5.3063 (5.3937) grad_norm 3.2042 (3.2418) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][500/625] eta 0:00:32 lr 0.000021 wd 0.0500 time 0.2544 (0.2640) data time 0.0008 (0.0024) model time 0.2537 (0.2617) loss 5.8631 (5.3936) grad_norm 2.9537 (3.2510) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][510/625] eta 0:00:30 lr 0.000021 wd 0.0500 time 0.2601 (0.2638) data time 0.0008 (0.0024) model time 0.2593 (0.2615) loss 6.5468 (5.3878) grad_norm 2.5386 (3.2739) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][520/625] eta 0:00:27 lr 0.000021 wd 0.0500 time 0.2544 (0.2637) data time 0.0007 (0.0023) model time 0.2537 (0.2614) loss 4.5411 (5.3840) grad_norm 2.4981 (3.2734) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][530/625] eta 0:00:25 lr 0.000021 wd 0.0500 time 0.2519 (0.2639) data time 0.0008 (0.0023) model time 0.2511 (0.2617) loss 4.3696 (5.3815) grad_norm 2.1175 (3.2642) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][540/625] eta 0:00:22 lr 0.000021 wd 0.0500 time 0.2532 (0.2639) data time 0.0006 (0.0023) model time 0.2527 (0.2617) loss 5.0377 (5.3791) grad_norm 8.6822 (3.2693) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][550/625] eta 0:00:19 lr 0.000021 wd 0.0500 time 0.2553 (0.2640) data time 0.0007 (0.0023) model time 0.2545 (0.2618) loss 5.3577 (5.3835) grad_norm 3.0020 (3.2612) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:27:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][560/625] eta 0:00:17 lr 0.000021 wd 0.0500 time 0.2580 (0.2639) data time 0.0007 (0.0022) model time 0.2573 (0.2617) loss 6.4258 (5.3885) grad_norm 4.2186 (3.2641) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][570/625] eta 0:00:14 lr 0.000021 wd 0.0500 time 0.2562 (0.2641) data time 0.0011 (0.0022) model time 0.2551 (0.2620) loss 5.3899 (5.3894) grad_norm 1.9968 (3.2617) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][580/625] eta 0:00:11 lr 0.000021 wd 0.0500 time 0.2575 (0.2640) data time 0.0012 (0.0022) model time 0.2563 (0.2619) loss 5.8800 (5.3916) grad_norm 3.5963 (3.2642) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][590/625] eta 0:00:09 lr 0.000021 wd 0.0500 time 0.2566 (0.2638) data time 0.0009 (0.0022) model time 0.2557 (0.2617) loss 4.6657 (5.3908) grad_norm 2.9147 (3.2691) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][600/625] eta 0:00:06 lr 0.000021 wd 0.0500 time 0.2595 (0.2637) data time 0.0007 (0.0022) model time 0.2588 (0.2617) loss 4.6895 (5.3862) grad_norm 2.4994 (3.2827) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][610/625] eta 0:00:03 lr 0.000021 wd 0.0500 time 0.2519 (0.2636) data time 0.0004 (0.0021) model time 0.2514 (0.2615) loss 5.4157 (5.3890) grad_norm 4.9759 (3.2768) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][620/625] eta 0:00:01 lr 0.000021 wd 0.0500 time 0.2523 (0.2634) data time 0.0004 (0.0021) model time 0.2518 (0.2614) loss 4.4425 (5.3835) grad_norm 3.1486 (3.2701) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 295 training takes 0:02:44 [2024-08-04 11:28:15 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:28:16 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.574 (0.574) Loss 0.6045 (0.6045) Acc@1 90.430 (90.430) Acc@5 98.877 (98.877) Mem 9655MB [2024-08-04 11:28:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.104) Loss 0.9067 (0.7159) Acc@1 81.787 (87.322) Acc@5 96.582 (97.892) Mem 9655MB [2024-08-04 11:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.081) Loss 1.0000 (0.8301) Acc@1 79.297 (84.345) Acc@5 95.557 (96.789) Mem 9655MB [2024-08-04 11:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.013 Acc@5 96.825 [2024-08-04 11:28:18 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.742 (0.742) Loss 0.5938 (0.5938) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:28:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.056 (0.124) Loss 0.8926 (0.7069) Acc@1 82.373 (87.287) Acc@5 96.680 (97.869) Mem 9655MB [2024-08-04 11:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.091) Loss 0.9980 (0.8230) Acc@1 79.150 (84.270) Acc@5 95.459 (96.742) Mem 9655MB [2024-08-04 11:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.925 Acc@5 96.761 [2024-08-04 11:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:28:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.93% [2024-08-04 11:28:20 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:28:21 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:28:21 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][0/625] eta 0:07:00 lr 0.000021 wd 0.0500 time 0.6734 (0.6734) data time 0.4114 (0.4114) model time 0.0000 (0.0000) loss 5.6006 (5.6006) grad_norm 2.6253 (2.6253) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][10/625] eta 0:03:12 lr 0.000021 wd 0.0500 time 0.2562 (0.3124) data time 0.0006 (0.0383) model time 0.0000 (0.0000) loss 5.7091 (5.4399) grad_norm 3.7263 (3.0923) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][20/625] eta 0:02:52 lr 0.000021 wd 0.0500 time 0.2578 (0.2855) data time 0.0006 (0.0205) model time 0.0000 (0.0000) loss 5.9376 (5.2842) grad_norm 3.9607 (2.8532) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][30/625] eta 0:02:44 lr 0.000021 wd 0.0500 time 0.2564 (0.2761) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 5.2353 (5.3244) grad_norm 1.9256 (2.7969) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][40/625] eta 0:02:41 lr 0.000021 wd 0.0500 time 0.2571 (0.2761) data time 0.0007 (0.0109) model time 0.0000 (0.0000) loss 6.5765 (5.3520) grad_norm 1.7521 (2.8055) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][50/625] eta 0:02:40 lr 0.000021 wd 0.0500 time 0.2565 (0.2793) data time 0.0009 (0.0090) model time 0.0000 (0.0000) loss 5.8167 (5.3482) grad_norm 4.6176 (3.0530) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][60/625] eta 0:02:37 lr 0.000021 wd 0.0500 time 0.2578 (0.2786) data time 0.0006 (0.0077) model time 0.2572 (0.2745) loss 5.2704 (5.3308) grad_norm 1.7144 (3.1777) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][70/625] eta 0:02:32 lr 0.000021 wd 0.0500 time 0.2588 (0.2754) data time 0.0008 (0.0067) model time 0.2580 (0.2645) loss 5.0119 (5.3298) grad_norm 3.5058 (3.1851) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][80/625] eta 0:02:28 lr 0.000021 wd 0.0500 time 0.2535 (0.2730) data time 0.0009 (0.0060) model time 0.2526 (0.2613) loss 5.9117 (5.3514) grad_norm 2.8862 (3.1555) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][90/625] eta 0:02:24 lr 0.000021 wd 0.0500 time 0.2557 (0.2710) data time 0.0006 (0.0054) model time 0.2551 (0.2596) loss 5.6189 (5.3748) grad_norm 3.3757 (3.1618) loss_scale 128.0000 (128.0000) mem 9655MB [2024-08-04 11:28:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][100/625] eta 0:02:21 lr 0.000021 wd 0.0500 time 0.2549 (0.2695) data time 0.0009 (0.0050) model time 0.2540 (0.2586) loss 5.4729 (5.3534) grad_norm 2.3094 (3.1245) loss_scale 256.0000 (131.8020) mem 9655MB [2024-08-04 11:28:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][110/625] eta 0:02:19 lr 0.000021 wd 0.0500 time 0.2531 (0.2716) data time 0.0010 (0.0046) model time 0.2522 (0.2641) loss 5.8693 (5.3453) grad_norm 7.2371 (3.1386) loss_scale 256.0000 (142.9910) mem 9655MB [2024-08-04 11:28:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][120/625] eta 0:02:17 lr 0.000021 wd 0.0500 time 0.2558 (0.2719) data time 0.0007 (0.0043) model time 0.2551 (0.2657) loss 6.1004 (5.3354) grad_norm 3.9739 (3.1335) loss_scale 256.0000 (152.3306) mem 9655MB [2024-08-04 11:28:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][130/625] eta 0:02:13 lr 0.000021 wd 0.0500 time 0.2535 (0.2707) data time 0.0007 (0.0041) model time 0.2527 (0.2643) loss 5.4414 (5.3460) grad_norm 2.5254 (3.1113) loss_scale 256.0000 (160.2443) mem 9655MB [2024-08-04 11:28:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][140/625] eta 0:02:11 lr 0.000021 wd 0.0500 time 0.2573 (0.2710) data time 0.0007 (0.0039) model time 0.2566 (0.2654) loss 4.8338 (5.3350) grad_norm 4.2826 (3.1059) loss_scale 256.0000 (167.0355) mem 9655MB [2024-08-04 11:29:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][150/625] eta 0:02:08 lr 0.000021 wd 0.0500 time 0.2545 (0.2700) data time 0.0009 (0.0037) model time 0.2535 (0.2644) loss 5.1586 (5.3471) grad_norm 2.2048 (3.0998) loss_scale 256.0000 (172.9272) mem 9655MB [2024-08-04 11:29:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][160/625] eta 0:02:05 lr 0.000021 wd 0.0500 time 0.2583 (0.2693) data time 0.0007 (0.0035) model time 0.2576 (0.2637) loss 4.6381 (5.3330) grad_norm 1.9068 (3.0645) loss_scale 256.0000 (178.0870) mem 9655MB [2024-08-04 11:29:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][170/625] eta 0:02:02 lr 0.000021 wd 0.0500 time 0.2570 (0.2685) data time 0.0007 (0.0033) model time 0.2563 (0.2630) loss 5.1243 (5.3441) grad_norm 5.4465 (3.0454) loss_scale 256.0000 (182.6433) mem 9655MB [2024-08-04 11:29:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][180/625] eta 0:01:59 lr 0.000021 wd 0.0500 time 0.2582 (0.2678) data time 0.0012 (0.0032) model time 0.2570 (0.2623) loss 4.4247 (5.3219) grad_norm 2.8573 (3.0459) loss_scale 256.0000 (186.6961) mem 9655MB [2024-08-04 11:29:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][190/625] eta 0:01:56 lr 0.000021 wd 0.0500 time 0.2565 (0.2680) data time 0.0009 (0.0031) model time 0.2556 (0.2629) loss 4.4820 (5.3028) grad_norm 2.6861 (3.0207) loss_scale 256.0000 (190.3246) mem 9655MB [2024-08-04 11:29:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][200/625] eta 0:01:53 lr 0.000021 wd 0.0500 time 0.2592 (0.2674) data time 0.0008 (0.0030) model time 0.2583 (0.2624) loss 4.3361 (5.3065) grad_norm 2.4577 (3.0060) loss_scale 256.0000 (193.5920) mem 9655MB [2024-08-04 11:29:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][210/625] eta 0:01:50 lr 0.000021 wd 0.0500 time 0.2582 (0.2670) data time 0.0007 (0.0029) model time 0.2576 (0.2621) loss 5.2665 (5.3111) grad_norm 3.3035 (3.0153) loss_scale 256.0000 (196.5498) mem 9655MB [2024-08-04 11:29:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][220/625] eta 0:01:47 lr 0.000021 wd 0.0500 time 0.2570 (0.2665) data time 0.0008 (0.0028) model time 0.2562 (0.2616) loss 5.1742 (5.3083) grad_norm 2.5079 (3.0275) loss_scale 256.0000 (199.2398) mem 9655MB [2024-08-04 11:29:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][230/625] eta 0:01:45 lr 0.000021 wd 0.0500 time 0.2591 (0.2666) data time 0.0008 (0.0028) model time 0.2583 (0.2620) loss 5.6583 (5.3115) grad_norm 2.5857 (3.0232) loss_scale 256.0000 (201.6970) mem 9655MB [2024-08-04 11:29:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][240/625] eta 0:01:42 lr 0.000021 wd 0.0500 time 0.2564 (0.2662) data time 0.0007 (0.0027) model time 0.2558 (0.2617) loss 6.2728 (5.3017) grad_norm 2.3206 (3.0215) loss_scale 256.0000 (203.9502) mem 9655MB [2024-08-04 11:29:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][250/625] eta 0:01:39 lr 0.000021 wd 0.0500 time 0.2584 (0.2658) data time 0.0006 (0.0026) model time 0.2578 (0.2614) loss 5.2911 (5.3059) grad_norm 2.5343 (3.0200) loss_scale 256.0000 (206.0239) mem 9655MB [2024-08-04 11:29:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][260/625] eta 0:01:37 lr 0.000021 wd 0.0500 time 0.2570 (0.2666) data time 0.0007 (0.0025) model time 0.2563 (0.2626) loss 5.5973 (5.3037) grad_norm 2.5350 (3.0703) loss_scale 256.0000 (207.9387) mem 9655MB [2024-08-04 11:29:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][270/625] eta 0:01:34 lr 0.000021 wd 0.0500 time 0.2593 (0.2663) data time 0.0010 (0.0025) model time 0.2583 (0.2623) loss 5.8617 (5.3062) grad_norm 4.2879 (3.1032) loss_scale 256.0000 (209.7122) mem 9655MB [2024-08-04 11:29:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][280/625] eta 0:01:31 lr 0.000021 wd 0.0500 time 0.2575 (0.2659) data time 0.0009 (0.0024) model time 0.2566 (0.2620) loss 4.5409 (5.3069) grad_norm 3.4386 (3.1755) loss_scale 256.0000 (211.3594) mem 9655MB [2024-08-04 11:29:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][290/625] eta 0:01:28 lr 0.000021 wd 0.0500 time 0.2628 (0.2656) data time 0.0009 (0.0024) model time 0.2619 (0.2617) loss 5.2389 (5.3165) grad_norm 2.5881 (3.1525) loss_scale 256.0000 (212.8935) mem 9655MB [2024-08-04 11:29:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][300/625] eta 0:01:26 lr 0.000021 wd 0.0500 time 0.2571 (0.2653) data time 0.0009 (0.0023) model time 0.2563 (0.2614) loss 5.7909 (5.3170) grad_norm 2.2794 (3.1284) loss_scale 256.0000 (214.3256) mem 9655MB [2024-08-04 11:29:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][310/625] eta 0:01:23 lr 0.000021 wd 0.0500 time 0.2541 (0.2650) data time 0.0008 (0.0023) model time 0.2532 (0.2612) loss 5.2307 (5.3130) grad_norm 7.7271 (3.1379) loss_scale 256.0000 (215.6656) mem 9655MB [2024-08-04 11:29:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][320/625] eta 0:01:20 lr 0.000021 wd 0.0500 time 0.2526 (0.2647) data time 0.0009 (0.0022) model time 0.2518 (0.2610) loss 5.5996 (5.3075) grad_norm 4.0277 (3.1300) loss_scale 256.0000 (216.9221) mem 9655MB [2024-08-04 11:29:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][330/625] eta 0:01:18 lr 0.000021 wd 0.0500 time 0.2573 (0.2649) data time 0.0006 (0.0022) model time 0.2568 (0.2613) loss 5.3966 (5.2985) grad_norm 2.0579 (3.1146) loss_scale 256.0000 (218.1027) mem 9655MB [2024-08-04 11:29:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][340/625] eta 0:01:15 lr 0.000021 wd 0.0500 time 0.2548 (0.2647) data time 0.0008 (0.0022) model time 0.2540 (0.2611) loss 5.7969 (5.2955) grad_norm 2.4296 (3.1191) loss_scale 256.0000 (219.2141) mem 9655MB [2024-08-04 11:29:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][350/625] eta 0:01:12 lr 0.000021 wd 0.0500 time 0.2537 (0.2644) data time 0.0008 (0.0021) model time 0.2529 (0.2610) loss 4.8761 (5.2979) grad_norm 1.7562 (3.0971) loss_scale 256.0000 (220.2621) mem 9655MB [2024-08-04 11:29:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][360/625] eta 0:01:10 lr 0.000021 wd 0.0500 time 0.2568 (0.2642) data time 0.0009 (0.0021) model time 0.2559 (0.2608) loss 6.4055 (5.3032) grad_norm 2.5297 (3.0923) loss_scale 256.0000 (221.2521) mem 9655MB [2024-08-04 11:29:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][370/625] eta 0:01:07 lr 0.000021 wd 0.0500 time 0.2582 (0.2640) data time 0.0005 (0.0021) model time 0.2577 (0.2606) loss 5.3749 (5.3068) grad_norm 4.0765 (3.1012) loss_scale 256.0000 (222.1887) mem 9655MB [2024-08-04 11:30:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][380/625] eta 0:01:04 lr 0.000021 wd 0.0500 time 0.2568 (0.2638) data time 0.0009 (0.0020) model time 0.2559 (0.2605) loss 4.8331 (5.3054) grad_norm 2.7791 (3.0988) loss_scale 256.0000 (223.0761) mem 9655MB [2024-08-04 11:30:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][390/625] eta 0:01:02 lr 0.000021 wd 0.0500 time 0.2557 (0.2641) data time 0.0007 (0.0020) model time 0.2550 (0.2609) loss 5.1094 (5.3076) grad_norm 4.0909 (3.0998) loss_scale 256.0000 (223.9182) mem 9655MB [2024-08-04 11:30:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][400/625] eta 0:00:59 lr 0.000021 wd 0.0500 time 0.2573 (0.2640) data time 0.0008 (0.0020) model time 0.2565 (0.2608) loss 5.2544 (5.3019) grad_norm 2.6493 (3.0859) loss_scale 256.0000 (224.7182) mem 9655MB [2024-08-04 11:30:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][410/625] eta 0:00:56 lr 0.000021 wd 0.0500 time 0.2523 (0.2638) data time 0.0009 (0.0019) model time 0.2515 (0.2606) loss 5.0623 (5.3089) grad_norm 2.0462 (3.0889) loss_scale 256.0000 (225.4793) mem 9655MB [2024-08-04 11:30:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][420/625] eta 0:00:54 lr 0.000021 wd 0.0500 time 0.2539 (0.2641) data time 0.0010 (0.0019) model time 0.2530 (0.2610) loss 4.2990 (5.3041) grad_norm 3.0831 (3.0786) loss_scale 256.0000 (226.2043) mem 9655MB [2024-08-04 11:30:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][430/625] eta 0:00:51 lr 0.000021 wd 0.0500 time 0.2545 (0.2639) data time 0.0007 (0.0019) model time 0.2538 (0.2609) loss 5.3417 (5.3116) grad_norm 3.5840 (3.0734) loss_scale 256.0000 (226.8956) mem 9655MB [2024-08-04 11:30:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][440/625] eta 0:00:48 lr 0.000021 wd 0.0500 time 0.2578 (0.2637) data time 0.0008 (0.0019) model time 0.2570 (0.2607) loss 6.0284 (5.3126) grad_norm 2.3635 (3.0650) loss_scale 256.0000 (227.5556) mem 9655MB [2024-08-04 11:30:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][450/625] eta 0:00:46 lr 0.000021 wd 0.0500 time 0.2694 (0.2635) data time 0.0009 (0.0019) model time 0.2684 (0.2606) loss 4.8181 (5.3078) grad_norm 2.9514 (3.0626) loss_scale 256.0000 (228.1863) mem 9655MB [2024-08-04 11:30:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][460/625] eta 0:00:43 lr 0.000021 wd 0.0500 time 0.2534 (0.2641) data time 0.0012 (0.0018) model time 0.2522 (0.2612) loss 6.3260 (5.3101) grad_norm 2.5184 (3.0505) loss_scale 256.0000 (228.7896) mem 9655MB [2024-08-04 11:30:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][470/625] eta 0:00:40 lr 0.000021 wd 0.0500 time 0.3827 (0.2642) data time 0.0009 (0.0018) model time 0.3818 (0.2614) loss 5.8220 (5.3141) grad_norm 2.1350 (3.0893) loss_scale 256.0000 (229.3673) mem 9655MB [2024-08-04 11:30:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][480/625] eta 0:00:38 lr 0.000021 wd 0.0500 time 0.2581 (0.2640) data time 0.0007 (0.0018) model time 0.2574 (0.2612) loss 4.9693 (5.3156) grad_norm 2.3309 (3.0830) loss_scale 256.0000 (229.9210) mem 9655MB [2024-08-04 11:30:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][490/625] eta 0:00:35 lr 0.000021 wd 0.0500 time 0.2571 (0.2639) data time 0.0006 (0.0018) model time 0.2566 (0.2611) loss 6.4195 (5.3157) grad_norm 2.6373 (3.0768) loss_scale 256.0000 (230.4521) mem 9655MB [2024-08-04 11:30:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][500/625] eta 0:00:33 lr 0.000021 wd 0.0500 time 0.2564 (0.2641) data time 0.0007 (0.0018) model time 0.2557 (0.2615) loss 5.8105 (5.3164) grad_norm 2.7719 (3.0644) loss_scale 256.0000 (230.9621) mem 9655MB [2024-08-04 11:30:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][510/625] eta 0:00:30 lr 0.000021 wd 0.0500 time 0.2575 (0.2643) data time 0.0007 (0.0017) model time 0.2568 (0.2617) loss 5.6875 (5.3177) grad_norm 4.2159 (3.1343) loss_scale 256.0000 (231.4521) mem 9655MB [2024-08-04 11:30:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][520/625] eta 0:00:27 lr 0.000021 wd 0.0500 time 0.2561 (0.2642) data time 0.0008 (0.0017) model time 0.2553 (0.2616) loss 5.5426 (5.3203) grad_norm 1.9964 (3.1298) loss_scale 256.0000 (231.9232) mem 9655MB [2024-08-04 11:30:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][530/625] eta 0:00:25 lr 0.000021 wd 0.0500 time 0.4596 (0.2644) data time 0.0007 (0.0017) model time 0.4589 (0.2619) loss 4.7516 (5.3205) grad_norm 2.7085 (3.1191) loss_scale 256.0000 (232.3766) mem 9655MB [2024-08-04 11:30:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][540/625] eta 0:00:22 lr 0.000021 wd 0.0500 time 0.2573 (0.2647) data time 0.0009 (0.0017) model time 0.2564 (0.2622) loss 5.8301 (5.3225) grad_norm 3.3073 (3.1106) loss_scale 256.0000 (232.8133) mem 9655MB [2024-08-04 11:30:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][550/625] eta 0:00:19 lr 0.000021 wd 0.0500 time 0.2563 (0.2648) data time 0.0009 (0.0017) model time 0.2554 (0.2624) loss 4.6655 (5.3243) grad_norm 2.8357 (3.1027) loss_scale 256.0000 (233.2341) mem 9655MB [2024-08-04 11:30:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][560/625] eta 0:00:17 lr 0.000021 wd 0.0500 time 0.2589 (0.2647) data time 0.0008 (0.0017) model time 0.2581 (0.2623) loss 5.8587 (5.3256) grad_norm 2.9023 (3.1044) loss_scale 256.0000 (233.6399) mem 9655MB [2024-08-04 11:30:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][570/625] eta 0:00:14 lr 0.000021 wd 0.0500 time 0.2568 (0.2647) data time 0.0006 (0.0017) model time 0.2562 (0.2624) loss 4.8705 (5.3242) grad_norm 2.8064 (3.0961) loss_scale 256.0000 (234.0315) mem 9655MB [2024-08-04 11:30:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][580/625] eta 0:00:11 lr 0.000021 wd 0.0500 time 0.2564 (0.2646) data time 0.0009 (0.0016) model time 0.2555 (0.2622) loss 4.8755 (5.3218) grad_norm 2.7084 (3.0913) loss_scale 256.0000 (234.4096) mem 9655MB [2024-08-04 11:30:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][590/625] eta 0:00:09 lr 0.000021 wd 0.0500 time 0.4571 (0.2648) data time 0.0009 (0.0016) model time 0.4563 (0.2625) loss 5.5950 (5.3209) grad_norm 2.9435 (3.0932) loss_scale 256.0000 (234.7750) mem 9655MB [2024-08-04 11:31:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][600/625] eta 0:00:06 lr 0.000021 wd 0.0500 time 0.2549 (0.2649) data time 0.0010 (0.0016) model time 0.2539 (0.2627) loss 4.8885 (5.3197) grad_norm 4.0134 (3.0933) loss_scale 256.0000 (235.1281) mem 9655MB [2024-08-04 11:31:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][610/625] eta 0:00:03 lr 0.000021 wd 0.0500 time 0.2526 (0.2648) data time 0.0006 (0.0016) model time 0.2520 (0.2626) loss 5.2876 (5.3194) grad_norm 2.6138 (3.0864) loss_scale 256.0000 (235.4697) mem 9655MB [2024-08-04 11:31:05 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][620/625] eta 0:00:01 lr 0.000021 wd 0.0500 time 0.2525 (0.2647) data time 0.0006 (0.0016) model time 0.2519 (0.2624) loss 6.2701 (5.3226) grad_norm 3.6177 (3.0855) loss_scale 256.0000 (235.8003) mem 9655MB [2024-08-04 11:31:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 296 training takes 0:02:45 [2024-08-04 11:31:06 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:31:06 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.6006 (0.6006) Acc@1 90.283 (90.283) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:31:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.095) Loss 0.9004 (0.7128) Acc@1 81.787 (87.322) Acc@5 96.777 (97.883) Mem 9655MB [2024-08-04 11:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.076) Loss 0.9956 (0.8276) Acc@1 79.346 (84.331) Acc@5 95.850 (96.780) Mem 9655MB [2024-08-04 11:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.997 Acc@5 96.801 [2024-08-04 11:31:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:31:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.805 (0.805) Loss 0.5928 (0.5928) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.129) Loss 0.8926 (0.7069) Acc@1 82.275 (87.291) Acc@5 96.680 (97.874) Mem 9655MB [2024-08-04 11:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 0.9966 (0.8228) Acc@1 79.004 (84.263) Acc@5 95.459 (96.745) Mem 9655MB [2024-08-04 11:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.915 Acc@5 96.763 [2024-08-04 11:31:10 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:31:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][0/625] eta 0:11:21 lr 0.000021 wd 0.0500 time 1.0899 (1.0899) data time 0.4073 (0.4073) model time 0.0000 (0.0000) loss 6.3460 (6.3460) grad_norm 2.6534 (2.6534) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][10/625] eta 0:03:33 lr 0.000021 wd 0.0500 time 0.2529 (0.3478) data time 0.0008 (0.0378) model time 0.0000 (0.0000) loss 4.7174 (5.4327) grad_norm 11.8460 (3.6554) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][20/625] eta 0:03:04 lr 0.000021 wd 0.0500 time 0.2524 (0.3045) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 5.8398 (5.3296) grad_norm 5.7594 (3.8478) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][30/625] eta 0:02:54 lr 0.000021 wd 0.0500 time 0.2574 (0.2931) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 5.8447 (5.4322) grad_norm 1.9343 (3.7520) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][40/625] eta 0:02:47 lr 0.000021 wd 0.0500 time 0.2585 (0.2870) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 5.1244 (5.4418) grad_norm 3.3124 (3.5008) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][50/625] eta 0:02:43 lr 0.000021 wd 0.0500 time 0.2530 (0.2836) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 5.5554 (5.4223) grad_norm 2.3975 (3.3664) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][60/625] eta 0:02:37 lr 0.000021 wd 0.0500 time 0.2525 (0.2791) data time 0.0010 (0.0076) model time 0.2516 (0.2554) loss 5.9362 (5.4214) grad_norm 2.8665 (3.4175) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][70/625] eta 0:02:34 lr 0.000021 wd 0.0500 time 0.2616 (0.2787) data time 0.0007 (0.0066) model time 0.2609 (0.2653) loss 5.8583 (5.3994) grad_norm 4.0587 (3.3153) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][80/625] eta 0:02:30 lr 0.000021 wd 0.0500 time 0.2566 (0.2759) data time 0.0008 (0.0060) model time 0.2558 (0.2618) loss 6.2680 (5.4062) grad_norm 3.8861 (3.2546) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][90/625] eta 0:02:27 lr 0.000021 wd 0.0500 time 0.4387 (0.2757) data time 0.0009 (0.0054) model time 0.4377 (0.2647) loss 5.1682 (5.4145) grad_norm 2.0446 (3.3568) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][100/625] eta 0:02:23 lr 0.000021 wd 0.0500 time 0.2580 (0.2736) data time 0.0009 (0.0050) model time 0.2572 (0.2625) loss 5.8176 (5.4480) grad_norm 3.9013 (3.2983) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][110/625] eta 0:02:20 lr 0.000020 wd 0.0500 time 0.2558 (0.2720) data time 0.0006 (0.0046) model time 0.2553 (0.2613) loss 4.6477 (5.4379) grad_norm 3.1160 (3.2857) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][120/625] eta 0:02:16 lr 0.000020 wd 0.0500 time 0.2567 (0.2707) data time 0.0007 (0.0043) model time 0.2561 (0.2604) loss 4.3338 (5.4279) grad_norm 1.9264 (3.2696) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][130/625] eta 0:02:13 lr 0.000020 wd 0.0500 time 0.2537 (0.2697) data time 0.0008 (0.0040) model time 0.2529 (0.2598) loss 5.1857 (5.4176) grad_norm 1.6657 (3.1977) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][140/625] eta 0:02:10 lr 0.000020 wd 0.0500 time 0.2554 (0.2699) data time 0.0011 (0.0038) model time 0.2543 (0.2611) loss 5.8463 (5.4160) grad_norm 1.8358 (3.1954) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][150/625] eta 0:02:07 lr 0.000020 wd 0.0500 time 0.2599 (0.2689) data time 0.0008 (0.0036) model time 0.2591 (0.2605) loss 5.7323 (5.4173) grad_norm 5.6666 (3.3925) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][160/625] eta 0:02:04 lr 0.000020 wd 0.0500 time 0.2622 (0.2682) data time 0.0009 (0.0035) model time 0.2613 (0.2602) loss 5.4304 (5.4146) grad_norm 2.9441 (3.3634) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][170/625] eta 0:02:01 lr 0.000020 wd 0.0500 time 0.2541 (0.2675) data time 0.0009 (0.0033) model time 0.2532 (0.2598) loss 4.9116 (5.4044) grad_norm 2.0314 (3.3169) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:31:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][180/625] eta 0:01:58 lr 0.000020 wd 0.0500 time 0.2545 (0.2669) data time 0.0007 (0.0032) model time 0.2538 (0.2594) loss 4.3061 (5.3829) grad_norm 4.4294 (3.2969) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][190/625] eta 0:01:55 lr 0.000020 wd 0.0500 time 0.2592 (0.2663) data time 0.0007 (0.0031) model time 0.2585 (0.2591) loss 4.7228 (5.3850) grad_norm 2.7461 (3.4032) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][200/625] eta 0:01:52 lr 0.000020 wd 0.0500 time 0.2540 (0.2658) data time 0.0009 (0.0030) model time 0.2531 (0.2588) loss 5.3182 (5.3917) grad_norm 2.0759 (3.4438) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][210/625] eta 0:01:50 lr 0.000020 wd 0.0500 time 0.2599 (0.2659) data time 0.0008 (0.0029) model time 0.2590 (0.2593) loss 5.9669 (5.3882) grad_norm 3.8331 (3.4137) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][220/625] eta 0:01:47 lr 0.000020 wd 0.0500 time 0.2545 (0.2654) data time 0.0011 (0.0028) model time 0.2533 (0.2591) loss 5.7108 (5.3761) grad_norm 3.2441 (3.7187) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][230/625] eta 0:01:45 lr 0.000020 wd 0.0500 time 0.4646 (0.2659) data time 0.0007 (0.0027) model time 0.4639 (0.2600) loss 4.9262 (5.3844) grad_norm 2.4435 (3.6778) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][240/625] eta 0:01:42 lr 0.000020 wd 0.0500 time 0.2572 (0.2655) data time 0.0010 (0.0026) model time 0.2562 (0.2597) loss 5.1762 (5.3720) grad_norm 2.9612 (3.6360) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][250/625] eta 0:01:39 lr 0.000020 wd 0.0500 time 0.2512 (0.2651) data time 0.0012 (0.0026) model time 0.2500 (0.2594) loss 5.3506 (5.3812) grad_norm 5.6332 (3.6208) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][260/625] eta 0:01:36 lr 0.000020 wd 0.0500 time 0.2549 (0.2647) data time 0.0010 (0.0025) model time 0.2540 (0.2591) loss 4.9993 (5.3747) grad_norm 2.5986 (3.6213) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][270/625] eta 0:01:34 lr 0.000020 wd 0.0500 time 0.2588 (0.2651) data time 0.0006 (0.0024) model time 0.2582 (0.2599) loss 5.8441 (5.3769) grad_norm 2.2730 (3.6086) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][280/625] eta 0:01:31 lr 0.000020 wd 0.0500 time 0.2564 (0.2648) data time 0.0007 (0.0024) model time 0.2558 (0.2597) loss 5.5935 (5.3721) grad_norm 2.4406 (3.5776) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][290/625] eta 0:01:28 lr 0.000020 wd 0.0500 time 0.2557 (0.2645) data time 0.0009 (0.0023) model time 0.2549 (0.2595) loss 5.6027 (5.3738) grad_norm 1.8374 (3.6133) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][300/625] eta 0:01:25 lr 0.000020 wd 0.0500 time 0.2533 (0.2642) data time 0.0010 (0.0023) model time 0.2523 (0.2593) loss 5.2221 (5.3721) grad_norm 4.6341 (3.5785) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][310/625] eta 0:01:23 lr 0.000020 wd 0.0500 time 0.2557 (0.2639) data time 0.0011 (0.0022) model time 0.2546 (0.2591) loss 6.0041 (5.3692) grad_norm 3.9664 (3.5555) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][320/625] eta 0:01:20 lr 0.000020 wd 0.0500 time 0.2581 (0.2637) data time 0.0008 (0.0022) model time 0.2573 (0.2590) loss 6.4087 (5.3809) grad_norm 2.1408 (3.5290) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][330/625] eta 0:01:17 lr 0.000020 wd 0.0500 time 0.2566 (0.2635) data time 0.0006 (0.0022) model time 0.2560 (0.2589) loss 4.5184 (5.3745) grad_norm 2.0799 (3.4944) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][340/625] eta 0:01:15 lr 0.000020 wd 0.0500 time 0.2606 (0.2633) data time 0.0009 (0.0021) model time 0.2596 (0.2588) loss 4.7162 (5.3681) grad_norm 3.1433 (3.4814) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][350/625] eta 0:01:12 lr 0.000020 wd 0.0500 time 0.2499 (0.2637) data time 0.0011 (0.0021) model time 0.2488 (0.2593) loss 5.4111 (5.3760) grad_norm 2.5494 (3.5130) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][360/625] eta 0:01:09 lr 0.000020 wd 0.0500 time 0.2552 (0.2640) data time 0.0008 (0.0021) model time 0.2544 (0.2598) loss 5.2592 (5.3776) grad_norm 3.5176 (3.4909) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][370/625] eta 0:01:07 lr 0.000020 wd 0.0500 time 0.2583 (0.2638) data time 0.0008 (0.0020) model time 0.2575 (0.2597) loss 5.6970 (5.3832) grad_norm 1.8284 (3.4661) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][380/625] eta 0:01:04 lr 0.000020 wd 0.0500 time 0.2528 (0.2639) data time 0.0006 (0.0020) model time 0.2522 (0.2599) loss 4.6713 (5.3800) grad_norm 1.7673 (3.4459) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][390/625] eta 0:01:01 lr 0.000020 wd 0.0500 time 0.2557 (0.2637) data time 0.0010 (0.0020) model time 0.2547 (0.2598) loss 5.8302 (5.3916) grad_norm 3.2335 (3.4343) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][400/625] eta 0:00:59 lr 0.000020 wd 0.0500 time 0.2562 (0.2640) data time 0.0007 (0.0020) model time 0.2556 (0.2602) loss 5.3356 (5.3982) grad_norm 2.0996 (3.4343) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:32:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][410/625] eta 0:00:56 lr 0.000020 wd 0.0500 time 0.2632 (0.2639) data time 0.0010 (0.0019) model time 0.2622 (0.2601) loss 5.5282 (5.3906) grad_norm 2.0117 (3.4166) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][420/625] eta 0:00:54 lr 0.000020 wd 0.0500 time 0.2551 (0.2637) data time 0.0009 (0.0019) model time 0.2541 (0.2600) loss 5.4919 (5.3880) grad_norm 2.2357 (3.4023) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][430/625] eta 0:00:51 lr 0.000020 wd 0.0500 time 0.2557 (0.2635) data time 0.0009 (0.0019) model time 0.2549 (0.2599) loss 5.3194 (5.3896) grad_norm 2.4892 (3.3884) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][440/625] eta 0:00:48 lr 0.000020 wd 0.0500 time 0.2553 (0.2637) data time 0.0008 (0.0019) model time 0.2546 (0.2601) loss 4.5292 (5.3887) grad_norm 2.3982 (3.3850) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][450/625] eta 0:00:46 lr 0.000020 wd 0.0500 time 0.2573 (0.2635) data time 0.0006 (0.0018) model time 0.2567 (0.2600) loss 6.0855 (5.3855) grad_norm 3.4251 (3.3722) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][460/625] eta 0:00:43 lr 0.000020 wd 0.0500 time 0.2575 (0.2633) data time 0.0009 (0.0018) model time 0.2566 (0.2599) loss 6.1297 (5.3875) grad_norm 2.8899 (3.3550) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][470/625] eta 0:00:40 lr 0.000020 wd 0.0500 time 0.2564 (0.2636) data time 0.0009 (0.0018) model time 0.2555 (0.2602) loss 4.8627 (5.3837) grad_norm 2.0486 (3.3438) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][480/625] eta 0:00:38 lr 0.000020 wd 0.0500 time 0.2538 (0.2635) data time 0.0007 (0.0018) model time 0.2531 (0.2601) loss 4.3976 (5.3800) grad_norm 1.9927 (3.3750) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][490/625] eta 0:00:35 lr 0.000020 wd 0.0500 time 0.2579 (0.2637) data time 0.0009 (0.0018) model time 0.2570 (0.2605) loss 5.3881 (5.3840) grad_norm 2.5419 (3.3773) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][500/625] eta 0:00:32 lr 0.000020 wd 0.0500 time 0.2571 (0.2636) data time 0.0010 (0.0018) model time 0.2561 (0.2604) loss 5.9069 (5.3830) grad_norm 2.4379 (3.3768) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][510/625] eta 0:00:30 lr 0.000020 wd 0.0500 time 0.2539 (0.2639) data time 0.0009 (0.0017) model time 0.2529 (0.2607) loss 5.1056 (5.3809) grad_norm 3.4972 (3.3668) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][520/625] eta 0:00:27 lr 0.000020 wd 0.0500 time 0.2503 (0.2637) data time 0.0009 (0.0017) model time 0.2494 (0.2606) loss 5.7433 (5.3755) grad_norm 2.1555 (3.3526) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][530/625] eta 0:00:25 lr 0.000020 wd 0.0500 time 0.2536 (0.2636) data time 0.0009 (0.0017) model time 0.2526 (0.2605) loss 5.0224 (5.3757) grad_norm 2.0506 (3.3478) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][540/625] eta 0:00:22 lr 0.000020 wd 0.0500 time 0.2518 (0.2634) data time 0.0009 (0.0017) model time 0.2509 (0.2604) loss 4.7867 (5.3763) grad_norm 3.4068 (3.3715) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][550/625] eta 0:00:19 lr 0.000020 wd 0.0500 time 0.2576 (0.2635) data time 0.0008 (0.0017) model time 0.2567 (0.2605) loss 5.2133 (5.3718) grad_norm 1.9930 (3.3579) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][560/625] eta 0:00:17 lr 0.000020 wd 0.0500 time 0.2548 (0.2634) data time 0.0010 (0.0017) model time 0.2538 (0.2604) loss 6.1928 (5.3738) grad_norm 3.0838 (3.3475) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][570/625] eta 0:00:14 lr 0.000020 wd 0.0500 time 0.2594 (0.2633) data time 0.0009 (0.0017) model time 0.2585 (0.2603) loss 5.5710 (5.3745) grad_norm 3.1311 (3.3413) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][580/625] eta 0:00:11 lr 0.000020 wd 0.0500 time 0.2536 (0.2631) data time 0.0009 (0.0016) model time 0.2527 (0.2602) loss 5.0130 (5.3784) grad_norm 2.1374 (3.3594) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][590/625] eta 0:00:09 lr 0.000020 wd 0.0500 time 0.2564 (0.2630) data time 0.0007 (0.0016) model time 0.2557 (0.2601) loss 5.5342 (5.3814) grad_norm 1.9615 (3.3496) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][600/625] eta 0:00:06 lr 0.000020 wd 0.0500 time 0.2563 (0.2631) data time 0.0007 (0.0016) model time 0.2556 (0.2603) loss 4.1338 (5.3758) grad_norm 2.1397 (3.3619) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][610/625] eta 0:00:03 lr 0.000020 wd 0.0500 time 0.2534 (0.2633) data time 0.0006 (0.0016) model time 0.2529 (0.2605) loss 5.1885 (5.3750) grad_norm 2.2620 (3.3515) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:54 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][620/625] eta 0:00:01 lr 0.000020 wd 0.0500 time 0.2541 (0.2636) data time 0.0004 (0.0016) model time 0.2537 (0.2609) loss 4.7833 (5.3723) grad_norm 2.5775 (3.3363) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:33:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 297 training takes 0:02:44 [2024-08-04 11:33:55 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:33:56 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:33:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.6035 (0.6035) Acc@1 90.576 (90.576) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:33:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.054 (0.098) Loss 0.9028 (0.7160) Acc@1 81.934 (87.367) Acc@5 96.826 (97.892) Mem 9655MB [2024-08-04 11:33:57 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.078) Loss 0.9941 (0.8300) Acc@1 79.297 (84.361) Acc@5 95.947 (96.794) Mem 9655MB [2024-08-04 11:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.025 Acc@5 96.807 [2024-08-04 11:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:33:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.679 (0.679) Loss 0.5938 (0.5938) Acc@1 90.381 (90.381) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.053 (0.129) Loss 0.8926 (0.7071) Acc@1 82.080 (87.260) Acc@5 96.680 (97.878) Mem 9655MB [2024-08-04 11:33:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.094) Loss 0.9961 (0.8230) Acc@1 78.955 (84.247) Acc@5 95.557 (96.754) Mem 9655MB [2024-08-04 11:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.911 Acc@5 96.771 [2024-08-04 11:34:00 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:34:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][0/625] eta 0:11:15 lr 0.000020 wd 0.0500 time 1.0810 (1.0810) data time 0.5071 (0.5071) model time 0.0000 (0.0000) loss 6.1642 (6.1642) grad_norm 2.5198 (2.5198) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][10/625] eta 0:03:23 lr 0.000020 wd 0.0500 time 0.2513 (0.3306) data time 0.0010 (0.0470) model time 0.0000 (0.0000) loss 5.0190 (5.5445) grad_norm 3.0506 (2.9301) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][20/625] eta 0:02:58 lr 0.000020 wd 0.0500 time 0.2547 (0.2951) data time 0.0010 (0.0250) model time 0.0000 (0.0000) loss 4.8767 (5.4787) grad_norm 5.0472 (3.0276) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][30/625] eta 0:02:48 lr 0.000020 wd 0.0500 time 0.2529 (0.2827) data time 0.0008 (0.0172) model time 0.0000 (0.0000) loss 4.9282 (5.4391) grad_norm 2.4902 (3.1443) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][40/625] eta 0:02:43 lr 0.000020 wd 0.0500 time 0.2564 (0.2793) data time 0.0007 (0.0133) model time 0.0000 (0.0000) loss 5.2919 (5.4444) grad_norm 2.0923 (3.2734) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][50/625] eta 0:02:37 lr 0.000020 wd 0.0500 time 0.2567 (0.2748) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 5.5546 (5.4376) grad_norm 2.2464 (3.1309) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][60/625] eta 0:02:34 lr 0.000020 wd 0.0500 time 0.3577 (0.2732) data time 0.0010 (0.0093) model time 0.3566 (0.2644) loss 5.6801 (5.4144) grad_norm 3.3857 (3.0548) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][70/625] eta 0:02:33 lr 0.000020 wd 0.0500 time 0.2547 (0.2762) data time 0.0008 (0.0081) model time 0.2538 (0.2790) loss 5.3195 (5.4107) grad_norm 3.7459 (3.1207) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][80/625] eta 0:02:29 lr 0.000020 wd 0.0500 time 0.2533 (0.2736) data time 0.0011 (0.0072) model time 0.2522 (0.2707) loss 6.3351 (5.4169) grad_norm 2.5900 (3.0342) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][90/625] eta 0:02:25 lr 0.000020 wd 0.0500 time 0.2539 (0.2717) data time 0.0009 (0.0065) model time 0.2530 (0.2668) loss 4.5986 (5.3597) grad_norm 2.1089 (3.0103) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][100/625] eta 0:02:21 lr 0.000020 wd 0.0500 time 0.2529 (0.2701) data time 0.0006 (0.0060) model time 0.2523 (0.2643) loss 5.4303 (5.3790) grad_norm 3.2433 (3.0664) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][110/625] eta 0:02:19 lr 0.000020 wd 0.0500 time 0.2566 (0.2703) data time 0.0012 (0.0055) model time 0.2554 (0.2655) loss 5.3942 (5.3999) grad_norm 4.3595 (3.0856) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][120/625] eta 0:02:15 lr 0.000020 wd 0.0500 time 0.2559 (0.2692) data time 0.0007 (0.0051) model time 0.2552 (0.2642) loss 4.2797 (5.3726) grad_norm 4.5305 (3.1141) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][130/625] eta 0:02:12 lr 0.000020 wd 0.0500 time 0.2602 (0.2683) data time 0.0005 (0.0048) model time 0.2597 (0.2632) loss 4.9995 (5.3714) grad_norm 4.7553 (3.1239) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][140/625] eta 0:02:09 lr 0.000020 wd 0.0500 time 0.2541 (0.2675) data time 0.0010 (0.0045) model time 0.2531 (0.2624) loss 6.0519 (5.3618) grad_norm 2.6207 (3.1033) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][150/625] eta 0:02:07 lr 0.000020 wd 0.0500 time 0.2572 (0.2687) data time 0.0006 (0.0043) model time 0.2565 (0.2647) loss 6.2039 (5.3570) grad_norm 2.2192 (3.0710) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][160/625] eta 0:02:04 lr 0.000020 wd 0.0500 time 0.2903 (0.2682) data time 0.0009 (0.0041) model time 0.2894 (0.2642) loss 5.1012 (5.3468) grad_norm 2.2245 (3.0592) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][170/625] eta 0:02:01 lr 0.000020 wd 0.0500 time 0.2559 (0.2675) data time 0.0010 (0.0039) model time 0.2548 (0.2634) loss 5.4846 (5.3349) grad_norm 1.8313 (3.0423) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][180/625] eta 0:01:58 lr 0.000020 wd 0.0500 time 0.2547 (0.2669) data time 0.0007 (0.0037) model time 0.2540 (0.2628) loss 5.1389 (5.3323) grad_norm 5.5317 (3.0352) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][190/625] eta 0:01:55 lr 0.000020 wd 0.0500 time 0.2503 (0.2663) data time 0.0009 (0.0036) model time 0.2494 (0.2624) loss 6.0920 (5.3472) grad_norm 2.0385 (3.0538) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][200/625] eta 0:01:53 lr 0.000020 wd 0.0500 time 0.3761 (0.2674) data time 0.0008 (0.0035) model time 0.3753 (0.2640) loss 5.5162 (5.3448) grad_norm 1.5919 (3.0428) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][210/625] eta 0:01:50 lr 0.000020 wd 0.0500 time 0.2544 (0.2669) data time 0.0007 (0.0033) model time 0.2537 (0.2634) loss 4.3656 (5.3499) grad_norm 4.5741 (3.0639) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:34:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][220/625] eta 0:01:47 lr 0.000020 wd 0.0500 time 0.2576 (0.2664) data time 0.0005 (0.0032) model time 0.2570 (0.2629) loss 5.4009 (5.3446) grad_norm 2.0742 (3.1255) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][230/625] eta 0:01:45 lr 0.000020 wd 0.0500 time 0.2505 (0.2668) data time 0.0009 (0.0031) model time 0.2496 (0.2637) loss 5.0382 (5.3498) grad_norm 2.4539 (3.1206) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][240/625] eta 0:01:42 lr 0.000020 wd 0.0500 time 0.2559 (0.2672) data time 0.0009 (0.0030) model time 0.2550 (0.2643) loss 4.6333 (5.3384) grad_norm 2.6578 (3.2131) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][250/625] eta 0:01:40 lr 0.000020 wd 0.0500 time 0.2527 (0.2668) data time 0.0010 (0.0030) model time 0.2517 (0.2638) loss 6.2247 (5.3383) grad_norm 3.1939 (3.2076) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][260/625] eta 0:01:37 lr 0.000020 wd 0.0500 time 0.2602 (0.2665) data time 0.0007 (0.0029) model time 0.2595 (0.2635) loss 6.2789 (5.3342) grad_norm 4.1241 (3.2031) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][270/625] eta 0:01:34 lr 0.000020 wd 0.0500 time 0.2596 (0.2661) data time 0.0016 (0.0028) model time 0.2580 (0.2632) loss 4.6114 (5.3212) grad_norm 2.1743 (3.4428) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][280/625] eta 0:01:31 lr 0.000020 wd 0.0500 time 0.2530 (0.2658) data time 0.0008 (0.0027) model time 0.2522 (0.2628) loss 5.8414 (5.3248) grad_norm 2.6249 (3.4338) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][290/625] eta 0:01:28 lr 0.000020 wd 0.0500 time 0.2546 (0.2654) data time 0.0008 (0.0027) model time 0.2538 (0.2625) loss 4.7059 (5.3205) grad_norm 3.8920 (3.4063) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][300/625] eta 0:01:26 lr 0.000020 wd 0.0500 time 0.2548 (0.2655) data time 0.0009 (0.0026) model time 0.2539 (0.2627) loss 5.6390 (5.3167) grad_norm 3.1055 (3.3969) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][310/625] eta 0:01:23 lr 0.000020 wd 0.0500 time 0.2565 (0.2652) data time 0.0007 (0.0026) model time 0.2558 (0.2624) loss 5.4004 (5.3186) grad_norm 3.5295 (3.3751) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][320/625] eta 0:01:20 lr 0.000020 wd 0.0500 time 0.2553 (0.2649) data time 0.0017 (0.0025) model time 0.2536 (0.2622) loss 5.1471 (5.3157) grad_norm 1.6947 (3.3708) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][330/625] eta 0:01:18 lr 0.000020 wd 0.0500 time 0.2539 (0.2647) data time 0.0010 (0.0025) model time 0.2530 (0.2619) loss 6.2101 (5.3183) grad_norm 2.3789 (3.3417) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][340/625] eta 0:01:15 lr 0.000020 wd 0.0500 time 0.2595 (0.2645) data time 0.0008 (0.0024) model time 0.2587 (0.2617) loss 5.6812 (5.3145) grad_norm 1.9419 (3.3240) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][350/625] eta 0:01:13 lr 0.000020 wd 0.0500 time 0.4403 (0.2657) data time 0.0010 (0.0024) model time 0.4392 (0.2633) loss 5.2992 (5.3243) grad_norm 3.5594 (3.3376) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][360/625] eta 0:01:10 lr 0.000020 wd 0.0500 time 0.2577 (0.2655) data time 0.0011 (0.0023) model time 0.2566 (0.2630) loss 4.6917 (5.3235) grad_norm 2.8246 (3.3755) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][370/625] eta 0:01:07 lr 0.000020 wd 0.0500 time 0.2566 (0.2652) data time 0.0008 (0.0023) model time 0.2558 (0.2628) loss 4.7732 (5.3262) grad_norm 3.0850 (3.3640) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][380/625] eta 0:01:04 lr 0.000020 wd 0.0500 time 0.2595 (0.2651) data time 0.0009 (0.0023) model time 0.2586 (0.2626) loss 4.9994 (5.3239) grad_norm 2.2969 (3.3510) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][390/625] eta 0:01:02 lr 0.000020 wd 0.0500 time 0.2543 (0.2648) data time 0.0008 (0.0022) model time 0.2535 (0.2624) loss 5.7412 (5.3283) grad_norm 3.8565 (3.3421) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][400/625] eta 0:00:59 lr 0.000020 wd 0.0500 time 0.2560 (0.2646) data time 0.0007 (0.0022) model time 0.2553 (0.2622) loss 4.5697 (5.3245) grad_norm 3.4513 (3.3342) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][410/625] eta 0:00:56 lr 0.000020 wd 0.0500 time 0.2578 (0.2644) data time 0.0008 (0.0022) model time 0.2570 (0.2620) loss 5.4211 (5.3233) grad_norm 2.7579 (3.3211) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][420/625] eta 0:00:54 lr 0.000020 wd 0.0500 time 0.2564 (0.2642) data time 0.0010 (0.0021) model time 0.2554 (0.2618) loss 5.0610 (5.3243) grad_norm 5.8644 (3.3164) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][430/625] eta 0:00:51 lr 0.000020 wd 0.0500 time 0.2565 (0.2640) data time 0.0012 (0.0021) model time 0.2553 (0.2616) loss 5.0971 (5.3280) grad_norm 5.0580 (3.3091) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][440/625] eta 0:00:48 lr 0.000020 wd 0.0500 time 0.2553 (0.2638) data time 0.0012 (0.0021) model time 0.2541 (0.2615) loss 5.7264 (5.3240) grad_norm 2.4036 (3.3445) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:35:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][450/625] eta 0:00:46 lr 0.000020 wd 0.0500 time 0.2508 (0.2637) data time 0.0009 (0.0021) model time 0.2499 (0.2613) loss 5.6273 (5.3285) grad_norm 2.0346 (3.3351) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][460/625] eta 0:00:43 lr 0.000020 wd 0.0500 time 0.2580 (0.2639) data time 0.0009 (0.0020) model time 0.2570 (0.2616) loss 5.9019 (5.3294) grad_norm 2.0113 (3.3124) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][470/625] eta 0:00:40 lr 0.000020 wd 0.0500 time 0.2559 (0.2637) data time 0.0010 (0.0020) model time 0.2549 (0.2615) loss 5.7659 (5.3386) grad_norm 1.8425 (3.3091) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][480/625] eta 0:00:38 lr 0.000020 wd 0.0500 time 0.2567 (0.2636) data time 0.0011 (0.0020) model time 0.2556 (0.2614) loss 6.1952 (5.3422) grad_norm 2.6019 (3.3019) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][490/625] eta 0:00:35 lr 0.000020 wd 0.0500 time 0.2506 (0.2634) data time 0.0010 (0.0020) model time 0.2496 (0.2612) loss 4.3605 (5.3451) grad_norm 1.8736 (3.2905) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][500/625] eta 0:00:32 lr 0.000020 wd 0.0500 time 0.2561 (0.2633) data time 0.0008 (0.0019) model time 0.2553 (0.2611) loss 5.0747 (5.3428) grad_norm 2.8972 (3.2806) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][510/625] eta 0:00:30 lr 0.000020 wd 0.0500 time 0.2531 (0.2632) data time 0.0010 (0.0019) model time 0.2521 (0.2610) loss 6.3071 (5.3487) grad_norm 2.9974 (3.2794) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][520/625] eta 0:00:27 lr 0.000020 wd 0.0500 time 0.2585 (0.2635) data time 0.0008 (0.0019) model time 0.2577 (0.2613) loss 5.0523 (5.3436) grad_norm 3.0901 (3.2749) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][530/625] eta 0:00:25 lr 0.000020 wd 0.0500 time 0.2592 (0.2633) data time 0.0006 (0.0019) model time 0.2586 (0.2612) loss 4.3808 (5.3414) grad_norm 2.6591 (3.2728) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][540/625] eta 0:00:22 lr 0.000020 wd 0.0500 time 0.2567 (0.2632) data time 0.0009 (0.0019) model time 0.2558 (0.2611) loss 5.4950 (5.3421) grad_norm 2.8280 (3.2717) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][550/625] eta 0:00:19 lr 0.000020 wd 0.0500 time 0.2558 (0.2633) data time 0.0009 (0.0019) model time 0.2548 (0.2612) loss 5.4601 (5.3455) grad_norm 2.4543 (3.2601) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][560/625] eta 0:00:17 lr 0.000020 wd 0.0500 time 0.2552 (0.2635) data time 0.0012 (0.0018) model time 0.2540 (0.2614) loss 5.2843 (5.3425) grad_norm 4.6222 (3.2615) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][570/625] eta 0:00:14 lr 0.000020 wd 0.0500 time 0.2544 (0.2634) data time 0.0007 (0.0018) model time 0.2538 (0.2613) loss 5.2884 (5.3440) grad_norm 2.2060 (3.2757) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][580/625] eta 0:00:11 lr 0.000020 wd 0.0500 time 0.2564 (0.2635) data time 0.0007 (0.0018) model time 0.2557 (0.2615) loss 5.5175 (5.3461) grad_norm 3.0243 (3.2766) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][590/625] eta 0:00:09 lr 0.000020 wd 0.0500 time 0.2568 (0.2634) data time 0.0011 (0.0018) model time 0.2557 (0.2613) loss 4.7258 (5.3474) grad_norm 3.5810 (3.2672) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][600/625] eta 0:00:06 lr 0.000020 wd 0.0500 time 0.2546 (0.2636) data time 0.0011 (0.0018) model time 0.2536 (0.2616) loss 6.0458 (5.3488) grad_norm 4.8255 (3.2617) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:41 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][610/625] eta 0:00:03 lr 0.000020 wd 0.0500 time 0.2521 (0.2635) data time 0.0004 (0.0018) model time 0.2517 (0.2615) loss 4.5361 (5.3482) grad_norm 2.7477 (3.3046) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][620/625] eta 0:00:01 lr 0.000020 wd 0.0500 time 0.2540 (0.2633) data time 0.0005 (0.0018) model time 0.2534 (0.2614) loss 5.6149 (5.3470) grad_norm 2.2855 (3.2897) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:44 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 298 training takes 0:02:44 [2024-08-04 11:36:44 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:36:45 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:36:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.549 (0.549) Loss 0.5996 (0.5996) Acc@1 90.479 (90.479) Acc@5 98.779 (98.779) Mem 9655MB [2024-08-04 11:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.102) Loss 0.9028 (0.7124) Acc@1 81.885 (87.380) Acc@5 96.875 (97.909) Mem 9655MB [2024-08-04 11:36:46 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.079) Loss 0.9971 (0.8273) Acc@1 78.955 (84.331) Acc@5 95.703 (96.789) Mem 9655MB [2024-08-04 11:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.015 Acc@5 96.803 [2024-08-04 11:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:36:47 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.685 (0.685) Loss 0.5938 (0.5938) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:36:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.127) Loss 0.8926 (0.7071) Acc@1 82.178 (87.282) Acc@5 96.729 (97.896) Mem 9655MB [2024-08-04 11:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 0.9961 (0.8228) Acc@1 78.857 (84.261) Acc@5 95.557 (96.773) Mem 9655MB [2024-08-04 11:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.919 Acc@5 96.783 [2024-08-04 11:36:49 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:36:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][0/625] eta 0:11:17 lr 0.000020 wd 0.0500 time 1.0840 (1.0840) data time 0.6825 (0.6825) model time 0.0000 (0.0000) loss 4.3196 (4.3196) grad_norm 1.8378 (1.8378) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:52 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][10/625] eta 0:03:23 lr 0.000020 wd 0.0500 time 0.2554 (0.3315) data time 0.0009 (0.0629) model time 0.0000 (0.0000) loss 4.9172 (5.2256) grad_norm 3.6634 (3.0499) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:55 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][20/625] eta 0:02:58 lr 0.000020 wd 0.0500 time 0.2579 (0.2954) data time 0.0007 (0.0334) model time 0.0000 (0.0000) loss 5.4771 (5.2593) grad_norm 2.6426 (2.8448) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:36:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][30/625] eta 0:02:54 lr 0.000020 wd 0.0500 time 0.2580 (0.2937) data time 0.0005 (0.0229) model time 0.0000 (0.0000) loss 5.6062 (5.2116) grad_norm 2.4652 (2.7901) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][40/625] eta 0:02:51 lr 0.000020 wd 0.0500 time 0.2532 (0.2940) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 5.8225 (5.2776) grad_norm 3.5080 (2.7217) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][50/625] eta 0:02:45 lr 0.000020 wd 0.0500 time 0.2588 (0.2871) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 6.2130 (5.2649) grad_norm 2.0139 (2.7184) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][60/625] eta 0:02:40 lr 0.000020 wd 0.0500 time 0.2593 (0.2838) data time 0.0007 (0.0121) model time 0.2585 (0.2663) loss 4.1307 (5.2435) grad_norm 4.3344 (2.9902) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][70/625] eta 0:02:35 lr 0.000020 wd 0.0500 time 0.2515 (0.2799) data time 0.0011 (0.0105) model time 0.2505 (0.2607) loss 5.7111 (5.2487) grad_norm 3.0395 (3.0437) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][80/625] eta 0:02:32 lr 0.000020 wd 0.0500 time 0.2560 (0.2794) data time 0.0010 (0.0093) model time 0.2550 (0.2654) loss 5.3183 (5.2529) grad_norm 2.0098 (3.0667) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][90/625] eta 0:02:28 lr 0.000020 wd 0.0500 time 0.2544 (0.2768) data time 0.0009 (0.0084) model time 0.2535 (0.2626) loss 5.8316 (5.2781) grad_norm 4.0416 (3.0369) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][100/625] eta 0:02:24 lr 0.000020 wd 0.0500 time 0.2559 (0.2746) data time 0.0010 (0.0077) model time 0.2549 (0.2610) loss 4.3409 (5.2726) grad_norm 3.7650 (3.1123) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][110/625] eta 0:02:20 lr 0.000020 wd 0.0500 time 0.2604 (0.2729) data time 0.0006 (0.0071) model time 0.2598 (0.2599) loss 4.8462 (5.2821) grad_norm 5.3827 (3.1546) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][120/625] eta 0:02:17 lr 0.000020 wd 0.0500 time 0.2531 (0.2714) data time 0.0009 (0.0066) model time 0.2522 (0.2590) loss 5.1626 (5.3032) grad_norm 3.6806 (3.1365) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][130/625] eta 0:02:13 lr 0.000020 wd 0.0500 time 0.2517 (0.2702) data time 0.0008 (0.0061) model time 0.2509 (0.2584) loss 4.9371 (5.2990) grad_norm 1.9065 (3.1224) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][140/625] eta 0:02:10 lr 0.000020 wd 0.0500 time 0.2518 (0.2691) data time 0.0016 (0.0058) model time 0.2502 (0.2580) loss 4.8912 (5.3066) grad_norm 3.7541 (3.1094) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][150/625] eta 0:02:07 lr 0.000020 wd 0.0500 time 0.2463 (0.2682) data time 0.0009 (0.0055) model time 0.2454 (0.2576) loss 5.2861 (5.3151) grad_norm 2.7174 (3.1094) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][160/625] eta 0:02:04 lr 0.000020 wd 0.0500 time 0.2611 (0.2686) data time 0.0009 (0.0052) model time 0.2602 (0.2591) loss 4.6904 (5.3147) grad_norm 3.8520 (3.1078) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][170/625] eta 0:02:01 lr 0.000020 wd 0.0500 time 0.2582 (0.2679) data time 0.0011 (0.0049) model time 0.2571 (0.2588) loss 5.4765 (5.3262) grad_norm 3.1172 (3.0920) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][180/625] eta 0:01:59 lr 0.000020 wd 0.0500 time 0.2542 (0.2694) data time 0.0009 (0.0047) model time 0.2533 (0.2615) loss 5.8367 (5.3186) grad_norm 4.8599 (3.0899) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][190/625] eta 0:01:56 lr 0.000020 wd 0.0500 time 0.2576 (0.2687) data time 0.0009 (0.0045) model time 0.2567 (0.2611) loss 5.3280 (5.3085) grad_norm 2.7795 (3.0973) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][200/625] eta 0:01:53 lr 0.000020 wd 0.0500 time 0.2660 (0.2682) data time 0.0010 (0.0044) model time 0.2649 (0.2608) loss 5.6793 (5.3160) grad_norm 2.4824 (3.0825) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][210/625] eta 0:01:51 lr 0.000020 wd 0.0500 time 0.2575 (0.2676) data time 0.0009 (0.0042) model time 0.2567 (0.2604) loss 4.9639 (5.3131) grad_norm 3.1983 (3.0744) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][220/625] eta 0:01:48 lr 0.000020 wd 0.0500 time 0.2549 (0.2670) data time 0.0010 (0.0041) model time 0.2539 (0.2600) loss 5.4190 (5.3160) grad_norm 2.7759 (3.0854) loss_scale 256.0000 (256.0000) mem 9655MB [2024-08-04 11:37:50 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][230/625] eta 0:01:45 lr 0.000020 wd 0.0500 time 0.2548 (0.2665) data time 0.0009 (0.0039) model time 0.2540 (0.2597) loss 5.2392 (5.3246) grad_norm 2.0110 (3.1026) loss_scale 512.0000 (264.8658) mem 9655MB [2024-08-04 11:37:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][240/625] eta 0:01:42 lr 0.000020 wd 0.0500 time 0.2542 (0.2668) data time 0.0007 (0.0038) model time 0.2534 (0.2604) loss 4.3656 (5.3109) grad_norm 2.8359 (3.0971) loss_scale 512.0000 (275.1203) mem 9655MB [2024-08-04 11:37:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][250/625] eta 0:01:39 lr 0.000020 wd 0.0500 time 0.2518 (0.2664) data time 0.0008 (0.0037) model time 0.2510 (0.2601) loss 5.0161 (5.3114) grad_norm 3.2260 (3.0771) loss_scale 512.0000 (284.5578) mem 9655MB [2024-08-04 11:37:58 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][260/625] eta 0:01:37 lr 0.000020 wd 0.0500 time 0.2552 (0.2660) data time 0.0007 (0.0036) model time 0.2546 (0.2599) loss 4.0871 (5.3179) grad_norm 2.0967 (3.1110) loss_scale 512.0000 (293.2720) mem 9655MB [2024-08-04 11:38:01 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][270/625] eta 0:01:34 lr 0.000020 wd 0.0500 time 0.2610 (0.2657) data time 0.0009 (0.0035) model time 0.2601 (0.2597) loss 5.0921 (5.3087) grad_norm 1.8059 (3.1015) loss_scale 512.0000 (301.3432) mem 9655MB [2024-08-04 11:38:03 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][280/625] eta 0:01:31 lr 0.000020 wd 0.0500 time 0.2526 (0.2654) data time 0.0010 (0.0034) model time 0.2516 (0.2596) loss 4.5134 (5.3042) grad_norm 2.2457 (3.0873) loss_scale 512.0000 (308.8399) mem 9655MB [2024-08-04 11:38:06 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][290/625] eta 0:01:28 lr 0.000020 wd 0.0500 time 0.2594 (0.2651) data time 0.0006 (0.0033) model time 0.2588 (0.2594) loss 5.5720 (5.3094) grad_norm 3.0954 (3.1064) loss_scale 512.0000 (315.8213) mem 9655MB [2024-08-04 11:38:08 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][300/625] eta 0:01:26 lr 0.000020 wd 0.0500 time 0.2583 (0.2647) data time 0.0006 (0.0032) model time 0.2577 (0.2592) loss 6.0102 (5.3180) grad_norm 3.1824 (3.0859) loss_scale 512.0000 (322.3389) mem 9655MB [2024-08-04 11:38:11 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][310/625] eta 0:01:23 lr 0.000020 wd 0.0500 time 0.2548 (0.2644) data time 0.0008 (0.0032) model time 0.2540 (0.2590) loss 6.2804 (5.3294) grad_norm 2.2518 (3.0671) loss_scale 512.0000 (328.4373) mem 9655MB [2024-08-04 11:38:14 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][320/625] eta 0:01:20 lr 0.000020 wd 0.0500 time 0.2569 (0.2642) data time 0.0009 (0.0031) model time 0.2560 (0.2589) loss 5.1323 (5.3303) grad_norm 3.8579 (3.0526) loss_scale 512.0000 (334.1558) mem 9655MB [2024-08-04 11:38:16 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][330/625] eta 0:01:18 lr 0.000020 wd 0.0500 time 0.2627 (0.2646) data time 0.0008 (0.0030) model time 0.2619 (0.2595) loss 4.8666 (5.3308) grad_norm 4.7489 (3.0646) loss_scale 512.0000 (339.5287) mem 9655MB [2024-08-04 11:38:19 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][340/625] eta 0:01:15 lr 0.000020 wd 0.0500 time 0.2537 (0.2643) data time 0.0008 (0.0030) model time 0.2529 (0.2593) loss 5.0596 (5.3231) grad_norm 2.6844 (3.0676) loss_scale 512.0000 (344.5865) mem 9655MB [2024-08-04 11:38:22 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][350/625] eta 0:01:12 lr 0.000020 wd 0.0500 time 0.2522 (0.2645) data time 0.0007 (0.0029) model time 0.2515 (0.2597) loss 5.8072 (5.3328) grad_norm 3.7628 (3.0732) loss_scale 512.0000 (349.3561) mem 9655MB [2024-08-04 11:38:24 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][360/625] eta 0:01:10 lr 0.000020 wd 0.0500 time 0.2560 (0.2643) data time 0.0009 (0.0029) model time 0.2550 (0.2596) loss 5.4008 (5.3402) grad_norm 4.1871 (3.0715) loss_scale 512.0000 (353.8615) mem 9655MB [2024-08-04 11:38:27 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][370/625] eta 0:01:07 lr 0.000020 wd 0.0500 time 0.2615 (0.2646) data time 0.0008 (0.0028) model time 0.2607 (0.2600) loss 5.8823 (5.3348) grad_norm 3.1298 (3.0883) loss_scale 512.0000 (358.1240) mem 9655MB [2024-08-04 11:38:29 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][380/625] eta 0:01:04 lr 0.000020 wd 0.0500 time 0.2620 (0.2643) data time 0.0008 (0.0028) model time 0.2612 (0.2598) loss 6.0618 (5.3372) grad_norm 2.4500 (3.0821) loss_scale 512.0000 (362.1627) mem 9655MB [2024-08-04 11:38:32 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][390/625] eta 0:01:02 lr 0.000020 wd 0.0500 time 0.2524 (0.2641) data time 0.0009 (0.0027) model time 0.2515 (0.2597) loss 5.8643 (5.3369) grad_norm 2.0142 (3.0690) loss_scale 512.0000 (365.9949) mem 9655MB [2024-08-04 11:38:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][400/625] eta 0:00:59 lr 0.000020 wd 0.0500 time 0.2526 (0.2644) data time 0.0011 (0.0027) model time 0.2515 (0.2602) loss 5.4819 (5.3398) grad_norm 3.3523 (3.0595) loss_scale 512.0000 (369.6359) mem 9655MB [2024-08-04 11:38:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][410/625] eta 0:00:56 lr 0.000020 wd 0.0500 time 0.2537 (0.2643) data time 0.0010 (0.0026) model time 0.2527 (0.2600) loss 5.7816 (5.3363) grad_norm 3.7816 (3.0524) loss_scale 512.0000 (373.0998) mem 9655MB [2024-08-04 11:38:40 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][420/625] eta 0:00:54 lr 0.000020 wd 0.0500 time 0.2584 (0.2650) data time 0.0008 (0.0026) model time 0.2576 (0.2610) loss 5.9436 (5.3355) grad_norm 1.8828 (3.0401) loss_scale 512.0000 (376.3990) mem 9655MB [2024-08-04 11:38:43 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][430/625] eta 0:00:51 lr 0.000020 wd 0.0500 time 0.2562 (0.2648) data time 0.0008 (0.0026) model time 0.2554 (0.2608) loss 6.1214 (5.3369) grad_norm 2.1706 (3.0428) loss_scale 512.0000 (379.5452) mem 9655MB [2024-08-04 11:38:45 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][440/625] eta 0:00:48 lr 0.000020 wd 0.0500 time 0.2568 (0.2646) data time 0.0010 (0.0025) model time 0.2559 (0.2607) loss 6.1202 (5.3355) grad_norm 2.5039 (3.0435) loss_scale 512.0000 (382.5488) mem 9655MB [2024-08-04 11:38:48 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][450/625] eta 0:00:46 lr 0.000020 wd 0.0500 time 0.2559 (0.2648) data time 0.0006 (0.0025) model time 0.2554 (0.2610) loss 6.3904 (5.3370) grad_norm 2.7406 (3.0388) loss_scale 512.0000 (385.4191) mem 9655MB [2024-08-04 11:38:51 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][460/625] eta 0:00:43 lr 0.000020 wd 0.0500 time 0.2503 (0.2651) data time 0.0010 (0.0025) model time 0.2493 (0.2613) loss 5.8959 (5.3388) grad_norm 4.2961 (3.0379) loss_scale 512.0000 (388.1649) mem 9655MB [2024-08-04 11:38:53 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][470/625] eta 0:00:41 lr 0.000020 wd 0.0500 time 0.2552 (0.2649) data time 0.0006 (0.0024) model time 0.2545 (0.2612) loss 5.7248 (5.3328) grad_norm 2.6884 (3.0298) loss_scale 512.0000 (390.7941) mem 9655MB [2024-08-04 11:38:56 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][480/625] eta 0:00:38 lr 0.000020 wd 0.0500 time 0.2558 (0.2651) data time 0.0012 (0.0024) model time 0.2546 (0.2615) loss 4.7664 (5.3331) grad_norm 1.7885 (3.0152) loss_scale 512.0000 (393.3139) mem 9655MB [2024-08-04 11:38:59 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][490/625] eta 0:00:35 lr 0.000020 wd 0.0500 time 0.2555 (0.2653) data time 0.0007 (0.0024) model time 0.2548 (0.2618) loss 5.5947 (5.3359) grad_norm 1.9551 (3.0125) loss_scale 512.0000 (395.7312) mem 9655MB [2024-08-04 11:39:02 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][500/625] eta 0:00:33 lr 0.000020 wd 0.0500 time 0.2560 (0.2651) data time 0.0009 (0.0023) model time 0.2551 (0.2617) loss 5.0131 (5.3356) grad_norm 3.0976 (3.0127) loss_scale 512.0000 (398.0519) mem 9655MB [2024-08-04 11:39:04 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][510/625] eta 0:00:30 lr 0.000020 wd 0.0500 time 0.2651 (0.2654) data time 0.0010 (0.0023) model time 0.2641 (0.2620) loss 5.7692 (5.3331) grad_norm 3.2472 (3.0102) loss_scale 512.0000 (400.2818) mem 9655MB [2024-08-04 11:39:07 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][520/625] eta 0:00:27 lr 0.000020 wd 0.0500 time 0.2558 (0.2652) data time 0.0009 (0.0023) model time 0.2549 (0.2619) loss 5.1614 (5.3319) grad_norm 2.1738 (3.0079) loss_scale 512.0000 (402.4261) mem 9655MB [2024-08-04 11:39:09 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][530/625] eta 0:00:25 lr 0.000020 wd 0.0500 time 0.2611 (0.2650) data time 0.0006 (0.0022) model time 0.2605 (0.2617) loss 4.9311 (5.3328) grad_norm 3.2901 (3.0002) loss_scale 512.0000 (404.4896) mem 9655MB [2024-08-04 11:39:12 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][540/625] eta 0:00:22 lr 0.000020 wd 0.0500 time 0.2595 (0.2649) data time 0.0006 (0.0022) model time 0.2589 (0.2616) loss 4.7297 (5.3329) grad_norm 2.5560 (3.0133) loss_scale 512.0000 (406.4769) mem 9655MB [2024-08-04 11:39:15 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][550/625] eta 0:00:19 lr 0.000020 wd 0.0500 time 0.2593 (0.2647) data time 0.0008 (0.0022) model time 0.2585 (0.2614) loss 5.5886 (5.3270) grad_norm 3.1451 (3.0325) loss_scale 512.0000 (408.3920) mem 9655MB [2024-08-04 11:39:17 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][560/625] eta 0:00:17 lr 0.000020 wd 0.0500 time 0.2567 (0.2648) data time 0.0007 (0.0022) model time 0.2560 (0.2616) loss 5.1413 (5.3219) grad_norm 4.1704 (3.0360) loss_scale 512.0000 (410.2389) mem 9655MB [2024-08-04 11:39:20 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][570/625] eta 0:00:14 lr 0.000020 wd 0.0500 time 0.2552 (0.2647) data time 0.0009 (0.0022) model time 0.2544 (0.2615) loss 4.3962 (5.3181) grad_norm 3.8897 (3.0382) loss_scale 512.0000 (412.0210) mem 9655MB [2024-08-04 11:39:23 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][580/625] eta 0:00:11 lr 0.000020 wd 0.0500 time 0.2565 (0.2647) data time 0.0008 (0.0021) model time 0.2557 (0.2616) loss 4.6308 (5.3205) grad_norm 2.3043 (inf) loss_scale 256.0000 (411.5387) mem 9655MB [2024-08-04 11:39:25 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][590/625] eta 0:00:09 lr 0.000020 wd 0.0500 time 0.2557 (0.2646) data time 0.0010 (0.0021) model time 0.2547 (0.2615) loss 4.6109 (5.3164) grad_norm 2.5876 (inf) loss_scale 256.0000 (408.9069) mem 9655MB [2024-08-04 11:39:28 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][600/625] eta 0:00:06 lr 0.000020 wd 0.0500 time 0.2546 (0.2645) data time 0.0008 (0.0021) model time 0.2538 (0.2614) loss 4.5838 (5.3193) grad_norm 2.3370 (inf) loss_scale 256.0000 (406.3627) mem 9655MB [2024-08-04 11:39:30 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][610/625] eta 0:00:03 lr 0.000020 wd 0.0500 time 0.2520 (0.2644) data time 0.0006 (0.0021) model time 0.2513 (0.2613) loss 5.4142 (5.3237) grad_norm 2.5382 (inf) loss_scale 256.0000 (403.9018) mem 9655MB [2024-08-04 11:39:33 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][620/625] eta 0:00:01 lr 0.000020 wd 0.0500 time 0.2531 (0.2642) data time 0.0004 (0.0021) model time 0.2527 (0.2612) loss 6.1568 (5.3291) grad_norm 3.5600 (inf) loss_scale 256.0000 (401.5201) mem 9655MB [2024-08-04 11:39:34 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 394): INFO EPOCH 299 training takes 0:02:45 [2024-08-04 11:39:34 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saving...... [2024-08-04 11:39:34 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/latest_ckpt.pth saved !!! [2024-08-04 11:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.6079 (0.6079) Acc@1 90.332 (90.332) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:39:35 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.097) Loss 0.9038 (0.7184) Acc@1 81.885 (87.358) Acc@5 96.729 (97.874) Mem 9655MB [2024-08-04 11:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.077) Loss 0.9980 (0.8320) Acc@1 79.297 (84.338) Acc@5 95.801 (96.794) Mem 9655MB [2024-08-04 11:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.025 Acc@5 96.809 [2024-08-04 11:39:36 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-04 11:39:37 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.806 (0.806) Loss 0.5938 (0.5938) Acc@1 90.283 (90.283) Acc@5 98.828 (98.828) Mem 9655MB [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.055 (0.128) Loss 0.8926 (0.7071) Acc@1 82.178 (87.287) Acc@5 96.631 (97.883) Mem 9655MB [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.055 (0.093) Loss 0.9961 (0.8229) Acc@1 78.906 (84.282) Acc@5 95.605 (96.768) Mem 9655MB [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.941 Acc@5 96.785 [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.94% [2024-08-04 11:39:38 vssd_mesa_retrain_tiny_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saving...... [2024-08-04 11:39:39 vssd_mesa_retrain_tiny_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_tiny_e300/20240725135109/best_ckpt_ema.pth saved !!! [2024-08-04 11:39:39 vssd_mesa_retrain_tiny_e300] (main_hfai_mnodes.py 291): INFO Training time 7:51:20